Jupyter AI Magics Are Not ✨Magic✨

It doesn’t take ✨magic✨ to integrate ChatGPT into your Jupyter workflow.

Integrating ChatGPT into your Jupyter workflow doesn’t have to be magic.

New tools are seemingly coming out daily to help write code using large language models (LLMs). They appear to have a considerable positive impact on developers’ lives. GitHub claims 88% percent of developers feel more productive when using GitHub Copilot, and they’re more than twice as fast. In addition to proprietary tools like Copilot, there are open-source integrations with various LLM providers (OpenAI, Anthropic, Hugging Face Hub, etc.), like the Jupyter AI extension for Jupyter Lab, that can provide significant gains.

The Jupyter AI extension is featureful, with a custom chat interface, Q&A about your notebook, a vector database to query local data and notebook magics. All these features can feel like magic, but they’re not. Sometimes, they’re even pretty simple, like the notebook magics.

Making something that works is often surprisingly easy (the code devil is in the details of making it robust and flexible). So, I’d like to show you the internals of making your own AI magic command.

At the core, there are two things we need to know:

Creating a magic command requires us to decorate a function with @register_cell_magic:

from IPython.core.magic import register_cell_magic

@register_cell_magic
def llm(line, cell):
    # Do something
    pass

That’s it. It’s sufficient to make %%llm a cell magic command.

To interact with the LLM provider, we’ll use Langchain, but we could use the provider’s package directly, like OpenAI’s Python package. We instantiate our chat object and create the sequence of messages to send to the model. The SystemMessage tells the model what we want it to do (you could do some prompt engineering to, say, adjust the verbosity of the response). The HumanMessage will contain the instructions for the code we want it to write.

from langchain.chat_models import ChatOpenAI

model = ChatOpenAI(
   openai_api_key="YOUR-OPENAI-API-KEY",
   model="gpt-3.5-turbo",
)

messages = [
    SystemMessage(content="""
        You're an experienced Python programmer.
        Write code that does the task below.
        Return only the code."""
    ),
    HumanMessage(content=cell)
]

Then there’s actually a third part: inserting a new cell in the notebook with the code the model generated. By default, magic functions just return a value. To do that, we have to dig a little into the internals of IPython and use its payload manager:

from IPython import get_ipython
ip = get_ipython()
ip.payload_manager.write_payload(
    dict(source='set_next_input', text=code, replace=False, execute=False)
)

Putting all the pieces of code together, we get our full generative AI IPython magic cell command:

import os
from IPython.core.magic import register_cell_magic
from IPython import get_ipython
from langchain.chat_models import ChatOpenAI
from langchain.schema import HumanMessage, SystemMessage

@register_cell_magic
def llm(line, cell):
    model = ChatOpenAI(
        openai_api_key="YOUR-OPENAI-API-KEY",
        model="gpt-3.5-turbo",
    )

    messages = [
        SystemMessage(content="""
            You're an experienced Python programmer.
            Write code that does the task below.
            Return only the code."""
        ),
        HumanMessage(content=cell)
    ]
    
    response = model.predict_messages(messages)
    code = response.content.strip('```python').strip('`').strip()
    
    ip = get_ipython()
    ip.payload_manager.write_payload(
        dict(source='set_next_input', text=code, replace=False, execute=False)
    )

Once you’ve defined the function above, you can use it in a new cell with the %%llm cell magic.

%%llm
A numpy array of time samples from 0 to 2 seconds in 1 ms increment.
x that's the sine of the time.
A line plot of the result, with a purple line.

The prompt above created this code. But be aware that it’ll probably generate slightly different code every time you execute the cell. You can adjust how different it’ll be using the temperature argument of the ChatOpenAI class.

import numpy as np
import matplotlib.pyplot as plt

# Generate time samples from 0 to 2 seconds with 1 ms increment
time = np.arange(0, 2, 0.001)

# Compute the sine of the time samples
x = np.sin(time)

# Plot the result
plt.plot(time, x, color='purple')
plt.show()

The next step would be to package it up in an extension we can reuse across notebooks rather than copy-pasting it from notebook to notebook.

Because it’s your extension, you can modify it to suit your needs. You could replace the call to OpenAI with a local Code Llama model using ctransformers. Or you could parametrize the prompt to support different programming languages or use the IPython object to collect code from previous cells and inject them as context in the HumanMessage.

Using other people’s tools is often the pragmatic solution.

But sometimes, building your own tools is the only way to learn (that there is no ✨magic✨).

___________________________

Author: Alexandre Chabot-Leclerc, Vice President, Digital Transformation Solutions, holds a Ph.D. in electrical engineering and a M.Sc. in acoustics engineering from the Technical University of Denmark and a B.Eng. in electrical engineering from the Université de Sherbrooke. He is passionate about transforming people and the work they do. He has taught the scientific Python stack and machine learning to hundreds of scientists, engineers, and analysts at the world’s largest corporations and national laboratories. After seven years in Denmark, Alexandre is totally sold on commuting by bicycle. If you have any free time you’d like to fill, ask him for a book, music, podcast, or restaurant recommendation.

Share this article:

Related Content

Revolutionizing Materials R&D with “AI Supermodels”

Learn how AI Supermodels are allowing for faster, more accurate predictions with far fewer data points.

Read More

Digital Transformation vs. Digital Enhancement: A Starting Decision Framework for Technology Initiatives in R&D

Leveraging advanced technology like generative AI through digital transformation (not digital enhancement) is how to get the biggest returns in scientific R&D.

Read More

Digital Transformation in Practice

There is much more to digital transformation than technology, and a holistic strategy is crucial for the journey.

Read More

Leveraging AI for More Efficient Research in BioPharma

In the rapidly-evolving landscape of drug discovery and development, traditional approaches to R&D in biopharma are no longer sufficient. Artificial intelligence (AI) continues to be a...

Read More

Utilizing LLMs Today in Industrial Materials and Chemical R&D

Leveraging large language models (LLMs) in materials science and chemical R&D isn't just a speculative venture for some AI future. There are two primary use...

Read More

Top 10 AI Concepts Every Scientific R&D Leader Should Know

R&D leaders and scientists need a working understanding of key AI concepts so they can more effectively develop future-forward data strategies and lead the charge...

Read More

Why A Data Fabric is Essential for Modern R&D

Scattered and siloed data is one of the top challenges slowing down scientific discovery and innovation today. What every R&D organization needs is a data...

Read More

Jupyter AI Magics Are Not ✨Magic✨

It doesn’t take ✨magic✨ to integrate ChatGPT into your Jupyter workflow. Integrating ChatGPT into your Jupyter workflow doesn’t have to be magic. New tools are…

Read More

Top 5 Takeaways from the American Chemical Society (ACS) 2023 Fall Meeting: R&D Data, Generative AI and More

By Mike Heiber, Ph.D., Materials Informatics Manager Enthought, Materials Science Solutions The American Chemical Society (ACS) is a premier scientific organization with members all over…

Read More

Real Scientists Make Their Own Tools

There’s a long history of scientists who built new tools to enable their discoveries. Tycho Brahe built a quadrant that allowed him to observe the…

Read More