It doesn’t take ✨magic✨ to integrate ChatGPT into your Jupyter workflow.
Integrating ChatGPT into your Jupyter workflow doesn’t have to be magic.
New tools are seemingly coming out daily to help write code using large language models (LLMs). They appear to have a considerable positive impact on developers’ lives. GitHub claims 88% percent of developers feel more productive when using GitHub Copilot, and they’re more than twice as fast. In addition to proprietary tools like Copilot, there are open-source integrations with various LLM providers (OpenAI, Anthropic, Hugging Face Hub, etc.), like the Jupyter AI extension for Jupyter Lab, that can provide significant gains.
The Jupyter AI extension is featureful, with a custom chat interface, Q&A about your notebook, a vector database to query local data and notebook magics. All these features can feel like magic, but they’re not. Sometimes, they’re even pretty simple, like the notebook magics.
Making something that works is often surprisingly easy (the code devil is in the details of making it robust and flexible). So, I’d like to show you the internals of making your own AI magic command.
At the core, there are two things we need to know:
- How to create new magic commands, and
- How to interact with the LLM provider.
Creating a magic command requires us to decorate a function with @register_cell_magic
:
from IPython.core.magic import register_cell_magic
@register_cell_magic
def llm(line, cell):
# Do something
pass
That’s it. It’s sufficient to make %%llm
a cell magic command.
To interact with the LLM provider, we’ll use Langchain, but we could use the provider’s package directly, like OpenAI’s Python package. We instantiate our chat object and create the sequence of messages to send to the model. The SystemMessage tells the model what we want it to do (you could do some prompt engineering to, say, adjust the verbosity of the response). The HumanMessage will contain the instructions for the code we want it to write.
from langchain.chat_models import ChatOpenAI
model = ChatOpenAI(
openai_api_key="YOUR-OPENAI-API-KEY",
model="gpt-3.5-turbo",
)
messages = [
SystemMessage(content="""
You're an experienced Python programmer.
Write code that does the task below.
Return only the code."""
),
HumanMessage(content=cell)
]
Then there’s actually a third part: inserting a new cell in the notebook with the code the model generated. By default, magic functions just return a value. To do that, we have to dig a little into the internals of IPython and use its payload manager:
from IPython import get_ipython
ip = get_ipython()
ip.payload_manager.write_payload(
dict(source='set_next_input', text=code, replace=False, execute=False)
)
Putting all the pieces of code together, we get our full generative AI IPython magic cell command:
import os
from IPython.core.magic import register_cell_magic
from IPython import get_ipython
from langchain.chat_models import ChatOpenAI
from langchain.schema import HumanMessage, SystemMessage
@register_cell_magic
def llm(line, cell):
model = ChatOpenAI(
openai_api_key="YOUR-OPENAI-API-KEY",
model="gpt-3.5-turbo",
)
messages = [
SystemMessage(content="""
You're an experienced Python programmer.
Write code that does the task below.
Return only the code."""
),
HumanMessage(content=cell)
]
response = model.predict_messages(messages)
code = response.content.strip('```python').strip('`').strip()
ip = get_ipython()
ip.payload_manager.write_payload(
dict(source='set_next_input', text=code, replace=False, execute=False)
)
Once you’ve defined the function above, you can use it in a new cell with the %%llm
cell magic.
%%llm
A numpy array of time samples from 0 to 2 seconds in 1 ms increment.
x that's the sine of the time.
A line plot of the result, with a purple line.
The prompt above created this code. But be aware that it’ll probably generate slightly different code every time you execute the cell. You can adjust how different it’ll be using the temperature
argument of the ChatOpenAI class.
import numpy as np
import matplotlib.pyplot as plt
# Generate time samples from 0 to 2 seconds with 1 ms increment
time = np.arange(0, 2, 0.001)
# Compute the sine of the time samples
x = np.sin(time)
# Plot the result
plt.plot(time, x, color='purple')
plt.show()
The next step would be to package it up in an extension we can reuse across notebooks rather than copy-pasting it from notebook to notebook.
Because it’s your extension, you can modify it to suit your needs. You could replace the call to OpenAI with a local Code Llama model using ctransformers. Or you could parametrize the prompt to support different programming languages or use the IPython object to collect code from previous cells and inject them as context in the HumanMessage.
Using other people’s tools is often the pragmatic solution.
But sometimes, building your own tools is the only way to learn (that there is no ✨magic✨).
___________________________
Author: Alexandre Chabot-Leclerc, Vice President, Digital Transformation Solutions, holds a Ph.D. in electrical engineering and a M.Sc. in acoustics engineering from the Technical University of Denmark and a B.Eng. in electrical engineering from the Université de Sherbrooke. He is passionate about transforming people and the work they do. He has taught the scientific Python stack and machine learning to hundreds of scientists, engineers, and analysts at the world’s largest corporations and national laboratories. After seven years in Denmark, Alexandre is totally sold on commuting by bicycle. If you have any free time you’d like to fill, ask him for a book, music, podcast, or restaurant recommendation.
Related Content
Revolutionizing Materials R&D with “AI Supermodels”
Learn how AI Supermodels are allowing for faster, more accurate predictions with far fewer data points.
Digital Transformation vs. Digital Enhancement: A Starting Decision Framework for Technology Initiatives in R&D
Leveraging advanced technology like generative AI through digital transformation (not digital enhancement) is how to get the biggest returns in scientific R&D.
Digital Transformation in Practice
There is much more to digital transformation than technology, and a holistic strategy is crucial for the journey.
Leveraging AI for More Efficient Research in BioPharma
In the rapidly-evolving landscape of drug discovery and development, traditional approaches to R&D in biopharma are no longer sufficient. Artificial intelligence (AI) continues to be a...
Utilizing LLMs Today in Industrial Materials and Chemical R&D
Leveraging large language models (LLMs) in materials science and chemical R&D isn't just a speculative venture for some AI future. There are two primary use...
Top 10 AI Concepts Every Scientific R&D Leader Should Know
R&D leaders and scientists need a working understanding of key AI concepts so they can more effectively develop future-forward data strategies and lead the charge...
Why A Data Fabric is Essential for Modern R&D
Scattered and siloed data is one of the top challenges slowing down scientific discovery and innovation today. What every R&D organization needs is a data...
Jupyter AI Magics Are Not ✨Magic✨
It doesn’t take ✨magic✨ to integrate ChatGPT into your Jupyter workflow. Integrating ChatGPT into your Jupyter workflow doesn’t have to be magic. New tools are…
Top 5 Takeaways from the American Chemical Society (ACS) 2023 Fall Meeting: R&D Data, Generative AI and More
By Mike Heiber, Ph.D., Materials Informatics Manager Enthought, Materials Science Solutions The American Chemical Society (ACS) is a premier scientific organization with members all over…
Real Scientists Make Their Own Tools
There’s a long history of scientists who built new tools to enable their discoveries. Tycho Brahe built a quadrant that allowed him to observe the…