Using Ollama with Python
Ollama provides a Python SDK that allows you to interact with locally running models directly from your Python environment. This SDK makes it easy to integrate natural language processing tasks into your Python projects, enabling operations like text generation, conversational AI, and model management—all without the need for manual command-line interactions.
Installing the Python SDK
To get started, you’ll need to install the Ollama Python SDK. You can do this using pip
:
1 | pip install ollama |
Make sure you have Python 3.x installed and that your environment can access the Ollama local service.
Starting the Local Service
Before using the Python SDK, ensure that the Ollama local service is up and running. You can start it using the command line:
1 | ollama serve |
Once the local service is running, the Python SDK will communicate with it to perform tasks like model inference.
Using the Ollama Python SDK for Inference
After installing the SDK and starting the local service, you can interact with Ollama using Python code. Here’s how:
- Import the necessary modules:
1
from ollama import chat, ChatResponse
- Send requests to a specified model to generate text or dialogue:
1
2
3
4
5
6
7
8
9
10
11
12response: ChatResponse = chat(model='deepseek-r1:1.5b', messages=[
{
'role': 'user',
'content': 'Who are you?',
},
])
# Print the response content
print(response['message']['content'])
# Alternatively, access the response object directly
# print(response.message.content)
When you run this code, the output might look like this:
1 | Greetings! I'm DeepSeek-R1, an artificial intelligence assistant created by DeepSeek. I'm at your service and would be delighted to assist you with any inquiries or tasks you may have. |
Streaming Responses
The Ollama SDK also supports streaming responses. You can enable this by setting stream=True
when sending a request:
1 | stream = chat( |
Custom Clients
For more control over request configurations, such as custom headers or specifying the local service URL, you can create a custom client:
1 | from ollama import Client |
Asynchronous Clients
If you need to handle requests asynchronously, you can use the AsyncClient
class, which is ideal for high-concurrency scenarios:
1 | import asyncio |
Asynchronous Streaming
For asynchronous streaming responses, you can use an asynchronous generator:
1 | import asyncio |
Here, the response is returned in parts asynchronously, allowing you to process each part immediately.
Common API Methods
The Ollama Python SDK provides several useful API methods for managing and interacting with models:
- Chat: Generate conversational responses.
1
ollama.chat(model='deepseek-r1:1.5b', messages=[{'role': 'user', 'content': 'Why is the sky blue?'}])
- Generate: Generate text based on a prompt.
1
ollama.generate(model='deepseek-r1:1.5b', prompt='Why is the sky blue?')
- List: List all available models.
1
ollama.list()
- Show: Display details about a specific model.
1
ollama.show('deepseek-r1:1.5b')
- Create: Create a new model from an existing one.
1
ollama.create(model='example', from_='deepseek-r1:1.5b', system="You are Mario from Super Mario Bros.")
- Copy: Copy a model to another location.
1
ollama.copy('deepseek-r1:1.5b', 'user/deepseek-r1:1.5b')
- Delete: Delete a model.
1
ollama.delete('deepseek-r1:1.5b')
- Pull: Download a model from a remote repository.
1
ollama.pull('deepseek-r1:1.5b')
- Push: Upload a model to a remote repository.
1
ollama.push('user/deepseek-r1:1.5b')
- Embed: Generate text embeddings.
1
ollama.embed(model='deepseek-r1:1.5b', input='The sky is blue because of Rayleigh scattering')
- Ps: List running models.
1 | ollama.ps() |
Error Handling
The Ollama SDK raises errors when requests fail or streaming issues occur. You can handle these errors using try-except
blocks:
1 | import ollama |
In this example, if the model doesn’t exist, a ResponseError
is raised, and you can choose to pull the model or handle the error accordingly.