Interacting with the Ollama API
Ollama provides an HTTP-based API that allows developers to programmatically interact with its models. This guide will walk you through the detailed usage of the Ollama API, including request formats, response formats, and example code.
Starting the Ollama Service
Before using the API, ensure the Ollama service is running. You can start it with the following command:
1 | ollama serve |
By default, the service runs at http://localhost:11434
.
All endpoints start with:
http://localhost:11434
Conventions
Model Names
Model names follow a model:tag
format. The model
part can include an optional namespace, like example/model
. For instance, deepseek-r1:14b
and llama3.2:1b
are valid examples. The tag
is optional and defaults to latest
if not specified. It’s used to pinpoint a specific version of the model.
Durations
All durations are measured and returned in nanoseconds.
Streaming Responses
Some endpoints stream responses as JSON objects. You can disable streaming by passing {"stream": false}
in the request for these endpoints.
API Endpoints
Ollama offers several key API endpoints:
Generate Text
Sends a prompt to the model and retrieves the generated text.
HTTP Method: POST
URL: /api/generate
Parameters
model
: (required) the model nameprompt
: the prompt to generate a response forsuffix
: the text after the model responseimages
: (optional) a list of base64-encoded images (for multimodal models such asllava
)
Advanced Parameters (Optional):
format
: the format to return a response in. Format can bejson
or a JSON schemaoptions
: additional model parameters listed in the documentation for the Modelfile such astemperature
system
: system message to (overrides what is defined in theModelfile
)template
: the prompt template to use (overrides what is defined in theModelfile
)stream
: iffalse
the response will be returned as a single response object, rather than a stream of objectsraw
: iftrue
no formatting will be applied to the prompt. You may choose to use theraw
parameter if you are specifying a full templated prompt in your request to the APIkeep_alive
: controls how long the model will stay loaded into memory following the request (default:5m
)context
(deprecated): the context parameter returned from a previous request to/generate
, this can be used to keep a short conversational memory
Request Format:
1 | { |
Response Format:
1 | { |
Important
It’s important to instruct the model to use JSON in theprompt
. Otherwise, the model may generate large amounts whitespace.
Request Example:
1 | curl http://localhost:11434/api/generate --data '{ |
Response (Success):
A stream of JSON objects is returned:
1 | { |
The final response in the stream also includes additional data about the generation:
total_duration
: time spent generating the responseload_duration
: time spent in nanoseconds loading the modelprompt_eval_count
: number of tokens in the promptprompt_eval_duration
: time spent in nanoseconds evaluating the prompteval_count
: number of tokens in the responseeval_duration
: time in nanoseconds spent generating the responsecontext
: an encoding of the conversation used in this response, this can be sent in the next request to keep a conversational memoryresponse
: empty if the response was streamed, if not streamed, this will contain the full response
To calculate how fast the response is generated in tokens per second (token/s), divide eval_count / eval_duration * 10^9
.
Chat
Supports multi-turn conversations, with the model retaining context.
HTTP Method: POST
URL: /api/chat
Parameters:
model
: (required) the model namemessages
: the messages of the chat, this can be used to keep a chat memorytools
: list of tools in JSON for the model to use if supported
The message
object has the following fields:
role
: the role of the message, eithersystem
,user
,assistant
, ortool
content
: the content of the messageimages
(optional): a list of images to include in the message (for multimodal models such asllava
)tool_calls
(optional): a list of tools in JSON that the model wants to use
Advanced parameters (optional):
format
: the format to return a response in. Format can bejson
or a JSON schema.options
: additional model parameters listed in the documentation for the Modelfile such astemperature
stream
: iffalse
the response will be returned as a single response object, rather than a stream of objectskeep_alive
: controls how long the model will stay loaded into memory following the request (default:5m
)
Request Format:
1 | { |
Response Format:
1 | { |
List Local Models
Lists all locally downloaded models.
HTTP Method: POST
URL: /api/tag
Response Format:
1 | { |
Pull a Model
Downloads a model from the model repository.
HTTP Method: POST
URL: /api/pull
Parameters
model
: name of the model to pullinsecure
: (optional) allow insecure connections to the library. Only use this if you are pulling from your own library during development.stream
: (optional) iffalse
the response will be returned as a single response object, rather than a stream of objects
Request Format:
1 | { |
Response Format:
1 | { |
Usage Examples
Generate Text
Using curl
to send a request:
1 | curl http://localhost:11434/api/generate -d '{ |
Multi-turn Chat
Using curl
to send a request:
1 | curl http://localhost:11434/api/chat -d '{ |
List Local Models
Using curl
to send a request:
1 | curl http://localhost:11434/api/tags |
Pull a Model
Using curl
to send a request:
1 | curl http://localhost:11434/api/pull -d '{ |
Streaming Responses
Ollama supports streaming responses, which is useful for real-time text generation.
Enabling Streaming
Set "stream": true
in the request to receive responses line by line.
Example:
1 | curl http://localhost:11434/api/generate -d '{ |
Response Format
Each line returns a JSON object:
1 | { |
Programming Language Examples
Python (using requests
library)
Generate Text:
1 | import requests |
Multi-turn Chat:
1 | response = requests.post( |
JavaScript (using fetch
API)
Generate Text:
1 | fetch("http://localhost:11434/api/generate", { |
Multi-turn Chat:
1 | fetch("http://localhost:11434/api/chat", { |
Parameter Name | Type | Requirement | Description |
---|---|---|---|
Header | Required | Request message header. | |
Token | String | Required | User token after login. If not logged in, this will be an empty string. |
Version | String | Required | API version number. |
SystemId | Integer | Required | Organization ID, representing the system ID making the request. |
Timestamp | Long | Required | Current UNIX timestamp. |