跳到主要内容

API

Endpoints

Conventions

Model names

Model names follow a model:tag format, where model can have an optional namespace such as example/model. Some examples are orca-mini:3b-q8_0 and llama3:70b. The tag is optional and, if not provided, will default to latest. The tag is used to identify a specific version.

Durations

All durations are returned in nanoseconds.

Streaming responses

Certain endpoints stream responses as JSON objects. Streaming can be disabled by providing {"stream": false} for these endpoints.

Generate a completion

POST /api/generate

Generate a response for a given prompt with a provided model. This is a streaming endpoint, so there will be a series of responses. The final response object will include statistics and additional data from the request.

Parameters

  • model: (required) the model name
  • prompt: the prompt to generate a response for
  • suffix: the text after the model response
  • images: (optional) a list of base64-encoded images (for multimodal models such as llava)

Advanced parameters (optional):

  • format: the format to return a response in. Format can be json or a JSON schema
  • options: additional model parameters listed in the documentation for the Modelfile such as temperature
  • system: system message to (overrides what is defined in the Modelfile)
  • template: the prompt template to use (overrides what is defined in the Modelfile)
  • stream: if false the response will be returned as a single response object, rather than a stream of objects
  • raw: if true no formatting will be applied to the prompt. You may choose to use the raw parameter if you are specifying a full templated prompt in your request to the API
  • keep_alive: controls how long the model will stay loaded into memory following the request (default: 5m)
  • context (deprecated): the context parameter returned from a previous request to /generate, this can be used to keep a short conversational memory

Structured outputs

Structured outputs are supported by providing a JSON schema in the format parameter. The model will generate a response that matches the schema. See the structured outputs example below.

JSON mode

Enable JSON mode by setting the format parameter to json. This will structure the response as a valid JSON object. See the JSON mode example below.

[!IMPORTANT] It's important to instruct the model to use JSON in the prompt. Otherwise, the model may generate large amounts whitespace.

Examples

Generate request (Streaming)

Request
curl http://localhost:11434/api/generate -d '{
"model": "llama3.2",
"prompt": "Why is the sky blue?"
}'
Response

A stream of JSON objects is returned:

{
"model": "llama3.2",
"created_at": "2023-08-04T08:52:19.385406455-07:00",
"response": "The",
"done": false
}

The final response in the stream also includes additional data about the generation:

  • total_duration: time spent generating the response
  • load_duration: time spent in nanoseconds loading the model
  • prompt_eval_count: number of tokens in the prompt
  • prompt_eval_duration: time spent in nanoseconds evaluating the prompt
  • eval_count: number of tokens in the response
  • eval_duration: time in nanoseconds spent generating the response
  • context: an encoding of the conversation used in this response, this can be sent in the next request to keep a conversational memory
  • response: empty if the response was streamed, if not streamed, this will contain the full response

To calculate how fast the response is generated in tokens per second (token/s), divide eval_count / eval_duration * 10^9.

{
"model": "llama3.2",
"created_at": "2023-08-04T19:22:45.499127Z",
"response": "",
"done": true,
"context": [
1,
2,
3
],
"total_duration": 10706818083,
"load_duration": 6338219291,
"prompt_eval_count": 26,
"prompt_eval_duration": 130079000,
"eval_count": 259,
"eval_duration": 4232710000
}

Request (No streaming)

Request

A response can be received in one reply when streaming is off.

curl http://localhost:11434/api/generate -d '{
"model": "llama3.2",
"prompt": "Why is the sky blue?",
"stream": false
}'
Response

If stream is set to false, the response will be a single JSON object:

{
"model": "llama3.2",
"created_at": "2023-08-04T19:22:45.499127Z",
"response": "The sky is blue because it is the color of the sky.",
"done": true,
"context": [
1,
2,
3
],
"total_duration": 5043500667,
"load_duration": 5025959,
"prompt_eval_count": 26,
"prompt_eval_duration": 325953000,
"eval_count": 290,
"eval_duration": 4709213000
}

Request (with suffix)

Request
curl http://localhost:11434/api/generate -d '{
"model": "codellama:code",
"prompt": "def compute_gcd(a, b):",
"suffix": " return result",
"options": {
"temperature": 0
},
"stream": false
}'
Response
{
"model": "codellama:code",
"created_at": "2024-07-22T20:47:51.147561Z",
"response": "\n if a == 0:\n return b\n else:\n return compute_gcd(b % a, a)\n\ndef compute_lcm(a, b):\n result = (a * b) / compute_gcd(a, b)\n",
"done": true,
"done_reason": "stop",
"context": [
],
"total_duration": 1162761250,
"load_duration": 6683708,
"prompt_eval_count": 17,
"prompt_eval_duration": 201222000,
"eval_count": 63,
"eval_duration": 953997000
}

Request (Structured outputs)

Request
curl -X POST http://localhost:11434/api/generate -H "Content-Type: application/json" -d '{
"model": "llama3.1:8b",
"prompt": "Ollama is 22 years old and is busy saving the world. Respond using JSON",
"stream": false,
"format": {
"type": "object",
"properties": {
"age": {
"type": "integer"
},
"available": {
"type": "boolean"
}
},
"required": [
"age",
"available"
]
}
}'
Response
{
"model": "llama3.1:8b",
"created_at": "2024-12-06T00:48:09.983619Z",
"response": "{\n \"age\": 22,\n \"available\": true\n}",
"done": true,
"done_reason": "stop",
"context": [
1,
2,
3
],
"total_duration": 1075509083,
"load_duration": 567678166,
"prompt_eval_count": 28,
"prompt_eval_duration": 236000000,
"eval_count": 16,
"eval_duration": 269000000
}

Request (JSON mode)

[!IMPORTANT] When format is set to json, the output will always be a well-formed JSON object. It's important to also instruct the model to respond in JSON.

Request
curl http://localhost:11434/api/generate -d '{
"model": "llama3.2",
"prompt": "What color is the sky at different times of the day? Respond using JSON",
"format": "json",
"stream": false
}'
Response
{
"model": "llama3.2",
"created_at": "2023-11-09T21:07:55.186497Z",
"response": "{\n\"morning\": {\n\"color\": \"blue\"\n},\n\"noon\": {\n\"color\": \"blue-gray\"\n},\n\"afternoon\": {\n\"color\": \"warm gray\"\n},\n\"evening\": {\n\"color\": \"orange\"\n}\n}\n",
"done": true,
"context": [
1,
2,
3
],
"total_duration": 4648158584,
"load_duration": 4071084,
"prompt_eval_count": 36,
"prompt_eval_duration": 439038000,
"eval_count": 180,
"eval_duration": 4196918000
}

The value of response will be a string containing JSON similar to:

{
"morning": {
"color": "blue"
},
"noon": {
"color": "blue-gray"
},
"afternoon": {
"color": "warm gray"
},
"evening": {
"color": "orange"
}
}

Request (with images)

To submit images to multimodal models such as llava or bakllava, provide a list of base64-encoded images:

Request

curl http://localhost:11434/api/generate -d '{
"model": "llava",
"prompt":"What is in this picture?",
"stream": false,
"images": ["iVBORw0KGgoAAAANSUhEUgAAAG0AAABmCAYAAADBPx+VAAAACXBIWXMAAAsTAAALEwEAmpwYAAAAAXNSR0IArs4c6QAAAARnQU1BAACxjwv8YQUAAA3VSURBVHgB7Z27r0zdG8fX743i1bi1ikMoFMQloXRpKFFIqI7LH4BEQ+NWIkjQuSWCRIEoULk0gsK1kCBI0IhrQVT7tz/7zZo888yz1r7MnDl7z5xvsjkzs2fP3uu71nNfa7lkAsm7d++Sffv2JbNmzUqcc8m0adOSzZs3Z+/XES4ZckAWJEGWPiCxjsQNLWmQsWjRIpMseaxcuTKpG/7HP27I8P79e7dq1ars/yL4/v27S0ejqwv+cUOGEGGpKHR37tzJCEpHV9tnT58+dXXCJDdECBE2Ojrqjh071hpNECjx4cMHVycM1Uhbv359B2F79+51586daxN/+pyRkRFXKyRDAqxEp4yMlDDzXG1NPnnyJKkThoK0VFd1ELZu3TrzXKxKfW7dMBQ6bcuWLW2v0VlHjx41z717927ba22U9APcw7Nnz1oGEPeL3m3p2mTAYYnFmMOMXybPPXv2bNIPpFZr1NHn4HMw0KRBjg9NuRw95s8PEcz/6DZELQd/09C9QGq5RsmSRybqkwHGjh07OsJSsYYm3ijPpyHzoiacg35MLdDSIS/O1yM778jOTwYUkKNHWUzUWaOsylE00MyI0fcnOwIdjvtNdW/HZwNLGg+sR1kMepSNJXmIwxBZiG8tDTpEZzKg0GItNsosY8USkxDhD0Rinuiko2gfL/RbiD2LZAjU9zKQJj8RDR0vJBR1/Phx9+PHj9Z7REF4nTZkxzX4LCXHrV271qXkBAPGfP/atWvu/PnzHe4C97F48eIsRLZ9+3a3f/9+87dwP1JxaF7/3r17ba+5l4EcaVo0lj3SBq5kGTJSQmLWMjgYNei2GPT1MuMqGTDEFHzeQSP2wi/jGnkmPJ/nhccs44jvDAxpVcxnq0F6eT8h4ni/iIWpR5lPyA6ETkNXoSukvpJAD3AsXLiwpZs49+fPn5ke4j10TqYvegSfn0OnafC+Tv9ooA/JPkgQysqQNBzagXY55nO/oa1F7qvIPWkRL12WRpMWUvpVDYmxAPehxWSe8ZEXL20sadYIozfmNch4QJPAfeJgW3rNsnzphBKNJM2KKODo1rVOMRYik5ETy3ix4qWNI81qAAirizgMIc+yhTytx0JWZuNI03qsrgWlGtwjoS9XwgUhWGyhUaRZZQNNIEwCiXD16tXcAHUs79co0vSD8rrJCIW98pzvxpAWyyo3HYwqS0+H0BjStClcZJT5coMm6D2LOF8TolGJtK9fvyZpyiC5ePFi9nc/oJU4eiEP0jVoAnHa9wyJycITMP78+eMeP37sXrx44d6+fdt6f82aNdkx1pg9e3Zb5W+RSRE+n+VjksQWifvVaTKFhn5O8my63K8Qabdv33b379/PiAP//vuvW7BggZszZ072/+TJk91YgkafPn166zXB1rQHFvouAWHq9z3SEevSUerqCn2/dDCeta2jxYbr69evk4MHDyY7d+7MjhMnTiTPnz9Pfv/+nfQT2ggpO2dMF8cghuoM7Ygj5iWCqRlGFml0QC/ftGmTmzt3rmsaKDsgBSPh0/8yPeLLBihLkOKJc0jp8H8vUzcxIA1k6QJ/c78tWEyj5P3o4u9+jywNPdJi5rAH9x0KHcl4Hg570eQp3+vHXGyrmEeigzQsQsjavXt38ujRo44LQuDDhw+TW7duRS1HGgMxhNXHgflaNTOsHyKvHK5Ijo2jbFjJBQK9YwFd6RVMzfgRBmEfP37suBBm/p49e1qjEP2mwTViNRo0VJWH1deMXcNK08uUjVUu7s/zRaL+oLNxz1bpANco4npUgX4G2eFbpDFyQoQxojBCpEGSytmOH8qrH5Q9vuzD6ofQylkCUmh8DBAr+q8JCyVNtWQIidKQE9wNtLSQnS4jDSsxNHogzFuQBw4cyM61UKVsjfr3ooBkPSqqQHesUPWVtzi9/vQi1T+rJj7WiTz4Pt/l3LxUkr5P2VYZaZ4URpsE+st/dujQoaBBYokbrz/8TJNQYLSonrPS9kUaSkPeZyj1AWSj+d+VBoy1pIWVNed8P0Ll/ee5HdGRhrHhR5GGN0r4LGZBaj8oFDJitBTJzIZgFcmU0Y8ytWMZMzJOaXUSrUs5RxKnrxmbb5YXO9VGUhtpXldhEUogFr3IzIsvlpmdosVcGVGXFWp2oU9kLFL3dEkSz6NHEY1sjSRdIuDFWEhd8KxFqsRi1uM/nz9/zpxnwlESONdg6dKlbsaMGS4EHFHtjFIDHwKOo46l4TxSuxgDzi+rE2jg+BaFruOX4HXa0Nnf1lwAPufZeF8/r6zD97WK2qFnGjBxTw5qNGPxT+5T/r7/7RawFC3j4vTp09koCxkeHjqbHJqArmH5UrFKKksnxrK7FuRIs8STfBZv+luugXZ2pR/pP9Ois4z+TiMzUUkUjD0iEi1fzX8GmXyuxUBRcaUfykV0YZnlJGKQpOiGB76x5GeWkWWJc3mOrK6S7xdND+W5N6XyaRgtWJFe13GkaZnKOsYqGdOVVVbGupsyA/l7emTLHi7vwTdirNEt0qxnzAvBFcnQF16xh/TMpUuXHDowhlA9vQVraQhkudRdzOnK+04ZSP3DUhVSP61YsaLtd/ks7ZgtPcXqPqEafHkdqa84X6aCeL7YWlv6edGFHb+ZFICPlljHhg0bKuk0CSvVznWsotRu433alNdFrqG45ejoaPCaUkWERpLXjzFL2Rpllp7PJU2a/v7Ab8N05/9t27Z16KUqoFGsxnI9EosS2niSYg9SpU6B4JgTrvVW1flt1sT+0ADIJU2maXzcUTraGCRaL1Wp9rUMk16PMom8QhruxzvZIegJjFU7LLCePfS8uaQdPny4jTTL0dbee5mYokQsXTIWNY46kuMbnt8Kmec+LGWtOVIl9cT1rCB0V8WqkjAsRwta93TbwNYoGKsUSChN44lgBNCoHLHzquYKrU6qZ8lolCIN0Rh6cP0Q3U6I6IXILYOQI513hJaSKAorFpuHXJNfVlpRtmYBk1Su1obZr5dnKAO+L10Hrj3WZW+E3qh6IszE37F6EB+68mGpvKm4eb9bFrlzrok7fvr0Kfv727dvWRmdVTJHw0qiiCUSZ6wCK+7XL/AcsgNyL74DQQ730sv78Su7+t/A36MdY0sW5o40ahslXr58aZ5HtZB8GH64m9EmMZ7FpYw4T6QnrZfgenrhFxaSiSGXtPnz57e9TkNZLvTjeqhr734CNtrK41L40sUQckmj1lGKQ0rC37x544r8eNXRpnVE3ZZY7zXo8NomiO0ZUCj2uHz58rbXoZ6gc0uA+F6ZeKS/jhRDUq8MKrTho9fEkihMmhxtBI1DxKFY9XLpVcSkfoi8JGnToZO5sU5aiDQIW716ddt7ZLYtMQlhECdBGXZZMWldY5BHm5xgAroWj4C0hbYkSc/jBmggIrXJWlZM6pSETsEPGqZOndr2uuuR5rF169a2HoHPdurUKZM4CO1WTPqaDaAd+GFGKdIQkxAn9RuEWcTRyN2KSUgiSgF5aWzPTeA/lN5rZubMmR2bE4SIC4nJoltgAV/dVefZm72AtctUCJU2CMJ327hxY9t7EHbkyJFseq+EJSY16RPo3Dkq1kkr7+q0bNmyDuLQcZBEPYmHVdOBiJyIlrRDq41YPWfXOxUysi5fvtyaj+2BpcnsUV/oSoEMOk2CQGlr4ckhBwaetBhjCwH0ZHtJROPJkyc7UjcYLDjmrH7ADTEBXFfOYmB0k9oYBOjJ8b4aOYSe7QkKcYhFlq3QYLQhSidNmtS2RATwy8YOM3EQJsUjKiaWZ+vZToUQgzhkHXudb/PW5YMHD9yZM2faPsMwoc7RciYJXbGuBqJ1UIGKKLv915jsvgtJxCZDubdXr165mzdvtr1Hz5LONA8jrUwKPqsmVesKa49S3Q4WxmRPUEYdTjgiUcfUwLx589ySJUva3oMkP6IYddq6HMS4o55xBJBUeRjzfa4Zdeg56QZ43LhxoyPo7Lf1kNt7oO8wWAbNwaYjIv5lhyS7kRf96dvm5Jah8vfvX3flyhX35cuX6HfzFHOToS1H4BenCaHvO8pr8iDuwoUL7tevX+b5ZdbBair0xkFIlFDlW4ZknEClsp/TzXyAKVOmmHWFVSbDNw1l1+4f90U6IY/q4V27dpnE9bJ+v87QEydjqx/UamVVPRG+mwkNTYN+9tjkwzEx+atCm/X9WvWtDtAb68Wy9LXa1UmvCDDIpPkyOQ5ZwSzJ4jMrvFcr0rSjOUh+GcT4LSg5ugkW1Io0/SCDQBojh0hPlaJdah+tkVYrnTZowP8iq1F1TgMBBauufyB33x1v+NWFYmT5KmppgHC+NkAgbmRkpD3yn9QIseXymoTQFGQmIOKTxiZIWpvAatenVqRVXf2nTrAWMsPnKrMZHz6bJq5jvce6QK8J1cQNgKxlJapMPdZSR64/UivS9NztpkVEdKcrs5alhhWP9NeqlfWopzhZScI6QxseegZRGeg5a8C3Re1Mfl1ScP36ddcUaMuv24iOJtz7sbUjTS4qBvKmstYJoUauiuD3k5qhyr7QdUHMeCgLa1Ear9NquemdXgmum4fvJ6w1lqsuDhNrg1qSpleJK7K3TF0Q2jSd94uSZ60kK1e3qyVpQK6PVWXp2/FC3mp6jBhKKOiY2h3gtUV64TWM6wDETRPLDfSakXmH3w8g9Jlug8ZtTt4kVF0kLUYYmCCtD/DrQ5YhMGbA9L3ucdjh0y8kOHW5gU/VEEmJTcL4Pz/f7mgoAbYkAAAAAElFTkSuQmCC"]
}'

Response

{
"model": "llava",
"created_at": "2023-11-03T15:36:02.583064Z",
"response": "A happy cartoon character, which is cute and cheerful.",
"done": true,
"context": [
1,
2,
3
],
"total_duration": 2938432250,
"load_duration": 2559292,
"prompt_eval_count": 1,
"prompt_eval_duration": 2195557000,
"eval_count": 44,
"eval_duration": 736432000
}

Request (Raw Mode)

In some cases, you may wish to bypass the templating system and provide a full prompt. In this case, you can use the raw parameter to disable templating. Also note that raw mode will not return a context.

Request
curl http://localhost:11434/api/generate -d '{
"model": "mistral",
"prompt": "[INST] why is the sky blue? [/INST]",
"raw": true,
"stream": false
}'

Request (Reproducible outputs)

For reproducible outputs, set seed to a number:

Request
curl http://localhost:11434/api/generate -d '{
"model": "mistral",
"prompt": "Why is the sky blue?",
"options": {
"seed": 123
}
}'
Response
{
"model": "mistral",
"created_at": "2023-11-03T15:36:02.583064Z",
"response": " The sky appears blue because of a phenomenon called Rayleigh scattering.",
"done": true,
"total_duration": 8493852375,
"load_duration": 6589624375,
"prompt_eval_count": 14,
"prompt_eval_duration": 119039000,
"eval_count": 110,
"eval_duration": 1779061000
}

Generate request (With options)

If you want to set custom options for the model at runtime rather than in the Modelfile, you can do so with the options parameter. This example sets every available option, but you can set any of them individually and omit the ones you do not want to override.

Request
curl http://localhost:11434/api/generate -d '{
"model": "llama3.2",
"prompt": "Why is the sky blue?",
"stream": false,
"options": {
"num_keep": 5,
"seed": 42,
"num_predict": 100,
"top_k": 20,
"top_p": 0.9,
"min_p": 0.0,
"typical_p": 0.7,
"repeat_last_n": 33,
"temperature": 0.8,
"repeat_penalty": 1.2,
"presence_penalty": 1.5,
"frequency_penalty": 1.0,
"penalize_newline": true,
"stop": ["\n", "user:"],
"numa": false,
"num_ctx": 1024,
"num_batch": 2,
"num_gpu": 1,
"main_gpu": 0,
"use_mmap": true,
"num_thread": 8
}
}'
Response
{
"model": "llama3.2",
"created_at": "2023-08-04T19:22:45.499127Z",
"response": "The sky is blue because it is the color of the sky.",
"done": true,
"context": [
1,
2,
3
],
"total_duration": 4935886791,
"load_duration": 534986708,
"prompt_eval_count": 26,
"prompt_eval_duration": 107345000,
"eval_count": 237,
"eval_duration": 4289432000
}

Load a model

If an empty prompt is provided, the model will be loaded into memory.

Request
curl http://localhost:11434/api/generate -d '{
"model": "llama3.2"
}'
Response

A single JSON object is returned:

{
"model": "llama3.2",
"created_at": "2023-12-18T19:52:07.071755Z",
"response": "",
"done": true
}

Unload a model

If an empty prompt is provided and the keep_alive parameter is set to 0, a model will be unloaded from memory.

Request
curl http://localhost:11434/api/generate -d '{
"model": "llama3.2",
"keep_alive": 0
}'
Response

A single JSON object is returned:

{
"model": "llama3.2",
"created_at": "2024-09-12T03:54:03.516566Z",
"response": "",
"done": true,
"done_reason": "unload"
}

Generate a chat completion

POST /api/chat

Generate the next message in a chat with a provided model. This is a streaming endpoint, so there will be a series of responses. Streaming can be disabled using "stream": false. The final response object will include statistics and additional data from the request.

Parameters

  • model: (required) the model name
  • messages: the messages of the chat, this can be used to keep a chat memory
  • tools: list of tools in JSON for the model to use if supported

The message object has the following fields:

  • role: the role of the message, either system, user, assistant, or tool
  • content: the content of the message
  • images (optional): a list of images to include in the message (for multimodal models such as llava)
  • tool_calls (optional): a list of tools in JSON that the model wants to use

Advanced parameters (optional):

  • format: the format to return a response in. Format can be json or a JSON schema.
  • options: additional model parameters listed in the documentation for the Modelfile such as temperature
  • stream: if false the response will be returned as a single response object, rather than a stream of objects
  • keep_alive: controls how long the model will stay loaded into memory following the request (default: 5m)

Structured outputs

Structured outputs are supported by providing a JSON schema in the format parameter. The model will generate a response that matches the schema. See the Chat request (Structured outputs) example below.

Examples

Chat Request (Streaming)

Request

Send a chat message with a streaming response.

curl http://localhost:11434/api/chat -d '{
"model": "llama3.2",
"messages": [
{
"role": "user",
"content": "why is the sky blue?"
}
]
}'
Response

A stream of JSON objects is returned:

{
"model": "llama3.2",
"created_at": "2023-08-04T08:52:19.385406455-07:00",
"message": {
"role": "assistant",
"content": "The",
"images": null
},
"done": false
}

Final response:

{
"model": "llama3.2",
"created_at": "2023-08-04T19:22:45.499127Z",
"message": {
"role": "assistant",
"content": ""
},
"done": true,
"total_duration": 4883583458,
"load_duration": 1334875,
"prompt_eval_count": 26,
"prompt_eval_duration": 342546000,
"eval_count": 282,
"eval_duration": 4535599000
}

Chat request (No streaming)

Request
curl http://localhost:11434/api/chat -d '{
"model": "llama3.2",
"messages": [
{
"role": "user",
"content": "why is the sky blue?"
}
],
"stream": false
}'
Response
{
"model": "llama3.2",
"created_at": "2023-12-12T14:13:43.416799Z",
"message": {
"role": "assistant",
"content": "Hello! How are you today?"
},
"done": true,
"total_duration": 5191566416,
"load_duration": 2154458,
"prompt_eval_count": 26,
"prompt_eval_duration": 383809000,
"eval_count": 298,
"eval_duration": 4799921000
}

Chat request (Structured outputs)

Request
curl -X POST http://localhost:11434/api/chat -H "Content-Type: application/json" -d '{
"model": "llama3.1",
"messages": [{"role": "user", "content": "Ollama is 22 years old and busy saving the world. Return a JSON object with the age and availability."}],
"stream": false,
"format": {
"type": "object",
"properties": {
"age": {
"type": "integer"
},
"available": {
"type": "boolean"
}
},
"required": [
"age",
"available"
]
},
"options": {
"temperature": 0
}
}'
Response
{
"model": "llama3.1",
"created_at": "2024-12-06T00:46:58.265747Z",
"message": {
"role": "assistant",
"content": "{\"age\": 22, \"available\": false}"
},
"done_reason": "stop",
"done": true,
"total_duration": 2254970291,
"load_duration": 574751416,
"prompt_eval_count": 34,
"prompt_eval_duration": 1502000000,
"eval_count": 12,
"eval_duration": 175000000
}

Chat request (With History)

Send a chat message with a conversation history. You can use this same approach to start the conversation using multi-shot or chain-of-thought prompting.

Request
curl http://localhost:11434/api/chat -d '{
"model": "llama3.2",
"messages": [
{
"role": "user",
"content": "why is the sky blue?"
},
{
"role": "assistant",
"content": "due to rayleigh scattering."
},
{
"role": "user",
"content": "how is that different than mie scattering?"
}
]
}'
Response

A stream of JSON objects is returned:

{
"model": "llama3.2",
"created_at": "2023-08-04T08:52:19.385406455-07:00",
"message": {
"role": "assistant",
"content": "The"
},
"done": false
}

Final response:

{
"model": "llama3.2",
"created_at": "2023-08-04T19:22:45.499127Z",
"done": true,
"total_duration": 8113331500,
"load_duration": 6396458,
"prompt_eval_count": 61,
"prompt_eval_duration": 398801000,
"eval_count": 468,
"eval_duration": 7701267000
}

Chat request (with images)

Request

Send a chat message with images. The images should be provided as an array, with the individual images encoded in Base64.

curl http://localhost:11434/api/chat -d '{
"model": "llava",
"messages": [
{
"role": "user",
"content": "what is in this image?",
"images": ["iVBORw0KGgoAAAANSUhEUgAAAG0AAABmCAYAAADBPx+VAAAACXBIWXMAAAsTAAALEwEAmpwYAAAAAXNSR0IArs4c6QAAAARnQU1BAACxjwv8YQUAAA3VSURBVHgB7Z27r0zdG8fX743i1bi1ikMoFMQloXRpKFFIqI7LH4BEQ+NWIkjQuSWCRIEoULk0gsK1kCBI0IhrQVT7tz/7zZo888yz1r7MnDl7z5xvsjkzs2fP3uu71nNfa7lkAsm7d++Sffv2JbNmzUqcc8m0adOSzZs3Z+/XES4ZckAWJEGWPiCxjsQNLWmQsWjRIpMseaxcuTKpG/7HP27I8P79e7dq1ars/yL4/v27S0ejqwv+cUOGEGGpKHR37tzJCEpHV9tnT58+dXXCJDdECBE2Ojrqjh071hpNECjx4cMHVycM1Uhbv359B2F79+51586daxN/+pyRkRFXKyRDAqxEp4yMlDDzXG1NPnnyJKkThoK0VFd1ELZu3TrzXKxKfW7dMBQ6bcuWLW2v0VlHjx41z717927ba22U9APcw7Nnz1oGEPeL3m3p2mTAYYnFmMOMXybPPXv2bNIPpFZr1NHn4HMw0KRBjg9NuRw95s8PEcz/6DZELQd/09C9QGq5RsmSRybqkwHGjh07OsJSsYYm3ijPpyHzoiacg35MLdDSIS/O1yM778jOTwYUkKNHWUzUWaOsylE00MyI0fcnOwIdjvtNdW/HZwNLGg+sR1kMepSNJXmIwxBZiG8tDTpEZzKg0GItNsosY8USkxDhD0Rinuiko2gfL/RbiD2LZAjU9zKQJj8RDR0vJBR1/Phx9+PHj9Z7REF4nTZkxzX4LCXHrV271qXkBAPGfP/atWvu/PnzHe4C97F48eIsRLZ9+3a3f/9+87dwP1JxaF7/3r17ba+5l4EcaVo0lj3SBq5kGTJSQmLWMjgYNei2GPT1MuMqGTDEFHzeQSP2wi/jGnkmPJ/nhccs44jvDAxpVcxnq0F6eT8h4ni/iIWpR5lPyA6ETkNXoSukvpJAD3AsXLiwpZs49+fPn5ke4j10TqYvegSfn0OnafC+Tv9ooA/JPkgQysqQNBzagXY55nO/oa1F7qvIPWkRL12WRpMWUvpVDYmxAPehxWSe8ZEXL20sadYIozfmNch4QJPAfeJgW3rNsnzphBKNJM2KKODo1rVOMRYik5ETy3ix4qWNI81qAAirizgMIc+yhTytx0JWZuNI03qsrgWlGtwjoS9XwgUhWGyhUaRZZQNNIEwCiXD16tXcAHUs79co0vSD8rrJCIW98pzvxpAWyyo3HYwqS0+H0BjStClcZJT5coMm6D2LOF8TolGJtK9fvyZpyiC5ePFi9nc/oJU4eiEP0jVoAnHa9wyJycITMP78+eMeP37sXrx44d6+fdt6f82aNdkx1pg9e3Zb5W+RSRE+n+VjksQWifvVaTKFhn5O8my63K8Qabdv33b379/PiAP//vuvW7BggZszZ072/+TJk91YgkafPn166zXB1rQHFvouAWHq9z3SEevSUerqCn2/dDCeta2jxYbr69evk4MHDyY7d+7MjhMnTiTPnz9Pfv/+nfQT2ggpO2dMF8cghuoM7Ygj5iWCqRlGFml0QC/ftGmTmzt3rmsaKDsgBSPh0/8yPeLLBihLkOKJc0jp8H8vUzcxIA1k6QJ/c78tWEyj5P3o4u9+jywNPdJi5rAH9x0KHcl4Hg570eQp3+vHXGyrmEeigzQsQsjavXt38ujRo44LQuDDhw+TW7duRS1HGgMxhNXHgflaNTOsHyKvHK5Ijo2jbFjJBQK9YwFd6RVMzfgRBmEfP37suBBm/p49e1qjEP2mwTViNRo0VJWH1deMXcNK08uUjVUu7s/zRaL+oLNxz1bpANco4npUgX4G2eFbpDFyQoQxojBCpEGSytmOH8qrH5Q9vuzD6ofQylkCUmh8DBAr+q8JCyVNtWQIidKQE9wNtLSQnS4jDSsxNHogzFuQBw4cyM61UKVsjfr3ooBkPSqqQHesUPWVtzi9/vQi1T+rJj7WiTz4Pt/l3LxUkr5P2VYZaZ4URpsE+st/dujQoaBBYokbrz/8TJNQYLSonrPS9kUaSkPeZyj1AWSj+d+VBoy1pIWVNed8P0Ll/ee5HdGRhrHhR5GGN0r4LGZBaj8oFDJitBTJzIZgFcmU0Y8ytWMZMzJOaXUSrUs5RxKnrxmbb5YXO9VGUhtpXldhEUogFr3IzIsvlpmdosVcGVGXFWp2oU9kLFL3dEkSz6NHEY1sjSRdIuDFWEhd8KxFqsRi1uM/nz9/zpxnwlESONdg6dKlbsaMGS4EHFHtjFIDHwKOo46l4TxSuxgDzi+rE2jg+BaFruOX4HXa0Nnf1lwAPufZeF8/r6zD97WK2qFnGjBxTw5qNGPxT+5T/r7/7RawFC3j4vTp09koCxkeHjqbHJqArmH5UrFKKksnxrK7FuRIs8STfBZv+luugXZ2pR/pP9Ois4z+TiMzUUkUjD0iEi1fzX8GmXyuxUBRcaUfykV0YZnlJGKQpOiGB76x5GeWkWWJc3mOrK6S7xdND+W5N6XyaRgtWJFe13GkaZnKOsYqGdOVVVbGupsyA/l7emTLHi7vwTdirNEt0qxnzAvBFcnQF16xh/TMpUuXHDowhlA9vQVraQhkudRdzOnK+04ZSP3DUhVSP61YsaLtd/ks7ZgtPcXqPqEafHkdqa84X6aCeL7YWlv6edGFHb+ZFICPlljHhg0bKuk0CSvVznWsotRu433alNdFrqG45ejoaPCaUkWERpLXjzFL2Rpllp7PJU2a/v7Ab8N05/9t27Z16KUqoFGsxnI9EosS2niSYg9SpU6B4JgTrvVW1flt1sT+0ADIJU2maXzcUTraGCRaL1Wp9rUMk16PMom8QhruxzvZIegJjFU7LLCePfS8uaQdPny4jTTL0dbee5mYokQsXTIWNY46kuMbnt8Kmec+LGWtOVIl9cT1rCB0V8WqkjAsRwta93TbwNYoGKsUSChN44lgBNCoHLHzquYKrU6qZ8lolCIN0Rh6cP0Q3U6I6IXILYOQI513hJaSKAorFpuHXJNfVlpRtmYBk1Su1obZr5dnKAO+L10Hrj3WZW+E3qh6IszE37F6EB+68mGpvKm4eb9bFrlzrok7fvr0Kfv727dvWRmdVTJHw0qiiCUSZ6wCK+7XL/AcsgNyL74DQQ730sv78Su7+t/A36MdY0sW5o40ahslXr58aZ5HtZB8GH64m9EmMZ7FpYw4T6QnrZfgenrhFxaSiSGXtPnz57e9TkNZLvTjeqhr734CNtrK41L40sUQckmj1lGKQ0rC37x544r8eNXRpnVE3ZZY7zXo8NomiO0ZUCj2uHz58rbXoZ6gc0uA+F6ZeKS/jhRDUq8MKrTho9fEkihMmhxtBI1DxKFY9XLpVcSkfoi8JGnToZO5sU5aiDQIW716ddt7ZLYtMQlhECdBGXZZMWldY5BHm5xgAroWj4C0hbYkSc/jBmggIrXJWlZM6pSETsEPGqZOndr2uuuR5rF169a2HoHPdurUKZM4CO1WTPqaDaAd+GFGKdIQkxAn9RuEWcTRyN2KSUgiSgF5aWzPTeA/lN5rZubMmR2bE4SIC4nJoltgAV/dVefZm72AtctUCJU2CMJ327hxY9t7EHbkyJFseq+EJSY16RPo3Dkq1kkr7+q0bNmyDuLQcZBEPYmHVdOBiJyIlrRDq41YPWfXOxUysi5fvtyaj+2BpcnsUV/oSoEMOk2CQGlr4ckhBwaetBhjCwH0ZHtJROPJkyc7UjcYLDjmrH7ADTEBXFfOYmB0k9oYBOjJ8b4aOYSe7QkKcYhFlq3QYLQhSidNmtS2RATwy8YOM3EQJsUjKiaWZ+vZToUQgzhkHXudb/PW5YMHD9yZM2faPsMwoc7RciYJXbGuBqJ1UIGKKLv915jsvgtJxCZDubdXr165mzdvtr1Hz5LONA8jrUwKPqsmVesKa49S3Q4WxmRPUEYdTjgiUcfUwLx589ySJUva3oMkP6IYddq6HMS4o55xBJBUeRjzfa4Zdeg56QZ43LhxoyPo7Lf1kNt7oO8wWAbNwaYjIv5lhyS7kRf96dvm5Jah8vfvX3flyhX35cuX6HfzFHOToS1H4BenCaHvO8pr8iDuwoUL7tevX+b5ZdbBair0xkFIlFDlW4ZknEClsp/TzXyAKVOmmHWFVSbDNw1l1+4f90U6IY/q4V27dpnE9bJ+v87QEydjqx/UamVVPRG+mwkNTYN+9tjkwzEx+atCm/X9WvWtDtAb68Wy9LXa1UmvCDDIpPkyOQ5ZwSzJ4jMrvFcr0rSjOUh+GcT4LSg5ugkW1Io0/SCDQBojh0hPlaJdah+tkVYrnTZowP8iq1F1TgMBBauufyB33x1v+NWFYmT5KmppgHC+NkAgbmRkpD3yn9QIseXymoTQFGQmIOKTxiZIWpvAatenVqRVXf2nTrAWMsPnKrMZHz6bJq5jvce6QK8J1cQNgKxlJapMPdZSR64/UivS9NztpkVEdKcrs5alhhWP9NeqlfWopzhZScI6QxseegZRGeg5a8C3Re1Mfl1ScP36ddcUaMuv24iOJtz7sbUjTS4qBvKmstYJoUauiuD3k5qhyr7QdUHMeCgLa1Ear9NquemdXgmum4fvJ6w1lqsuDhNrg1qSpleJK7K3TF0Q2jSd94uSZ60kK1e3qyVpQK6PVWXp2/FC3mp6jBhKKOiY2h3gtUV64TWM6wDETRPLDfSakXmH3w8g9Jlug8ZtTt4kVF0kLUYYmCCtD/DrQ5YhMGbA9L3ucdjh0y8kOHW5gU/VEEmJTcL4Pz/f7mgoAbYkAAAAAElFTkSuQmCC"]
}
]
}'
Response
{
"model": "llava",
"created_at": "2023-12-13T22:42:50.203334Z",
"message": {
"role": "assistant",
"content": " The image features a cute, little pig with an angry facial expression. It's wearing a heart on its shirt and is waving in the air. This scene appears to be part of a drawing or sketching project.",
"images": null
},
"done": true,
"total_duration": 1668506709,
"load_duration": 1986209,
"prompt_eval_count": 26,
"prompt_eval_duration": 359682000,
"eval_count": 83,
"eval_duration": 1303285000
}

Chat request (Reproducible outputs)

Request
curl http://localhost:11434/api/chat -d '{
"model": "llama3.2",
"messages": [
{
"role": "user",
"content": "Hello!"
}
],
"options": {
"seed": 101,
"temperature": 0
}
}'
Response
{
"model": "llama3.2",
"created_at": "2023-12-12T14:13:43.416799Z",
"message": {
"role": "assistant",
"content": "Hello! How are you today?"
},
"done": true,
"total_duration": 5191566416,
"load_duration": 2154458,
"prompt_eval_count": 26,
"prompt_eval_duration": 383809000,
"eval_count": 298,
"eval_duration": 4799921000
}

Chat request (with tools)

Request
curl http://localhost:11434/api/chat -d '{
"model": "llama3.2",
"messages": [
{
"role": "user",
"content": "What is the weather today in Paris?"
}
],
"stream": false,
"tools": [
{
"type": "function",
"function": {
"name": "get_current_weather",
"description": "Get the current weather for a location",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "The location to get the weather for, e.g. San Francisco, CA"
},
"format": {
"type": "string",
"description": "The format to return the weather in, e.g. 'celsius' or 'fahrenheit'",
"enum": ["celsius", "fahrenheit"]
}
},
"required": ["location", "format"]
}
}
}
]
}'
Response
{
"model": "llama3.2",
"created_at": "2024-07-22T20:33:28.123648Z",
"message": {
"role": "assistant",
"content": "",
"tool_calls": [
{
"function": {
"name": "get_current_weather",
"arguments": {
"format": "celsius",
"location": "Paris, FR"
}
}
}
]
},
"done_reason": "stop",
"done": true,
"total_duration": 885095291,
"load_duration": 3753500,
"prompt_eval_count": 122,
"prompt_eval_duration": 328493000,
"eval_count": 33,
"eval_duration": 552222000
}

Load a model

If the messages array is empty, the model will be loaded into memory.

Request
curl http://localhost:11434/api/chat -d '{
"model": "llama3.2",
"messages": []
}'
Response
{
"model": "llama3.2",
"created_at": "2024-09-12T21:17:29.110811Z",
"message": {
"role": "assistant",
"content": ""
},
"done_reason": "load",
"done": true
}

Unload a model

If the messages array is empty and the keep_alive parameter is set to 0, a model will be unloaded from memory.

Request
curl http://localhost:11434/api/chat -d '{
"model": "llama3.2",
"messages": [],
"keep_alive": 0
}'
Response

A single JSON object is returned:

{
"model": "llama3.2",
"created_at": "2024-09-12T21:33:17.547535Z",
"message": {
"role": "assistant",
"content": ""
},
"done_reason": "unload",
"done": true
}

Create a Model

POST /api/create

Create a model from:

  • another model;
  • a safetensors directory; or
  • a GGUF file.

If you are creating a model from a safetensors directory or from a GGUF file, you must create a blob for each of the files and then use the file name and SHA256 digest associated with each blob in the files field.

Parameters

  • model: name of the model to create
  • from: (optional) name of an existing model to create the new model from
  • files: (optional) a dictionary of file names to SHA256 digests of blobs to create the model from
  • adapters: (optional) a dictionary of file names to SHA256 digests of blobs for LORA adapters
  • template: (optional) the prompt template for the model
  • license: (optional) a string or list of strings containing the license or licenses for the model
  • system: (optional) a string containing the system prompt for the model
  • parameters: (optional) a dictionary of parameters for the model ( see Modelfile for a list of parameters)
  • messages: (optional) a list of message objects used to create a conversation
  • stream: (optional) if false the response will be returned as a single response object, rather than a stream of objects
  • quantize (optional): quantize a non-quantized (e.g. float16) model

Quantization types

TypeRecommended
q4_K_M*
q4_K_S
q8_0*

Examples

Create a new model

Create a new model from an existing model.

Request
curl http://localhost:11434/api/create -d '{
"model": "mario",
"from": "llama3.2",
"system": "You are Mario from Super Mario Bros."
}'
Response

A stream of JSON objects is returned:

{
"status": "reading model metadata"
}
{
"status": "creating system layer"
}
{
"status": "using already created layer sha256:22f7f8ef5f4c791c1b03d7eb414399294764d7cc82c7e94aa81a1feb80a983a2"
}
{
"status": "using already created layer sha256:8c17c2ebb0ea011be9981cc3922db8ca8fa61e828c5d3f44cb6ae342bf80460b"
}
{
"status": "using already created layer sha256:7c23fb36d80141c4ab8cdbb61ee4790102ebd2bf7aeff414453177d4f2110e5d"
}
{
"status": "using already created layer sha256:2e0493f67d0c8c9c68a8aeacdf6a38a2151cb3c4c1d42accf296e19810527988"
}
{
"status": "using already created layer sha256:2759286baa875dc22de5394b4a925701b1896a7e3f8e53275c36f75a877a82c9"
}
{
"status": "writing layer sha256:df30045fe90f0d750db82a058109cecd6d4de9c90a3d75b19c09e5f64580bb42"
}
{
"status": "writing layer sha256:f18a68eb09bf925bb1b669490407c1b1251c5db98dc4d3d81f3088498ea55690"
}
{
"status": "writing manifest"
}
{
"status": "success"
}

Quantize a model

Quantize a non-quantized model.

Request
curl http://localhost:11434/api/create -d '{
"model": "llama3.2:quantized",
"from": "llama3.2:3b-instruct-fp16",
"quantize": "q4_K_M"
}'
Response

A stream of JSON objects is returned:

{
"status": "quantizing F16 model to Q4_K_M",
"digest": "0",
"total": 6433687776,
"completed": 12302
}
{
"status": "quantizing F16 model to Q4_K_M",
"digest": "0",
"total": 6433687776,
"completed": 6433687552
}
{
"status": "verifying conversion"
}
{
"status": "creating new layer sha256:fb7f4f211b89c6c4928ff4ddb73db9f9c0cfca3e000c3e40d6cf27ddc6ca72eb"
}
{
"status": "using existing layer sha256:966de95ca8a62200913e3f8bfbf84c8494536f1b94b49166851e76644e966396"
}
{
"status": "using existing layer sha256:fcc5a6bec9daf9b561a68827b67ab6088e1dba9d1fa2a50d7bbcc8384e0a265d"
}
{
"status": "using existing layer sha256:a70ff7e570d97baaf4e62ac6e6ad9975e04caa6d900d3742d37698494479e0cd"
}
{
"status": "using existing layer sha256:56bb8bd477a519ffa694fc449c2413c6f0e1d3b1c88fa7e3c9d88d3ae49d4dcb"
}
{
"status": "writing manifest"
}
{
"status": "success"
}

Create a model from GGUF

Create a model from a GGUF file. The files parameter should be filled out with the file name and SHA256 digest of the GGUF file you wish to use. Use /api/blobs/:digest to push the GGUF file to the server before calling this API.

Request
curl http://localhost:11434/api/create -d '{
"model": "my-gguf-model",
"files": {
"test.gguf": "sha256:432f310a77f4650a88d0fd59ecdd7cebed8d684bafea53cbff0473542964f0c3"
}
}'
Response

A stream of JSON objects is returned:

{
"status": "parsing GGUF"
}
{
"status": "using existing layer sha256:432f310a77f4650a88d0fd59ecdd7cebed8d684bafea53cbff0473542964f0c3"
}
{
"status": "writing manifest"
}
{
"status": "success"
}

Create a model from a Safetensors directory

The files parameter should include a dictionary of files for the safetensors model which includes the file names and SHA256 digest of each file. Use /api/blobs/:digest to first push each of the files to the server before calling this API. Files will remain in the cache until the Ollama server is restarted.

Request
curl http://localhost:11434/api/create -d '{
"model": "fred",
"files": {
"config.json": "sha256:dd3443e529fb2290423a0c65c2d633e67b419d273f170259e27297219828e389",
"generation_config.json": "sha256:88effbb63300dbbc7390143fbbdd9d9fa50587b37e8bfd16c8c90d4970a74a36",
"special_tokens_map.json": "sha256:b7455f0e8f00539108837bfa586c4fbf424e31f8717819a6798be74bef813d05",
"tokenizer.json": "sha256:bbc1904d35169c542dffbe1f7589a5994ec7426d9e5b609d07bab876f32e97ab",
"tokenizer_config.json": "sha256:24e8a6dc2547164b7002e3125f10b415105644fcf02bf9ad8b674c87b1eaaed6",
"model.safetensors": "sha256:1ff795ff6a07e6a68085d206fb84417da2f083f68391c2843cd2b8ac6df8538f"
}
}'
Response

A stream of JSON objects is returned:

{"status":"converting model"}
{"status":"creating new layer sha256:05ca5b813af4a53d2c2922933936e398958855c44ee534858fcfd830940618b6"}
{"status":"using autodetected template llama3-instruct"}
{"status":"using existing layer sha256:56bb8bd477a519ffa694fc449c2413c6f0e1d3b1c88fa7e3c9d88d3ae49d4dcb"}
{"status":"writing manifest"}
{"status":"success"}

Check if a Blob Exists

HEAD /api/blobs/:digest

Ensures that the file blob (Binary Large Object) used with create a model exists on the server. This checks your Ollama server and not ollama.com.

Query Parameters

  • digest: the SHA256 digest of the blob

Examples

Request

curl -I http://localhost:11434/api/blobs/sha256:29fdb92e57cf0827ded04ae6461b5931d01fa595843f55d36f5b275a52087dd2

Response

Return 200 OK if the blob exists, 404 Not Found if it does not.

Push a Blob

POST /api/blobs/:digest

Push a file to the Ollama server to create a "blob" (Binary Large Object).

Query Parameters

  • digest: the expected SHA256 digest of the file

Examples

Request

curl -T model.gguf -X POST http://localhost:11434/api/blobs/sha256:29fdb92e57cf0827ded04ae6461b5931d01fa595843f55d36f5b275a52087dd2

Response

Return 201 Created if the blob was successfully created, 400 Bad Request if the digest used is not expected.

List Local Models

GET /api/tags

List models that are available locally.

Examples

Request

curl http://localhost:11434/api/tags

Response

A single JSON object will be returned.

{
"models": [
{
"name": "deepseek-r1:latest",
"model": "deepseek-r1:latest",
"modified_at": "2025-05-10T08:06:48.639712648-07:00",
"size": 4683075271,
"digest": "0a8c266910232fd3291e71e5ba1e058cc5af9d411192cf88b6d30e92b6e73163",
"details": {
"parent_model": "",
"format": "gguf",
"family": "qwen2",
"families": [
"qwen2"
],
"parameter_size": "7.6B",
"quantization_level": "Q4_K_M"
}
},
{
"name": "llama3.2:latest",
"model": "llama3.2:latest",
"modified_at": "2025-05-04T17:37:44.706015396-07:00",
"size": 2019393189,
"digest": "a80c4f17acd55265feec403c7aef86be0c25983ab279d83f3bcd3abbcb5b8b72",
"details": {
"parent_model": "",
"format": "gguf",
"family": "llama",
"families": [
"llama"
],
"parameter_size": "3.2B",
"quantization_level": "Q4_K_M"
}
}
]
}

Show Model Information

POST /api/show

Show information about a model including details, modelfile, template, parameters, license, system prompt.

Parameters

  • model: name of the model to show
  • verbose: (optional) if set to true, returns full data for verbose response fields

Examples

Request

curl http://localhost:11434/api/show -d '{
"model": "llava"
}'

Response

{
"modelfile": "# Modelfile generated by \"ollama show\"\n# To build a new Modelfile based on this one, replace the FROM line with:\n# FROM llava:latest\n\nFROM /Users/matt/.ollama/models/blobs/sha256:200765e1283640ffbd013184bf496e261032fa75b99498a9613be4e94d63ad52\nTEMPLATE \"\"\"{{ .System }}\nUSER: {{ .Prompt }}\nASSISTANT: \"\"\"\nPARAMETER num_ctx 4096\nPARAMETER stop \"\u003c/s\u003e\"\nPARAMETER stop \"USER:\"\nPARAMETER stop \"ASSISTANT:\"",
"parameters": "num_keep 24\nstop \"<|start_header_id|>\"\nstop \"<|end_header_id|>\"\nstop \"<|eot_id|>\"",
"template": "{{ if .System }}<|start_header_id|>system<|end_header_id|>\n\n{{ .System }}<|eot_id|>{{ end }}{{ if .Prompt }}<|start_header_id|>user<|end_header_id|>\n\n{{ .Prompt }}<|eot_id|>{{ end }}<|start_header_id|>assistant<|end_header_id|>\n\n{{ .Response }}<|eot_id|>",
"details": {
"parent_model": "",
"format": "gguf",
"family": "llama",
"families": [
"llama"
],
"parameter_size": "8.0B",
"quantization_level": "Q4_0"
},
"model_info": {
"general.architecture": "llama",
"general.file_type": 2,
"general.parameter_count": 8030261248,
"general.quantization_version": 2,
"llama.attention.head_count": 32,
"llama.attention.head_count_kv": 8,
"llama.attention.layer_norm_rms_epsilon": 0.00001,
"llama.block_count": 32,
"llama.context_length": 8192,
"llama.embedding_length": 4096,
"llama.feed_forward_length": 14336,
"llama.rope.dimension_count": 128,
"llama.rope.freq_base": 500000,
"llama.vocab_size": 128256,
"tokenizer.ggml.bos_token_id": 128000,
"tokenizer.ggml.eos_token_id": 128009,
"tokenizer.ggml.merges": [],
// populates if `verbose=true`
"tokenizer.ggml.model": "gpt2",
"tokenizer.ggml.pre": "llama-bpe",
"tokenizer.ggml.token_type": [],
// populates if `verbose=true`
"tokenizer.ggml.tokens": []
// populates if `verbose=true`
},
"capabilities": [
"completion",
"vision"
],
}

Copy a Model

POST /api/copy

Copy a model. Creates a model with another name from an existing model.

Examples

Request

curl http://localhost:11434/api/copy -d '{
"source": "llama3.2",
"destination": "llama3-backup"
}'

Response

Returns a 200 OK if successful, or a 404 Not Found if the source model doesn't exist.

Delete a Model

DELETE /api/delete

Delete a model and its data.

Parameters

  • model: model name to delete

Examples

Request

curl -X DELETE http://localhost:11434/api/delete -d '{
"model": "llama3:13b"
}'

Response

Returns a 200 OK if successful, 404 Not Found if the model to be deleted doesn't exist.

Pull a Model

POST /api/pull

Download a model from the ollama library. Cancelled pulls are resumed from where they left off, and multiple calls will share the same download progress.

Parameters

  • model: name of the model to pull
  • insecure: (optional) allow insecure connections to the library. Only use this if you are pulling from your own library during development.
  • stream: (optional) if false the response will be returned as a single response object, rather than a stream of objects

Examples

Request

curl http://localhost:11434/api/pull -d '{
"model": "llama3.2"
}'

Response

If stream is not specified, or set to true, a stream of JSON objects is returned:

The first object is the manifest:

{
"status": "pulling manifest"
}

Then there is a series of downloading responses. Until any of the download is completed, the completed key may not be included. The number of files to be downloaded depends on the number of layers specified in the manifest.

{
"status": "downloading digestname",
"digest": "digestname",
"total": 2142590208,
"completed": 241970
}

After all the files are downloaded, the final responses are:

{
"status": "verifying sha256 digest"
}
{
"status": "writing manifest"
}
{
"status": "removing any unused layers"
}
{
"status": "success"
}

if stream is set to false, then the response is a single JSON object:

{
"status": "success"
}

Push a Model

POST /api/push

Upload a model to a model library. Requires registering for ollama.ai and adding a public key first.

Parameters

  • model: name of the model to push in the form of <namespace>/<model>:<tag>
  • insecure: (optional) allow insecure connections to the library. Only use this if you are pushing to your library during development.
  • stream: (optional) if false the response will be returned as a single response object, rather than a stream of objects

Examples

Request

curl http://localhost:11434/api/push -d '{
"model": "mattw/pygmalion:latest"
}'

Response

If stream is not specified, or set to true, a stream of JSON objects is returned:

{
"status": "retrieving manifest"
}

and then:

{
"status": "starting upload",
"digest": "sha256:bc07c81de745696fdf5afca05e065818a8149fb0c77266fb584d9b2cba3711ab",
"total": 1928429856
}

Then there is a series of uploading responses:

{
"status": "starting upload",
"digest": "sha256:bc07c81de745696fdf5afca05e065818a8149fb0c77266fb584d9b2cba3711ab",
"total": 1928429856
}

Finally, when the upload is complete:

{
"status": "pushing manifest"
}
{
"status": "success"
}

If stream is set to false, then the response is a single JSON object:

{
"status": "success"
}

Generate Embeddings

POST /api/embed

Generate embeddings from a model

Parameters

  • model: name of model to generate embeddings from
  • input: text or list of text to generate embeddings for

Advanced parameters:

  • truncate: truncates the end of each input to fit within context length. Returns error if false and context length is exceeded. Defaults to true
  • options: additional model parameters listed in the documentation for the Modelfile such as temperature
  • keep_alive: controls how long the model will stay loaded into memory following the request (default: 5m)

Examples

Request

curl http://localhost:11434/api/embed -d '{
"model": "all-minilm",
"input": "Why is the sky blue?"
}'

Response

{
"model": "all-minilm",
"embeddings": [
[
0.010071029,
-0.0017594862,
0.05007221,
0.04692972,
0.054916814,
0.008599704,
0.105441414,
-0.025878139,
0.12958129,
0.031952348
]
],
"total_duration": 14143917,
"load_duration": 1019500,
"prompt_eval_count": 8
}

Request (Multiple input)

curl http://localhost:11434/api/embed -d '{
"model": "all-minilm",
"input": ["Why is the sky blue?", "Why is the grass green?"]
}'

Response

{
"model": "all-minilm",
"embeddings": [
[
0.010071029,
-0.0017594862,
0.05007221,
0.04692972,
0.054916814,
0.008599704,
0.105441414,
-0.025878139,
0.12958129,
0.031952348
],
[
-0.0098027075,
0.06042469,
0.025257962,
-0.006364387,
0.07272725,
0.017194884,
0.09032035,
-0.051705178,
0.09951512,
0.09072481
]
]
}

List Running Models

GET /api/ps

List models that are currently loaded into memory.

Examples

Request

curl http://localhost:11434/api/ps

Response

A single JSON object will be returned.

{
"models": [
{
"name": "mistral:latest",
"model": "mistral:latest",
"size": 5137025024,
"digest": "2ae6f6dd7a3dd734790bbbf58b8909a606e0e7e97e94b7604e0aa7ae4490e6d8",
"details": {
"parent_model": "",
"format": "gguf",
"family": "llama",
"families": [
"llama"
],
"parameter_size": "7.2B",
"quantization_level": "Q4_0"
},
"expires_at": "2024-06-04T14:38:31.83753-07:00",
"size_vram": 5137025024
}
]
}

Generate Embedding

Note: this endpoint has been superseded by /api/embed

POST /api/embeddings

Generate embeddings from a model

Parameters

  • model: name of model to generate embeddings from
  • prompt: text to generate embeddings for

Advanced parameters:

  • options: additional model parameters listed in the documentation for the Modelfile such as temperature
  • keep_alive: controls how long the model will stay loaded into memory following the request (default: 5m)

Examples

Request

curl http://localhost:11434/api/embeddings -d '{
"model": "all-minilm",
"prompt": "Here is an article about llamas..."
}'

Response

{
"embedding": [
0.5670403838157654,
0.009260174818336964,
0.23178744316101074,
-0.2916173040866852,
-0.8924556970596313,
0.8785552978515625,
-0.34576427936553955,
0.5742510557174683,
-0.04222835972905159,
-0.137906014919281
]
}

Version

GET /api/version

Retrieve the Ollama version

Examples

Request

curl http://localhost:11434/api/version

Response

{
"version": "0.5.1"
}

Model File

说明

Modelfile syntax is in development

A model file is the blueprint to create and share models with Ollama.

Table of Contents

Format

The format of the Modelfile:

## comment
INSTRUCTION arguments
InstructionDescription
FROM (required)Defines the base model to use.
PARAMETERSets the parameters for how Ollama will run the model.
TEMPLATEThe full prompt template to be sent to the model.
SYSTEMSpecifies the system message that will be set in the template.
ADAPTERDefines the (Q)LoRA adapters to apply to the model.
LICENSESpecifies the legal license.
MESSAGESpecify message history.

Examples

Basic Modelfile

An example of a Modelfile creating a mario blueprint:

FROM llama3.2
## sets the temperature to 1 [higher is more creative, lower is more coherent]
PARAMETER temperature 1
## sets the context window size to 4096, this controls how many tokens the LLM can use as context to generate the next token
PARAMETER num_ctx 4096

## sets a custom system message to specify the behavior of the chat assistant
SYSTEM You are Mario from super mario bros, acting as an assistant.

To use this:

  1. Save it as a file (e.g. Modelfile)
  2. ollama create choose-a-model-name -f <location of the file e.g. ./Modelfile>
  3. ollama run choose-a-model-name
  4. Start using the model!

To view the Modelfile of a given model, use the ollama show --modelfile command.

ollama show --modelfile llama3.2

Output:

## Modelfile generated by "ollama show"
## To build a new Modelfile based on this one, replace the FROM line with:
## FROM llama3.2:latest
FROM /Users/pdevine/.ollama/models/blobs/sha256-00e1317cbf74d901080d7100f57580ba8dd8de57203072dc6f668324ba545f29
TEMPLATE """{{ if .System }}<|start_header_id|>system<|end_header_id|>

{{ .System }}<|eot_id|>{{ end }}{{ if .Prompt }}<|start_header_id|>user<|end_header_id|>

{{ .Prompt }}<|eot_id|>{{ end }}<|start_header_id|>assistant<|end_header_id|>

{{ .Response }}<|eot_id|>"""
PARAMETER stop "<|start_header_id|>"
PARAMETER stop "<|end_header_id|>"
PARAMETER stop "<|eot_id|>"
PARAMETER stop "<|reserved_special_token"

Instructions

FROM (Required)

The FROM instruction defines the base model to use when creating a model.

FROM <model name>:<tag>
Build from existing model
FROM llama3.2
Build from a Safetensors model
FROM <model directory>

The model directory should contain the Safetensors weights for a supported architecture.

Currently supported model architectures:

  • Llama (including Llama 2, Llama 3, Llama 3.1, and Llama 3.2)
  • Mistral (including Mistral 1, Mistral 2, and Mixtral)
  • Gemma (including Gemma 1 and Gemma 2)
  • Phi3
Build from a GGUF file
FROM ./ollama-model.gguf

The GGUF file location should be specified as an absolute path or relative to the Modelfile location.

PARAMETER

The PARAMETER instruction defines a parameter that can be set when the model is run.

PARAMETER <parameter> <parametervalue>
Valid Parameters and Values
ParameterDescriptionValue TypeExample Usage
num_ctxSets the size of the context window used to generate the next token. (Default: 2048)intnum_ctx 4096
repeat_last_nSets how far back for the model to look back to prevent repetition. (Default: 64, 0 = disabled, -1 = num_ctx)intrepeat_last_n 64
repeat_penaltySets how strongly to penalize repetitions. A higher value (e.g., 1.5) will penalize repetitions more strongly, while a lower value (e.g., 0.9) will be more lenient. (Default: 1.1)floatrepeat_penalty 1.1
temperatureThe temperature of the model. Increasing the temperature will make the model answer more creatively. (Default: 0.8)floattemperature 0.7
seedSets the random number seed to use for generation. Setting this to a specific number will make the model generate the same text for the same prompt. (Default: 0)intseed 42
stopSets the stop sequences to use. When this pattern is encountered the LLM will stop generating text and return. Multiple stop patterns may be set by specifying multiple separate stop parameters in a modelfile.stringstop "AI assistant:"
num_predictMaximum number of tokens to predict when generating text. (Default: -1, infinite generation)intnum_predict 42
top_kReduces the probability of generating nonsense. A higher value (e.g. 100) will give more diverse answers, while a lower value (e.g. 10) will be more conservative. (Default: 40)inttop_k 40
top_pWorks together with top-k. A higher value (e.g., 0.95) will lead to more diverse text, while a lower value (e.g., 0.5) will generate more focused and conservative text. (Default: 0.9)floattop_p 0.9
min_pAlternative to the top_p, and aims to ensure a balance of quality and variety. The parameter p represents the minimum probability for a token to be considered, relative to the probability of the most likely token. For example, with p=0.05 and the most likely token having a probability of 0.9, logits with a value less than 0.045 are filtered out. (Default: 0.0)floatmin_p 0.05

TEMPLATE

TEMPLATE of the full prompt template to be passed into the model. It may include (optionally) a system message, a user's message and the response from the model. Note: syntax may be model specific. Templates use Go template syntax.

Template Variables
VariableDescription
{{ .System }}The system message used to specify custom behavior.
{{ .Prompt }}The user prompt message.
{{ .Response }}The response from the model. When generating a response, text after this variable is omitted.
TEMPLATE """{{ if .System }}<|im_start|>system
{{ .System }}<|im_end|>
{{ end }}{{ if .Prompt }}<|im_start|>user
{{ .Prompt }}<|im_end|>
{{ end }}<|im_start|>assistant
"""

SYSTEM

The SYSTEM instruction specifies the system message to be used in the template, if applicable.

SYSTEM """<system message>"""

ADAPTER

The ADAPTER instruction specifies a fine tuned LoRA adapter that should apply to the base model. The value of the adapter should be an absolute path or a path relative to the Modelfile. The base model should be specified with a FROM instruction. If the base model is not the same as the base model that the adapter was tuned from the behaviour will be erratic.

Safetensor adapter
ADAPTER <path to safetensor adapter>

Currently supported Safetensor adapters:

  • Llama (including Llama 2, Llama 3, and Llama 3.1)
  • Mistral (including Mistral 1, Mistral 2, and Mixtral)
  • Gemma (including Gemma 1 and Gemma 2)
GGUF adapter
ADAPTER ./ollama-lora.gguf

LICENSE

The LICENSE instruction allows you to specify the legal license under which the model used with this Modelfile is shared or distributed.

LICENSE """
<license text>
"""

MESSAGE

The MESSAGE instruction allows you to specify a message history for the model to use when responding. Use multiple iterations of the MESSAGE command to build up a conversation which will guide the model to answer in a similar way.

MESSAGE <role> <message>
Valid roles
RoleDescription
systemAlternate way of providing the SYSTEM message for the model.
userAn example message of what the user could have asked.
assistantAn example message of how the model should respond.
Example conversation
MESSAGE user Is Toronto in Canada?
MESSAGE assistant yes
MESSAGE user Is Sacramento in Canada?
MESSAGE assistant no
MESSAGE user Is Ontario in Canada?
MESSAGE assistant yes

Notes

  • the Modelfile is not case sensitive. In the examples, uppercase instructions are used to make it easier to distinguish it from arguments.
  • Instructions can be in any order. In the examples, the FROM instruction is first to keep it easily readable.