Ollama is a localized machine learning framework designed for various natural language processing (NLP) tasks. It focuses on model loading, inference, and generation, making it easy for users to interact with large pre-trained models deployed locally.

Models

Models are the heart of Ollama. These are pre-trained machine learning models capable of performing tasks like text generation, summarization, sentiment analysis, and dialogue generation. Ollama supports a wide range of popular pre-trained models, including:

  • Deepseek-v3: A large language model by DeepSeek, specialized in text generation.
  • LLama2: A large language model by Meta, designed for text generation.
  • GPT: OpenAI’s GPT series, suitable for dialogue generation, text reasoning, and more.
  • BERT: A model for sentence understanding and question-answering systems.
  • Custom Models: Users can upload and use their own custom models for inference.

Key Features of Models:

  • Inference: Generating output based on user input.
  • Fine-tuning: Adapting pre-trained models to specific tasks or domains using custom datasets.

Models are typically neural networks with millions (or billions) of parameters, trained on vast amounts of text data to learn language patterns and perform efficient reasoning.

Supported Models:

Ollama’s library of supported models can be accessed here: Ollama Model Library.

Click on the model to view the download command:

Below are some examples of models and their download commands:

Model Parameters Size Download Command
DeepSeek-R1 671B 404GB ollama run deepseek-r1:671b
DeepSeek-R1 70B 43GB ollama run deepseek-r1:70b
DeepSeek-R1 32B 20GB ollama run deepseek-r1:32b
DeepSeek-R1 14B 9.0GB ollama run deepseek-r1:14b
DeepSeek-R1 8B 4.9GB ollama run deepseek-r1:8b
DeepSeek-R1 7B 4.7GB ollama run deepseek-r1:7b
DeepSeek-R1 1.5B 1.1GB ollama run deepseek-r1:1.5b
Llama 3.3 70B 43GB ollama run llama3.3
Phi 4 14B 9.1GB ollama run phi4
Llama 3.2 3B 2.0GB ollama run llama3.2
Llama 3.2 1B 1.3GB ollama run llama3.2:1b

Tasks

Ollama supports a variety of NLP tasks, each corresponding to different use cases for the models. Key tasks include:

  • Chat Generation: Generating natural dialogue responses in conversations.
  • Text Generation: Creating natural language text based on prompts (e.g., articles, stories).
  • Sentiment Analysis: Determining the emotional tone of a text (positive, negative, neutral).
  • Text Summarization: Condensing long texts into concise summaries.
  • Translation: Translating text between languages.

Using Ollama’s command-line tools, users can specify tasks and load appropriate models to achieve their goals.

Inference

Inference is the process of feeding input into a trained model to generate output. Ollama provides user-friendly command-line tools and APIs, allowing users to quickly input text and receive results.

How Inference Works:

  1. Input: The user provides text input, such as a question, prompt, or dialogue.
  2. Model Processing: The model processes the input using its neural network to generate relevant output.
  3. Output: The model returns the generated text, which could be a response, article, translation, etc.

Ollama’s API and CLI make it simple to interact with local models and perform inference tasks efficiently.

Fine-tuning

Fine-tuning involves further training a pre-trained model on domain-specific data to improve its performance on specific tasks or domains. Ollama supports fine-tuning, enabling users to customize models using their own datasets.

Fine-tuning Process:

  1. Prepare Dataset: Gather and format domain-specific data (e.g., text files or JSON).
  2. Load Pre-trained Model: Select a suitable pre-trained model (e.g., LLama2 or GPT).
  3. Train: Fine-tune the model using the custom dataset to adapt it to the target task.
  4. Save and Deploy: Save the fine-tuned model for future use and deploy it as needed.

Fine-tuning helps models achieve higher accuracy and efficiency in specialized tasks.