Skip to main content

Text Generation

This guide covers text generation capabilities of the Ollama service.

Generation Endpoints

Generate Text

POST /api/generate

curl -X POST "https://ollama.moodmnky.com/api/generate" \
  -H "x-api-key: your_api_key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama2",
    "prompt": "Write a story about a robot learning to paint.",
    "parameters": {
      "temperature": 0.7,
      "max_tokens": 500
    }
  }'

Stream Generation

POST /api/generate/stream

curl -X POST "https://ollama.moodmnky.com/api/generate/stream" \
  -H "x-api-key: your_api_key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama2",
    "prompt": "Write a story about a robot learning to paint.",
    "parameters": {
      "temperature": 0.7,
      "max_tokens": 500
    }
  }'

Generation Parameters

Core Parameters

ParameterTypeDescriptionDefault
modelstringModel to useRequired
promptstringInput textRequired
temperaturefloatRandomness (0.0-1.0)0.8
max_tokensintegerMaximum tokens to generate2048
top_pfloatNucleus sampling threshold0.9
top_kintegerTop-k sampling threshold40
repeat_penaltyfloatRepetition penalty1.1
presence_penaltyfloatPresence penalty0.0
frequency_penaltyfloatFrequency penalty0.0

Advanced Parameters

ParameterTypeDescriptionDefault
stop_sequencesstring[]Sequences to stop generation[]
seedintegerRandom seed for reproducibilitynull
num_ctxintegerContext window size2048
num_predictintegerNumber of tokens to predict-1

Response Format

Standard Response

{
  "text": "Generated text response",
  "usage": {
    "prompt_tokens": 10,
    "completion_tokens": 100,
    "total_tokens": 110
  },
  "model": "llama2",
  "created_at": "2024-04-05T12:00:00Z"
}

Stream Response

{
  "text": "Partial",
  "usage": {
    "completion_tokens": 1
  }
}
// ... more chunks ...
{
  "text": " response.",
  "usage": {
    "completion_tokens": 1
  },
  "done": true
}

Generation Examples

Basic Text Generation

const response = await client.ollama.generate({
  model: "llama2",
  prompt: "Write a story about a robot learning to paint.",
  parameters: {
    temperature: 0.7,
    max_tokens: 500
  }
});

console.log(response.text);

Streaming Generation

const stream = await client.ollama.generateStream({
  model: "llama2",
  prompt: "Write a story about a robot learning to paint.",
  parameters: {
    temperature: 0.7,
    max_tokens: 500
  }
});

for await (const chunk of stream) {
  process.stdout.write(chunk.text);
}

Chat Completion

const response = await client.ollama.generate({
  model: "llama2",
  prompt: [
    { role: "system", content: "You are a helpful assistant." },
    { role: "user", content: "Tell me about painting techniques." }
  ],
  parameters: {
    temperature: 0.7
  }
});

Best Practices

  1. Prompt Engineering
    • Be specific and clear in prompts
    • Provide context when needed
    • Use system messages for behavior control
    • Structure multi-turn conversations properly
  2. Parameter Tuning
    • Lower temperature for factual responses
    • Higher temperature for creative tasks
    • Adjust max_tokens based on expected length
    • Use stop sequences to control output
  3. Performance Optimization
    • Use streaming for long responses
    • Batch similar requests when possible
    • Cache frequently used responses
    • Monitor token usage

Error Handling

try {
  const response = await client.ollama.generate({
    model: "llama2",
    prompt: "Hello, world!"
  });
} catch (error) {
  switch (error.code) {
    case "CONTEXT_LENGTH_EXCEEDED":
      // Handle prompt too long
      break;
    case "RATE_LIMIT_EXCEEDED":
      // Handle rate limiting
      break;
    case "INVALID_PARAMETERS":
      // Handle invalid parameters
      break;
  }
}

Support & Resources