Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.moodmnky.com/llms.txt

Use this file to discover all available pages before exploring further.

Text Generation

This guide covers text generation capabilities of the Ollama service.

Generation Endpoint

Generate text using AI models.

Generation Endpoints

Generate Text

POST /api/generate

curl -X POST "https://ollama.moodmnky.com/api/generate" \
  -H "x-api-key: your_api_key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama2",
    "prompt": "Write a story about a robot learning to paint.",
    "parameters": {
      "temperature": 0.7,
      "max_tokens": 500
    }
  }'

Stream Generation

POST /api/generate/stream

curl -X POST "https://ollama.moodmnky.com/api/generate/stream" \
  -H "x-api-key: your_api_key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama2",
    "prompt": "Write a story about a robot learning to paint.",
    "parameters": {
      "temperature": 0.7,
      "max_tokens": 500
    }
  }'

Generation Parameters

Core Parameters

ParameterTypeDescriptionDefault
modelstringModel to useRequired
promptstringInput textRequired
temperaturefloatRandomness (0.0-1.0)0.8
max_tokensintegerMaximum tokens to generate2048
top_pfloatNucleus sampling threshold0.9
top_kintegerTop-k sampling threshold40
repeat_penaltyfloatRepetition penalty1.1
presence_penaltyfloatPresence penalty0.0
frequency_penaltyfloatFrequency penalty0.0

Advanced Parameters

ParameterTypeDescriptionDefault
stop_sequencesstring[]Sequences to stop generation[]
seedintegerRandom seed for reproducibilitynull
num_ctxintegerContext window size2048
num_predictintegerNumber of tokens to predict-1

Response Format

Standard Response

{
  "text": "Generated text response",
  "usage": {
    "prompt_tokens": 10,
    "completion_tokens": 100,
    "total_tokens": 110
  },
  "model": "llama2",
  "created_at": "2024-04-05T12:00:00Z"
}

Stream Response

{
  "text": "Partial",
  "usage": {
    "completion_tokens": 1
  }
}
// ... more chunks ...
{
  "text": " response.",
  "usage": {
    "completion_tokens": 1
  },
  "done": true
}

Generation Examples

Basic Text Generation

const response = await client.ollama.generate({
  model: "llama2",
  prompt: "Write a story about a robot learning to paint.",
  parameters: {
    temperature: 0.7,
    max_tokens: 500
  }
});

console.log(response.text);

Streaming Generation

const stream = await client.ollama.generateStream({
  model: "llama2",
  prompt: "Write a story about a robot learning to paint.",
  parameters: {
    temperature: 0.7,
    max_tokens: 500
  }
});

for await (const chunk of stream) {
  process.stdout.write(chunk.text);
}

Chat Completion

const response = await client.ollama.generate({
  model: "llama2",
  prompt: [
    { role: "system", content: "You are a helpful assistant." },
    { role: "user", content: "Tell me about painting techniques." }
  ],
  parameters: {
    temperature: 0.7
  }
});

Best Practices

  1. Prompt Engineering
    • Be specific and clear in prompts
    • Provide context when needed
    • Use system messages for behavior control
    • Structure multi-turn conversations properly
  2. Parameter Tuning
    • Lower temperature for factual responses
    • Higher temperature for creative tasks
    • Adjust max_tokens based on expected length
    • Use stop sequences to control output
  3. Performance Optimization
    • Use streaming for long responses
    • Batch similar requests when possible
    • Cache frequently used responses
    • Monitor token usage

Error Handling

try {
  const response = await client.ollama.generate({
    model: "llama2",
    prompt: "Hello, world!"
  });
} catch (error) {
  switch (error.code) {
    case "CONTEXT_LENGTH_EXCEEDED":
      // Handle prompt too long
      break;
    case "RATE_LIMIT_EXCEEDED":
      // Handle rate limiting
      break;
    case "INVALID_PARAMETERS":
      // Handle invalid parameters
      break;
  }
}

Support & Resources