Text Generation
This guide covers text generation capabilities of the Ollama service.Generation Endpoints
Generate Text
Stream Generation
Generation Parameters
Core Parameters
| Parameter | Type | Description | Default |
|---|---|---|---|
| model | string | Model to use | Required |
| prompt | string | Input text | Required |
| temperature | float | Randomness (0.0-1.0) | 0.8 |
| max_tokens | integer | Maximum tokens to generate | 2048 |
| top_p | float | Nucleus sampling threshold | 0.9 |
| top_k | integer | Top-k sampling threshold | 40 |
| repeat_penalty | float | Repetition penalty | 1.1 |
| presence_penalty | float | Presence penalty | 0.0 |
| frequency_penalty | float | Frequency penalty | 0.0 |
Advanced Parameters
| Parameter | Type | Description | Default |
|---|---|---|---|
| stop_sequences | string[] | Sequences to stop generation | [] |
| seed | integer | Random seed for reproducibility | null |
| num_ctx | integer | Context window size | 2048 |
| num_predict | integer | Number of tokens to predict | -1 |
Response Format
Standard Response
Stream Response
Generation Examples
Basic Text Generation
Streaming Generation
Chat Completion
Best Practices
-
Prompt Engineering
- Be specific and clear in prompts
- Provide context when needed
- Use system messages for behavior control
- Structure multi-turn conversations properly
-
Parameter Tuning
- Lower temperature for factual responses
- Higher temperature for creative tasks
- Adjust max_tokens based on expected length
- Use stop sequences to control output
-
Performance Optimization
- Use streaming for long responses
- Batch similar requests when possible
- Cache frequently used responses
- Monitor token usage