Ollama Service API

The MOOD MNKY Ollama service provides enterprise-grade AI model management and inference capabilities. This documentation covers all available endpoints, authentication methods, and best practices for integration.

Base URL

https://ollama.moodmnky.com

Available Endpoints

Model Management

Generation & Inference

Monitoring & Health

Authentication

All requests to the Ollama service must include an API key in the Authorization header:

Authorization: Bearer your-api-key

To obtain an API key:

Contact the DevOps team through the developer portal
Specify your use case and required rate limits
Follow our security best practices for API key management

Request Format

All POST requests should use JSON with the content type header:

Content-Type: application/json

Response Format

Successful responses will have appropriate HTTP status codes and JSON bodies:

{
  "status": "success",
  "data": {
    // Response data here
  }
}

Error responses follow the standard error format:

{
  "error": {
    "code": "error_code",
    "message": "Error description",
    "details": {
      // Additional error details
    }
  }
}

Rate Limiting

The service implements the following rate limits:

Endpoint Category	Rate Limit	Burst Limit
Generation	100/min	120/min
Model Management	1000/min	1200/min
Monitoring	1000/min	1200/min

Rate limit headers are included in all responses:

X-RateLimit-Limit: Rate limit ceiling
X-RateLimit-Remaining: Remaining requests
X-RateLimit-Reset: Time until limit reset

Monitoring

The service exposes Prometheus metrics at:

https://ollama.moodmnky.com/metrics

Available metrics include:

Request counts and latencies
Model loading/unloading events
Resource utilization
Error rates
Token usage

Best Practices

Model Management
- Cache model information locally
- Implement exponential backoff for retries
- Monitor model versions
Generation Requests
- Use streaming for long generations
- Implement request timeouts
- Handle rate limits gracefully
Production Usage
- Monitor API response times
- Set up alerts for error rates
- Track token usage
- Implement circuit breakers

Examples

Curl

# Generate a completion
curl -X POST https://ollama.moodmnky.com/api/generate \
  -H "Authorization: Bearer your-api-key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama3.2",
    "prompt": "Why is the sky blue?"
  }'

Python

import requests

API_KEY = "your-api-key"
BASE_URL = "https://ollama.moodmnky.com"

def generate_completion(prompt, model="llama3.2"):
    response = requests.post(
        f"{BASE_URL}/api/generate",
        headers={
            "Authorization": f"Bearer {API_KEY}",
            "Content-Type": "application/json"
        },
        json={
            "model": model,
            "prompt": prompt
        }
    )
    return response.json()

JavaScript

const API_KEY = 'your-api-key';
const BASE_URL = 'https://ollama.moodmnky.com';

async function generateCompletion(prompt, model = 'llama3.2') {
  const response = await fetch(`${BASE_URL}/api/generate`, {
    method: 'POST',
    headers: {
      'Authorization': `Bearer ${API_KEY}`,
      'Content-Type': 'application/json'
    },
    body: JSON.stringify({
      model,
      prompt
    })
  });
  return await response.json();
}

​Ollama Service API

​Base URL

​Available Endpoints

​Model Management

​Generation & Inference

​Monitoring & Health

​Authentication

​Request Format

​Response Format

​Rate Limiting

​Monitoring

​Best Practices

​Examples

​Curl

​Python

​JavaScript

​Support