Monitoring
This guide covers monitoring capabilities of the Ollama service.Monitoring Overview
Ollama provides comprehensive monitoring capabilities to help you track the performance, health, and usage of your AI models and service deployments.Health Check Endpoint
Available Metrics
| Metric Category | Metric | Type | Description |
|---|---|---|---|
| System | cpu_usage | float | CPU usage percentage |
| System | memory_usage | integer | Memory usage in MB |
| System | gpu_usage | float | GPU usage percentage |
| System | uptime | integer | Service uptime in seconds |
| Requests | total | integer | Total requests received |
| Requests | success | integer | Successful requests |
| Requests | failed | integer | Failed requests |
| Requests | average_latency | integer | Average latency in ms |
| Model-specific | requests | integer | Requests per model |
| Model-specific | average_tokens_per_request | integer | Average tokens generated per request |
| Model-specific | average_latency | integer | Average latency per model in ms |
Logging Configuration
Ollama provides configurable logging levels that can be adjusted based on your monitoring needs.Logging Levels
| Level | Description |
|---|---|
| debug | Most verbose, includes all details for debugging |
| info | General operational information |
| warn | Potential issues that don’t affect operation |
| error | Errors that affect operation but don’t cause service failure |
| critical | Critical issues that may cause service failure |
Usage Examples
Monitor System Health
Track Model Performance
Monitor Request Patterns
Best Practices
-
Health Monitoring
- Implement regular health checks
- Set up automated alerting for unhealthy status
- Monitor system resource usage trends
- Track model load status
-
Performance Tracking
- Monitor latency across different models
- Track token throughput
- Analyze request patterns
- Identify performance bottlenecks
-
Error Monitoring
- Track error rates by endpoint
- Categorize errors by type
- Set up alerts for unusual error patterns
- Maintain error logs for troubleshooting
-
Resource Optimization
- Monitor GPU/CPU utilization
- Track memory usage patterns
- Implement resource-based auto-scaling
- Optimize model loading based on usage patterns
Integration with Monitoring Tools
Ollama’s metrics can be integrated with standard monitoring platforms:-
Prometheus Integration
- Export metrics in Prometheus format
- Set up scrape configuration
- Create Grafana dashboards
- Configure alerting rules
-
Centralized Logging
- Forward logs to ELK stack
- Configure log parsing
- Set up log-based alerts
- Implement log retention policies
-
Custom Dashboards
- Aggregate metrics across instances
- Visualize performance trends
- Create service-level dashboards
- Implement custom alerting