Understanding the Prometheus up Metric#

Overview#

The up metric is one of the most fundamental metrics in Prometheus. It indicates whether a target (service, endpoint, or exporter) is currently accessible and responding to Prometheus scrapes.

What is the up Metric?#

The up metric is automatically generated by Prometheus for every target it scrapes. It’s not provided by the target itself, but rather calculated by Prometheus based on the success or failure of scrape attempts.

Possible Status Values#

The up metric has 3 possible states:

Status Value Meaning When it occurs
UP up = 1 Target is accessible and responding successfully • Scrape returns HTTP 200• Target responds within timeout• Valid Prometheus metrics format• No network issues
DOWN up = 0 Target is not accessible or not responding properly • Network connectivity issues• HTTP error status codes (4xx, 5xx)• Target exceeds scrape timeout• Malformed metrics response• Target completely unreachable
MISSING No up metric Target has never been scraped or is not configured • Target not in Prometheus configuration• Service discovery hasn’t found the target• Target was never successfully scraped• Configuration error preventing scraping

Detailed Explanation#

1. up = 1 (Target is UP)#

  • Meaning: The target is accessible and responding successfully
  • When it occurs:
    • The scrape request returns HTTP 200 status code
    • The target responds within the configured timeout
    • The response contains valid Prometheus metrics format
    • No network connectivity issues

2. up = 0 (Target is DOWN)#

  • Meaning: The target is not accessible or not responding properly
  • When it occurs:
    • Network connectivity issues (DNS resolution failure, connection timeout)
    • Target returns HTTP error status codes (4xx, 5xx)
    • Target doesn’t respond within the configured scrape timeout
    • Target returns malformed metrics (invalid Prometheus format)
    • Target is completely unreachable

3. Missing up metric#

  • Meaning: The target has never been scraped or is not configured
  • When it occurs:
    • Target is not included in Prometheus configuration
    • Service discovery hasn’t discovered the target yet
    • Target was never successfully scraped (even once)
    • Configuration error preventing Prometheus from attempting to scrape
    • Target was removed from configuration but still appears in queries

Key Characteristics#

Automatic Generation#

  • Prometheus automatically creates this metric for every configured target
  • No need to explicitly expose this metric from your application
  • Available for all scrape targets (applications, exporters, etc.)

Labels#

The up metric includes labels that identify the target:

up{instance="localhost:9090", job="prometheus"} 1
up{instance="localhost:9100", job="node-exporter"} 0

Common labels:

  • instance: The target’s address (host:port)
  • job: The job name from Prometheus configuration
  • Additional labels from relabel_configs

Practical Usage#

Monitoring Target Health#

# Check if any targets are down
up == 0

# Check if any targets are missing (not configured or never scraped)
absent(up)

# Count total targets (including missing ones)
count(up or vector(0))

# Count targets that are up
count(up == 1)

# Count targets that are down
count(up == 0)

# Count targets that are missing
count(absent(up))

# Percentage of targets that are up
count(up == 1) / count(up or vector(0)) * 100

# Check for targets that should exist but are missing
absent(up{job="my-service"})

Alerting Rules#

groups:
- name: target.rules
  rules:
  - alert: InstanceDown
    expr: up == 0
    for: 5m
    labels:
      severity: critical
    annotations:
      summary: "Instance {{ $labels.instance }} down"
      description: "{{ $labels.instance }} of job {{ $labels.job }} has been down for more than 5 minutes."
  
  - alert: InstanceMissing
    expr: absent(up{job="my-service"})
    for: 1m
    labels:
      severity: warning
    annotations:
      summary: "Instance missing from monitoring"
      description: "Expected instance for job {{ $labels.job }} is not being scraped by Prometheus."

Service Discovery Integration#

The up metric works seamlessly with service discovery mechanisms:

  • Static configuration
  • File-based service discovery
  • Consul, etcd, Kubernetes service discovery
  • DNS-based service discovery

Common Scenarios#

Scenario 1: Application Metrics#

# prometheus.yml
scrape_configs:
- job_name: 'my-app'
  static_configs:
  - targets: ['localhost:8080']

Result:

up{instance="localhost:8080", job="my-app"} 1  # App is running
up{instance="localhost:8080", job="my-app"} 0  # App is down

Scenario 2: Multiple Instances#

scrape_configs:
- job_name: 'web-servers'
  static_configs:
  - targets: 
    - 'web1:8080'
    - 'web2:8080'
    - 'web3:8080'

Result:

up{instance="web1:8080", job="web-servers"} 1
up{instance="web2:8080", job="web-servers"} 0  # This instance is down
up{instance="web3:8080", job="web-servers"} 1

Troubleshooting#

When up = 0#

  1. Check Network Connectivity

    curl -v http://target:port/metrics
    
  2. Verify Target Configuration

    • Correct host and port
    • Valid scrape path (usually /metrics)
    • Proper authentication if required
  3. Check Target Logs

    • Application errors
    • Resource constraints (CPU, memory)
    • Port binding issues
  4. Prometheus Configuration

    • Scrape interval settings
    • Timeout configurations
    • Label configurations

Common Issues#

  • DNS Resolution: Target hostname cannot be resolved
  • Port Issues: Target not listening on expected port
  • Authentication: Missing or incorrect credentials
  • Firewall: Network blocking access to target
  • Resource Exhaustion: Target too busy to respond

Best Practices#

  1. Set Appropriate Scrape Intervals

    scrape_interval: 15s
    scrape_timeout: 10s
    
  2. Use Service Discovery

    • Automatically discover new targets
    • Handle dynamic environments
    • Reduce manual configuration
  3. Implement Proper Alerting

    • Alert on up == 0 for critical services
    • Use appropriate alert durations
    • Include meaningful alert messages
  4. Monitor Target Availability

    # Alert if more than 10% of targets are down
    count(up == 0) / count(up) > 0.1
    

Summary#

The up metric indicates the status of Prometheus targets with 3 possible states: 1 (up), 0 (down), or missing (not configured/never scraped). It’s automatically generated by Prometheus and is essential for monitoring the health of your infrastructure. Understanding all three states is crucial for effective monitoring and alerting in Prometheus-based observability stacks.

Key Takeaways:

  • 3 possible states: 1 (up), 0 (down), or missing
  • Automatically generated by Prometheus for configured targets
  • Missing state indicates configuration or discovery issues
  • Essential for comprehensive target health monitoring
  • Foundation for reliable alerting systems
  • Works with all service discovery mechanisms
  • Use absent() function to detect missing targets