Monitoring & Logging Stack Workflow

This project shows a full observability pipeline running on four servers: a Web Server that hosts the app and Alloy, a Prometheus Server for metrics collection, a Loki Server for centralized logs, and a Grafana Server for dashboards, visualization, alerting, and Slack notifications.

Web Server
💻

Web + Alloy

Hosts the Flask application, exposes app metrics, exposes node metrics through Node Exporter, and uses Alloy to ship logs and forward metrics.

Main components

  • Flask / Titan application on port 5000
  • Node Exporter on port 9100
  • Alloy agent for metrics and logs
  • Load generation + fake traffic scripts

What it sends

  • Application metrics → Prometheus
  • System metrics → Prometheus
  • Application logs → Loki

Why it matters

  • Shows CPU, memory, disk, and process health
  • Shows request rate, endpoint traffic, and app behavior
  • Produces logs for troubleshooting and alert investigations
Prometheus Server
🔥

Prometheus

Scrapes metrics from configured targets, stores them as time-series data, and answers PromQL queries used by Grafana dashboards and alerts.

Main responsibilities

  • Scrape interval collection from targets
  • Store metrics in a time-series database
  • Support PromQL queries for monitoring panels
  • Track target health and availability

Key targets

  • Prometheus self-monitoring
  • Web server system metrics job
  • Web server application metrics job

Example metrics

  • CPU utilization and load average
  • Memory available and disk usage
  • HTTP request totals and request rate
Loki Server
🟢

Loki

Receives and stores logs sent by Alloy, keeps labels for efficient log filtering, and makes application logs searchable inside Grafana.

Main responsibilities

  • Log aggregation from the web server
  • Label-based log indexing and querying
  • Centralized log storage for troubleshooting
  • Support for log exploration in Grafana

Incoming data

  • App logs from /var/log/titan
  • Error logs, access logs, info logs
  • Streams forwarded by Alloy

Why it matters

  • Correlates logs with metrics during incidents
  • Helps find root cause faster
  • Supports observability beyond dashboards
Grafana Server
📊

Grafana

Connects to Prometheus and Loki, builds dashboards, visualizes metrics and logs, defines threshold-based alerts, and sends notifications to Slack.

Main responsibilities

  • Dashboards for system and application monitoring
  • Panels for CPU, memory, disk, and HTTP traffic
  • Alert rules, thresholds, and evaluation groups
  • Notification routing to Slack contact points

Data sources

  • Prometheus for metrics
  • Loki for logs
  • Slack webhook for notifications

Example outcomes

  • Production dashboards by environment and service
  • Root disk usage alerts and resolved notifications
  • Live troubleshooting with metrics + logs together
Web Server exposes metrics and logs
Prometheus scrapes metrics / Loki ingests logs
Grafana builds dashboards and sends alerts

Dashboards & Observability Views

System Dashboard

  • CPU utilization
  • Memory available
  • Root disk usage

Application Dashboard

  • HTTP requests total
  • Request rate by endpoint
  • Dynamic variable filtering

Logs & Investigation

  • Search app logs by labels
  • Inspect errors and request traces
  • Correlate metrics with incidents
Metrics from Prometheus Logs from Loki App traffic and request rate Threshold breaches and alerts

Alerting & Notifications

🚨 Critical Alert
Root Disk Usage > 65%
Grafana rule enters Firing state
⚠️ Warning Alert
CPU or Memory threshold crossed
Evaluation group checks every minute
✅ Resolved Notification
Metric returned to normal range
Slack receives recovery message