
Monitoring and Logging: What to Track, How to Alert, and Tools That Work
Practical monitoring and logging advice from three decades of production operations. What metrics matter, how to build alerts that work, and tools I trust.

Practical monitoring and logging advice from three decades of production operations. What metrics matter, how to build alerts that work, and tools I trust.

How to build observability pipelines with the OpenTelemetry Collector, Cribl, and Vector to cut telemetry costs 60-80% without losing diagnostic visibility.

Continuous profiling is the fourth pillar of observability most teams skip. Learn how Pyroscope, Parca, and eBPF-based profilers find CPU and memory bottlenecks that metrics and traces can't.

A deep-dive into building a production-grade observability stack with Prometheus, Loki, Grafana, and Tempo. Learn the architecture, scaling trade-offs, cardinality traps, and when the open-source stack beats a $40k/month SaaS bill.

ClickHouse is a columnar database built for real-time analytics at absurd scale. Here's how it works, why it's faster than the alternatives, and where it fits in your data stack.

SLAs are for lawyers. SLOs are for engineers. Here's how to define meaningful service level objectives, measure them properly, and use error budgets to make smarter deployment decisions.

A practical guide to eBPF: how it works at the kernel level, why Cilium replaced iptables for Kubernetes networking, and how eBPF powers next-generation observability without sidecars.

How OpenTelemetry works, why distributed tracing is different from logging and metrics, and how to instrument your services without drowning in overhead and noise.

AIOps applies machine learning to operations data to reduce alert noise, detect anomalies, and accelerate incident response. Here's what works in practice and what's still hype.
Practical deep dives on infrastructure, security, and scaling. No spam, no fluff.
By subscribing, you agree to receive emails. Unsubscribe anytime.