Performance

Troubleshooting Latency: A Systematic Approach to Finding the Bottleneck

A systematic method for tracking down latency issues in production systems, from network to application to database, built from decades of war stories.

Oct 10, 2025

Databases

Stored Procedures: When to Use Them, When to Avoid Them

Opinionated guide to stored procedures covering performance benefits, maintainability costs, security implications, and practical guidelines for when they help vs hurt.

Jul 22, 2025

Data & Analytics

Apache Polars in Production: The Rust-Built DataFrame Library Replacing Pandas for Real Data Engineering Work

A principal cloud architect's guide to Apache Polars: why this Rust-based DataFrame library is replacing pandas in production pipelines, how lazy evaluation and Apache Arrow make it dramatically faster, and where it fits in the modern data stack alongside DuckDB and Apache Iceberg.

May 29, 2025

DevOps

Continuous Profiling in Production: Pyroscope, Parca, and Finding the CPU Hog You Never Knew You Had

Continuous profiling is the fourth pillar of observability most teams skip. Learn how Pyroscope, Parca, and eBPF-based profilers find CPU and memory bottlenecks that metrics and traces can't.

May 19, 2025

Databases

SSD vs HDD: How to Choose the Right Storage for Your Workload

Practical guide to choosing SSD or HDD storage for databases, analytics, and archival workloads based on real-world performance, cost, and endurance tradeoffs.

May 18, 2025

Databases

Sharding vs Partitioning: Database Scaling Strategies Compared

Sharding and partitioning are related but different database scaling strategies. A veteran architect explains both approaches, their trade-offs, and when to use each.

May 5, 2025

Cloud Architecture

Bare Metal Cloud: When the Hypervisor Is the Problem, Not the Solution

Virtual machines are the default in cloud computing, but for a growing set of workloads, the hypervisor is pure overhead. Bare metal cloud gives you dedicated hardware without the colocation operations burden. Here's when it makes sense.

Apr 11, 2025

Databases

Distributed Caching Explained: Redis, Memcached, Valkey, and How to Choose

A principal architect's guide to distributed caching: how Redis, Memcached, and Valkey work, when to use each, and lessons from running caches at scale in production.

Apr 5, 2025

Cloud Architecture

ARM in the Cloud: AWS Graviton, Ampere Altra, and Why CPU Architecture Actually Matters Now

ARM-based cloud instances are delivering 40-60% better price-performance than x86 equivalents. Here's how AWS Graviton and Ampere Altra work, what workloads benefit most, and how to migrate.

Mar 12, 2025

Networking

Latency vs Bandwidth: What's the Real Difference and Why It Matters

Understand the critical difference between latency and bandwidth, why both matter for performance, and how to optimize each in real-world cloud and network architectures.

Jul 15, 2024

Performance

Troubleshooting Latency: A Systematic Approach to Finding the Bottleneck

Stored Procedures: When to Use Them, When to Avoid Them

Apache Polars in Production: The Rust-Built DataFrame Library Replacing Pandas for Real Data Engineering Work

Continuous Profiling in Production: Pyroscope, Parca, and Finding the CPU Hog You Never Knew You Had

SSD vs HDD: How to Choose the Right Storage for Your Workload

Sharding vs Partitioning: Database Scaling Strategies Compared

Bare Metal Cloud: When the Hypervisor Is the Problem, Not the Solution

Distributed Caching Explained: Redis, Memcached, Valkey, and How to Choose

ARM in the Cloud: AWS Graviton, Ampere Altra, and Why CPU Architecture Actually Matters Now

Latency vs Bandwidth: What's the Real Difference and Why It Matters

Get Cloud Architecture Insights