Security

Post-Quantum Cryptography: Migrating Your Cloud Infrastructure Before Q-Day Changes Everything

A practical guide to migrating cloud infrastructure to post-quantum cryptography. Covers NIST ML-KEM and ML-DSA standards, hybrid TLS deployment, PKI migration, and a phased roadmap for engineering teams facing real 2026 compliance deadlines.

Diagram showing classical cryptography algorithms being replaced by post-quantum alternatives in cloud infrastructure

I have been watching the post-quantum cryptography story unfold for years, and I will tell you honestly: most engineering teams are about five years behind on this. I spent a chunk of last year helping three different organizations do crypto inventory audits, and in every case the results were worse than expected. RSA and ECDSA are everywhere, buried deep in certificate chains, JWT signing keys, SSH host keys, internal PKI, TLS between microservices, and the API credentials nobody has touched since 2019. The migration is not a one-afternoon job.

The urgency is real. NIST published finalized post-quantum standards in August 2024 after an eight-year evaluation process. FIPS 140-2 sunsets in September 2026. The CNSA 2.0 requirements hit government procurement in January 2027. And the “harvest now, decrypt later” threat, where adversaries are already hoarding encrypted traffic to decrypt once a cryptographically relevant quantum computer exists, means the window for protecting long-lived secrets started closing years ago.

This article is a practical engineering guide. I will walk through the threat model, the NIST standards you need to know, how hybrid TLS works, where cloud providers actually stand today, and the concrete steps to migrate without burning down your existing infrastructure.

Why the Threat Is Not Hypothetical

Let me be precise about what breaks and what does not. A cryptographically relevant quantum computer (CRQC) running Shor’s algorithm can factor large integers and solve discrete logarithm problems in polynomial time. That breaks:

  • RSA key exchange and RSA digital signatures
  • ECDSA (elliptic curve digital signatures)
  • ECDH and ECDHE key agreement
  • Classic Diffie-Hellman

Symmetric encryption (AES) and hash functions (SHA-2, SHA-3) are not broken by Shor’s algorithm. They are weakened by Grover’s algorithm, which provides a quadratic speedup for unstructured search, but doubling key lengths (AES-256 instead of AES-128, SHA-384 instead of SHA-256) addresses that. The problem is almost entirely in asymmetric cryptography.

The uncomfortable timeline: most credible estimates put a CRQC capable of breaking 2048-bit RSA somewhere between 2030 and 2040. Some estimates push further. IBM’s quantum roadmap, Google’s Willow chip announcements, and Microsoft’s topological qubit research all suggest the timeline is accelerating faster than it was five years ago. No one can give you a firm date, but “not in my career” is no longer a defensible position for a principal architect.

The harvest now, decrypt later attack is the most immediate reason to act today. Nation-state adversaries with the resources to eventually build a CRQC have obvious incentives to capture TLS traffic now and decrypt it when the hardware arrives. Health records, financial data, intellectual property, government communications, anything with a long confidentiality requirement is already at risk. If you are protecting data that needs to stay secret for more than ten years, you needed to migrate yesterday.

The NIST Standards You Need to Know

In August 2024, NIST published three finalized post-quantum cryptographic standards. These are not research proposals or candidates. They are the official standards that hardware vendors, cloud providers, and governments are building against.

FIPS 203: ML-KEM (Module-Lattice Key Encapsulation Mechanism) is what was called CRYSTALS-Kyber during the competition. It replaces RSA and ECDH for key exchange. A key encapsulation mechanism (KEM) works differently from Diffie-Hellman: one party generates a public key, the other party encapsulates a shared secret using that public key, and only the holder of the private key can decapsulate it to recover the shared secret. The math is based on the hardness of the Module Learning With Errors (MLWE) problem, which resists both classical and quantum attacks. Three parameter sets exist: ML-KEM-512, ML-KEM-768, and ML-KEM-1024, offering progressively higher security levels.

FIPS 204: ML-DSA (Module-Lattice Digital Signature Algorithm) is what was CRYSTALS-Dilithium. It replaces ECDSA and RSA signatures. Same underlying lattice mathematics, same parameter sets (ML-DSA-44, ML-DSA-65, ML-DSA-87), and significantly larger signature sizes than ECDSA. An ML-DSA-65 signature is about 3,293 bytes compared to 64 bytes for an ECDSA P-256 signature. This matters when you have constrained environments or high-frequency signing.

FIPS 205: SLH-DSA (Stateless Hash-Based Digital Signature Algorithm) is what was SPHINCS+. It uses hash functions rather than lattice mathematics, which means its security assumptions are more conservative and well-understood. The trade-off is significantly larger signatures and slower key generation. Most organizations will use ML-DSA for performance-critical signing and SLH-DSA as a conservative backup or for very long-lived signatures like code signing roots.

A fourth standard, FN-DSA (FALCON), was published as FIPS 206 and is relevant for embedded and constrained environments, but I will not cover it here because it is less relevant for typical cloud infrastructure.

Post-quantum cryptography algorithm comparison: ML-KEM vs ML-DSA vs classical algorithms by key size, signature size, and performance

The Hybrid Approach: Why You Cannot Just Switch

The naive migration plan, replace all RSA/ECDSA with ML-KEM/ML-DSA and call it done, fails immediately against reality. You have clients running software you do not control. You have HSMs that do not support PQC algorithms yet. You have certificate authorities whose intermediate CAs are signed with ECDSA. You have partners and vendors whose systems need to interoperate with yours for years.

The answer is hybrid cryptography: use both a classical algorithm and a PQC algorithm simultaneously, such that breaking the session requires breaking both. If you use X25519+ML-KEM-768 for TLS key exchange, an attacker needs to break both X25519 (classically hard) and ML-KEM-768 (quantum-hard) to recover the session key. A quantum computer breaks X25519 but not ML-KEM. A classical attack that somehow breaks ML-KEM (the lattice math is new, after all) does not break X25519. The hybrid gives you defense in depth during the transition period.

For TLS 1.3, the hybrid approach uses X25519MLKEM768 or SecP256r1MLKEM768 as the key share in the ClientHello. The handshake adds roughly 1,600 extra bytes and 80 to 150 microseconds of additional compute overhead. That is a one-time cost per connection establishment, not per-request overhead. In practice I have seen this be completely invisible in production metrics for most workloads.

Chrome shipped hybrid key exchange in version 124. As of early 2026, over 60 percent of TLS traffic hitting Cloudflare uses hybrid ML-KEM. The ecosystem is moving. The question is whether your infrastructure is moving with it.

NIST officially permits hybrid mode under FIPS 140-3, provided the PQC component is a NIST-approved algorithm. This matters for organizations that need FIPS-compliant deployments.

Where Cloud Providers Actually Stand

The provider support story is better than it was 18 months ago, but not uniform.

AWS has AWS-LC, their open-source cryptographic library, which was the first to include ML-KEM in a FIPS 140-3 validation. AWS Certificate Manager has hybrid certificate support on its roadmap but has not shipped it to all regions as of mid-2025. AWS KMS does not yet support PQC key types for customer-managed keys, though they have published a roadmap. CloudFront and Application Load Balancer support hybrid TLS key exchange in select configurations.

GCP has Google Certificate Authority Service with hybrid certificate support in preview. Google’s own products, including Gmail and Chrome, are already running hybrid TLS. Cloud KMS PQC support is in limited preview. GKE supports PQC cipher suites through custom Envoy configurations.

Azure has a crypto agility framework and has published detailed guidance through their Security Development Lifecycle team. Azure Key Vault PQC support is in preview for ML-DSA. Azure Front Door hybrid TLS is available in specific regions.

HashiCorp Vault added ML-KEM and ML-DSA support in Vault 1.17. If you use Vault for secrets management and PKI, you can start issuing hybrid certificates and managing PQC key material today.

For organizations running their own PKI with OpenSSL, version 3.2 added ML-KEM and ML-DSA support. BoringSSL (used by Chrome, Android, and much of AWS) has had hybrid TLS support since 2024. LibreSSL lags behind. If your TLS termination uses a library you build yourself, check your version carefully.

Cloud provider post-quantum cryptography support matrix showing AWS, GCP, Azure, and open-source library readiness as of 2025

Building Cryptographic Agility First

Before you touch a single certificate, you need cryptographic agility: the architectural property that lets you swap cryptographic algorithms without rewriting applications. I have seen teams that needed six months to rotate a single signing key because the algorithm was hardcoded in three different places, required a coordinated deployment across five services, and had no automated testing for the rotation procedure. That team was not going to migrate to PQC in any reasonable timeline.

Cryptographic agility means:

Centralize all cryptographic operations. If every service implements its own TLS configuration, signature verification, or key derivation, you have no leverage. Put crypto behind internal libraries or services where a one-line config change can swap algorithms.

Abstract algorithm selection from key material. Your application should reference a key identifier, not an algorithm. The key management system (KMS, Vault, or cloud-native service) handles the algorithm. When you rotate to an ML-DSA key, the application does not need to know.

Make certificate rotation fast and automated. If rotating a certificate requires a manual ticket to a different team, you will not survive a PQC migration. Automate certificate lifecycle with tools like cert-manager on Kubernetes or AWS Private CA automation. The secret management patterns and automated rotation I described there apply directly here.

Inventory your cryptographic dependencies. You cannot migrate what you cannot find. Build a crypto bill of materials: every certificate, every signing key, every TLS configuration, every use of asymmetric cryptography in your codebase. Tools like Entrust PKI Spotlight, Keyfactor, and Venafi Discovery can automate discovery across your fleet. Open-source options include cryptographic linting in CI/CD with tools like crypto-verify and ssl-checker.

This is directly analogous to what software supply chain security work demands. The SBOM approach to supply chain security gave us a model for tracking software dependencies; a crypto inventory is the same principle applied to cryptographic primitives.

The Migration Roadmap

Here is the phased approach I recommend. I have run variations of this at organizations ranging from 50 engineers to multi-thousand-person engineering orgs, and the phases are roughly universal, though the timelines vary by starting point.

Phase 1: Inventory and classify (months 1 to 3)

Run discovery tooling across your fleet. Build a spreadsheet (or better, a database) of every certificate, key, and TLS configuration. Classify by risk: long-lived secrets protecting data with 10+ year confidentiality requirements go in a critical bucket. Short-lived tokens for stateless API calls go in a lower-priority bucket. Root and intermediate CA certificates are the highest priority because every leaf certificate is downstream of them.

Identify your crypto-agility gaps. Where is cryptography hardcoded? Where do you lack automated rotation?

Phase 2: Crypto agility remediation (months 2 to 6)

Fix the agility problems before you migrate algorithms. Centralize TLS configuration. Automate certificate rotation. Build the internal library abstractions that let you swap algorithms.

Enable hybrid TLS on your external-facing load balancers and CDN layer. This is low risk and high visibility, demonstrating that hybrid PQC TLS works with your client base. Monitor for compatibility issues with older TLS clients.

Phase 3: PKI migration (months 4 to 12)

Establish a new PQC root CA using ML-DSA. Issue hybrid intermediate CAs (dual-algorithm: classical ECDSA plus ML-DSA). Begin issuing hybrid leaf certificates for external-facing services.

This is the part that takes longer than people expect. Certificate trust is hierarchical and has inertia. Getting PQC roots into browser trust stores, mobile operating systems, and partner trust anchors takes months of coordination. Start early.

Phase 4: Internal service migration (months 6 to 18)

Migrate internal service-to-service TLS to hybrid key exchange. Migrate JWT signing keys to ML-DSA. Migrate SSH host keys. Migrate data encryption keys to use PQC KEMs for key wrapping.

For Kubernetes environments, migrate the cluster CA and API server certificates. Update service mesh mTLS configuration to use hybrid cipher suites. If you use Istio or Linkerd for zero-trust service-to-service authentication, both have roadmaps for PQC cipher suite support in their data planes.

Phase 5: Deprecate classical-only (months 18 to 36)

Once hybrid deployment is stable, begin phasing out classical-only configurations. Enforce hybrid or PQC-only cipher suites on internal services. Rotate out ECDSA-only signing keys.

This phase depends heavily on partner and vendor ecosystem readiness. You may need to maintain classical fallback for specific integrations for longer than you would like.

Post-quantum migration timeline showing five phases from inventory through classical deprecation

Practical Implementation Notes

A few things I have learned from running these migrations that are not obvious from the standards documents:

Performance is not your bottleneck. Engineers worry about ML-KEM performance, but benchmarks on modern hardware show ML-KEM-768 key generation and encapsulation at sub-millisecond times on commodity x86. The handshake overhead is real but small. The operational overhead, automation, monitoring, and change management, is where projects slow down.

Signature sizes matter at scale. ML-DSA-65 signatures are about 50 times larger than ECDSA P-256 signatures. For most applications this is irrelevant. For high-frequency event signing, append-only audit logs, or JWT-heavy architectures, the size increase can affect storage costs and network throughput. Profile before you migrate signing keys.

OpenSSL 3.2+ is your friend. If you control your TLS stack, upgrading to OpenSSL 3.2 is the single highest-leverage action you can take. It gives you ML-KEM and ML-DSA support, hybrid key exchange, and the provider architecture that makes algorithm swapping manageable. This directly enables the TLS fundamentals you rely on, just with quantum-safe algorithms.

Java is a problem. The JCA (Java Cryptography Architecture) provider ecosystem for PQC is still maturing. Bouncy Castle has solid PQC support and is the most practical path for Java applications today. Amazon’s AWS Java SDK uses AWS-LC-FIPS which has ML-KEM support, but many enterprise Java middleware stacks have not been updated.

HSMs are often the bottleneck. Hardware security modules certified under older FIPS standards often do not support PQC algorithms. Check your HSM vendor’s roadmap. Thales, Entrust, and nCipher all have PQC roadmaps but certification takes time. If your root CA key is in an HSM that does not support ML-DSA, your Phase 3 timeline is dictated by your HSM vendor’s schedule, not yours.

SSH is often forgotten. SSH host keys and user keys are RSA or ECDSA in the vast majority of deployments. OpenSSH 9.0 (released 2022) uses ML-KEM-768 + X25519 hybrid for key exchange by default. But your SSH host keys (the ones that identify servers to clients) still need to be migrated to ML-DSA or hybrid algorithms. This is particularly relevant for bastion hosts and jump boxes that you rely on for emergency access.

The encryption-at-rest and in-transit patterns you follow today will need updating. Any pattern that uses RSA or ECDH to wrap symmetric encryption keys needs to be migrated to ML-KEM wrapping. The symmetric encryption itself (AES-256-GCM) stays the same.

What to Watch For in the Ecosystem

The PQC landscape is still moving. A few things worth tracking:

FIPS 203/204/205 implementation quality varies. The standards are final, but implementations are new. Side-channel attacks against early ML-KEM implementations have been published in academic literature. Use well-reviewed implementations like AWS-LC, BoringSSL, or OpenSSL 3.2 rather than rolling your own.

Certificate transparency and PQC. CT logs currently use ECDSA for their signatures. The CT ecosystem has not fully worked out PQC migration. This creates a temporary gap for organizations that rely on CT for compliance. Watch the CA/Browser Forum discussions.

The internet PKI timeline. Browser vendors, the CA/Browser Forum, and root program operators (Apple, Microsoft, Mozilla, Google) need to coordinate on PQC root certificate requirements. The timeline for PQC root certificates in browser trust stores is still being worked out. Start your conversations with your CA vendor now.

Regulatory compliance. If your organization is in a regulated industry, check your specific compliance framework. CNSA 2.0 is mandatory for US government contractors and suppliers. ENISA has published PQC migration guidance for EU organizations. The NIS2 directive creates indirect pressure on cloud security posture including cryptographic standards.

What I Learned from Doing This Under Pressure

The engagement I remember most was a financial services firm that had a legacy batch job signing daily reports with RSA-2048. The keys were stored in a software keystore that had not been rotated in seven years. The signing code was in a monolithic Python service that also handled compliance reporting, tax calculations, and about a dozen other things. Nobody wanted to touch it.

We could not migrate them to ML-DSA directly because their downstream partners (auditors, regulators, counterparties) could not yet verify ML-DSA signatures. So we implemented a hybrid approach: the service signed with both RSA-2048 (existing) and ML-DSA-65 (new), embedding both signatures in the output. Partners verified the RSA signature. They started logging ML-DSA signature verification attempts on their test systems. Eighteen months later, their partners were ready, and they deprecated the RSA path.

The lesson: design for coexistence, not replacement. Ripping out the old and replacing with new is almost never the right migration strategy in production systems. Adding the new while keeping the old, with a clear deprecation timeline, is how these migrations actually succeed.

With twenty years of watching cryptographic migrations play out, from MD5 to SHA-1 to SHA-256, from SSL 3 to TLS 1.0 to TLS 1.3, I can tell you the pattern is always the same. The technology is ready years before the ecosystem is, and the organizations that invest in agility early are the ones that migrate without incidents. The ones that wait until they are forced to act are the ones you read about in post-mortems.

Start your crypto inventory now. The Q-Day deadline may be uncertain. The compliance deadlines are not.