Ramesh’s Security and Technlogy Blogs

Can Your Agents Prove Their Identity Without a Central Authority

2026-05-25T00:00:00+00:00

Part-1: Can your Agents prove their identity without a central authority?

If you are building multi-agent systems and looking at the future of Agentic identity, this post is for you. As agents become more autonomous and operate across team and organizational boundaries, who can talk to whom becomes a security problem, not a routing problem. This post describes a practical approach using W3C Decentralized Identifiers and Verifiable Credentials, with working Python code you can run locally in under 10 minutes.

The Problem

Multi-agent systems today rely on shared secrets (API keys) or central registries (service meshes, config files) for trust. Both break down when:

An agent is compromised and you need to revoke access immediately
Agents span organizational boundaries (different teams, companies, cloud accounts)
The central registry goes down or is itself compromised

These patterns assume you control the entire system. They do not scale to autonomous agents cooperating across trust boundaries.

What Is Decentralized Identity?

To understand decentralized identity (DID), it helps to see where identity systems have been and why each generation solved one problem while introducing another.

Centralized Identity (LDAP, Active Directory)

In a centralized model, a single authority owns and manages all identities. Microsoft Active Directory is the classic example: every user, every service account, every permission lives in one directory. To determine “is Agent X allowed to do Y?” you query the directory.

This works inside one organization. It creates a single point of failure. If the directory is down, nothing authenticates. If it is compromised, an attacker controls every identity. For multi-agent AI, centralized identity does not work across organizational boundaries. Your LDAP server cannot vouch for an agent running in someone else’s infrastructure.

Federated Identity (SAML, OAuth/OIDC)

Federation addresses cross-organization trust. Instead of one authority, multiple Identity Providers (IdPs) agree to trust each other. SAML and OAuth/OIDC enable “log in with Google” or accept tokens from a partner’s IdP.

Federation reduces single-point-of-failure risk but introduces structural dependencies. You need pre-negotiated trust relationships between IdPs. Token verification requires a round-trip to the issuer (or access to a JWKS endpoint). Setting up federation across many parties is operationally heavy.

Decentralized Identity (DIDs + Verifiable Credentials)

Decentralized identity removes the central authority entirely. Each entity creates its own identity (a DID backed by a cryptographic key pair) and carries its own credentials. Verification happens locally. The verifier checks a cryptographic signature, not a database.

The benefit of Decentralized identity is that there is no single point of failure, No pre-negotiated trust relationships and No issuer callback at verification time. All the verification is handled cryptographically.

	Centralized	Federated	Decentralized
Authority	Single (LDAP/AD)	Multiple IdPs	None (self-sovereign)
Single point of failure	Yes	Reduced	No
Cross-org trust	Not possible	Requires federation setup	Built-in
Verification	Query the directory	Token introspection / JWKS	Local signature check
Revocation	Delete from directory	Token expiry / revoke at IdP	Revocation list
Agent suitability	Poor (designed for humans in one org)	Moderate (token-based)	Strong (peer-to-peer, no human in loop)

Components of Decentralized Identity

A decentralized identity system has four core components.

1. Decentralized Identifier (DID) is a globally unique string like did:web:example.com:flight that the agent owns. The did:web method means “resolve this DID by fetching a document over HTTPS.” Other methods exist (did:key, did:ion, did:ethr) but did:web is the simplest for production web services.

2. DID Document is a JSON document published at the URL derived from the DID. It contains the agent’s public key, what that key can be used for (authentication, signing credentials), and service endpoints (where to reach the agent). Anyone who resolves the DID gets this document.

3. Verifiable Credential (VC) is a signed assertion from an issuer about a subject. “The Orchestrator certifies that Flight Agent has the capabilities: flight_search, flight_booking.” The credential is portable. The agent holds it and presents it on demand. The verifier checks the issuer’s signature without contacting the issuer.

4. Revocation List is a list of credential IDs that are no longer valid. The verifier checks this list during authorization. If the credential ID appears, the agent is rejected, even if the signature is perfect and the credential has not expired.

How the Trust Chain Works

When the Orchestrator needs to delegate a task to the Flight Agent, it runs a 5-step trust chain. Each step builds on the previous one. Failure at any step means the task is never sent.

Step 1, Discovery: The Orchestrator fetches the Flight Agent’s Agent Card (a JSON file at a well-known URL). The card states: “I am Flight Agent, I can search flights, my DID is did:web:example.com:flight.”

Step 2, Resolution: The Orchestrator resolves the DID. It converts did:web:example.com:flight into https://example.com/flight/did.json, fetches the document, and extracts the public key and service endpoints.

Step 3, Authentication: The Orchestrator sends a random 32-byte nonce to the Flight Agent: “sign this.” The Flight Agent signs it with its private key. The Orchestrator verifies the signature using the public key from the DID Document. If it verifies, the agent provably holds the private key. It is who it claims to be.

Step 4, Authorization: The Orchestrator fetches the Flight Agent’s Verifiable Credential and runs four checks locally: (1) Is the signature from the claimed issuer? (2) Has it expired? (3) Is it on the revocation list? (4) Does it grant the needed capability? All four checks must pass.

Step 5, Delegation: Only after all checks pass does the task flow. The agent has been discovered, identified, authenticated, and authorized. The Orchestrator sends the task.

The full chain completes in under 200ms. Five HTTP requests. Zero central databases.

Use Cases This Solves

Cross-organization agent collaboration. Agents from different companies verify each other without a shared authority. Each agent’s DID is self-sovereign.

Instant revocation without key rotation. A compromised agent is cut off by adding one credential ID to a revocation list. No restart, no config changes, no cascading updates across services.

Least-privilege enforcement. Credentials explicitly list granted capabilities. An agent authorized for flight_search cannot perform flight_booking unless its credential grants that capability.

Replay attack prevention. Every authentication uses a fresh 32-byte random nonce. A captured response is useless for future challenges.

Decentralized verification. Verifiers resolve DIDs and check credential signatures locally. No round-trip to an issuer or central authority at verification time.

Auditable trust decisions. Every step (discovery, resolution, authentication, authorization) produces a verifiable artifact. You can reconstruct exactly why an agent was trusted or rejected.

Graceful credential rotation. Credentials expire (for example, after 30 days). New ones are issued without downtime. Old ones naturally stop working.

Try It Yourself

The prototype uses three Flask servers (orchestrator, flight agent, hotel agent) with Ed25519 cryptography via PyNaCl. The code is available at github.com/r2rajan/did-vc under the sample1 directory.

Setup:

git clone https://github.com/r2rajan/did-vc.git
cd did-vc/sample1
python3 -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt

What Is Next: Part 2, From Flask to AWS

The trust primitives stay the same. The deployment changes into minimum viable product (mvp) to deploy in AWS cloud.

Flask becomes Lambda (serverless)
localhost becomes API Gateway + CloudFront (HTTPS, global)
Files on disk become DynamoDB + Secrets Manager (encrypted, managed)
Hardcoded responses become Amazon Bedrock (LLM-powered agent reasoning)

The identity layer is independent of the infrastructure layer. That is the value of building on standards.

Part 1 of a two-part series. Part 2 deploys the system to AWS with Lambda, Bedrock, and use real time agents, LLMs and an UI.

JA4 Signatures: The fingerprint that bots find difficult to fake

2026-05-21T00:00:00+00:00

JA4 Signatures: The fingerprint that bots find difficult to fake

Last week, I encountered a security incident that resulted in a Denial of Service (DoS) for an e-commerce API. Every request carried a Chrome User-Agent header. Valid cookies. Residential IP addresses. Rate limiting didn’t trigger because the requests trickled in at human-like intervals across thousands of IPs.

The question isn’t whether these attacks will reach your infrastructure. It’s whether you can fingerprint each connection and identify what it actually is, regardless of what it claims to be.

For centuries, people in the Indian state of Kerala have used three composite identifiers to uniquely identify people which still exists today.

Their ancestral house name (veedu peru)
Their given name (First name and Surname).
Their village (desam)

The naming system works in layers.

The ancestral house name - ( veedu peru or tharavadu peru) carries more weight than a family name in the Western sense. It identifies your specific lineage and property. Two families with the surname “kutty” in the same village might be completely unrelated, but their house names distinguish them immediately.
Then comes the given name.
Then the village or the locality the family belongs to (desam or sthalam).

For example “Ramesh” could be anyone.

The fingerprint quality comes from the combination. “Ramesh” alone is common. “Ramesh from Kottayam” narrows it. “Padinjarekara Ramesh from Kottayam” is essentially unique — you’ve identified not just the person but their lineage, their ancestral property, and their geographic origin in one string.

There’s a layer of network identity that works the same way as the Kerala naming system to uniquely identify each connection. It’s called JA4 fingerprinting, and it lives in the TLS handshake which is a three-part signature that reveals what a client actually is, regardless of what it claims to be.

What Happens Before Your App Even Sees a Request

When a client connects to your server over HTTPS, a handshake happens before any application data flows. Think of it like arriving at a building with a security desk. Before you get to your meeting, you show your ID, exchange credentials, agree on how you’ll communicate.

The TCP handshake establishes the connection (SYN, SYN-ACK, ACK). Then the TLS handshake negotiates encryption. The very first message the client sends in TLS is called the ClientHello. It contains:

Which TLS versions the client supports
Which cipher suites it offers (the encryption algorithms it knows)
Which extensions it wants to use
What protocols it prefers (HTTP/2, HTTP/1.1, HTTP/3)

A real browser, a Python script, a Go binary, and a piece of malware all construct this message differently. They use different libraries, different defaults, different capabilities. The ClientHello is an involuntary fingerprint that the client can’t help but reveal what it actually is.

JA3 was the first widely-adopted method for fingerprinting ClientHellos. It worked, but it produced opaque MD5 hashes that told you nothing at a glance, broke when GREASE values randomized fields, and couldn’t distinguish between similar clients. JA4 fixes all of that.

What is JA4?

JA4 isn’t a single fingerprint. It’s a family — the JA4+ suite — and each member fingerprints a different layer of the connection.

Three innovations make JA4 fundamentally better than its predecessors.

It’s human-readable. A JA4 fingerprint looks like t13d1515h2_8daaf6152771_e5627efa2ab1. That first section — t13d1515h2 — tells you immediately: TCP connection, TLS 1.3, domain name present, 15 cipher suites, 15 extensions, HTTP/2. You can glance at it and know you’re looking at a modern browser. Compare that to JA3’s 66918128f1b9b03303d77c6f2eefd128. Which one tells you something useful at 3 AM during an incident?

GREASE removal and sorting produce stable fingerprints. GREASE (Generate Random Extensions And Sustain Extensibility) values are dummy entries browsers inject to prevent server ossification. They change randomly between connections. JA4 strips them and sorts the remaining values, so the same client produces the same fingerprint every time — regardless of GREASE randomization.

Layered fingerprinting catches sophisticated evasion. A bot might match a browser’s TLS fingerprint by using a patched TLS library. But does its TCP window size match? Does its HTTP header order match? JA4T + JA4 + JA4H together create a multi-dimensional identity that’s expensive to fully replicate.

How JA4 Works — Under the Hood

A JA4 fingerprint has three sections separated by underscores:

Each section adds a layer of specificity.

Section A is the human-readable metadata. It encodes protocol type, TLS version, whether SNI is present, cipher count, extension count, and ALPN value. Ten characters that tell you immediately what category of client you’re looking at.

Section B is the first 12 characters of a SHA-256 hash computed over the sorted, GREASE-removed cipher suites. Two clients might share the same Section A (same TLS version, same extension count) but their cipher suite selections reveal different TLS libraries and configurations.

Section C hashes the extensions (excluding SNI and ALPN, already captured in Section A) plus signature algorithms. Two clients using the same library version might still differ here based on how they’ve been configured.

Any single section has collisions. Together, they produce a composite identifier with enough entropy to differentiate millions of distinct clients.

The sorting step is what gives JA4 its stability. Two connections from the same client will produce identical fingerprints even if the underlying library randomizes the order of ciphers or extensions. The hash truncation keeps things compact while preserving enough uniqueness to differentiate millions of distinct clients.

Here’s what the comparison looks like across three very different clients:

Chrome Browser:   t13d1515h2_8daaf6152771_e5627efa2ab1

Python requests: t12d0907h1_ac4b62f6e85_7cdb5ce3f4e2

Malware (minimal): t12d030100_b8c8b6e2a142_3f2e7a9d1bc4

Even without decoding the hashes, Section A alone creates a self indictment for the malware signature. It is a request claiming to be a Chrome browser but showing t12d0301 — TLS 1.2, three cipher suites, no Application Layer Protocol Negotiation (ALPN) is lying. No modern browser looks like that particularly Chrome.

Code Demo: Building a JA4 Fingerprint

The following Python script (included in this repo as ja4_fingerprint_demo.py) demonstrates the complete JA4 construction algorithm. It doesn’t require packet capture — it uses simulated ClientHello messages to show the math clearly.

The key functions:

def remove_grease(values: list) -> list:
    """Remove all GREASE values from a list of cipher/extension IDs."""
    return [v for v in values if not is_grease(v)]


def build_section_a(hello: ClientHello) -> str:
    """
    Build the human-readable section.
    Format: {protocol}{version}{sni}{cipher_count}{ext_count}{alpn}
    Example: t13d1516h2
    """
    proto = hello.protocol
    version = TLS_VERSION_MAP.get(hello.tls_version, "00")
    sni_flag = "d" if hello.sni else "i"
    ciphers_no_grease = remove_grease(hello.cipher_suites)
    cipher_count = f"{len(ciphers_no_grease):02d}"
    extensions_no_grease = remove_grease(hello.extensions)
    ext_count = f"{len(extensions_no_grease):02d}"

    if hello.alpn:
        first_alpn = hello.alpn[0]
        alpn_mapped = ALPN_MAP.get(first_alpn, f"{first_alpn[0]}{first_alpn[-1]}")
    else:
        alpn_mapped = "00"

    return f"{proto}{version}{sni_flag}{cipher_count}{ext_count}{alpn_mapped}"

Section B sorts cipher suites as hex strings and hashes them:

def build_section_b(hello: ClientHello) -> str:
    ciphers = remove_grease(hello.cipher_suites)
    cipher_hex = sorted([f"{c:04x}" for c in ciphers])
    cipher_string = ",".join(cipher_hex)
    return hashlib.sha256(cipher_string.encode()).hexdigest()[:12]

Section C does the same for extensions, but excludes SNI and ALPN (already captured in Section A) and appends signature algorithms:

def build_section_c(hello: ClientHello) -> str:
    extensions = [
        e for e in hello.extensions
        if not is_grease(e) and e not in {0x0000, 0x0010}
    ]
    ext_hex = sorted([f"{e:04x}" for e in extensions])
    ext_string = ",".join(ext_hex)
    sig_algs = [f"{s:04x}" for s in hello.signature_algorithms]
    sig_string = ",".join(sig_algs)
    combined = f"{ext_string}_{sig_string}"
    return hashlib.sha256(combined.encode()).hexdigest()[:12]

Run python ja4_fingerprint_demo.py to see the full output with three simulated clients — Chrome, Python requests, and a minimal malware implementation. The difference is immediately visible.

For production use, see FoxIO’s official JA4+ implementation which handles the full spec including edge cases around QUIC, raw packet parsing, and integration with common network tools.

Real-World Security Use Cases

Bot detection. This is one use case where JA4 is effective. A credential-stuffing bot sets its User-Agent to Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36... — identical to Chrome. But it’s written in Go using a standard net/http client. Its JA4 fingerprint reveals TLS 1.2, 9 cipher suites, no ALPN. Chrome hasn’t looked like that since 2019. Blocked.

Malware hunting. Command-and-control frameworks leave distinctive fingerprints. Cobalt Strike’s default HTTPS beacon, Metasploit’s Meterpreter, Sliver, BruteRatel — they all use specific TLS libraries with specific defaults. Security teams publish known-bad JA4 fingerprints the same way they publish known-bad IP addresses, except fingerprints are harder for attackers to rotate.

API protection. Your mobile app uses certificate pinning and a specific HTTP client. You know its JA4 fingerprint. When someone reverse-engineers your API and makes calls from a Python script using stolen tokens, the fingerprint mismatch gives them away — even if every other header is perfect.

WAF enhancement. JA4 rules complement traditional signatures. A request might pass every content-based rule but get flagged because no legitimate client produces that fingerprint for that endpoint. The ja4db.com database catalogs fingerprints for known applications, making rule authoring straightforward.

Limitations and Considerations

JA4 isn’t a silver bullet. Sophisticated attackers using headless Chrome or patched browsers produce legitimate-looking fingerprints because they are legitimate browsers — just automated ones. Fingerprinting catches the gap between what traffic claims to be and what it is, but when the traffic genuinely is what it claims to be (just automated), you need behavioral analysis on top.

There are privacy implications. The same properties that let you fingerprint bots let you fingerprint users. JA4 is less granular than canvas fingerprinting or font enumeration, but it still contributes to a trackable identity. Use it for security, not surveillance.

Encrypted Client Hello (ECH), currently in draft, will eventually encrypt the ClientHello contents. When ECH reaches widespread adoption, passive fingerprinting becomes harder. Active fingerprinting techniques and server-side analysis will matter more.

JA4 works best as one signal in an ensemble — combined with behavioral analysis, rate limiting, device fingerprinting, and challenge-response mechanisms.

Getting Started

JA4 is already integrated into tools you probably run:

Zeek — native JA4 support via package
Suricata — JA4 keywords in rules
Wireshark — JA4 column available in recent versions
Cloudflare, AWS WAF, Fastly — various levels of JA4 support in CDN/WAF products

For the fastest path to value: start with JA4, the TLS fingerprint, to identify what each client actually is at the handshake layer. Then combine JA4 with JA4H (HTTP fingerprint) for deeper coverage — TLS-layer identity plus HTTP-layer behavior together catches the widest range of automated traffic with minimal false positives.

FoxIO maintains the open-source reference implementation with libraries for multiple languages and integration guides for common platforms.

Try it on your own traffic. Capture a few minutes of TLS sessions, compute the JA4 fingerprints, and see how many distinct client types appear. You’ll likely find that 80% of your traffic produces fewer than 10 unique fingerprints — and anything outside that set deserves a closer look.

The naming analogy in this post is inspired by the traditional naming conventions of the people of Kerala, India. Their system that achieved unique identification through layered context long before centralized identity systems existed. Credit and gratitude to the Malayali community for a cultural practice that elegantly illustrates how composite identifiers work.

What recent military conflicts teach us about Kinetic Resilience

2026-05-13T00:00:00+00:00

What recent military conflicts teach us about kinetic resilience

I have always been drawn to military history. The strategies, the engineering, the way wars force innovation at a pace that peacetime never does. From Alexander to the General Bernard Montgomery, I find myself reading about how leaders and armies adapted to new threats in real time. So when the conflict in Ukraine began reshaping land warfare in Europe, I followed it closely. Not the politics of it, but the engineering of it. Specifically, how aerial threats have rendered the Main Battle Tank, a platform that dominated land warfare for a century, vulnerable in ways its designers never anticipated.

The Leopard 2, the T-90, the Challenger, the M1 Abrams are some of the best of the breed Main Battle Tanks. It does not matter whose flag was painted on the hull. A first-person-view drone costing a few hundred dollars, piloted by a soldier with a headset and a gaming controller, can disable or even destroy a sixty-tonne machine worth several million. The threat comes from above. The armour was designed for the front and the sides.

What fascinated me was the response and not the destruction. Crews in the field, with limited resources and no time, began improvising defences. Welded metal cages on turret roofs. Netting draped over vehicles. Electronic jammers strapped to hulls. These were not elegant solutions. They were born of necessity, built from whatever was available, and they worked well enough to keep crews alive and protect their equipment.

Then in March 2026, military strikes damaged cloud data centre facilities in the Middle East for the first time. The threat I had been watching on the battlefield had arrived at the doorstep of digital infrastructure. In myprevious post, I explored how to architect workloads to survive the loss of a facility. But that post deliberately left one question underexplored. Can the battlefield improvisations, be adapted to add a layer of physical protection to data centre itself?

This post is my attempt to answer that question. I lean more on curiosity and creative thinking than hard facts or core engineering. Consider this post as a thought experiment, not a technical specification.

The Roof Nobody Thought About

Data centre physical security is a mature discipline. Perimeter fencing with anti-climb measures. Vehicle bollards rated to stop a lorry at speed. Mantraps with biometric authentication. Security operations centres monitoring every door and corridor. Access control systems that would make a bank vault envious. All of these measures were focused on the ground.

The roof, by contrast, is where the HVAC systems sit. Where the skylights are. Where the cable trays run. It is protected against weather, against water ingress, against the occasional bird strike. It is not protected against a deliberate aerial threat.

This was perfectly reasonable for decades. The threat model for a data centre did not include someone flying an explosive device into the roof at 120 kilometres per hour. That threat model has now changed. The Ukraine conflict demonstrated that small, inexpensive drones can deliver shaped charges with precision. The recent conflict in middle-east on cloud infrastructure confirmed that data centres are real targets.

Lessons from the Battlefield

The soldiers in Ukraine did not have the luxury of waiting for a perfect solution. They needed something that worked today, built from materials they could source this week. The data centre industry has more time and more resources, but the engineering principles are the same.

Netting as a First Line of Defence

In Izyum, in northeastern Ukraine, high-tensile netting is suspended over civilian infrastructure to protect against daily FPV drone attacks. The nets serve three functions. They physically trap incoming drones, entangling rotors and arresting forward motion. They can detonate a drone’s payload at a safe distance above the roof surface, dissipating the blast energy before it reaches the structure. And they create uncertainty for the drone operator, who cannot be certain whether the payload will reach the intended target.

For a data centre, the same principle applies. Netting suspended above the roofline, at sufficient height to create a detonation gap, provides a passive defence layer that requires no power, no operator, and no maintenance beyond periodic inspection. It does not stop every threat. But it raises the difficulty and reduces the probability of a clean strike.

Slat Armour for Critical Rooftop Equipment

Tank crews in Ukraine weld rigid metal grids, known as cope cages or slat armour, to their turret roofs. The engineering is straightforward. A shaped charge warhead, the type carried by most FPV drones, requires a specific standoff distance to form its penetrating jet. A metal grid detonates the warhead prematurely, before it reaches the optimal standoff distance, causing the jet to disperse rather than penetrate.

Data centre roofs have specific vulnerable points. HVAC units, skylights, cable penetrations, and exhaust vents. These are the points where a shaped charge could breach the roof envelope and damage equipment below. Rigid metal grids installed over these points replicate the cope cage principle. They do not make the equipment invulnerable. They make a successful penetration far less likely.

Electronic Countermeasures

The third layer is electronic. FPV drones rely on a radio link between the operator and the aircraft. Disrupt that link and the drone becomes uncontrollable. It either crashes, flies off course, or enters a failsafe mode that takes it away from the target.

Electronic warfare systems that emit interference across the frequency bands used by commercial and military drones are already deployed on vehicles in the Ukraine conflict. They create a protective bubble within which drone control signals cannot reach the aircraft.

For data centres, the challenge is specificity. A facility that jams drone control frequencies indiscriminately will also disrupt its own wireless networks, cellular connectivity, and potentially GPS-dependent systems. The implementation requires directional emission, careful frequency selection, and coordination with telecommunications regulators. It is not a simple installation. But the technology exists and is proven in the field.

Visual Obscuration

The simplest and cheapest measure is making the target harder to find and identify. FPV drone operators navigate visually. They identify the target through a camera feed, often at speed, and guide the drone to a specific point on the structure.

Reflective netting, camouflage patterns, and visual disruption materials on rooftops interfere with this process. They do not make the building invisible. They make it harder to identify the precise aim point, which reduces the accuracy of a manual strike. Against autonomous drones that navigate by GPS coordinates rather than visual identification, this measure is less effective. Against the manually piloted FPV drones that constitute the majority of the current threat, it adds a meaningful layer of difficulty.

Conclusion: The Cope Cage is Not the Strategy

If you think any of these measures, would make the datacenter completely invulnerable to an aerial threat, then you are living in a mirage. A determined adversary with sufficient resources will find a way through netting, past jammers, and around camouflage.

The soldiers in Ukraine understood this. The cope cage on a tank is not a guarantee of survival. It is a way to improve the odds. It buys time. It turns a certain kill into a probable miss. It keeps the crew alive long enough to reach cover or for the electronic countermeasures to take effect.

The same logic applies to data centres. Physical hardening is not a strategy in itself. It is one layer in a defence that must also include architectural resilience at the workload level. The netting buys you time at the roof level. The cages keep the HVAC running a bit longer. The jammers might stop the drone before it arrives at all. But none of them guarantee the building survives, which is why the workload architecture underneath matters more than any of them.

Harden the facility to improve the odds. Architect the workload to survive regardless.

After Thoughts

The data centre industry has not yet had its “cope cage moment.” The improvised, field driven solutions that emerged in Ukraine were born out of immediate necessity. Data centre operators have the advantage of time, resources, and engineering rigour. They can design these defences properly rather than welding them together under fire.

But the window for preparation is not unlimited. The threat has moved from theoretical to demonstrated. The engineering principles are proven. The materials and technologies exist. What remains is the decision to act.

The roof deserves the same attention we have given the perimeter. The sky is no longer empty and will no longer be considered safe based on land borders.

Part-3: Who are you? How client Registration works in the Agentic World

2026-05-10T00:00:00+00:00

Part-3: Who Are You? How Client Registration works in the Agentic World

Everytime i go to Las Vegas for attending technology conferences, I am always worried about the soul-crushing, velvet-rope maze at the hotel front desk where you stand for forty minutes to register and prove your identity to get that plastic key card.

Recently, I landed in Las Vegas in December 2025 to speak at AWS’s flagship conference re:invent. With over 50,000 in-person attendees descending on the city in a single week, this time as I walked into the lobby and I didn’t stop. While a sea of people stood in line to register by handing over IDs, waiting for the front desk to manually type data into a database, and receive a piece of plastic, I kept moving. My room was assigned, my identity was verified, and my phone was already a functioning key. I bypassed the front desk entirely and went straight to my room. It was a frictionless experience.

That plastic key card, in the world of OAuth, we call it Dynamic Client Registration (DCR). We’ve spent years getting applications to stand in that same “front desk” line, registering with an authorization server, waiting for credentials, storing them in a database. In my previous post, I walked through how Alice grants consent to individual agents using exactly this model. Each agent registers as its own OAuth client via DCR (RFC 7591), getting per-agent revocation, independent scope ceilings, and clean audit trails.

But what if our agents could just walk past the velvet rope? What if they could carry their own Digital Key that a server could verify on the fly? No registration desk, no waiting, no database entry.

That’s the promise of Client ID Metadata Documents (CIMD). This post breaks down both models, when each applies, and how they work in the agentic world.

The Registration Problem in Agentic Systems

Traditional OAuth assumes a small, known set of clients registered with a small, known set of authorization servers. An admin creates a client in the IdP dashboard, copies the client_id and client_secret, and hardcodes them into the app config.

Agents break this assumption in two directions:

Agent-side explosion. An enterprise might deploy dozens of agents (reporting-agent, coding-agent, deployment-agent), each needing its own identity. Manual registration doesn’t scale.

Server-side explosion. In MCP, a single agent might connect to hundreds of tool servers. If each server requires separate registration, the agent needs hundreds of client_id values, one per server.

DCR and CIMD each address one side of this problem.

Dynamic Client Registration (DCR): The Server-Managed Model

DCR (RFC 7591) lets a client programmatically register with an authorization server by POSTing its metadata to a /register endpoint. The server validates, stores the registration, and returns credentials.

POST /register HTTP/1.1
Content-Type: application/json

{
  "client_name": "Reporting Agent",
  "redirect_uris": ["https://agents.example.com/callback"],
  "grant_types": ["authorization_code", "refresh_token"],
  "scope": "reports:read",
  "agent_metadata": {
    "owner": "data-platform-team",
    "agent_type": "autonomous-reporting",
    "capability_version": "2.3.0"
  }
}

The server responds with a unique client_id and (for confidential clients) a client_secret. From that point forward, the agent authenticates using those credentials.

Why DCR Works for Agent Fleets

In my previous post, I made the case for per-agent registration: automated onboarding, scoped revocation, independent scope ceilings, and clean audit attribution. DCR makes all of that practical. Each agent gets its own client_id, its own scope ceiling, and its own entry in the registry that doubles as your source of truth for what agents exist and who owns them.

Where DCR Breaks Down

DCR was designed for a world where the number of clients is bounded and the authorization server is a known entity. In the world of agents and open ecosystems, three problems emerge:

Unbounded database growth. If 10,000 users each run the same coding-agent, that’s 10,000 registrations for what is logically one application.

The open endpoint problem. The /register endpoint must be accessible to unauthenticated clients. This makes it a target for DDoS attacks and registration flooding.

N × M coordination. If an agent connects to M different MCP servers, each with its own AS, it needs M separate registrations. This is the “registration wall” that blocks agent-to-server connectivity.

To make matters worse, many identity providers (Entra ID, some Okta configurations) don’t expose a public DCR endpoint or require a pre-provisioned API key to access the registration_endpoint. Teams end up building OAuth proxy infrastructure just to work around this.

Client ID Metadata Documents (CIMD): The Client-Hosted Model

CIMD flips the registration model entirely. Instead of the client registering with the authorization server (AS), the client hosts its own identity document for the AS to fetch.

The client_id is an HTTPS URL. The authorization server fetches that URL, reads the JSON metadata, and validates it on demand.

{
  "client_id": "https://coding-agent.example.com/.well-known/oauth-client.json",
  "client_name": "Coding Agent",
  "redirect_uris": [
    "https://coding-agent.example.com/callback",
    "http://localhost:3000/callback"
  ],
  "grant_types": ["authorization_code"],
  "response_types": ["code"],
  "token_endpoint_auth_method": "none"
}

The agent hosts this file. Any MCP server that supports CIMD can fetch it, validate it, and proceed with the OAuth flow.

The CIMD Flow

Agent sends authorization request with client_id = https://coding-agent.example.com/.well-known/oauth-client.json
AS fetches that URL via HTTP GET
AS validates: client_id in JSON matches the URL, redirect_uris are consistent
AS shows consent screen using client_name and logo_uri from the metadata
AS caches the metadata (respecting HTTP cache headers)
OAuth flow proceeds normally

No registration phase and importantly there are no credentials. The AS verifies identity by confirming the agent controls the domain where the metadata lives.

How CIMD Prevents Impersonation

The obvious question is what stops a malicious agent at evil.com from claiming to be coding-agent.example.com?

The answer is redirect_uri validation. The attacker sends an authorization request using the legitimate agent’s client_id URL but includes their own redirect_uri (https://evil.com/callback). The AS fetches the metadata from coding-agent.example.com, reads the redirect_uris list, and sees that evil.com isn’t in it. Request denied.

The attacker can’t modify the metadata file because they don’t control coding-agent.example.com. Domain ownership is the identity proof.

Confidential Clients with CIMD

DCR gives confidential clients a client_secret from the server. CIMD takes a different approach: the client proves identity using its own private key.

{
  "client_id": "https://coding-agent.example.com/.well-known/oauth-client.json",
  "client_name": "Coding Agent",
  "redirect_uris": ["https://coding-agent.example.com/callback"],
  "token_endpoint_auth_method": "private_key_jwt",
  "jwks_uri": "https://coding-agent.example.com/.well-known/jwks.json"
}

When the agent authenticates at the token endpoint, it signs a JWT with its private key. The AS fetches the public key from jwks_uri and verifies the signature. No shared secret required.

When to Use DCR for Agents?

DCR earns its complexity in one specific scenario: you own the authorization server and you need per-instance control over your agent fleet.

If you’re running an enterprise where security teams need to revoke a single compromised agent without touching the rest, DCR is your tool. If compliance requires a registry of every authorized agent with its owner, permission history, and lifecycle state, DCR gives you that registry as a byproduct of registration.

The concrete cases:

You need the AS to enforce scope ceilings at registration time. Reporting-agent caps at reports:read. Coding-agent gets code:read code:write. The ceiling is set before the user ever sees a consent screen.
Your agents evolve. Coding-agent ships a new capability next quarter that requires deployments:trigger. You want the AS to track that scope escalation and force re-consent. DCR’s registration record gives you the audit trail.
An agent gets compromised at 2am. You revoke its client_id and every token tied to it dies immediately. The other 15 agents in your fleet keep running.

The pattern: DCR works when the number of agents is bounded, the AS is a known entity you control, and governance matters more than onboarding speed.

When CIMD is useful?

For most agent-to-server connections in the MCP world, CIMD should be your default and the reason is that your agent will connect to servers it has never seen before. If each connection requires a registration step, you’ve built a toll booth on every on-ramp. CIMD removes the toll booth.

Where it shines:

Your agent connects to 20+ external MCP servers (GitHub tools, Slack tools, monitoring APIs). One metadata file at https://your-agent.com/oauth.json works for all of them. No per-server credentials to manage.
You ship an IDE extension or CLI tool used by 10,000 developers. With DCR, that’s 10,000 registrations in every server’s database. With CIMD, it’s one URL.
You want new server connections to work the moment a user clicks “connect.” No admin tickets, no API keys, no waiting for IT to provision a client.

In short, if your agent needs to talk to servers you don’t control, CIMD is the path of least resistance.

The Hybrid Model: DCR Internally, CIMD Externally

These two models aren’t mutually exclusive. They solve different problems at different layers. Now consider this architecture where both DCR and CIMD work together to solve different problems.

Internally: coding-agent registers via DCR with your enterprise AS. It gets a unique client_id, scope ceilings, and all the governance benefits. Alice’s consent is managed through the patterns from my previous post, standing authorization for routine work, task-scoped grants for sensitive operations.

Externally: When coding-agent needs to connect to an external MCP server (GitHub tools, Slack tools, third-party APIs), it presents its CIMD. No registration required. The external server fetches the metadata, validates the domain, and proceeds.

The agent holds two identities:

An internal DCR-issued client_id for enterprise governance
A CIMD URL for open federation

This gives you governance internally and speed externally

Security Trade-offs

Let’s understanad the security threat surface of each model to help you make informed decisions.

DCR Risks

DCR’s biggest exposure is the /register endpoint itself. It’s open by design, which means an attacker can flood it with junk registrations until your AS database chokes. Rate limiting and requiring initial access tokens help, but you’re still defending an open door. Beyond flooding, there’s impersonation: nothing stops an attacker from registering with client_name: "Official Coding Agent" and tricking users on the consent screen. Software statements can mitigate this, but few teams implement them today. And then there’s the long tail problem. Agents get decommissioned, teams move on, but the registrations stay in the database. Without TTLs or periodic cleanup, you accumulate dead entries that bloat storage and complicate audits.

CIMD Risks

CIMD trades the open registration endpoint for a different risk: your AS now fetches URLs from strangers. A malicious client could submit https://169.254.169.254/ as its client_id and trick your AS into hitting internal infrastructure. You need a hardened fetcher that blocks private IP ranges, enforces timeouts, and caps response size. The localhost problem is subtler: if a client claims http://localhost:1234 as its identity, the AS can’t verify which application is actually listening on that port. In production, restrict CIMD to non-localhost HTTPS URLs. Finally, domain ownership proves identity but not intent. Anyone can register my-evil-agent.io and host valid metadata there. The AS knows who is asking, but not whether they should be trusted. Trust policies, warning messages for unknown domains, and eventually Software Statements are the path forward here.

What’s Next: Software Statements and Platform Attestation

Both DCR and CIMD have a gap: neither proves the agent is who it claims to be beyond domain ownership or registration-time trust.

The emerging answer is Software Statements (defined in RFC 7591 §2.3). These are signed JWTs issued by a trusted third party (an app store, an OS vendor, a corporate registry) that attest to the agent’s identity.

coding-agent hosts its CIMD at https://coding-agent.example.com/oauth.json
The metadata includes a software_statement, a JWT signed by your enterprise’s agent registry
The external MCP server validates the statement against the registry’s public key
Trust is established not just by domain ownership, but by a verifiable chain of attestation

This bridges the gap between CIMD’s scalability and DCR’s trust guarantees. The agent gets frictionless connectivity and verifiable identity.

Conclusion

DCR and CIMD aren’t competitors. They answer different questions at different trust boundaries. DCR answers “should this agent be allowed to exist in my system?” CIMD answers “how does this agent introduce itself to a server it’s never met?”

Use DCR inside your enterprise where you need the AS to gatekeep. Use CIMD at the edges where your agents meet the open world. Most teams will end up running both.

The consent patterns from my previous post still apply here. Per-agent scope ceilings, incremental consent, standing vs. task-scoped authorization: all of that operates at the DCR layer, governing what Alice approves and how those approvals evolve over time. CIMD operates one layer below, solving the connectivity problem so your agents can actually reach the servers where those tokens need to work without friction of registration.

Designing Workloads for Kinetic Resilience

2026-05-02T00:00:00+00:00

Designing Workloads for Kinetic Resilience

Early in my career, I worked for a construction equipment company in Peoria, Illinois. Every week, I drove to a facility in Mossville, Illinois to rotate backup tapes. The first time I made that drive, I expected a data center. What I found was a factory floor, rows of diesel engines painted bright yellow, lined up in neat formation. The “computer room” sat underground, accessed through a walkway cut into the middle of the factory. Next to it was a safe room, built for the people working above.

I grew up in the South of India known for its dry and tropical and predicatable climate and I had never experienced a tornado. My only reference to a Tornado was the movie Twister. But Mossville sits in central Illinois, where tornadoes are not a hypothetical risk. They are a recurring one. The factory’s designers understood this. They placed the computer room underground, beside the shelter, because they recognized that protecting digital infrastructure meant accounting for physical destruction.

That underground computer room was our disaster recovery facility. It was not elegant. But it reflected a design principle that remains relevant today. Infrastructure must survive the loss of the building above it.

Decades later in 2026, the threat landscape has changed. Data centers no longer face only tornadoes, floods, or earthquakes. Recent geopolitical events have exposed a different category of risk i.e. deliberate, targeted, physical attacks on digital infrastructure. Drone strikes have disabled power stations in Ukraine. Undersea cables have been severed in the Baltic Sea. Governments are reassessing the physical vulnerability of facilities they once considered secure.

The question is no longer whether a data center can withstand an equipment failure or a natural disaster. The question is whether your workloads can survive the intentional destruction of the facility that hosts them.

This post examines what it means to design for that scenario.

The Problem, Infrastructure Built for Accidents, Not Attacks

Data center design has been driven by a single objective for decades. Maintain availability in the face of failure. The industry has built mature approaches to achieve this through redundant power, backup cooling, multiple availability zones, automated failover, and cross-region replication. These practices work. But they share a common assumption that the facility itself continues to exist.

Kinetic threats break that assumption. A drone strike does not cause a recoverable hardware fault. A missile does not trigger a graceful failover. The destruction of a facility can be permanent, immediate, and total.

Since 2022, military strikes have destroyed power infrastructure that data centers depended on. Undersea cables have been severed. Attacks on maritime chokepoints have threatened cable routes carrying an estimated 17% of global internet traffic. And in 2026, drone strikes directly damaged cloud data center facilities for the first time, marking the moment kinetic threats moved from adjacent infrastructure to the cloud infrastructure itself. The World Economic Forum responded by calling for digital infrastructure to be treated as critical infrastructure on par with power grids and water systems (source).

The Limits of Traditional Availability Models

Traditional high availability works through layered redundancy. Redundant power, cooling, and network paths at the facility level. Load balancing and automated instance replacement at the application level. Multiple availability zones at the regional level. A 2025 Uptime Institute report found that 55% of data center outages were power-related, and the majority were resolved within hours through existing redundancy. The model works because it addresses the failure modes that occur with the highest frequency.

The gap appears when you examine the assumptions underneath. Availability zones within a single region are typically located within the same metropolitan area, within 100 kilometers of each other. They share the same power grid, the same internet exchange points, and the same political jurisdiction. A localized natural disaster can affect one zone while sparing others. A coordinated physical attack does not respect availability zone boundaries. If three zones sit within the same city and that city becomes a conflict zone, all three are at risk simultaneously.

Multi-region architectures reduce this exposure, but a majority of organizations default to a primary-secondary model where one availability zone in a region handles writes and the other serves as a warm standby. In a kinetic scenario, the loss of the primary region means the loss of all data not yet replicated, and a failover process that has never been tested under real conditions.

The result is a gap between what organizations believe their architecture can survive and what it actually can. Closing that gap is what kinetic resilience is about.

From Availability to Kinetic Resilience

Traditional resilience asks, “What happens when a component fails?” Kinetic resilience asks a different question. “What happens when the facility no longer exists?”

Component failure is temporary and recoverable. Facility destruction can be permanent and total. The recovery playbook for the first scenario does not apply to the second. There is no hardware to replace. There is no facility to restore into.

Kinetic resilience accounts for three scenarios that traditional models treat as edge cases. The permanent loss of a facility. The prolonged inaccessibility of an entire region due to conflict, government shutdown, or sustained infrastructure damage. And the disruption of external dependencies that facilities rely on to function, particularly power grids, fuel supply chains, and network interconnects.

The objective is not to prevent damage. The objective is to ensure that the destruction of any single facility, or even an entire region, does not cause a corresponding destruction of the services running on it. Resilience becomes a property of the workload, not the building. The workload survives because it was designed to exist independently of any single location.

Architectural Strategies for Kinetic Resilience

If resilience is a property of the workload rather than the facility, then the architecture must reflect that. Four strategies form the foundation.

Geographic Distribution Beyond Availability Zones

Multi-region architectures are the strongest foundation for kinetic resilience. By distributing workloads across geographically separated regions, they isolate failures and reduce the blast radius of any single event. For workloads that face kinetic risk, multi-region is not optional. It is the starting point.

The next step is ensuring that the regions themselves are distributed across boundaries that matter. Regions that do not share the same power grid, the same government jurisdiction, or the same geopolitical risk profile provide stronger isolation than regions clustered within a single country. A workload running in three regions across two continents is harder to disable through physical attack than one running in three availability zones within the same metropolitan corridor.

The trade-off is latency. Synchronous replication across regions is impractical for latency-sensitive applications. Workloads that require strong consistency need conflict resolution mechanisms, eventual consistency models, or partitioned write domains that allow each region to operate independently while reconciling state asynchronously.

The other trade-off is regulatory. Data sovereignty laws require that certain categories of data remain within national or regional boundaries. Kinetic resilience does not require ignoring these constraints. It requires designing around them by separating the data layer from the compute and control layers. Regulated data stays within the required jurisdiction. The compute layer distributes across broader boundaries so the system continues to operate even if one jurisdiction’s facilities are compromised.

Elimination of Single Points of Failure

No individual facility can be essential to the operation of the whole system. Single points of failure hide in unexpected places. A “multi-region” deployment where all DNS is managed from one provider in one jurisdiction. A primary write database that has never been promoted under load. A secrets manager or identity provider that runs in one location.

Eliminating these requires a systematic audit of every dependency in the stack. Each dependency must answer one question. If the facility hosting this component is destroyed in the next ten minutes, does the system continue to function? The implementation is active-active deployments where each site handles full production traffic, multi-writer databases or partitioned ownership models, and stateless service components wherever possible. A facility loss should result in reduced capacity, not systemic failure.

Graceful Degradation

A system that loses 30% of its infrastructure is not down. It is constrained. Graceful degradation means the system sheds non-essential functions to preserve essential ones. A financial platform disables analytics while maintaining transaction processing. A communications platform reduces video quality while maintaining voice and text.

This requires explicit decisions about service priority, made before the crisis, not during it. Every service needs a tier assignment. Load shedding strategies, circuit breakers, and feature flags become operational necessities. The data layer must also handle partial failure by serving requests with potentially stale data rather than returning errors, which means designing for eventual consistency from the start.

Independence from External Infrastructure

A data center that survives a military strike but loses power three hours later because the regional grid was also targeted has not achieved kinetic resilience. Physical disruptions affect the surrounding infrastructure, including power generation, fuel delivery, cooling water supply, and network connectivity.

Facilities in high-risk environments need independent power generation, cooling systems that operate without municipal water and power, and diverse network paths with multiple upstream providers and satellite backup links. Full independence is not feasible indefinitely. But extending the operational window from hours to days can mean the difference between a managed failover and a catastrophic loss.

Physical Hardening and Countermeasures

Architecture alone does not address the full threat surface. Physical hardening reduces the likelihood and severity of facility loss. Data center security has historically focused on the perimeter, with fences, bollards, and biometric access control protecting against ground-level entry. The roof has received far less attention, and the drone era has exposed that gap. FPV drones attack from above, cost a few hundred dollars, and are increasingly autonomous. Countermeasures proven in active conflict zones, including anti-drone netting, slat armor over rooftop equipment, electronic countermeasures, and visual obscuration, can be adapted for data center rooftops (source).

Structural reinforcement such as blast-resistant construction and subterranean placement further reduce vulnerability. The Mossville computer room from the opening of this post is an early example, placing infrastructure underground eliminates the roof as an attack surface entirely.

Physical hardening buys time, but it does not eliminate risk. It does not address regional disruptions that affect multiple facilities simultaneously, and it creates a false sense of security if the workload architecture underneath still has single points of failure. Hardening protects the facility. Architecture protects the workload. The correct approach is to pair both, harden the facility to reduce risk and architect the workload to absorb loss.

Testing and Validation

An architecture designed for kinetic resilience is only as credible as the scenarios it has been tested against. Conventional DR testing simulates the loss of a component, verifies failover, and records a pass. It does not validate that the system can survive the permanent destruction of a facility.

Kinetic resilience testing requires three additional scenarios, incorporated into the DR program organizations already operate.

Permanent facility loss. Remove a facility from the system entirely. Offline the DNS system supporting the region, withdraw its network routes, mark its data stores as permanently unavailable. The question is whether the system reaches a stable operating state and sustains production load for 48 to 72 hours without it.

Extended regional outage. Simulate a full region offline condition. A 30-minute failover test does not reveal the challenges that emerge at hour 12 or hour 48, such as certificate expirations, token refreshes, cache warming, and the human fatigue of operating in degraded mode.

Cascading dependency failure. This is very difficult scenario to test but nevertheless important. Simulate a facility loss while simultaneously disabling the network paths and monitoring system that would normally alert the on-call team. This surfaces the hidden dependencies and circular alerting paths that architecture reviews miss.

These scenarios belong inside existing DR programs, not alongside them. The investment is in expanding the scope of what gets tested, not in building a new process. And the tests must include the human response. The on-call engineer will be operating under stress, with incomplete information, without access to tools hosted in the destroyed facility. If the team cannot stabilize the system under those conditions, the architecture is not kinetically resilient regardless of what the design documents say.

Conclusion

Not every system needs kinetic resilience. Traditional high availability with regular backups remains appropriate for the majority of applications. This level of investment is justified when the consequences of failure extend beyond the organization itself, in systems like critical infrastructure, financial platforms, government services, and large-scale cloud providers where the impact of a facility loss reaches millions of downstream users.

The Shift in Perspective

The underground computer room in Mossville was built by engineers who understood that the building above it could be destroyed by a tornado. They did not try to make the building tornado-proof. They placed the infrastructure where the tornado could not reach it.

That same principle applies today, at a different scale and against a different threat. Traditional availability practices remain necessary, but they are no longer sufficient for systems that face deliberate physical threats. Kinetic resilience builds on that foundation by shifting the unit of resilience from the facility to the workload, from the building to the system.

Harden the facility to reduce risk. Architect the workload to absorb loss. Test under realistic conditions. And design every system to answer one question. If this building is gone tomorrow, does the service continue?

Part-2: Who Said Yes? Designing User Consent for AI Agents

2026-04-21T00:00:00+00:00

In the previous post, Alice had a token with exactly the right scopes, and reporting-agent exchanged it for a narrower delegated token before calling downstream services. The whole flow assumed that first token already existed and already carried the right scopes.

This post rewinds to the step before that. How did Alice actually authorize reporting-agent to act for her? And what changes when she adds a second agent, coding-agent, that needs a completely different set of permissions? If consent is wrong here, every downstream token carries the mistake with it.

The OAuth 2.0 authorization code flow was designed for a specific scenario: a human sitting at a browser, reviewing a consent screen for a single application, at one moment in time. “ExampleApp wants to read your reports. Allow or deny?”

Agents break three of those assumptions.

Agents can be long-lived. Alice approves reporting-agent once and it runs for months, often without her watching.

Agents can accumulate capabilities. Coding-agent might start out reading code, then later need to open pull requests, then later need to trigger deployments. Each of those is a different scope.

Agents come in populations. Alice doesn’t use one agent. She uses several, each with different purposes, different risk profiles, and different permission needs. The standard consent screen gives her no way to tell reporting-agent from coding-agent from the dozen other agents her team has rolled out.

Fixing this doesn’t require a new protocol. It requires using OAuth more deliberately.

Register Each Agent as Its Own Client

The first fix is the most important: reporting-agent and coding-agent each get their own OAuth client registration. Not a single shared “agent platform” client that every agent authenticates through.

This matters for four reasons.

Onboarding is automated. RFC 7591 Dynamic Client Registration (DCR) lets an agent platform register new clients programmatically when a new agent is deployed. You attach metadata (owner, agent type, declared capabilities, lifecycle state) and treat the client registry as your source of truth for what agents exist and what they are allowed to ask for.

Revocation is scoped. If coding-agent is compromised, you revoke its client registration and every token tied to it. Reporting-agent keeps working. Alice doesn’t get logged out.

Scope ceilings are independent. Reporting-agent’s client is registered with a maximum scope set of reports:read. Coding-agent’s client has code:read code:write. Neither can ever request a scope it wasn’t registered for, regardless of what Alice approves at runtime.

Audit attribution is clean. Every log line carries the specific client ID of the agent that made the call, not a shared identifier that spreads attribution across the whole fleet.

A registered agent client looks roughly like this:

{
  "client_id": "reporting-agent-prod",
  "client_name": "Reporting Agent",
  "grant_types": ["authorization_code", "refresh_token"],
  "scope": "reports:read",
  "agent_metadata": {
    "owner": "data-platform-team",
    "agent_type": "autonomous-reporting",
    "capability_version": "2.3.0"
  }
}

The agent_metadata block is a custom extension. IdPs like Entra ID, Okta, and Cognito let you attach arbitrary metadata to client registrations, and it becomes useful later for policy decisions and incident response.

With per-agent clients in place, each agent runs its own authorization code flow. Alice sees a distinct consent screen for each one, and grants a distinct set of scopes.

For reporting-agent, the authorization request looks like this:

GET /authorize
  ?response_type=code
  &client_id=reporting-agent-prod
  &redirect_uri=https://agents.example.com/callback
  &scope=reports:read
  &state=xyz123

Alice sees “Reporting Agent wants to read your reports” and approves. She gets a refresh token tied to reporting-agent’s client ID with a scope ceiling of reports:read.

For coding-agent, she runs a separate flow. Different client ID, different scope set (code:read code:write), different consent screen, different refresh token.

The key idea: the scope Alice approves at this step is the ceiling, not the per-call scope. The refresh token she grants reporting-agent carries reports:read as its maximum. When the agent later calls the token exchange service (as described in the previous post), the exchange narrows the scope further based on the specific downstream service being called. Alice’s consent sets the upper bound; the token exchange sets the actual permission on each call.

This separation is important. Alice is not approving every individual API call. She is approving a bounded capability, and trusting the delegation chain to narrow things appropriately.

Agents change. Six weeks after Alice first approved coding-agent, the agent’s capabilities expand. It now needs deployments:trigger to push code through to staging. Alice’s existing refresh token has code:read code:write as its ceiling and cannot cover the new scope.

You have two options.

Prompt for the delta: The agent initiates a new authorization request that includes only the new scope. The consent screen shows Alice what is changing: “Coding Agent is requesting a new permission: trigger deployments.” She approves, and the refresh token is upgraded, or a second token is issued alongside the first.

GET /authorize
  ?response_type=code
  &client_id=coding-agent-prod
  &scope=deployments:trigger
  &prompt=consent
  &state=abc456

Force full re-consent: For sensitive scope escalations, anything that moves the agent from read to write or touches production systems, requiring a fresh grant from scratch makes the decision visible rather than incremental. The UX cost is real, but so is the risk of scope creep through small, easily-approved increments.

A defensible policy: Allow delta consent for same-tier scopes, force full re-consent when crossing a sensitivity boundary (read to write, non-prod to prod, internal to external data). Record the consent decisions with timestamps and scope deltas so you can reconstruct how an agent’s permissions evolved.

Standing Authorization vs. Task-Scoped Authorization

Consent comes in two shapes, and agentic platforms need both.

Standing authorization is the default most teams reach for. Think of it like setting up ACH autopay for your homeowners association (HOA) dues. You authorize the HOA once to pull a fixed amount from your bank account every month. The payments run on schedule without you approving each one. You set the ceiling (the monthly amount), and the HOA operates within it indefinitely until you revoke the mandate. That is exactly how standing authorization works for agents. Alice grants reporting-agent a refresh token valid for 90 days. The agent runs on a schedule, exchanges the refresh token for short-lived access tokens, and does its work without Alice being involved. This is the right model when the agent’s task is ongoing and the scope is stable.

Task-scoped authorization is narrower. Think of the one-time password your bank sends to your phone when you initiate a wire transfer. The OTP is bound to that specific transaction, expires in minutes, and cannot be reused for a second transfer. You need a fresh code each time. That is task-scoped authorization. Alice is in a chat session with coding-agent and asks it to deploy a specific branch to staging. The agent requests a grant bound to this session and this task: short TTL, single-use refresh, tied to a session ID in the grant metadata. When the session ends, the grant is dead. This is the right model for high-risk, user-present actions where standing authority would be excessive.

The two compose. The coding-agent might hold a standing grant for code:read code:write and request task-scoped grants on top of it for sensitive operations like deployments:trigger. The standing grant handles the common case; the task-scoped grant handles the exception that needs a fresh “yes” from Alice.

Comparing the Two Client Models

Dimension	Shared client for all agents	Per-agent client registration
Revocation granularity	All-or-nothing (affects every agent)	Per-agent (isolated blast radius)
Scope ceiling	Union of all agent needs (over-broad)	Tailored per agent (least-privilege)
Audit attribution	Shared client ID in every log	Distinct client ID per agent
Onboarding cost	Low (one-time setup)	Moderate (DCR automation required)
Compromise blast radius	Every agent that shares the client	One agent only

Conclusion

The delegation pattern from the previous post is only as strong as the consent that seeds it. If every agent shares a client, the downstream token exchange has nothing meaningful to narrow from. If consent is granted once and never revisited, scope ceilings drift away from what Alice actually intended. If standing and task-scoped authorization are treated as the same thing, you end up over-authorizing routine work or under-authorizing sensitive actions.

Per-agent client registrations, explicit scope ceilings, incremental consent for evolving capabilities, and a clear line between standing and task-scoped grants give Alice real control and give your security team something defensible when someone asks how an agent came to hold the permissions it did.

There is a third case where agents running with no user present at all, like a scheduled agent that triggers at 2am. Standing authorization gets you partway there, but the model starts to strain when the human is fully out of the loop. Watch out for the next post in this series.

Part-1: Who Called That API? Why AI Agents Need Delegation, Not Impersonation

2026-04-13T00:00:00+00:00

Part-1: Who Called That API? Why AI Agents Need Delegation, Not Impersonation

When an AI agent accesses a service on behalf of a user, who shows up in the audit log? If the answer is just the user or just the agent, you have a gap that will surface during your next security review.

Most agentic platforms today use one of two flawed patterns: the agent impersonates the user (forwarding their token directly), or the agent authenticates as itself with no link to the user who initiated the request. Both patterns break down when you need to answer the question every incident responder asks, who did what? When AI agents are acting on behalf of a human user or another agent, you also need to ask another question. Through which system did they do it?

This post explains why delegation with dual identity is the correct model for agentic systems, and how OAuth 2.0 Token Exchange (RFC 8693) provides the standard to implement it.

The Problem with Impersonation

In the impersonation model, the agent receives the user’s access token and presents it directly to downstream services. The service sees the user’s identity and grants access based on the user’s permissions.

The token that reaches the downstream service looks like this:

{
  "iss": "https://idp.example.com",
  "sub": "alice@example.com",
  "scope": "reports:read tickets:read tickets:write",
  "aud": "https://api.example.com",
  "exp": 1743120000
}

This creates three problems.

First, the downstream service cannot distinguish between Alice calling the API directly and an agent calling it on her behalf. The audit trail shows alice@example.com for both. During an incident, you cannot determine whether a human or an automated agent performed a specific action.

Second, the agent inherits all of Alice’s permissions. If Alice has tickets:write scope but the agent only needs reports:read, the agent still carries the full scope set. A compromised or misbehaving agent can exercise permissions it was never intended to use.

Third, revoking the agent’s access requires revoking Alice’s token, which locks Alice out of every system, not just the agent.

Delegation: The Agent Authenticates as Itself

In the delegation model, the agent does not forward the user’s token. Instead, it presents both the user’s token and its own identity to a token exchange service. The service issues a new token that carries both identities: who authorized the action (the user) and who performed it (the agent).

The delegated token carries dual identity:

{
  "iss": "https://idp.example.com",
  "sub": "alice@example.com",
  "act": {
    "sub": "reporting-agent"
  },
  "aud": "https://api.example.com",
  "scope": "reports:read",
  "exp": 1743120300
}

The sub claim identifies Alice as the authorizing user. The act.sub claim identifies the agent that performed the action. The aud claim restricts this token to a specific downstream service. The scope is narrowed to only what the agent needs for this particular call.

The downstream service now has everything it needs: authorize based on sub, attribute the action to act.sub, reject the token if the audience does not match, and log both identities for audit.

How RFC 8693 Makes This Work

RFC 8693 defines OAuth 2.0 Token Exchange, a standard protocol for exchanging one security token for another. It is the mechanism that turns impersonation into delegation.

The exchange takes two inputs:

A subject_token: the user’s JWT from the identity provider
An actor_token: the agent’s workload identity credential

The token exchange service validates both tokens and issues a new token with narrowed scope. The authorization policy determines how scopes are narrowed. A common pattern for agentic platforms is to compute the intersection of user permissions, agent permissions, and service requirements, ensuring the delegated token never exceeds what any single party allows. The resulting token is scoped to a specific downstream service audience.

Here is what the token exchange request looks like:

POST /token HTTP/1.1
Content-Type: application/x-www-form-urlencoded

grant_type=urn:ietf:params:oauth:grant-type:token-exchange
&subject_token=eyJhbGciOiJSUzI1Alice...   (Alice's JWT)
&subject_token_type=urn:ietf:params:oauth:token-type:jwt
&actor_token=eyJhbGciOiJSUzI1Agent...     (Agent's credential)
&actor_token_type=urn:ietf:params:oauth:token-type:jwt
&audience=https://api.example.com
&scope=reports:read

The response is a new token with the dual-identity structure shown above. The token exchange service can enforce several constraints:

Scope narrowing: the issued token’s scope can be limited to only the permissions required for the target service. An agent declared with reports:read cannot obtain a token with tickets:write, even if the user has that permission.
Audience restriction: each token is bound to a single downstream service via the aud claim. A token minted for the reporting service is rejected by the ticketing service. This limits blast radius if a single service is compromised.
Short TTL: delegated tokens are issued with short expiration times (typically 5 minutes). The agent caches them and re-requests transparently when they expire. If the agent is compromised, the window of exposure is limited to the token expiry.

Comparing the Models

Dimension	Impersonation	Delegation (RFC 8693)
Audit trail	User identity only	User + agent identity
Scope control	Full user permissions	Intersection of user, agent, and service scopes
Token audience	Broad (all services)	Per-service restriction
Revocation	Revoke user token (locks out user)	Revoke agent identity (user unaffected)
Incident response	Cannot distinguish human from agent actions	Full attribution chain
Token lifetime	Matches user session (minutes to hours)	Short-lived (5 minutes), auto-refreshed

Conclusion

If you are building an agentic platform where AI agents call services on behalf of users, the identity model you choose determines whether your audit trails hold up under scrutiny.

Impersonation is simpler to implement. Forward the user’s token and move on. But it creates a blind spot in every audit log, grants agents more permissions than they need, and couples agent lifecycle to user credentials.

Delegation with RFC 8693 requires a token exchange service and per-service audience management. The operational overhead is higher. But it gives you individual attribution on every call, least-privilege enforcement at the token level, and independent revocation of agent and user identities. For security-sensitive environments where compliance, auditability, and least-privilege access are requirements, RFC 8693 token exchange is the foundation to build on.

Understanding OAuth Authentication in Amazon Bedrock AgentCore: A Deep Dive

2025-12-23T00:00:00+00:00

Understanding OAuth Authentication in Amazon Bedrock AgentCore: A Deep Dive

Introduction

Amazon Bedrock AgentCore introduces a sophisticated authentication pattern that enables AI agents to securely access external services on behalf of users. This architecture implements a dual authentication pattern that separates inbound authentication (who can call your agent) from outbound authentication (how your agent accesses external services).

In this post, we’ll explore how this OAuth flow works, why it’s designed this way, and walk through a complete authentication cycle with detailed diagrams.

The Dual Authentication Pattern

Traditional applications typically implement authentication in one direction: users authenticate to access the application. But AI agents introduce a new challenge: the agent itself needs to authenticate to external services on behalf of the user.

Bedrock AgentCore solves this with two separate authentication layers:

1. Inbound Authentication (User → Agent)

Controls who can invoke your agent runtime. Uses JWT tokens validated against a Cognito User Pool.

2. Outbound Authentication (Agent → External Services)

Controls how your agent accesses external services. Uses OAuth 2.0 with user federation to act on behalf of the authenticated user.

Architecture Overview

Two Cognito Pools: Why?

At first glance, using two separate Cognito User Pools might seem like unnecessary complexity. However, this architectural decision is fundamental to implementing secure, scalable AI agents that can access external services on behalf of users. The key insight is that authenticating who can invoke your agent is conceptually different from authenticating which external services your agent can access. By separating these concerns into two distinct authentication layers, we achieve better security isolation, clearer audit trails, and the flexibility to integrate with multiple identity providers without coupling them together. Think of it as having two separate security checkpoints: one at the entrance to your building (who can use the agent) and another at specific rooms inside (which external services the agent can access).

The architecture uses two separate Cognito User Pools, each serving a distinct purpose:

Runtime Pool (Inbound Authentication):

Purpose: Authenticate callers who invoke the agent
Users: Application users
Flow: User → JWT Token → Runtime validates token
Think of it as: “Who can talk to my agent?”

Identity Pool (Outbound Authentication):

Purpose: Store credentials for external services
Users: Service accounts
Flow: Agent → Request token → External service validates
Think of it as: “What can my agent access?”

graph TB subgraph "InboundAuth Cognito Pool" R1[Cognito Pool - InboundAuth] R2[Authenticates: Callers] R4[Validates: Inbound JWT tokens] end subgraph "OutboundAuth Cognito Pool" I1[Cognito Pool - OutboundAuth] I2[Authenticates: External Services] I4[Provides: OAuth tokens for agents] end User[End User] -->|Authenticate| R1 R1 -->|JWT| Runtime[Agent Runtime] Runtime -->|Execute| Agent[Agent Code] Agent -->|Need token| Vault[Token Vault] Vault -->|OAuth flow| I1 I1 -->|Access token| Vault style R1 fill:#4dabf7 style I1 fill:#fab005 style Runtime fill:#51cf66

This separation provides several benefits:

Security: Compromising user credentials doesn’t expose service credentials providing isolation
Scalability: Different user pools can scale independently of each other
Flexibility: Can integrate multiple external identity providers if you choose to do so
Audit: Clear separation between user actions and service actions

The Complete OAuth Flow

Let’s walk through a complete authentication cycle, from initial invocation to cached token usage.

sequenceDiagram participant User participant Browser participant CLI as AgentCore CLI participant Runtime as Bedrock AgentCore
Runtime participant Agent as Agent Code
(agent.py) participant TokenVault as Identity Token Vault participant RuntimeCognito as Cognito Pool - InboundAuth participant IdentityCognito as Cognito Pool - OutboundAuth rect rgb(200, 230, 255) Note over User,RuntimeCognito: PHASE 1: INBOUND AUTHENTICATION User->>CLI: 1. Get authentication token CLI->>RuntimeCognito: 2. Username/Password RuntimeCognito-->>CLI: 3. JWT Access Token Note right of CLI: Token contains:
• client_id
• username
• scopes
• expiry CLI->>User: 4. Return JWT token end rect rgb(255, 230, 200) Note over User,Runtime: PHASE 2: INVOKE AGENT User->>CLI: 5. Invoke agent with JWT CLI->>Runtime: 6. InvokeAgentRuntime API call Runtime->>RuntimeCognito: 7. Validate JWT against OIDC discovery RuntimeCognito-->>Runtime: 8. ✅ Token Valid Runtime->>Agent: 9. Execute agent code end rect rgb(230, 255, 230) Note over Agent,IdentityCognito: PHASE 3A: OUTBOUND AUTH - FIRST CALL Agent->>Agent: 10. Function with @requires_access_token Note right of Agent: Decorator parameters:
• provider_name
• callback_url
• auth_flow: USER_FEDERATION
• scopes: [openid] Agent->>TokenVault: 11. Request token for provider TokenVault->>TokenVault: 12. Check cache → Not found TokenVault->>IdentityCognito: 13. Initiate OAuth authorization IdentityCognito-->>TokenVault: 14. Authorization URL TokenVault-->>Agent: 15. Return auth URL Agent->>Agent: 16. on_auth_url callback fires Agent-->>Runtime: 17. Response: "Authorization Required" Runtime-->>CLI: 18. Forward response CLI-->>User: 19. Display authorization URL end rect rgb(255, 240, 230) Note over User,IdentityCognito: PHASE 3B: USER AUTHORIZATION User->>Browser: 20. Opens authorization URL Browser->>TokenVault: 21. GET authorization endpoint TokenVault->>IdentityCognito: 22. Redirect to Cognito login User->>Browser: 23. Enter credentials Browser->>IdentityCognito: 24. Submit authentication IdentityCognito->>IdentityCognito: 25. Validate credentials IdentityCognito->>IdentityCognito: 26. Generate authorization code IdentityCognito->>Browser: 27. Redirect with auth code Browser->>TokenVault: 28. Callback with code end rect rgb(240, 230, 255) Note over TokenVault,IdentityCognito: PHASE 3C: TOKEN EXCHANGE TokenVault->>IdentityCognito: 29. Exchange code for tokens Note right of TokenVault: Grant Type: authorization_code
Client credentials included IdentityCognito-->>TokenVault: 30. Return tokens Note right of IdentityCognito: Returns:
• access_token
• id_token
• refresh_token TokenVault->>TokenVault: 31. Cache tokens by session_id TokenVault->>Browser: 32. Redirect to callback_url end rect rgb(230, 255, 255) Note over User,IdentityCognito: PHASE 4: SUBSEQUENT CALLS (Cached) User->>CLI: 33. Invoke agent again (same session) CLI->>Runtime: 34. InvokeAgentRuntime with JWT Runtime->>Runtime: 35. Validate JWT (cached) Runtime->>Agent: 36. Execute agent code Agent->>TokenVault: 37. Request token TokenVault->>TokenVault: 38. Check cache → ✅ Found! TokenVault-->>Agent: 39. Return cached access_token Note right of Agent: Decorator injects token
directly into function Agent->>Agent: 40. Execute function with token Agent-->>Runtime: 41. Success response Runtime-->>CLI: 42. Forward response CLI-->>User: 43. Display result end

Understanding the Flow: A Simplified Walkthrough

The sequence diagram above shows the complete technical flow, but let’s break it down into simple, digestible steps. Think of this as a story with four chapters: getting your ticket to use the agent, using the agent, getting permission for external access, and then enjoying fast subsequent access.

Chapter 1: Getting Your Ticket to Talk to the Agent (Steps 1-4)

Before you can ask your agent to do anything, you need to prove who you are. This is the inbound authentication step.

Step 1: You ask for credentials You run a command like agentcore identity get-cognito-inbound-token. Think of this as walking up to a ticket booth and asking for admission.

Step 2: System checks your identity Your username and password are sent to the Runtime Cognito Pool. This is like showing your ID to the ticket seller.

Step 3: System gives you a JWT token If your credentials are valid, you receive a JWT (JSON Web Token). This token is like a concert ticket or an all-access pass - it proves you’re allowed to invoke the agent. The token contains important information:

Your client ID (which application you’re using)
Your username (who you are)
Scopes (what you’re allowed to do)
Expiry time (typically 1 hour)

Step 4: You hold onto your ticket You’ll use this JWT token every time you talk to the agent. you’ll need it for every invocation.

Chapter 2: Talking to the Agent (Steps 5-9)

Now that you have your JWT ticket, you can actually invoke the agent. This is still part of inbound authentication - proving you have the right to use the agent.

Step 5: You show your ticket and make a request You run: agentcore invoke '{"prompt": "Check my external account"}' --bearer-token You’re essentially saying: “Here’s my ticket, please do this task for me.”

Step 6: Ticket gets validated The Bedrock AgentCore Runtime receives your JWT and needs to verify it’s legitimate. Just like a bouncer at a concert scanning your ticket.

Step 7: Runtime calls the ticket office The Runtime asks the Runtime Cognito Pool: “Is this JWT token real? Is it still valid? Has it expired?” This happens by checking against the OIDC discovery endpoint configured in your agent.

Step 8: Cognito confirms ✅ “Yes, this token is valid. This user is authorized to invoke the agent.” The signature is valid, the token hasn’t expired, and it was issued by the correct authority.

Step 9: Agent begins execution With authentication confirmed, the agent code starts executing with your request. Your prompt is passed to the agent, and it begins processing.

Chapter 3A: Agent Needs External Access - First Time (Steps 10-19)

Now we switch to outbound authentication. Your agent needs to access an external service on your behalf, but it doesn’t have permission yet.

Step 10: Agent encounters a protected function Your agent code calls a function decorated with @requires_access_token. This decorator is the key to the OAuth flow.

Step 11: Agent asks the Token Vault The decorator automatically asks: “Do I have an access token for this external service provider for this session?” The Token Vault is a secure storage system that caches OAuth tokens by session ID.

Step 12: Vault checks its cache The Token Vault looks up: Session ID → Provider → Token Result: ❌ “No token found. This is the first time this session is accessing this provider.”

Step 13: Vault initiates OAuth flow Since there’s no cached token, the Token Vault starts the OAuth 2.0 authorization code flow with the Identity Cognito Pool.

Step 14: External service creates authorization URL The Identity Provider (Identity Cognito) generates a special authorization URL. This URL contains:

Encrypted state (including your session ID)
Requested scopes (what permissions you’re asking for)
Callback URL (where to redirect after authorization)
Client ID (which application is requesting access)

Step 15: Vault returns the authorization URL to the agent Instead of a token, the Vault returns: “Authorization required - here’s the URL”

Step 16: Agent’s callback hook fires The on_auth_url callback you specified in the decorator triggers. This gives your code a chance to handle the authorization URL appropriately.

Step 17-19: Agent tells you authorization is needed The agent responds with a message like: “🔐 Authorization Required - Please open this URL in your browser to authorize: [URL]” The Runtime forwards this response, and you see it in your CLI. The ball is now in your court - you need to authorize the access.

Chapter 3B: You Grant Permission (Steps 20-28)

This is where you (the user) explicitly grant permission for the agent to access the external service on your behalf. This is the heart of USER_FEDERATION - you’re in control.

Step 20: You open the authorization URL You click the link (or copy-paste it into your browser). This opens the OAuth authorization flow in your web browser.

Step 21: Browser navigates to the authorization endpoint Your browser makes a GET request to the Token Vault’s authorization endpoint.

Step 22: Redirected to the login page The Token Vault redirects you to the Identity Cognito Pool login page. This is where you’ll authenticate to the external service.

Step 23: You enter your credentials You type in your username and password for the external service. In this demo, that’s the Identity Cognito user (like externaluser24a901fd). Important: These are different credentials from your Runtime Cognito credentials! You’re now proving you own the external account.

Step 24: You submit the login form Browser sends your credentials to the Identity Cognito Pool.

Step 25: Identity Cognito validates your credentials The external service checks: “Is this the correct password for this user?” If valid, it proceeds.

Step 26: Identity Cognito generates an authorization code Instead of giving you the actual access token directly, OAuth uses an intermediate step: an authorization code. This is a short-lived, one-time-use code that can be exchanged for tokens. Why a code? Security! The code is sent via the browser (less secure channel), but the actual tokens are exchanged server-to-server (more secure).

Step 27: Browser redirected with the authorization code Identity Cognito redirects your browser back to the Token Vault callback URL, including the authorization code in the URL parameters.

Step 28: Token Vault receives the code The Token Vault’s callback endpoint receives the authorization code. Now it’s ready for the final exchange.

Chapter 3C: Authorization Code Becomes Real Access (Steps 29-32)

The Token Vault now exchanges the temporary authorization code for real, usable access tokens. This happens server-to-server, away from the browser.

Step 29: Vault exchanges code for tokens The Token Vault makes a server-to-server call to Identity Cognito: “Here’s the authorization code. Please give me access tokens. Here’s my client secret to prove I’m authorized.”

This exchange uses the authorization_code grant type and includes:

The authorization code (from step 27)
Client ID (identifies your application)
Client secret (proves your application is legitimate)
Redirect URI (must match the original request)

Step 30: Identity Cognito returns three tokens The Identity Provider responds with a token bundle:

access_token: This is the golden ticket! Your agent uses this to make API calls to the external service. It’s typically valid for 1 hour.
id_token: A JWT containing claims about the user’s identity (who they are, when they logged in, etc.). Useful for displaying user information.
refresh_token: A long-lived token used to obtain new access tokens when they expire. The agent can use this automatically to refresh access without asking you to re-authorize.

Step 31: Vault caches the tokens The Token Vault stores all three tokens in its cache, indexed by:

Session ID (e.g., demo_session_ABC123)
Provider name (e.g., ExternalServiceProvider)

This cache means future requests in the same session won’t need re-authorization!

Step 32: Browser redirected to callback URL Your browser is redirected to the callback_url specified in the decorator (e.g., https://example.com/oauth/callback). In this demo, it’s a dummy URL that does nothing. In production, this would be your application’s URL that handles post-authorization logic (like showing a success message or closing the auth window).

Chapter 4: Subsequent Calls Are Lightning Fast! (Steps 33-43)

This is where you see the real benefit of OAuth token caching. The second time you invoke the agent in the same session, everything is already set up.

Step 33: You invoke the agent again You run the same command: agentcore invoke '{"prompt": "Check my account"}' --bearer-token Crucially, you’re using the same session ID as before.

Step 34: JWT validation (same as before) The Runtime still validates your JWT token - you still need to prove you’re authorized to invoke the agent.

Step 35: Validation is cached/fast The Runtime may have cached the JWT validation results, making this step very quick.

Step 36: Agent code executes Your agent code runs, and again encounters the function with @requires_access_token.

Step 37: Agent asks Token Vault for the token The decorator asks: “Do I have an access token for this provider and session?”

Step 38: Vault checks cache - SUCCESS! ✅ The Token Vault finds the cached token from Phase 3C: “Found it! Session demo_session_ABC123 → Provider ExternalServiceProvider → access_token: eyJraWQi...”

Step 39: Vault returns the cached access token The Token Vault immediately returns the access token. No authorization URL, no user interaction needed!

Step 40: Function executes with the token The decorator automatically injects the token into your function’s access_token parameter. Your function code runs with the token available:

async def get_identity_token(*, access_token: str) -> str:
    # access_token is already here! No OAuth flow needed!
    return access_token

Step 41: Agent completes successfully Your agent logic executes, possibly making API calls to the external service using the access token. It returns a success response.

Step 42: Runtime forwards the response The Bedrock AgentCore Runtime sends the response back to the CLI.

Step 43: You see the result You see: “✅ Authenticated to external service. Token length: 847 characters. Status: Active and cached for this session”

The entire flow from step 33 to 43 takes just milliseconds because everything is cached!

Why This Two-Step Process?

Now that we’ve walked through all four chapters of the OAuth flow, you can see how the architecture elegantly handles both authentication challenges. The first time through requires user interaction and multiple network calls, taking several seconds to complete. But subsequent invocations in the same session are blazingly fast because everything is cached - the JWT validation is quick, and the OAuth token is retrieved from memory rather than requiring another authorization flow. This pattern strikes a perfect balance between security (explicit user authorization) and user experience (fast, seamless subsequent operations). The question naturally arises: why go through this two-step process at all? Why not use a single token for everything? The answer lies in the fundamental separation of concerns between who can invoke your agent versus what your agent can access on your behalf.

Security Isolation: Compromising your inbound JWT doesn’t expose your external service credentials, and vice versa.
Different Lifetimes: Your JWT and OAuth tokens can expire independently and be refreshed separately.
Principle of Least Privilege: The agent only gets access to external services when you explicitly grant it.
Auditability: Clear separation between “who invoked the agent” (JWT) and “what external services were accessed” (OAuth).
Flexibility: You can revoke external service access without revoking agent access, or the other way around.

Why Cache Tokens?

Another design decision that might seem obvious in retrospect but is critical to understand is the token caching mechanism. Without caching, every single agent invocation would require a complete OAuth authorization flow - you’d need to click an authorization link, log in, and grant permission every time you ask your agent a simple question. This would make the system practically unusable. Token caching solves this by storing the OAuth access tokens (and refresh tokens) in memory, indexed by session ID and provider name. When your agent needs to access an external service, it first checks the cache: if a valid token exists, it’s used immediately; if not, the OAuth flow kicks in. This approach transforms the user experience from “authorize every request” to “authorize once per session,” while maintaining security through session isolation and token expiration. Let’s examine why this caching strategy is so important:

The caching in Phase 4 is crucial for user experience:

Performance: Token exchange is slow (involves multiple network calls and redirects). Caching makes subsequent calls 10-100x faster.
User Experience: Imagine having to click an authorization link every single time you ask your agent a question! Caching means you authorize once per session.
Rate Limiting: Many OAuth providers have rate limits on token exchanges. Caching reduces the number of authorization flows.
Security: Tokens are cached per session ID, ensuring isolation between different users and contexts.

Session Isolation: Why It Matters

A subtle but powerful aspect of the token caching architecture is session-based isolation. You might wonder: why not cache tokens globally per user, so that once you authorize, all future agent invocations by that user across any session can use the same token? While this would be more convenient, it would also create significant security risks. By tying tokens to specific session IDs rather than user identities, the system ensures that each invocation context is isolated from others. This means that if a session is compromised, only that session’s tokens are at risk - not all of the user’s access across all sessions. It also enables fine-grained control: you can revoke access for a specific session without affecting other active sessions, and audit logs can precisely track which session performed which action. Session isolation is the foundation that makes the entire caching mechanism both performant and secure.

Session-Based Token Caching

Token caching is tied to session IDs, not user IDs. This provides important security and isolation benefits:

graph TB subgraph "Session A: demo_session_ABC123" A1[First Invocation] -->|No token| A2[User authorizes] A2 --> A3[Token cached for Session A] A3 --> A4[Second Invocation] A4 -->|Token found| A5[✅ Use cached token] A5 --> A6[Subsequent calls fast] end subgraph "Session B: demo_session_XYZ789" B1[First Invocation] -->|Different session
No token| B2[User must authorize again] B2 --> B3[Token cached for Session B] B3 --> B4[Isolated from Session A] end subgraph "Token Vault Cache" Cache[Session → Token Map] Cache -.->|Lookup| A3 Cache -.->|Lookup| B3 end style A3 fill:#51cf66 style A5 fill:#51cf66 style B4 fill:#fab005

Notice that tokens are tied to your session ID (e.g., demo_session_ABC123). This is a critical security feature:

Different session = Different tokens: If you start a new session (new session ID), you’ll need to re-authorize. The previous session’s tokens aren’t accessible.
Multi-user safety: In a shared environment, User A’s tokens never leak to User B because they have different session IDs.
Granular control: You can invalidate a single session’s access without affecting other sessions.
Audit trail: Every action is tied to a specific session, making it easy to trace who did what.

Conclusion

AWS Bedrock AgentCore’s dual authentication pattern represents a thoughtful approach to one of the most challenging problems in AI agent development: how to enable agents to securely access external services on behalf of users while maintaining strong security boundaries, excellent user experience, and clear auditability. By separating inbound authentication (who can invoke your agent) from outbound authentication (what external services your agent can access), the architecture achieves the right balance between security and usability.

Threat Modeling an AI Inference Pipeline on AWS

2025-12-22T00:00:00+00:00

Why Threat Modeling Matters for GenAI

Generative AI inference pipelines introduce new attack surfaces beyond traditional web applications:

Prompt injection
Model abuse and data exfiltration
Over-privileged IAM roles
Supply chain risks in model artifacts

A structured threat model helps identify and mitigate these risks before production deployment.

Reference Architecture

The following architecture represents a common serverless GenAI inference flow on AWS.

graph TD User -->|HTTPS| CloudFront CloudFront --> WAF WAF --> API_Gateway API_Gateway --> Lambda Lambda -->|Invoke| Bedrock Lambda --> DynamoDB

AI for Security and Security for AI

2025-01-01T00:00:00+00:00

A practical, security-first walkthrough for deploying AI.

Ramesh’s Security and Technlogy Blogs

Can Your Agents Prove Their Identity Without a Central Authority

Part-1: Can your Agents prove their identity without a central authority?

The Problem

What Is Decentralized Identity?

Centralized Identity (LDAP, Active Directory)

Federated Identity (SAML, OAuth/OIDC)

Decentralized Identity (DIDs + Verifiable Credentials)

Components of Decentralized Identity

How the Trust Chain Works

Use Cases This Solves

Try It Yourself

What Is Next: Part 2, From Flask to AWS

JA4 Signatures: The fingerprint that bots find difficult to fake

JA4 Signatures: The fingerprint that bots find difficult to fake

What Happens Before Your App Even Sees a Request

What is JA4?

How JA4 Works — Under the Hood

Code Demo: Building a JA4 Fingerprint

Real-World Security Use Cases

Limitations and Considerations

Getting Started

What recent military conflicts teach us about Kinetic Resilience

What recent military conflicts teach us about kinetic resilience

The Roof Nobody Thought About

Lessons from the Battlefield

Netting as a First Line of Defence

Slat Armour for Critical Rooftop Equipment

Electronic Countermeasures

Visual Obscuration

Conclusion: The Cope Cage is Not the Strategy

Harden the facility to improve the odds. Architect the workload to survive regardless.

After Thoughts

Part-3: Who are you? How client Registration works in the Agentic World

Part-3: Who Are You? How Client Registration works in the Agentic World

The Registration Problem in Agentic Systems

Dynamic Client Registration (DCR): The Server-Managed Model

Why DCR Works for Agent Fleets

Where DCR Breaks Down

Client ID Metadata Documents (CIMD): The Client-Hosted Model

The CIMD Flow

How CIMD Prevents Impersonation

Confidential Clients with CIMD

When to Use DCR for Agents?

When CIMD is useful?

The Hybrid Model: DCR Internally, CIMD Externally

Security Trade-offs

DCR Risks

CIMD Risks

What’s Next: Software Statements and Platform Attestation

Conclusion

Designing Workloads for Kinetic Resilience

Designing Workloads for Kinetic Resilience

The Problem, Infrastructure Built for Accidents, Not Attacks

The Limits of Traditional Availability Models

From Availability to Kinetic Resilience

Architectural Strategies for Kinetic Resilience

Geographic Distribution Beyond Availability Zones

Elimination of Single Points of Failure

Graceful Degradation

Independence from External Infrastructure

Physical Hardening and Countermeasures

Testing and Validation

Conclusion

The Shift in Perspective

Part-2: Who Said Yes? Designing User Consent for AI Agents

Part-2: Who Said Yes? Designing User Consent for AI Agents

Why OAuth Consent Wasn’t Designed for Agents

Register Each Agent as Its Own Client

The Consent Grant: What Alice Actually Approves

Incremental Consent for Evolving Agents

Standing Authorization vs. Task-Scoped Authorization

Comparing the Two Client Models

Conclusion

Part-1: Who Called That API? Why AI Agents Need Delegation, Not Impersonation

Part-1: Who Called That API? Why AI Agents Need Delegation, Not Impersonation

The Problem with Impersonation

Delegation: The Agent Authenticates as Itself

How RFC 8693 Makes This Work

Comparing the Models