Nice project but there are several dozens of “AI/LLM gateways” now.. all kind doing the same thing. Kong AI gateway [1] was maybe the first to attack the LLM traffic governance and is indeed far ahead in both features and adoption. Trying to understand the value add and differentiator here, since it’s a problem kinda solved already.
There are a few critical differences. archgw is designed as a data plane for agents - handling and processing ingress and egress (prompt) traffic to/from agents. Unlike frameworks or libraries, it runs as a single process that includes edge functionality and task-specific LLMs, tightly integrated to reduce latency and complexity.
Second, it’s about where the project is headed. Because archgw is built as a proxy server for agents, it’s designed to support emerging low-level protocols like A2A and MCP in a consistent, unified way—so developers can focus purely on high-level agent logic. This borrows from the same design decision that made Envoy successful for microservices: offload infrastructure concerns to a specialized layer, and keep application code clean. In our next big release, you will be able to run archgw as a sidecar proxy for improved orchestration and observability of agents. Something that other projects just won't be able to do.
Kong was designed for APIs. Envoy was built for microservices. Arch is built for agents.
MCP implementation is trivial - I agree. But A2A will require a mesh like structure. Meaning its not just about north/south traffic. It will be about east/west traffic as agents coordinate with each other. That communication and coordination among agents will need to be robust and that's where a sidecar proxy built on top of Envoy will offer certain properties in a first-class way that Kong can't easily support today.
This was the insight behind Envoy's initial design. Handle north/south and east/west traffic equally well as a universal data plane.
fwiw, if I were evaluating these proxies against each other, I would be intrigued by the solution built by people from the Envoy team. Envoy is great software and I’m sure there are many lessons you took from building it.
It looks like you’re even building on Envoy as the foundation for the system which just makes it more compelling.
Its a core dependency for rate limiting, traffic shaping, fail over detection. Its cluster subsystem is super convenient for local LLM calls too. We'll write up a blog on the lessons because there were many. For example, for intelligent routing decisions we can't create an upstream connection to a cluster based on route paths or host - Envoy forces a more static binding. This doesn't work when you are making decisions about a prompt and have to inject more dynamic flow control.
We’re using proxy-wasm and compiling to wasm32-wasip1, then mounting the .wasm binaries into Envoy as HTTP filters via envoy.filters.http.wasm. The line you're referring to:
…is where the integration happens. There's no need to modify envoy.bootstrap.wasm; instead, Arch loads the WASM modules at runtime using standard Envoy config templating. The filters (prompt_gateway for ingress, and llm_gateway for egress sit in the request path and do things like prompt inspection, model routing, header rewrites, and telemetry collection.
What’s missing right now are our guides showing how well ArchGW integrates with existing frameworks and tools. But the core idea is simple: it offloads low-level responsibilities—like routing, safety, and observability—that frameworks like LangChain currently try to handle inside the app. That means less bloat and more clarity in your agent logic.
And importantly, some things just can’t be done well in a framework. For example, enforcing global rate limits across LLMs isn’t realistic when each agent instance holds its own local state. That kind of cross-cutting concern needs to live in infrastructure—not in application code.
ArchGW complements langchain rather than replacing it - langchain handles agent orchestration/reasoning while ArchGW provides the infrastructure layer for prompt processing, guardrails and routing across your entire system.
[https://github.com/Kong/kong]
Second, it’s about where the project is headed. Because archgw is built as a proxy server for agents, it’s designed to support emerging low-level protocols like A2A and MCP in a consistent, unified way—so developers can focus purely on high-level agent logic. This borrows from the same design decision that made Envoy successful for microservices: offload infrastructure concerns to a specialized layer, and keep application code clean. In our next big release, you will be able to run archgw as a sidecar proxy for improved orchestration and observability of agents. Something that other projects just won't be able to do.
Kong was designed for APIs. Envoy was built for microservices. Arch is built for agents.
And since everything is an API, Kong also supports MCP natively (among many other protocols, including all LLMs): https://konghq.com/blog/product-releases/securing-observing-...
This was the insight behind Envoy's initial design. Handle north/south and east/west traffic equally well as a universal data plane.
It looks like you’re even building on Envoy as the foundation for the system which just makes it more compelling.
vm_config: runtime: "envoy.wasm.runtime.v8" code: local: filename: "/etc/envoy/proxy-wasm-plugins/prompt_gateway.wasm"
…is where the integration happens. There's no need to modify envoy.bootstrap.wasm; instead, Arch loads the WASM modules at runtime using standard Envoy config templating. The filters (prompt_gateway for ingress, and llm_gateway for egress sit in the request path and do things like prompt inspection, model routing, header rewrites, and telemetry collection.
And was pleased with what I was able to do. Thanks
And importantly, some things just can’t be done well in a framework. For example, enforcing global rate limits across LLMs isn’t realistic when each agent instance holds its own local state. That kind of cross-cutting concern needs to live in infrastructure—not in application code.