Tackle MCP-Servers on Kubernetes with kmcp and AgentGateway
veröffentlicht am 19.12.2025 von Lucas Brüning
Understanding the model context protocol, its usage, functionality, how to secure it, and why it is about to become one of the fundamental building blocks in an AI-Agent architecture. Explore different ways and examples of how to set up MCP-Servers in Kubernetes with AgentGateway, kgateway and kmcp. Learn how to make different MCP-Servers enterprise-ready by making them provide enterprise features such as authentication, authorization, and scalability.
What is the “model context protocol” (MCP)
LLMs are gaining widespread adoption in our day-to-day lives. LLMs are good at generating and predicting text and even visuals, but lack the functionality to connect to external systems and alter them or use them to retrieve information. Thus, the model context protocol was created as a solution to establish a standard across all LLM providers for connecting their LLMs to external systems.
The model context protocol, or MCP, is a standard created by Anthropic back in late 2024. In the middle of 2025, it got an increasing adoption by OpenAI and Google. Its goal is to create a standardized way to let an LLM in an LLM-powered application access external systems. This enables your LLM to invoke functions and procedures (also referenced as “tool calls” in the MCP-Specification) or to retrieve information from different data sources, also called “resources” in the MCP-Specification. Although Anthropic, OpenAI, and other providers already provide a way to enable tool calling and resource retrieval, the MCP provides a standardized approach that works across different providers and frameworks in different applications.
The MCP works in a client-server architecture. It usually contains the following actors:
- Host
This is our AI application, for example, a text editor, a code editor, or any other application that uses LLM technologies. - LLM
This is the LLM that the application uses. Can be any LLM such as OpenAI’s ChatGPT or Anthropic’s Claude or even an on-premises LLM. - MCP-Client
The MCP-Client is basically the “pipe” your LLM, MCP-Server and host use for communicating with each other. It pipes your LLM requests to the LLM and the LLM invokes the tool calls through the MCP-Client. - MCP-Server
The server contains all the tools and resource definitions and the actual implementation and code of the different tools. The logic for fetching resources is also included here.
In the following chart, you can find a standard MCP flow where the LLM uses a specific tool of an MCP-Server to fulfill the task. In this case, we assume that the host and the client are part of the same application. But it is also possible that the client and the host are fully decoupled.

As you can see here, the LLM doesn’t execute the tool directly, because LLMs are only capable of generating text and visuals. They rather generate text to the client saying that the LLM would like to call a tool with specific parameters and then, after users’ approval, the client calls the MCP-Server to execute the actual tool code. The communication between all actors in the MCP specification is done via JSON-RPC, whereas MCP-Servers can communicate via the standard input and standard output or via Streamable HTTP.
AgentGateway
In essence, MCP-Servers are software that can run in a container, which is great for workflows in Kubernetes or containerized workflows in general. But this might not apply to any MCP server. In Kubernetes workflows, a pod needs to expose an API or provide some kind of network reachability to expose it via a service, ingress, or gateway. For some MCP-Servers, this might not be directly possible because they can only communicate via standard input and standard output, which cannot be used to provide network reachability. In addition to that, because the MCP is a quite new and emerging technology, not every MCP-Server might implement essential enterprise features, such as authentication, authorization, and observability. Using a large number of different MCP-Servers might also pose the challenge of maintaining a large number of servers with different endpoints in your host application.
These problems can be tackled with something called an “MCP-Gateway”, such as “AgentGateway”. This is an open-source gateway that is designed for AI-Agents and MCP-Servers. You can use it to extend your MCP-Servers with observability and security mechanisms to observe tool and resource interactions and calls or to guard your MCP-Servers with OAuth2. It can also provide a network transport layer for your stdio MCP-Servers, enabling them to communicate via Streamable HTTP and run properly on Kubernetes as well. Additionally, it supports the serving of LLMs and using the Agent-to-Agent (A2A) protocol, but these features won’t be covered in this blog.
To set up AgentGateway on Kubernetes, you need to use kgateway, an implementation of the Kubernetes gateway API. This can then be used to expose and serve your MCP-Severs on Kubernetes.
First, you need some kind of pod or deployment and a service for the MCP-Server, so it is reachable via the network. Then you need to create a so-called “Backend” of type MCP with kgateway. The examples use kgateway with version 2.2.0 and AgentGateway with version 0.10.5:
apiVersion: gateway.kgateway.dev/v1alpha1
kind: Backend
metadata:
name: mcp-backend
spec:
type: MCP
mcp:
name: mcp-server
targets:
- static:
name: the-mcp-server
host: mcp-service.default.svc.cluster.local
port: 12001
protocol: StreamableHTTPThis example uses a so-called “static” MCP backend where you need to provide the exact service hostname. Kgateway also supports “dynamic” MCP backends where the backend resource is attached to the service via label selectors.
Next, you need to create an HTTP route, a native resource of the Kubernetes Gateway API, which refers to our backend we just created:
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
name: mcp
spec:
parentRefs:
- name: agentgateway
rules:
- backendRefs:
- name: mcp-backend
group: gateway.kgateway.dev
kind: BackendAnd at last, you need to create the Gateway resource. In a cloud environment, a cloud load balancer will be created, which then exposes your gateway.
apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
name: agentgateway
spec:
gatewayClassName: agentgateway
listeners:
- protocol: HTTP
port: 8080
name: http
allowedRoutes:
namespaces:
from: AllNow your MCP-Server should be reachable on port 8080. You can use the /mcp endpoint to connect your agents via Streamable HTTP.
One current drawback is that the AgentGateway setup with kgateway does not support all features, such as wrapping stdio servers or properly setting up MCP-Authentication. This will most likely change in future releases of kgateway. It also does not support being deployed with the ingress API. To use these important features in Kubernetes, we will look at a tool called kmcp.
kMCP
Kmcp is a CLI tool that can be used for the development of custom MCP-Servers with Go or Python. But it can also be used to deploy and set up MCP-Servers on Kubernetes. It is possible to deploy your developed MCP-Server by building a container image and deploying it directly to Kubernetes with one command. When deploying the MCP-Servers with kmcp, it uses AgentGateway under the hood, which enables us to leverage more of its features. With kmcps CRDs, we can easily set up stdio MCP-Servers. Kmcp then creates a Kubernetes service and deployment for us. One example could be the official GitHub MCP-Server. Although it also offers a hosted version, sometimes you would like to host it yourself on your Kubernetes cluster:
apiVersion: kagent.dev/v1alpha1
kind: MCPServer
metadata:
name: github-stdio-mcp
spec:
deployment:
image: "ghcr.io/github/github-mcp-server:latest"
port: 12001
cmd: "/server/github-mcp-server"
args: ["stdio"]
secretRefs:
- name: github-token
transportType: "stdio"We are using kmcp with version 0.1.9 here.
For this specific example, keep in mind that the GitHub MCP-Server needs a token you must provide as a Kubernetes secret. Put the name of your secret in the “secretRefs” name field. The created service can then be used in combination with an ingress to expose the MCP-Server. On the other hand, you can again use kgateway together with AgentGateway and combine them with kmcp. As stated in the AgentGateway example, you can use the internal service URL to expose the MCP-Server with kgateway.
MCP Authentication with OAuth2
In an enterprise context, you may not want to expose your MCP servers to the public without authentication and authorization. The protocol allows you to guard your MCP-Servers and authenticate against them via OAuth2. This enables your MCP-Servers and even your MCP-Tools and Resources to be controlled with RBAC. AgentGateway makes it quite simple to set up an MCP-Server with OAuth2, even if it is not natively supported by the MCP-Server. To do that, you need some kind of identity provider. We will use Keycloak in our examples. OAuth2 is currently not natively supported in AgentGateway when used in combination with kgateway. It is possible to provide a raw AgentGateway configuration with kgateway, which is an undocumented and experimental feature. We will simplify the process here by using AgentGateway as a standalone solution. Let’s assume we have Keycloak properly set up and configured and created different users and a realm for our MCP authentication. With an MCP-Server, set up with kmcp, for example, our AgentGateway config can look like this:
binds:
- port: 8080
listeners:
- routes:
- backends:
- mcp:
targets:
- name: mcp
mcp:
host: http://localhost:8081/mcp/
matches:
- path:
exact: /github/mcp
- path:
exact: /.well-known/oauth-protected-resource/github/mcp
- path:
exact: /.well-known/oauth-authorization-server/github/mcp
- path:
exact: /.well-known/oauth-authorization-server/github/mcp/client-registration
name: mcp-github
policies:
cors:
allowHeaders:
- "*"
allowOrigins:
- "*"
mcpAuthentication:
issuer: http://keycloak.192.168.49.2.nip.io/realms/mcp
jwksUrl: http://keycloak.192.168.49.2.nip.io/realms/mcp/protocol/openid-connect/certs
audience: mcp_proxy
provider:
keycloak: {}
resourceMetadata:
resource: http://localhost:8080/github/mcp
scopesSupported:
- profile
- offline_access
- openid
- email
- roles
bearerMethodsSupported:
- header
- body
resourceDocumentation: http://localhost:8080/github/mcp/docs
mcpAuthorization:
rules:
# Allow anyone to call “get_me”
- 'mcp.tool.name == "get_me"'
# Only the admin user can call “delete_repository”
- 'jwt.sub = "admin" && mcp.tool.name == "delete_repository"'This example wraps the AgentGateway authentication configuration around our GitHub MCP-Server, which is reachable from our cluster. In the configuration, you can define the Keycloak URLs required for authentication. There, you can also create rules for authorization using the “common expression language” (CEL). You can use that to restrict certain tools only to be called by specific users or specific groups.
When your MCP-Client attempts to connect to the MCP-Server, it will open a login form where you can log in with your configured user credentials. Note that your MCP-Client must support the OAuth2 flow for MCP.
This example provides one important part of MCP authentication. The second part includes the authentication against third-party services. The most common way to do that is with API keys, which are configured with the setup of the MCP server. But this might not be a fitting approach for use cases that require an approach that takes different user permissions and multitenancy into account. In an ideal scenario, the MCP-Server would use the users’ credentials to authenticate with third-party services and use the users’ permissions to retrieve data and call tools. This is a known issue and an ongoing topic in the protocol. On a high level, this will be done via an MCP feature called “elicitation”. This enables the MCP-Server to request additional user input or information from our MCP-Client. In our case, this could be used for user-generated API keys or tokens. There is even an ongoing discussion about URL mode elicitation, which further enhances the security of this workflow. The elicitation feature is quite new as well, meaning that most MCP-Clients do not support it yet, but adoption is growing.
There are other MCP gateways trying to address this issue as well. LiteLLM, a software discussed in another blog article, got updated to support MCP-Servers and enhance them with different authentication mechanisms.
Summary
The Model Context Protocol (MCP) is rapidly becoming a foundational technology for integrating large language models with external systems in a standardized and provider-agnostic way. While MCP-Servers are naturally well-suited for containerized environments, hosting them in Kubernetes introduces considerations for networking, security, and operational complexity, especially when using stdio-based servers or deployments that lack enterprise features such as authentication, authorization, and observability. Solutions like AgentGateway, kgateway, LiteLLM, and kmcp bridge these gaps by adding robust transport layers, OAuth2-based protection, RBAC capabilities, and simplified deployment workflows. Together, they enable running both native MCP-Servers and stdio-only implementations in production-grade Kubernetes environments.
As the MCP ecosystem continues to evolve, upcoming features such as URL mode elicitation will further strengthen secure, multi-tenant, and user-aware integrations. As tools, MCP-Clients, gateways, and the protocol mature, they become a core building block for enterprise-grade AI agent architectures. Hosting MCP-Servers on Kubernetes, supported by these emerging tools and standards, provides a scalable, secure, and future-proof foundation for building LLM-powered applications.
Further readings
- https://liquidreply.net/news/manage-a-unified-llm-api-platform-with-litellm
- https://github.com/modelcontextprotocol/modelcontextprotocol/pull/887
- https://kgateway.dev/docs/agentgateway/latest/
- https://www.keycloak.org/documentation
- https://www.anthropic.com/news/model-context-protocol
- https://kagent.dev/docs/kmcp