K8s multi tenancy genai

Solving Kubernetes Multi-tenancy Challenges with vCluster

veröffentlicht am 30.05.2025 von Fabian Brundke

Discover how vCluster resolves Kubernetes multi-tenancy limitations for Internal Developer Platforms by creating isolated virtual clusters within host environments. This technical deep dive explores how platform teams can empower users with full administrative control over their environments while maintaining proper isolation—solving the namespace-level resource limitations that typically challenge multi-tenant architectures. Learn how vCluster enables teams to deploy cluster-scoped resources like CRDs while preserving security and governance through seamless integration with host-level security tools.

Understanding Multi-tenancy 

When we are building Internal Developer Platforms (IDP) for our customers Kubernetes is often a solid choice as the robust core of this platform. This is due to its technical capabilities and the strong community that is constantly expanding the surrounding ecosystem. One common IDP use case is to support the software development lifecycle (SDLC) of multiple tenants (e.g. multiple applications or software engineering teams). Adopting Kubernetes helps to share resources among these different tenants and while this helps to - among other benefits - optimize costs and enforce standards, it also introduces the challenge of isolating workloads from each other. This is known as multi-tenancy and for an IDP we have to ensure, that this isolation is fair and secure.

Multi-tenancy with native Kubernetes features

Thankfully Kubernetes provides a few out-of-the-box features to support isolation in a multi-tenant setup. This isolation can happen for the control plane and data plane. Let's briefly describe these features.

Control Plane Isolation

  • Namespaces are used to group different resources for tenants into logical units. While this allows you to effectively apply e.g. security policies it does not provide any isolation per se.
  • Role-based access control (RBAC) plays a crucial role to enforce authorization. This helps to limit which API resource access is granted or denied to each individual tenant and can be combined with namespaces.
  • Resource Quotas can be used to set limits on the resource consumption for individual namespaces. This is not limited to compute resources like cpu and memory but can also be defined for e.g. storage resources or object counts.

Data Plane Isolation

  • Network Policies are used to limit egress and ingress traffic as by default all network communication is allowed. This helps to enforce network isolation between tenants.
  • Storage isolation can be achieved with dynamic volume provisioning.
  • Node isolation is supported by provisioning dedicated nodes for the individual tenants and only allowing tenant specific workload by using e.g. taints and tolerations.
 

While these features provide a robust foundation for multi-tenancy they are often times not sufficient in a mature IDP setup. Especially the concept of namespaces is becoming a limiting factor for the isolation on control plane level. Consider the situation where one team (going forward we consider individual teams as individual tenants) needs to deploy a specific Custom Resource Definition (CRD) that is required to run a certain tool only their application requires. As CRDs have to be deployed on cluster-scope they are not namespaced. Hence the team cannot deploy a CRD on their own (as their access is limited to their namespace as per isolation requirements). This means they will reach out to the platform engineering team to ask them for support which then leaves the platform engineers with essentially these options:

  • Deny the request and potentially lose the team as a platform user, which would probably also result in a negative reputation for the platform as a product.
  • Give the team more rights so they can deploy cluster-scoped resources which contradicts the idea of tenant isolation.
  • Deploy the CRD for the team which also means becoming responsible for managing tenant resources which contradicts the IDP idea and can quickly become an unmanageable burden (on the other side this option can be the right choice if multiple teams need the CRD and so it can become a platform offering).
  • Create a dedicated cluster just for this team where they get full access which significantly increases costs, operational burden and configuration sprawl.

None of these options are really compelling in the described use case to either the platform user nor the platform engineers. How can we solve this dilemma? Enter vCluster!

vCluster

vCluster is a tool by LoftLabs allowing us to spin up virtual clusters that are running on a physical host cluster. These virtual clusters are fully functional Kubernetes clusters and provide an API server endpoint. As such they are a compelling offer to support isolation in a multi-tenancy setup. How does that work?

Concept

From the physical host cluster perspective vCluster is simply an application running in a namespace. This application consists of two important components.
First there is the Virtual control plane which includes components that you find in every regular cluster as well:
  • Kubernetes API server to handle all API requests within the virtual cluster.
  • Controller manager which is responsible to ensure consistent resource state.
  • Data store to store the states of the resources in the virtual cluster.
  • Scheduler which is optional and can be used instead of the default host scheduler.
Second there is the Syncer. This component is responsible to sync resources from the virtual cluster to the host namespace where vCluster is running. This is necessary as vCluster itself does not have a network or nodes where workload can be scheduled. By default only the low-level resources like Pods, ConfigMaps, Secrets and Services are synced to the host.
The following diagram, taken from the official documentation, depicts the above-described architecture. Reading the documentation is highly recommended if you are interested in more details.
vCluster architecture with syncer and control plane running on host
vCluster Architecture (source: https://www.vcluster.com/docs/vcluster/0.24.0/introduction/architecture )

So what can you do with a virtual cluster and how does it help with multi-tenancy? Teams can request a virtual cluster e.g. via a self-service IDP offering. The IDP spins up the virtual cluster in a host namespace that is dedicated to the team and provides the required connection details (e.g. via kubeconfig) back to the requesting team. The team can then use the virtual cluster to deploy e.g. the required CRDs alongside their applications. They can also use the virtual cluster to create as many namespaces as they need. From a host perspective, all virtual cluster workload is still restricted to the assigned host namespace. In this way, vCluster solves the namespace-level multi-tenancy limitations for teams while maintaining it from an IDP perspective. Let's spin up a virtual cluster based on the free vCluster core offering to see it in action.

Hands-on

Prerequisites

  1. Access to a running Kubernetes cluster which will serve as the host (for this demo a local colima cluster with Kubernetes v1.32 is being used).
  2. kubectl to interact with the host and virtual cluster.
  3. vCluster CLI to install a virtual cluster (note: other options like Helm, Terraform, ArgoCD and Cluster API are also supported; this demo uses vCluster 0.24.1).
  4. Optional: as you will need to switch context between the host and the virtual cluster regularly you might want to consider installing kubectx.

Deploy a virtual cluster

With access to a Kubernetes cluster we use the vCluster CLI to spin up a virtual cluster:
  1. Run
    vcluster create tenant-a-vcluster --namespace tenant-a
    to create this virtual cluster. This automatically adds an entry to your kubeconfig and changes your current context to the virtual cluster. By running
    kubectl get namespace
    you should see the following output which looks comparable to what you find in every physical cluster as well:

    NAME              STATUS   AGE default           Active   1m05s kube-node-lease   Active   1m05s kube-public       Active   1m05s kube-system       Active   1m05s

  2. In your host cluster you will see a new namespace called tenant-a that includes two pods:
    • vCluster pod which includes the control plane and syncer containers
    • core-dns to ensure that pods and services can locate each other by hostnames in the virtual cluster. This core-dns pod was synced from the virtual cluster to the host cluster.

Deploy workload

As a next step we deploy some sample workload to see what happens in the virtual and the host cluster:
  1. Within the virtual cluster spin up a simple nginx pod in the default namespace by running
    kubectl run nginx --image=nginx
  2. In the host cluster you will see that the syncer placed the nginx pod in the tenant-a namespace of the virtual cluster with an adjusted name consisting of pod, namespace and virtual cluster: nginx-x-default-x-tenant-a-vcluster
  3. Since the nginx pod is running on the host it can be reached by other workload running on the host just as every other regular workload if there are no restrictions like network policies in place. To test this you can e.g. get the IP of the nginx pod in the host cluster:
    NGINX_IP=$(kubectl get pod nginx-x-default-x-tenant-a-vcluster -n tenant-a --template '{{.status.podIP}}')
    and then run:
    kubectl run temp-pod --rm -it --restart=Never --image=curlimages/curl -- curl $NGINX_IP
    This should output some html from nginx indicating that the request was successful.

Deploy a CRD

As we said that deploying CRDs as non-namespaced resources poses challenges in Kubernetes multi-tenancy scenarios, and that vCluster can help with this by allowing tenants to manage CRDs themselves, we need to test this as well.
  1. Deploy the following CRD on the virtual cluster:

    kubectl apply -f https://raw.githubusercontent.com/Liquid-Reply/vcluster-idp-tools/refs/heads/main/crds/crontabs.yaml

  2. Then deploy a corresponding custom resource (CR) on the virtual cluster:

    kubectl apply -f https://raw.githubusercontent.com/Liquid-Reply/vcluster-idp-tools/refs/heads/main/resources/crontab.yaml

    You will see a success message that the object was created: crontab.stable.example.com/my-new-cron-object created
  3. Try to deploy the same CR (not CRD) on the host cluster. You will see an error message indicating that the corresponding CRD is not installed as it is only available in the virtual cluster.

Hands-on summary

From this very basic hands-on we can see the simplicity of spinning up virtual clusters which look and behave like regular clusters. The workload that was deployed to the virtual cluster is synced to the host cluster where it behaves like regular cluster workload. As the virtual cluster brings its own API server and datastore we can deploy e.g. CRDs that are only available there and not on the host. This isolation on the control plane level is a very valuable asset in a multi-tenant environment and can solve the challenge we described initially.

Interaction with host applications

By now it is clear that vCluster is a great fit for platform users that want to manage non-namespaced resources but what about the platform engineering team? Usually this team is deploying an application stack that helps them and the users to ensure among others security, cost efficiency and compliance. Do the tools in this stack still work as expected for the virtual clusters although they are deployed on the host cluster? As an example we will take a look at Falco (detects security threats at runtime) and Kyverno (defines policies that act as guardrails for cluster workload) and see how they interact with the virtual cluster workload.

Falco

As Falco is part of the platform stack we first need to install it on the host cluster (see official documentation):
helm repo add falcosecurity https://falcosecurity.github.io/charts
helm repo update
helm install --replace falco --namespace falco --create-namespace --set tty=true falcosecurity/falco
 
This deploys Falco as a DaemonSet in the falco namespace (the demo uses chart version 4.21.3). Depending on the number of nodes in the host cluster you should see the corresponding pods running if the installation was successful: kubectl get pods -n falco

Falco is tracking suspicious activity like spawning a shell in a container or opening a sensitive file. To test the functionality and interaction with the workload from the virtual cluster we will reuse the nginx pod that we have deployed as part of the demo:

  1. Observe the Falco logs on the host cluster: kubectl logs ds/falco -n falco -f
  2. Extract the content of the etc/shadow file in the nginx pod (this file is considered sensitive as it contains password information): kubectl exec -it nginx -- cat etc/shadow
  3. In the Falco logs you should now see a warning for that recorded suspicious activity: Warning Sensitive file opened for reading by non-trusted program ...

Since vCluster synchronises workloads from the virtual cluster to the host, Falco can detect this kind of activity as it appears as regular host workload to Falco. This is good news for both the platform users and platform engineers as it means no reconfiguration of Falco is necessary when using vCluster.

Kyverno

Just like Falco we first need to install Kyverno on the host cluster (see official documentation):
helm repo add kyverno https://kyverno.github.io/kyverno/
helm repo update
helm install kyverno kyverno/kyverno -n kyverno --create-namespace

After successful installation Kyverno is running in the kyverno namespace (the demo uses the chart version 3.4.1): kubectl get pods -n kyverno

Validate Rules

To test if Kyverno policies are properly applied to the virtual cluster workload and if violations are surfaced to tenants we will deploy a sample policy on the host cluster. This policy includes a rule that validates if pods are labeled with app.kubernetes.io/name:

kubectl apply -f https://raw.githubusercontent.com/Liquid-Reply/vcluster-idp-tools/refs/heads/main/kyverno-policies/require-labels.yaml

You can check if the policy was properly applied to the workload by checking the events for both the host and the virtual cluster:
  1. For the virtual cluster namespace on the host cluster execute: kubectl events -n tenant-a | grep PolicyViolation
    Since none of the workloads in that namespace have the required label we see a lot of validation errors as we would expect from Kyverno:
    policy require-labels/check-for-labels fail: validation error: The label app.kubernetes.io/name is required.
  2. Now for the virtual cluster run: kubectl events | grep PolicyViolation
    For the nginx pod from the demo on the virtual cluster you will see the same validation error message as for the synced nginx pod on the host cluster. This demonstrates that violations from policies that are managed by the platform team on the host cluster are transparently surfaced to users of the virtual cluster. The reason why this works is that events from the host are synced to the virtual cluster but it is also important to keep in mind that this process is asychronous, meaning events will first show up on the host and then on the virtual cluster.

The deployed policy from above is set to Audit which means there will be warnings about policy violations but workload is not blocked from being deployed. Let's see what happens if we deploy the same policy but in an Enforce mode:

  1. Deploy the new policy on the host cluster:
    kubectl apply -f https://raw.githubusercontent.com/Liquid-Reply/vcluster-idp-tools/refs/heads/main/kyverno-policies/require-labels-enforce.yaml
  2. Deploy some sample workload on the virtual cluster without the required label:
    kubectl run nginx-enforce --image=nginx
  3. Take a look at the pod:
    kubectl describe pod nginx-enforce
    You will see that the pod is stuck in a pending state on the virtual cluster. This indicates that the manifest has been saved to the vCluster data store, but it is staying in the pending state as it cannot be synced to the host cluster since the policy there enforces the requirement of the app.kubernetes.io/name label for workloads.
    If you would try to deploy the same workload directly on the host cluster Kyverno would block the deployment immediately with an error message. Compared to the virtual cluster nothing would be saved to the data store of the host cluster.

Mutate rules

Besides rules in Kyverno policies that validate resources there is another important type of rule. These are mutate rules that can be used to modify resources and we want to test their behaviour as well. For that we will use a policy with rules for the following mutations:

  • add the label foo: bar
  • add the annotation foo: bar
  • add resource requests for cpu and memory if not present
You can take a look at the policy here:
  1. Install the policy on the host cluster:
    kubectl apply -f kubectl apply -f https://raw.githubusercontent.com/Liquid-Reply/vcluster-idp-tools/refs/heads/main/kyverno-policies/add-labels-annotations-resources.yaml
  2. Remove the Enforce policy from above to not block new workload from being deployed:
    kubectl delete -f https://raw.githubusercontent.com/Liquid-Reply/vcluster-idp-tools/refs/heads/main/kyverno-policies/require-labels-enforce.yaml
  3. Since the mutations are only applied to newly created pods we will deploy another nginx pod on the virtual cluster:
    kubectl run nginx-mutate --image=nginx
  4. To see if the mutations were applied correctly you can take a look at the pod:
    kubectl describe pod nginx-mutate
    You will see that both the foo: bar label and annotation have been applied to the pod in the virtual cluster but the resources have not been set. In contrast to that, the synced host pod not only has the label and annotation set but also the resources have been adjusted according to the policy - which is expected as Kyverno is running on the host. The reason for this is that just like for the events (as described earlier), the labels and annotations are synced from host to virtual cluster but that does not apply to all object properties. In fact most of the pod spec is not being synced back. This bi-directional sync is described more in the official documentation with an overview of what is actually being synced.

Implications

Just looking at pods from a functionality point of view it might be fine to solely use Kyverno from the host cluster as the policies will be applied to the synced pods on the host. These synced pods are the actual workload running on the underlying nodes so they will adhere to the defined policies of the internal developer platform. The drawback of this solution is, that the tenant that is using the virtual cluster does not have insights into all changes that are applied to his workload as they are not synced back from the host. This can lead to bad developer experience for the platform.
It is even more important to keep in mind that in this setup the Kyverno policies can only be applied to objects that are actually synced from vCluster to the host (by default only pods, secrets, configmaps and services are being synced). For example a Deployment resource in vCluster will not be synced to the host. The consequence is that host Kyverno policies that are targeting Deployments (to e.g. ensure that a minimum number of replicas is defined) will not take effect.

Good news is that there are some ways to improve this setup:

  • Additionally install Kyverno and the policies in the virtual cluster: This makes all mutations and validations transparent to the team using the virtual cluster. The team can now also deploy their own policies and use all Kyverno features that are available. Drawback is that the additional Kyverno installation requires resources and maintenance effort.
  • Use the Kyverno integration which is a vCluster enterprise feature: This helps to enforce policies inside the virtual clusters with only a single host Kyverno installation. One noteworthy limitation here is though that you can only look up other resources on the host cluster with Kyverno's resource library feature.
  • Currently the capabilities of vCluster can actually be extended by plugins to perform custom logic. It's important to keep in mind though that this custom logic has to be developed and maintained which might be a significant effort. Additionally the wider adoption of this feature seems to be limited so further support of this feature in the future might not be guaranteed.

Summary

We have explored how vCluster addresses the limitations of native Kubernetes multi-tenancy for Internal Developer Platforms (IDPs) on the control plane level. While Kubernetes offers baseline isolation through namespaces, RBAC, and other features, these become insufficient when teams need to deploy cluster-scoped resources like CRDs.

vCluster solves this by creating virtual Kubernetes clusters within a host cluster. This allows teams to have full administrative control within their isolated environment while maintaining proper constraints from the platform's perspective.

We have demonstrated how workloads sync between virtual and host clusters, and examined the interaction and compatibility with Falco and Kyverno. While some synchronization challenges exist, especially with Kyverno, we have learned that the actual workload is running on the host cluster and as such is properly managed by host applications which is an important aspect for platform engineers and users likewise.

Outlook

While we have focused on improving control plane isolation above, we have not discussed how data plane isolation can be enhanced. Not long ago LoftLabs has announced vNode to address exactly this challenge. It will be interesting to see how exactly this tools fits into a mature multi-tenancy solution and how vCluster and vNode will work together. As a next step we at Liquid Reply will test this combination to assess the possibilities.

References