Image 14

Building a K8S Operator to manipulate your Managed Cluster Nodes

veröffentlicht am 11.01.2023 von Max Schmidt

For our WebAssembly Special Interest Group at Liquid Reply we wanted to build a proof of concept that allows us to run WebAssembly workloads on regular Kubernetes nodes which are managed by a Cloud Provider like AWS, Azure or GCP. As WebAssembly images require a modified OCI runtime, in our case CRUN, with support to run WebAssembly. CRUN needs to be compiled, or replaced on the node with a version that has WebAssembly readily compiled.

Introduction
We came up with a solution that uses a privileged Kubernetes DaemonSet, that is able to execute commands on your cluster nodes. But this was not flexible enough, so we built an operator, which automatically provisions your Kubernetes nodes with an annotation. You can find the Operator here .

Simply put, the goal of this project is to provide an easy and uncomplicated way of trying out WebAssembly on Kubernetes. If you want to try this in a production use-case, contact us at Liquid Reply! We’d be happy to collaborate.

The Installer

The operator is deploying a job on the nodes which uses the Docker image from https://github.com/KWasm/kwasm-node-installer. This Docker image will install crun with WasmEdge support on the node. It is also in charge of modifying the ContainerD config file to add a new WasmEdge plugin which is then used by the RuntimeClass. To learn more about the installer, check out the link above.

Daemonset vs Operator

DaemonSetOperator
Installer in InitContainer, always-running pause container neededJob from Operator runs one time, then terminates when it succeeds
All Nodes are provisioned at the same time - may break your whole ClusterAnnotation-based - One node at a time possible - also possible to provision all nodes automatically
Affinity/NodeSelector to only provision certain nodes = manual scheduling of workloadsAutomatic Workload Scheduling based on Label on successful Operator Provisioning

We recommend to install the Operator on your cluster and label one node at a time to prevent failures.

Which Operator Type is the right choice?

There are a few different operator frameworks which you can choose from. Choosing the right operator for your specific use-case is key.

There are operator frameworks for building an operator with Python (Kopf), Java (java-operator-sdk), Rust (kube-rs), DotNet (KubeOps), GoLang (Operator-SDK) and even rather simpler frameworks for “just” deploying applications. With Operator-SDK you are able to create your operator according to your use-case in either GoLang, with Helm or even with Ansible.

Because our use-case is rather more complex, we chose to use Operator-SDK with a GoLang based Operator. Also the Operator-SDK community is rather large, and the project is well documented.

How our Operator works

Since we wanted to get rid of a DaemonSet which keeps on running even though the node is already provisioned for WebAssembly support, we decided on using Kubernetes Jobs to provision the nodes.
This means that our Operator needs to implement logic, that creates a Kubernetes Job when a node in the cluster has the annotation “kwasm.sh/kwasm-node=true”. In order to get this working, the “SetupWithManager” Function of Operator-SDK needs to be modified to listen to node events. This can be achieved with a simple “For(&corev1.Node{}).”. Then, on every node action (update, delete, create) the Reconcile Function will be triggered for the node. This function then checks if the label is set, and if it is set, the Job needs to be created with a NodeSelector pointing to the node for which the reconcile is being executed. Also after the Job has been created, a label will be set to this node. If a node has the annotation + label already set, a future reconcile will be skipped, since the node is already provisioned. The Job that will be deployed to the annotated node will be privileged, run under HostPID and will even mount the root Folder of the node . Remember that this is just a Proof-of-Concept of what can be achieved.

The Provisioner Reconcile Function

As stated above, we decided on using two different reconcilers. One for handling the Job deployment and one reconcile controller to watch the deployed Job objects. The provisioner reconciler is the heart of the Operator. It handles the node annotations and listens for any Node change. If we annotate the node with our(?) kwasm.sh/kwasm-node=true annotation, the reconcile function will be triggered. The most important thing is adding the two needed annotations as a constant.

const ( 
    addKWasmNodeLabelAnnotation = "kwasm.sh/kwasm-node" 
    nodeNameLabel = "kwasm.sh/kwasm-provisioned" 
) 

Then in the reconcile we need to check if both of the annotations are set. If this is the case, the node is already provisioned and can be skipped.

labelShouldBePresent := node.Annotations[addKWasmNodeLabelAnnotation] == "true" 
labelIsPresent := node.Labels[nodeNameLabel] == node.Name 
 
if labelShouldBePresent == labelIsPresent && !r.AutoProvision { 
    return ctrl.Result{}, nil 
}

If a node gets the annotation set for the first time, it should get a label set from the constant nodeNameLabel defined previously. Also, a function will be executed for generating the Kubernetes manifests for the Job to be deployed. The function for deploying the Job is out of scope for this post, however you can find it here

if labelShouldBePresent || r.AutoProvision && !labelIsPresent { 
    // If the label should be set but is not, set it. 
    if node.Labels == nil { 
        node.Labels = make(map[string]string) 
    } 
    node.Labels[nodeNameLabel] = node.Name 
    log.Info().Msgf("Trying to Deploy on %s", node.Name) 
    dep := r.deployJob(node, req) 
    err := r.Create(ctx, dep) 
    if err != nil { 
        log.Err(err).Msg("Failed to create new Job " + req.Namespace + " Job.Name " + req.Name) 
    } 

Of course we also need a logic to remove the Job, if the annotation gets removed from the Node.

// If the label should not be set but is, remove it. 
delete(node.Labels, nodeNameLabel) // Delete the nodeNameLabel from the Node 
err := r.Delete(ctx, &batchv1.Job{ 
    ObjectMeta: metav1.ObjectMeta{ 
        Name: req.Name + "-provision-kwasm", 
        Namespace: os.Getenv("CONTROLLER_NAMESPACE"), 
    }, 
}, client.PropagationPolicy(metav1.DeletePropagationBackground)) 

The Job Reconcile Function

Currently the reconcile controller for the Job(s) is only used for getting the status of the deployed Job from the provisioner controller. We created a function isJobFinished which returns the status condition of the Job to the reconciler function. The reconciler function then checks the return from the isJobFinished function and logs “Ongoing”, “Failing”, or if the Job is completed “Completed”.

Automatic provisioning

We added a functionality so you are not required to label your nodes manually. To use this feature, you can install the Helm Chart with the value kwasmOperator.autoProvision set to “true”. But keep in mind that this will immediately install WebAssembly support on all nodes in your cluster!