Kubernetes Operator
Warning
This is not authoritative documentation. These features are not currently available in Zuul. They may change significantly before final implementation, or may never be fully completed.
While Zuul can be happily deployed in a Kubernetes environment, it is a complex enough system that a Kubernetes Operator could provide value to deployers. A Zuul Operator would allow a deployer to create, manage and operate “A Zuul” in their Kubernetes and leave the details of how that works to the Operator.
To that end, the Zuul Project should create and maintain a Kubernetes Operator for running Zuul. Given the close ties between Zuul and Ansible, we should use Ansible Operator to implement the Operator. Our existing community is already running Zuul in both Kubernetes and OpenShift, so we should ensure our Operator works in both. When we’re happy with it, we should publish it to OperatorHub.
That’s the easy part. The remainder of the document is for hammering out some of the finer details.
Custom Resource Definitions
One of the key parts of making an Operator is to define one or more Custom Resource Definition (CRD). These allow a user to say “hey k8s, please give me a Thing”. It is then the Operator’s job to take the appropriate actions to make sure the Thing exists.
For Zuul, there should definitely be a Zuul CRD. It should be namespaced
with zuul-ci.org
. There should be a section for each service for
managing service config as well as capacity:
apiVersion: zuul-ci.org/v1alpha1
kind: Zuul
spec:
merger:
count: 5
executor:
count: 5
web:
count: 1
fingergw:
count: 1
scheduler:
count: 1
Note
Until the distributed scheduler exists in the underlying Zuul
implementation, the count
parameter for the scheduler service
cannot be set to anything greater than 1.
Zuul requires Nodepool to operate. While there are friendly people using Nodepool without Zuul, from the context of the Operator, the Nodepool services should just be considered part of Zuul.
apiVersion: zuul-ci.org/v1alpha1
kind: Zuul
spec:
merger:
count: 5
executor:
count: 5
web:
count: 1
fingergw:
count: 1
scheduler:
count: 1
# Because of nodepool config sharding, count is not valid for launcher.
launcher:
builder:
count: 2
Images
The Operator should, by default, use the docker.io/zuul
images that
are published. To support locally built or overridden images, the Operator
should have optional config settings for each image.
apiVersion: zuul-ci.org/v1alpha1
kind: Zuul
spec:
merger:
count: 5
image: docker.io/example/zuul-merger
executor:
count: 5
web:
count: 1
fingergw:
count: 1
scheduler:
count: 1
launcher:
builder:
count: 2
External Dependencies
Zuul needs some services, such as a RDBMS and a Zookeeper, that themselves are resources that should or could be managed by an Operator. It is out of scope (and inappropriate) for Zuul to provide these itself. Instead, the Zuul Operator should use CRDs provided by other Operators.
On Kubernetes installs that support the Operator Lifecycle Manager, external dependencies can be declared in the Zuul Operator’s OLM metadata. However, not all Kubernetes installs can handle this, so it should also be possible for a deployer to manually install a list of documented operators and CRD definitions before installing the Zuul Operator.
For each external service dependency where the Zuul Operator would be relying on another Operator to create and manage the given service, there should be a config override setting to allow a deployer to say “I already have one of these that’s located at Location, please don’t create one.” The config setting should be the location and connection information for the externally managed version of the service, and not providing that information should be taken to mean the Zuul Operator should create and manage the resource.
---
apiVersion: v1
kind: Secret
metadata:
name: externalDatabase
type: Opaque
stringData:
dburi: mysql+pymysql://zuul:password@db.example.com/zuul
---
apiVersion: zuul-ci.org/v1alpha1
kind: Zuul
spec:
# If the database section is omitted, the Zuul Operator will create
# and manage the database.
database:
secretName: externalDatabase
key: dburi
While Zuul supports multiple backends for RDBMS, the Zuul Operator should not attempt to support managing both. If the user chooses to let the Zuul Operator create and manage RDBMS, the Percona XtraDB Cluster Operator should be used. Deployers who wish to use a different one should use the config override setting pointing to the DB location.
Zuul Config
Zuul config files that do not contain information that the Operator needs to do its job, or that do not contain information into which the Operator might need to add data, should be handled by ConfigMap resources and not as parts of the CRD. The CRD should take references to the ConfigMap objects.
Completely external files like clouds.yaml
and kube/config
should be in Secrets referenced in the config. Zuul files like
nodepool.yaml
and main.yaml
that contain no information the Operator
needs should be in ConfigMaps and referenced.
apiVersion: zuul-ci.org/v1alpha1
kind: Zuul
spec:
merger:
count: 5
executor:
count: 5
web:
count: 1
fingergw:
count: 1
scheduler:
count: 1
config: zuulYamlConfig
launcher:
config: nodepoolYamlConfig
builder:
config: nodepoolYamlConfig
externalConfig:
openstack:
secretName: cloudsYaml
kubernetes:
secretName: kubeConfig
amazon:
secretName: botoConfig
Zuul files like /etc/nodepool/secure.conf
and /etc/zuul/zuul.conf
should be managed by the Operator and their options should be represented in
the CRD.
The Operator will shard the Nodepool config by provider-region using a utility pod and create a new ConfigMap for each provider-region with only the subset of config needed for that provider-region. It will then create a pod for each provider-region.
Because the Operator needs to make decisions based on what’s going on with
the zuul.conf
, or needs to directly manage some of it on behalf of the
deployer (such as RDBMS and Zookeeper connection info), the zuul.conf
file should be managed by and expressed in the CRD.
Connections should each have a stanza that is mostly a passthrough
representation of what would go in the corresponding section of zuul.conf
.
Due to the nature of secrets in kubernetes, fields that would normally contain either a secret string or a path to a file containing secret information should instead take the name of a kubernetes secret and the key name of the data in that secret that the deployer will have previously defined. The Operator will use this information to mount the appropriate secrets into a utility container, construct appropriate config files for each service, reupload those into kubernetes as additional secrets, and then mount the config secrets and the needed secrets containing file content only in the pods that need them.
---
apiVersion: v1
kind: Secret
metadata:
name: gerritSecrets
type: Opaque
data:
sshkey: YWRtaW4=
http_password: c2VjcmV0Cg==
---
apiVersion: v1
kind: Secret
metadata:
name: githubSecrets
type: Opaque
data:
app_key: aRnwpen=
webhook_token: an5PnoMrlw==
---
apiVersion: v1
kind: Secret
metadata:
name: pagureSecrets
type: Opaque
data:
api_token: Tmf9fic=
---
apiVersion: v1
kind: Secret
metadata:
name: smtpSecrets
type: Opaque
data:
password: orRn3V0Gwm==
---
apiVersion: v1
kind: Secret
metadata:
name: mqttSecrets
type: Opaque
data:
password: YWQ4QTlPO2FpCg==
ca_certs: PVdweTgzT3l5Cg==
certfile: M21hWF95eTRXCg==
keyfile: JnhlMElpNFVsCg==
---
apiVersion: zuul-ci.org/v1alpha1
kind: Zuul
spec:
merger:
count: 5
git_user_email: zuul@example.org
git_user_name: Example Zuul
executor:
count: 5
manage_ansible: false
web:
count: 1
status_url: https://zuul.example.org
fingergw:
count: 1
scheduler:
count: 1
connections:
gerrit:
driver: gerrit
server: gerrit.example.com
sshkey:
# If the key name in the secret matches the connection key name,
# it can be omitted.
secretName: gerritSecrets
password:
secretName: gerritSecrets
# If they do not match, the key must be specified.
key: http_password
user: zuul
baseurl: http://gerrit.example.com:8080
auth_type: basic
github:
driver: github
app_key:
secretName: githubSecrets
key: app_key
webhook_token:
secretName: githubSecrets
key: webhook_token
rate_limit_logging: false
app_id: 1234
pagure:
driver: pagure
api_token:
secretName: pagureSecrets
key: api_token
smtp:
driver: smtp
server: smtp.example.com
port: 25
default_from: zuul@example.com
default_to: zuul.reports@example.com
user: zuul
password:
secretName: smtpSecrets
mqtt:
driver: mqtt
server: mqtt.example.com
user: zuul
password:
secretName: mqttSecrets
ca_certs:
secretName: mqttSecrets
certfile:
secretName: mqttSecrets
keyfile:
secretName: mqttSecrets
Executor job volume
To manage the executor job volumes, the CR also accepts a list of volumes to be bind mounted in the job bubblewrap contexts:
name: Text
context: <trusted | untrusted>
access: <ro | rw>
path: /path
volume: Kubernetes.Volume
For example, to expose a GCP authdaemon token, the Zuul CR can be defined as
apiVersion: zuul-ci.org/v1alpha1
kind: Zuul
spec:
...
jobVolumes:
- context: trusted
access: ro
path: /authdaemon/token
volume:
name: gcp-auth
hostPath:
path: /var/authdaemon/executor
type: DirectoryOrCreate
Which would result in a new executor mountpath along with this zuul.conf change:
trusted_ro_paths=/authdaemon/token
Logging
By default, the Zuul Operator should perform no logging config which should
result in Zuul using its default of logging to INFO
. There should be a
simple config option to switch that to enable DEBUG
logging. There should
also be an option to allow specifying a named ConfigMap
with a logging
config. If a logging config ConfigMap
is given, it should override the
DEBUG
flag.