Cloud Native CI/CD with Tekton - Building Custom Tasks | Martin Heinz

In this article we will pick up where we left off in the previous article in which we deployed our Tekton Pipelines environment and we will explore in detail how to find, build and customize Tekton Tasks to create all the necessary building blocks for our pipelines. On top of that, we will also look at how to maintain and test our newly built Tasks, while using all the best practices for creating reusable, testable, well-structured and simple tasks.

If you haven't do so yet, then go checkout the previous article to get your Tekton development environment up and running, so you can follow along with the examples in this one.

Note: All the the code and resources used in this article are available in tekton-kickstarter repository

What Are Tasks?

Tekton Tasks are the basic building blocks of Pipelines. Task is a sequence of steps that performs some particular, well... task. Each of the steps in a Task is a container inside Task's Pod. Isolating this kind of sequence for related steps into single reusable Task provides Tekton with a lot of versatility and flexibility. They can be as simple as running single echo command or as complex as Docker build followed push to registry finished by image digest output.

Apart from Tasks, ClusterTasks are also available. They're not much different from basic Tasks as they're just a cluster-scoped Tasks. These are useful for general purpose Tasks that perform basic operations such as cloning repository or running kubectl commands. Using ClusterTasks helps avoid duplication of code and helps with reusability. Be mindful of modifications to ClusterTasks though, as any changes to them might impact many other pipelines in all the other namespaces in your cluster.

When we want to execute a Task or ClusterTask we create TaskRun. In programming terms you could also think of a Task as a class and TaskRun as it's instance.

If the above explanation isn't clear enough, then little example might help. Here's the simplest possible Task we can create:


# echo.yaml
apiVersion: tekton.dev/v1beta1
kind: Task
metadata:
  name: echo
spec:
  steps:
  - name: hello
    image: ubuntu
    script: echo 'Hello world!'

This simple Task called echo really does just that - it runs a ubuntu container and inject script into it that executes echo 'Hello world!'. Now that we have a task, we can also run it, or in other words create TaskRun. We can create a YAML file for that and apply it or we can also use tkn CLI:


~ $ tkn task start --filename echo.yaml 
TaskRun started: echo-run-jqn9l

In order to track the TaskRun progress run:
tkn taskrun logs echo-run-jqn9l -f -n default

~ $ tkn taskrun logs echo-run-jqn9l -f -n default
[hello] + echo Hello world!
[hello] Hello world!

And it's simple as that! We've run our first Task, so let's now move onto something a bit more useful and explore what Tasks are already out there...

Don't Reinvent a Wheel

This article is about creating and customizing Tekton Tasks, but let's not try to reinvent a wheel here. Instead let's use what Tekton community already created. The main source for existing Tasks that are ready to be used is Tekton Catalog. It's a repository of reliable, curated Tasks reviewed by Tekton maintainers. In addition to Tekton Catalog repository you can also use Tekton Hub which lists all the same Tasks as the catalog but in a bit easier to navigate view. It also lists rating of each Task, which might be a helpful indicator of quality.

In this catalog, you should be able to find all the basic stuff like Tasks for fetching repository (git-clone), building and pushing Docker images (kaniko or buildah) or sending Slack notifications (send-to-webhook-slack). So, before you decide to build custom Tasks try checking the catalog for existing solutions to common problems.

If you browsed through the Tekton catalog or Tekton hub, you probably noticed that installation of each Task is just a single kubectl apply -f .... That's easy enough but if you rely on many of these Tasks and want to track them in version control without copy-pasting all of their YAMLs, then you can use convenient script in tekton-kickstarter repository which will take list of remote YAML URLs, such as:


# catalog.yaml
git-clone: 'https://raw.githubusercontent.com/tektoncd/catalog/master/task/git-clone/0.2/git-clone.yaml'
send-to-webhook-slack: 'https://raw.githubusercontent.com/.../send-to-webhook-slack/0.1/send-to-webhook-slack.yaml'
skopeo-copy: 'https://raw.githubusercontent.com/tektoncd/catalog/master/task/skopeo-copy/0.1/skopeo-copy.yaml'
buildid: 'https://raw.githubusercontent.com/tektoncd/catalog/master/task/generate-build-id/0.1/generate-build-id.yaml'

And with invocation of make catalog, apply them to your cluster.

Layout

Before we start making custom tasks, it's a good idea to decide on layout which would make them easy to navigate, test and deploy. We can take a bit of an inspiration from Tekton Catalog repository and use the following directory structure:


...
├── tasks                      - Custom or remotely retrieved Tasks and ClusterTasks
│   ├── catalog.yaml           - List of Tasks retrieved from remote registries (e.g. Tekton catalog)
│   └── task-name              - Some custom Task
|     ├── task-name.yaml       - File containing Task or ClusterTask
|     └── tests                - Directory with files for testing
|         ├── resources.yaml   - Resources required for testing, e.g. PVC, Deployment
|         └── run.yaml         - TaskRun(s) that performs the test

We store all the tasks in single directory called tasks. In there we create one directory for each Task, which will contain one YAML file containing the Task itself and one more directory (tests) with resources required for testing. Those would be TaskRun(s) in run.yaml and any additional resources needed to perform the test inside resources.yaml. Those could be - for example - PVC for Task performing DB backup or Deployment for task that performs application scaling.

One more file shown in the structure above which we mentioned already in previous section is the catalog.yaml, which holds list of task that are to be installed from remote sources.

For convenience (if using tekton-kickstarter) all of these can be installed with one command, which is make deploy-tasks which traverses tasks directory and applies all the Tasks to your cluster while omitting all the testing resources.

Building Custom Ones

If you can't find appropriate Task for the the job in the catalog, then it's time to write your own. In the beginning of the article I showed very simple "Hello world." example, but Tekton Tasks can get much more complex, so let's go over all the configuration options and features that we can leverage.

Let's start simple and introduce the basics that we will need in pretty much any Task that we build. Those are Task parameters and scripts for Task steps:


# https://github.com/MartinHeinz/tekton-kickstarter/blob/master/tasks/deploy/deploy.yaml
apiVersion: tekton.dev/v1beta1
kind: ClusterTask
metadata:
  name: deploy
spec:
  params:
    - name: name
      description: Deployment name.
    - name: namespace
      default: default
  steps:
    - name: rollout
      image: 'bitnami/kubectl:1.20.2'
      script: |
        #!/usr/bin/env bash
        set -xe
        kubectl rollout restart deployment/$(params.name) -n $(params.namespace)

With the above deploy Task we can perform simple rollout of Kubernetes Deployment, by supplying it with name of the Deployment in name parameter and optionally namespace in which it resides. These parameters that we pass to the Task are then used in script section where they're expanded before the script is executed. To tell Tekton to expand the parameter we use the $(params.name) notation. This can be also used in other parts of spec, not just in the script.

Now, let's take a closer look at the script section - we begin it with shebang to make sure that we will be using bash. However, that doesn't mean that you always have to use bash, same way you could use for example Python with #!/usr/bin/env python, it all depend on your preference and on what's available in the image being used. After shebang we also use set -xe which tells the script to echo each command being executed - you don't have to do this, but it can be very helpful during debugging.

Alternatively, if you don't need whole script, but just a single command, then you can replace script section with command. This is how it would like like for simple Task that performs application health check using kubectl wait (note: omitting obvious/non-relevant parts of Task body for remaining examples):


# https://github.com/MartinHeinz/tekton-kickstarter/blob/master/tasks/healthcheck/healthcheck.yaml
...
  steps:
    - name: wait
      image: 'bitnami/kubectl:1.20.2'
      command:
        - kubectl
        - wait
        - --for=condition=available
        - --timeout=600s
        - deployment/$(params.name) 
        - -n
        - $(params.namespace)

This works the same way as with command directive in pods and so it can get verbose and hard to read if you have many arguments as in the example above. For this reason I prefer to use script for almost everything, as it's more readable and easier to update/change.

Another common thing that you might need in your Tasks is some kind of a storage where you can write data that can be used by subsequent steps in the Task or by other Tasks in the pipeline. The most common use case for this would be a place to fetch git repo. This kind of a storage is called workspace in Tekton and the following example shows a Tasks that mounts and clears the storage using rmdir:


# https://github.com/MartinHeinz/tekton-kickstarter/blob/master/tasks/clean/clean.yaml
...
spec:
  params:
    - name: path
      description: Path to directory being deleted.
      default: '.'
  workspaces:
    - name: source
      mountPath: /workspace
  steps:
    - image: ubuntu
      name: rmdir
      script: |
        #!/usr/bin/env bash
        set -xe
        rm -rf "$(workspaces.source.path)/$(params.path)"

Above clean Task includes workspace section that defines name of the workspace and path where it should be mounted. To make it easier to update the mountPath, Tekton provides variable in a format $(workspaces.ws-name.path) which can be used in scripts to reference the path.

So, defining the workspace is pretty simple, but the disk backing the workspace won't appear out of thin air. Therefore, when we execute Task that asks for a workspace, we also need to create PVC for it. The TaskRun with creation of needed PVC for the above Task would look like so:


# https://github.com/MartinHeinz/tekton-kickstarter/blob/master/tasks/clean/tests/run.yaml
apiVersion: tekton.dev/v1beta1
kind: TaskRun
metadata:
  name: clean
  namespace: default
spec:
  params:
    - name: path
      value: 'somedir'
  taskRef:
    name: clean        # Reference to the task
    kind: ClusterTask  # In case of ClusterTask we need to explicitly specify `kind`
  workspaces:
    - name: source
      volumeClaimTemplate:  # PVC created this way will be cleaned-up after TaskRun/PipelineRun completes!
        spec:
          accessModes:
            - ReadWriteOnce
          resources:
            requests:
              storage: 1Gi

Workspaces are very generic and therefore can be also used not just to store some ephemeral data during Task/Pipeline run, but also as a long term storage like in the following example:


# https://github.com/MartinHeinz/tekton-kickstarter/blob/master/tasks/pg-dump/pg-dump.yaml
apiVersion: tekton.dev/v1beta1
kind: ClusterTask
metadata:
  name: pg-dump
spec:
  params:
    - name: HOST
      type: string
    - name: DATABASE
      type: string
    - name: DEST
      type: string
  workspaces:
    - name: backup
      mountPath: /backup
  steps:
    - name: pg-dump
      image: 'postgres:13.2-alpine'
      env:
        - name: USERNAME
          valueFrom:
            secretKeyRef:
              name: postgres-config
              key: POSTGRES_USER
        - name: PASSWORD
          valueFrom:
            secretKeyRef:
              name: postgres-config
              key: POSTGRES_PASSWORD
      script: |
        #!/usr/bin/env sh
        set -xe
        PGPASSWORD="$PASSWORD" pg_dump -h $(params.HOST) -Fc -U $USERNAME $(params.DATABASE) > $(workspaces.backup.path)/$(params.DEST)

This Task can perform a PostgreSQL database backup using pg_dump utility. In this example, the script grabs the database data from host and DB specified in parameters and streams it into workspaces which is backed by PVC.

Another feature that I sneaked into this example that you will encounter quite often is ability to inject environment variables from ConfigMaps or Secrets. This is done in env section. This works exactly the same way as with Pods, so you can refer to that part of an API for this.

Back to the main topic of this example - PVC for database backup - considering that we want this data to be persistent, we cannot use PVC created using volumeClaimTemplate as shown previously as that would get cleaned up after Task finishes, so instead we need to create the PVC separately and pass it to Task this way:


# https://github.com/MartinHeinz/tekton-kickstarter/tree/master/tasks/pg-dump/tests
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: postgres-backup
spec:
  resources:
    requests:
      storage: 1Gi
  volumeMode: Filesystem
  accessModes:
    - ReadWriteOnce
---
apiVersion: tekton.dev/v1beta1
kind: TaskRun
metadata:
  name: pg-dump
spec:
  taskRef:
    name: pg-dump
    kind: ClusterTask
  params:
    - name: HOST
      value: 'postgres.default.svc.cluster.local'
    - name: DATABASE
      value: postgres
    - name: DEST
      value: 'test.bak'
  workspaces:
    - name: backup
      persistentVolumeClaim:
        claimName: postgres-backup

Here we use persistentVolumeClaim instead of volumeClaimTemplate and we specify name of existing PVC which is also defined in the above snippet. This example also assumes that there's a PostgreSQL database running at specified host - for full code including PostgreSQL deployment checkout files here.

Similarly to injection of environment variables, we can also use workspaces to inject whole ConfigMaps or Secrets (or some keys in them) as a file. This can be useful for example when you want whole .pem certificate from Secret in the Task or as a config file that maps GitHub repository to application name as in following example:


# https://github.com/MartinHeinz/tekton-kickstarter/blob/master/misc/config.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: repo-app-mapping
data:
  repo-app-mapping.yaml: |
    'git@github.com:kelseyhightower/nocode.git': 'nocode'
    'git@github.com:MartinHeinz/blog-frontend.git': 'blog-backend'
    'git@github.com:MartinHeinz/game-server-operator.git': 'game-server-operator'
    'git@github.com:MartinHeinz/python-project-blueprint.git': 'sample-python-app'
---
# https://github.com/MartinHeinz/tekton-kickstarter/blob/master/tasks/get-application-name/get-application-name.yaml
apiVersion: tekton.dev/v1beta1
kind: ClusterTask
metadata:
  name: get-application-name
spec:
  params:
    - name: repository-url
      type: string
    - name: mapping-file
      type: string
  steps:
    - name: get-application-name
      image: mikefarah/yq
      script: |
        #!/usr/bin/env sh
        set -xe
        yq e '."$(params.repository-url)"' /config/$(params.mapping-file) | tr -d '\012\015' > /tekton/results/application-name
  results:
    - name: application-name  # Can be accessed by other Tasks with $(tasks.get-application-name.results.application-name)
  workspaces:
    - name: config
      mountPath: /config

This Task takes repository URL and uses it to look up matching application name in the file which is part of the above shown ConfigMap. This is done using yq utility followed by output to a file in special directory called /tekton/results/.... This directory stores results of Tasks which is a feature (and YAML section) we haven't mentioned yet.

Task results are small pieces of data that Task can output, which then can be used by subsequent Tasks. To use these, one has to specify the name of the result variable in results section and then write something to /tekton/results/result-var-name. Also, as you surely noticed in the above script, we strip newline from the result using tr before writing it into the file, that's because the result should be simple output - ideally just single word - and not a big chunk of text. If you decide to write something longer (with multiple lines) to the result, you might end-up losing part of the value or seeing only empty string if you don't strip the newline character.

To then use Task - such as this one - that uses workspace from ConfigMap, we have to specify the config map in workspaces section in the following way:


# Full example at https://github.com/MartinHeinz/tekton-kickstarter/blob/master/tasks/get-application-name/tests/run.yaml
  workspaces:
    - name: config
      configmap:
        name: repo-app-mapping

Final thing I want to show here is the usage of sidecar containers. These are not so common, but can be useful when you need to run some service (which your Task depends on) for the duration of Task execution. One such service can be Docker daemon sidecar with exposed socket. To demonstrate this we can create a Task that performs Docker image efficiency analysis using tool called Dive:


# https://github.com/MartinHeinz/tekton-kickstarter/blob/master/tasks/dive/dive.yaml
apiVersion: tekton.dev/v1beta1
kind: ClusterTask
metadata:
  name: dive
spec:
  params:
    - name: IMAGE
      description: Name (reference) of the image to analyze.
  steps:
    - image: 'wagoodman/dive:latest'
      name: dive
      env:
        - name: CI
          value: 'true'
      command:
        - dive
        - $(params.IMAGE)
      volumeMounts:
        - mountPath: /var/run/
          name: dind-socket
  sidecars:
    - image: 'docker:18.05-dind'
      name: server
      securityContext:
        privileged: true
      volumeMounts:
        - mountPath: /var/lib/docker
          name: dind-storage
        - mountPath: /var/run/
          name: dind-socket
  volumes:
    - name: dind-storage
      emptyDir: {}
    - name: dind-socket
      emptyDir: {}

As you can see here, the sidecars section is pretty much the same as definition of any container in Pod spec. In addition to the usual stuff we've seen in previous examples, here we also specify volumes which are shared between sidecar and container in Tasks steps - in this case one of them being Docker storage in dind-socket which Dive container attaches to.

This covers most of the features of Tekton Tasks, but one thing I didn't mention so far and you will surely run into when reading Tekton docs is PipelineResource object, which can be used as Tasks input or output - for example GitHub sources as input or Docker image as output. So, why haven't I mentioned it yet? Well, PipelineResource is part of Tekton that I prefer to not use for a couple of reasons:

It's still in alpha unlike all the other resource types we used so far
There are very few PipelineResources. It's mostly just Git, Pull request and Image resources.
It's hard to troubleshoot them.

If you need more reason not to use them (for now), then take a look at docs section here.

In all these examples that we went over, we've seen a lot of YAML sections, options and features which shows how flexible Tekton is, this however makes it's API spec - naturally - very complex. kubectl explain unfortunately doesn't help with exploring API, but API spec is available in docs but is lacking at best. So, in case you have trouble finding what can you put in which part of YAML, then your best bet is to rely on fields listed at the beginning of Tasks doc or on examples here, but make sure you're on right branch for your version of Tekton, otherwise you might spend long hours debugging why your seemingly correct Task cannot be validated by Tekton controller.

Running and Testing

So far, we mostly just talked about Tasks and not much was said about TaskRuns. That's because - in my opinion - individual TaskRuns are best suited for testing and not really for running the Tasks regularly. For that you should use pipelines which will a topic of next article.

Speaking of testing - when we're done implementing the Task, then it's time to run some tests. For straightforward and easy testing, I recommend using the layout mentioned earlier in the article. Using it should help you encapsulate the Task in a way that allows you to test it independently of any resources outside of it's directory.

To then perform the actual test it's enough to apply resources/dependencies in .../tests/resources.yaml (if any) and then apply the actual test(s) inside .../tests/run.yaml. The tests really are just set of TaskRuns that use your custom Task, so for this basic testing approach there's no need for any setup/teardown or additional scripts - just kubectl apply -f resources.yaml and kubectl apply -f run.yaml. Examples of simple tests can be found each Task's directory in tekton-kickstarter or in Tekton Catalog repository.

In general though, for any particular Task in repositories of both of these projects, you can run the following commands to perform the test:


kubectl apply -f tests/resources.yaml  # Deploy dependencies
kubectl apply -f tests/run.yaml        # Start tests
kubectl get tr  # List the TaskRuns (tests)
NAME           SUCCEEDED   REASON      STARTTIME   COMPLETIONTIME
some-taskrun   True        Succeeded   106s        4s

tkn tr logs some-taskrun  # View logs from TaskRun
...

For me personally - when it comes to testing Tasks - it's sufficient to use the above basic testing approach for validation and ad-hoc testing. If you however end-up creating large number of custom Tasks and want to go all out, then you can adopt the approach in Tekton catalog and leverage testing scripts in it's repository. If you decide to go that route and strictly follow the layout and testing, you might also want to try contributing the Tasks to the Tekton Catalog, so that whole community can benefit from more high quality Tasks. 😉

As for the scripts that you'll need for this - you'll need to include test directory as well as Go dependencies (vendor directory) from Tekton Catalog in your code and then follow E2E testing guide in docs here.

Regardless of whether you choose basic or "all-out" approach for your testing though, try to make sure that you test more than just happy paths in the Tasks and Pipelines, otherwise you might end-up seeing a lot of bugs when they get deployed "in the wild".

Best Practices

After you have implemented and tested your custom Tasks, it's a good idea to go back and make sure your Tasks are following the best development practices, which will make them more reusable and maintainable in a long run.

The simplest thing that you can do that will also give the most benefit - is to use yamllint - a linter for YAML files. This tip doesn't apply only to Tekton Tasks, but rather to all YAML files as they can get tricky to get right with all the indentation, but it's especially important with Task definitions can get very long and complex, with many levels of indentation, so keeping them readable and validated can save you some unnecessary debugging as well as help you in keeping them more maintainable. You can find a custom .yamlint config in my repository which I like to use, but you should customize it to suit your code style and formatting. Just make sure you run yamllint from time to time (ideally in CI/CD) to keep things linted and validated.

As for the actual Tekton best practices - I could give you a huge list, but it would be mostly just things recommended by Tekton maintainers, so instead of copy-pasting it here, I will just point you to the relevant resources:

Closing Thoughts

In this article we saw how flexible and versatile Tekton is and with this flexibility also comes complexity of building or testing Tasks. Therefore, it's very much preferable to use existing Tasks created by community, rather then try to reinvent a wheel yourself. If there's however no suitable Task available and you have to build your own, make sure you write tests for your Tasks and follow best practices mentioned above to keep your Tasks maintainable and reliable.

After this article we should have enough experience as well as bunch of individual custom Tasks which we can use to start composing our pipelines. And that's exactly what we're going to do in the next article in these series, where we will explore how to build fully-featured pipelines to build, deploy, test your applications and much more.

Also if you haven't done so yet, make sure you checkout tekton-kickstarter repository where you can find all the Tasks and examples from this article as well as the pipelines you will see in the next one. 😉