Getting started using cert-manager with the sig-network Gateway API

Written by

Written by Maël Valais


			Getting started using cert-manager with the sig-network Gateway API

Published on our Cloud Native Blog.
Tagged with

The Ingress API is a good example of the API standardization that Kubernetes offers. Many cloud-native components, such as ExternalDNS, Traefik and cert-manager, integrate with the Ingress API, leading to a consistent experience.

Over time, the limitations of the Ingress API have led to the creation of various ad-hoc CRDs that aim at offering a better abstraction. Istio’s VirtualService CRD is one such example, and each proxy or service mesh creates its own.

This represents a challenge for tools such as cert-manager and ExternalDNS, that may not be able to support all of these different CRDs. That is where sig-network Gateway API comes in. The Gateway API project is managed by the sig-network community and aims to provide a universal API for modelling service networking in Kubernetes.

Gateway diagram

While the Gateway API is still in its alpha stages, it is already gaining wide adoption. Today, you can use the gateway API with Ambassador, Contour, Gloo, HAProxy, Istio, Kong and Traefik.

cert-manager in v1.5 added experimental support for the Gateway API. cert-manager can solve HTTP-01 challenges using HTTPRoute resources, along with the possibility to have Certificate resources automatically created by using annotations on Gateway resources. The documentation for using the Gateway API with cert-manager is available on the page Securing Gateway Resources.

This blog post will walk you through the process of using cert-manager with ExternalDNS and Traefik using the Gateway API.

Contents:

Getting started with Traefik

Prerequisites

In order to get ACME certificates using one of the Implementations of Gateway API, you will need a cluster with:

  1. A Service type=LoadBalancer controller installed (if you are running on GKE, then you are good to go).
  2. A DNS zone in some DNS provider that ExternalDNS supports (e.g., CloudDNS or Scaleway). You can also use the IP addresses directly, but this guide will focus on host names instead.

In the remainder of this guide, we will be using:

  • a pre-existing GKE cluster,
  • a pre-existing CloudDNS zone (domain mael-valais.example.com.).

Before installing the Gateway API implementation, we will need to install ExternalDNS.

Note that as of August 6, 2021, ExternalDNS does not support the Gateway API natively. Andy Bursavich is working on implementing it and you can follow the development on the issue external-dns#2045.

Let us set a couple of variables:

# The following project must contain the CloudDNS zone you will be using.
PROJECT=my-gcp-project

# This is the `DNS_NAME` that you can see when running
# `gcloud dns managed-zones list`:
DOMAIN=mael-valais.example.com

ExternalDNS needs a service account key to let ExternalDNS configure DNS records on the zone:

gcloud iam service-accounts create external-dns --display-name "For ExternalDNS" --project "$PROJECT"
gcloud projects add-iam-policy-binding "$PROJECT" --role=roles/dns.admin \
  --member="serviceAccount:external-dns@$PROJECT.iam.gserviceaccount.com"
kubectl -n kube-system apply -f- >/dev/null <<EOF
apiVersion: v1
kind: Namespace
metadata:
  name: external-dns
---
apiVersion: v1
kind: Secret
metadata:
  name: jsonkey
stringData:
  jsonkey: |
    $(gcloud iam service-accounts keys create /dev/stdout --iam-account "external-dns@$PROJECT.iam.gserviceaccount.com" --project "$PROJECT" | jq -c)
EOF

Finally, we can install ExternalDNS:

helm repo add bitnami https://charts.bitnami.com/bitnami >/dev/null
helm upgrade --install external-dns bitnami/external-dns --namespace external-dns --create-namespace \
    --set provider=google --set google.project="$PROJECT" --set google.serviceAccountSecret=jsonkey --set google.serviceAccountSecretKey=jsonkey \
    --set sources='{ingress,service}' >/dev/null

Installation

Before starting, make sure to follow the instructions in Prerequisites.

The support for the Gateway API was introduced with Traefik 2.4.8. Let us install Traefik:

helm repo add traefik --force-update https://helm.traefik.io/traefik
helm upgrade --install traefik traefik/traefik --namespace traefik --create-namespace \
    --set additionalArguments='{--providers.kubernetesingress,--providers.kubernetesingress.ingressendpoint.publishedservice=traefik/traefik,--experimental.kubernetesgateway=true,--providers.kubernetesgateway=true}' \
    --set ssl.enforced=true --set dashboard.ingressRoute=true

Let us configure the Service type=LoadBalancer created by Traefik with a DNS name:

kubectl annotate svc -n traefik traefik --overwrite "external-dns.alpha.kubernetes.io/hostname=traefik.$DOMAIN"

After some time, you should see:

# Check 1: the EXTERNAL-IP should have been set.
$ kubectl get svc -n traefik traefik
NAME      TYPE           CLUSTER-IP      EXTERNAL-IP    PORT(S)
traefik   LoadBalancer   10.43.192.130   34.78.153.43   80:30137/TCP,443:30804/TCP

# Check 2: after a few minutes, you should be able to query the domain.
$ nslookup traefik.$DOMAIN

As detailed on the page Traefik & Kubernetes, Traefik needs some extra RBAC rules:

kubectl apply -f- <<EOF
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: gateway-role
rules:
  - apiGroups:
      - ""
    resources:
      - services
      - endpoints
      - secrets
    verbs:
      - get
      - list
      - watch
  - apiGroups:
      - networking.x-k8s.io
    resources:
      - gatewayclasses
      - gateways
      - httproutes
    verbs:
      - get
      - list
      - watch
  - apiGroups:
      - networking.x-k8s.io
    resources:
      - gatewayclasses/status
      - gateways/status
      - httproutes/status
    verbs:
      - update
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: gateway-controller
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: gateway-role
subjects:
  - kind: ServiceAccount
    name: traefik
    namespace: traefik
EOF

Let us install the Gateway API CRDs:

kubectl kustomize "github.com/kubernetes-sigs/gateway-api/config/crd?ref=v0.3.0" | kubectl apply -f -

The next step is to install cert-manager. The Gateway API is supported since v1.5.0:

helm repo add jetstack https://charts.jetstack.io
helm upgrade --install cert-manager jetstack/cert-manager --namespace cert-manager --create-namespace \
  --set installCRDs=true --set "extraArgs={--controllers=*\,gateway-shim}" --version v1.5.0

Now, we can create an ACME Issuer and two Gateways: one for solving HTTP-01 challenges and one for listening on 443 (cf. below)

kubectl apply -f- <<EOF
apiVersion: cert-manager.io/v1
kind: Issuer
metadata:
  name: letsencrypt
spec:
  acme:
    email: your-email@example.com
    server: https://acme-v02.api.letsencrypt.org/directory
    privateKeySecretRef:
      name: letsencrypt
    solvers:
      - http01:
          gatewayHTTPRoute:
            labels:
              gateway: http01-solver-traefik
---
apiVersion: networking.x-k8s.io/v1alpha1
kind: GatewayClass
metadata:
  name: traefik
spec:
  controller: traefik.io/gateway-controller
---
apiVersion: networking.x-k8s.io/v1alpha1
kind: Gateway
metadata:
  name: http01-solver
spec:
  gatewayClassName: traefik
  listeners:
  - protocol: HTTP
    port: 8000
    routes:
      kind: HTTPRoute
      selector:
        matchLabels:
          gateway: http01-solver-traefik
---
apiVersion: networking.x-k8s.io/v1alpha1
kind: Gateway
metadata:
  name: traefik
  annotations:
    cert-manager.io/issuer: letsencrypt
spec:
  gatewayClassName: traefik
  listeners:
  - hostname: traefik.mael-valais.example.com
    protocol: HTTPS
    port: 8443
    routes:
      kind: HTTPRoute
      selector:
        matchLabels:
          gateway: traefik
    tls:
      mode: Terminate
      certificateRef:
        name: traefik-tls
        kind: Secret
        group: core
EOF

And finally, let us create a Deployment to test that out:

kubectl create deployment echoserver --image k8s.gcr.io/echoserver:1.3 --dry-run=client -oyaml | kubectl apply -f-
kubectl expose deployment echoserver --port=8080 --dry-run=client -oyaml | kubectl apply -f-
kubectl apply -f- <<EOF
apiVersion: networking.x-k8s.io/v1alpha1
kind: HTTPRoute
metadata:
  labels:
    gateway: traefik
  name: echoserver
spec:
  hostnames:
  - traefik.$DOMAIN
  rules:
  - forwardTo:
    - serviceName: echoserver
      port: 8080
EOF

You should be able to access the service with the following command:

$ curl https://traefik.mael-valais.example.com
CLIENT VALUES:
client_address=10.42.0.34
command=GET
real path=/
query=nil
request_version=1.1
request_uri=http://traefik.mael-valais.example.com:8080/

SERVER VALUES:
server_version=nginx: 1.9.11 - lua: 10001

HEADERS RECEIVED:
accept=*/*
accept-encoding=gzip
host=traefik.mael-valais.example.com
user-agent=curl/7.78.0
x-forwarded-for=10.42.0.36
x-forwarded-host=traefik.mael-valais.example.com
x-forwarded-port=443
x-forwarded-proto=https
x-forwarded-server=traefik-86bf649487-jdvxm
x-real-ip=10.42.0.36
BODY:
-no body in request-%

Known bugs and limitations in Gateway API implementations

While testing this feature with various Gateway API implementations we found some issues that will cause problems in some circumstances.

[HAProxy Ingress] certificateRef.group cannot be set to core

As of haproxy-ingress v0.13.0-beta.2, haproxy-ingress expects certificateRef.group to be empty, but the Gateway API v1alpha1 CRD requires a non-empty group (issue haproxy-ingress#830).

This issue has been fixed for v1alpha2 in PR 562, but will not be back-ported to v1alpha1.

Until PR 833 is merged, the only workaround I know about is to manually disable the non-empty requirement from the CRD:

kubectl get crd gateways.networking.x-k8s.io -oyaml | grep -v -- '- group' | kubectl apply -f-

[Traefik] HTTPRoute and Gateway must be on the same namespace

As of Traefik 2.4.9, Traefik only watches for HTTPRoutes that are on the same namespace as the Gateway and does not honor from: All (issue traefik#8246).

For example, the following won’t work:

apiVersion: networking.x-k8s.io/v1alpha1
kind: Gateway
metadata:
  name: traefik
  namespace: traefik # 🔥
  annotations:
    cert-manager.io/issuer: letsencrypt
spec:
  gatewayClassName: traefik
  listeners:
    - protocol: HTTP
      port: 8000
      routes:
        kind: HTTPRoute
        selector:
          matchLabels:
            gateway: http01-solver-traefik
        namespaces:
          from: All

At this point, you would expect to be able to create a Certificate in any namespace with an ACME Issuer. Let’s imagine that you have an ACME Issuer and Certificate in the namespace default:

apiVersion: cert-manager.io/v1
kind: Issuer
metadata:
  name: letsencrypt
  namespace: default
spec:
  acme:
    solvers:
      - http01:
          gatewayHTTPRoute:
            labels:
              gateway: http01-solver-traefik
---
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  name: example-tls
  namespace: default
spec:
  issuerRef:
    name: letsencrypt
  dnsNames:
    - example.com

cert-manager will create, as expected, an HTTPRoute in the namespace default:

kind: HTTPRoute
metadata:
  name: cm-acme-http-solver-gdhvg
  namespace: default
  labels:
    gateway: http01-solver-traefik
spec:
  gateways:
    allow: All
  hostnames:
    - example.com
  rules:
    - forwardTo:
        - port: 8089
          serviceName: cm-acme-http-solver-gdhvg
          weight: 1
      matches:
        - path:
            type: Exact
            value: /.well-known/acme-challenge/YadC4gaAzqEPU1Yea0D2MrzvNRWiBCtUizCtpiRQZqI

But Traefik won’t do anything with it:

time="2021-08-05T16:28:32Z" level=error msg="an error occurred while creating gateway status: 1 error occurred:
Cannot fetch HTTPRoutes for namespace "traefik" and matchLabels map[gateway:http01-solver-traefik]
gateway=http01-solver-traefik namespace=traefik

[Traefik] One faulty listener breaks the entire Gateway

Traefik requires all listeners to be valid before configuring itself, which means we can’t use a single Gateway for both the HTTP-01 challenges on port 80 and 443 configured with a certificate that is meant to be created using the HTTP-01 challenge.

For example:

apiVersion: networking.x-k8s.io/v1alpha1
kind: Gateway
metadata:
  name: traefik
  annotations:
    cert-manager.io/issuer: letsencrypt
spec:
  gatewayClassName: traefik
  listeners:
    # ✅ This listener is valid as per Traefik.
    - protocol: HTTP
      port: 8000
      routes:
        kind: HTTPRoute
        selector:
          matchLabels:
            gateway: http01-solver-traefik
    # ❌ This listener is invalid as per Traefik since the Secret can't be found.
    - hostname: traefik.mael-valais.example.com
      protocol: HTTPS
      port: 8443
      routes:
        kind: HTTPRoute
        selector:
          matchLabels:
            gateway: traefik
      tls:
        mode: Terminate
        certificateRef:
          name: traefik-tls
          kind: Secret
          group: core

Even thought the first listener is valid, none of them will be configured in Traefik; Traefik shows the following error:

An error occurred while creating gateway status: 1 error occurred: Error while retrieving certificate:
secret default/traefik-tls does not exist

Imagining that the Secret traefik-tls existed, cert-manager would remove the temporary HTTPRoute that it created, which means the first listener (on port 80) would start erroring, preventing the second listener (on port 443) from being configured:

time="2021-08-05T16:28:32Z" level=error msg="an error occurred while creating gateway status: 1 error occurred:
Cannot fetch HTTPRoutes for namespace "default" and matchLabels map[gateway:http01-solver]

The only workaround is to create a separate Gateway to prevent one listener from crashing the other listeners:

apiVersion: networking.x-k8s.io/v1alpha1
kind: Gateway
metadata:
  name: http01-solver
  annotations:
    cert-manager.io/issuer: letsencrypt
spec:
  gatewayClassName: traefik
  listeners:
    - protocol: HTTP
      port: 8000
      routes:
        kind: HTTPRoute
        selector:
          matchLabels:
            gateway: http01-solver-traefik
---
apiVersion: networking.x-k8s.io/v1alpha1
kind: Gateway
metadata:
  name: traefik
  annotations:
    cert-manager.io/issuer: letsencrypt
spec:
  gatewayClassName: traefik
  listeners:
    - hostname: traefik.mael-valais.example.com
      protocol: HTTPS
      port: 8443
      routes:
        kind: HTTPRoute
        selector:
          matchLabels:
            gateway: traefik
      tls:
        mode: Terminate
        certificateRef:
          name: traefik-tls
          kind: Secret
          group: core

[ExternalDNS] hostnames in Gateway and HTTPRoutes are not supported

One great feature of ExternalDNS is that it picks up the hostnames on your Ingress resources and creates A records for them. Unfortunately, ExternalDNS does not support the Gateway API.

A conversation is currently ongoing in external-dns#2045 to add support for the Gateway API.

A work around used in this guide was to set the external-dns.alpha.kubernetes.io/hostname annotation:

kubectl annotate svc -n traefik traefik \
    "external-dns.alpha.kubernetes.io/hostname=traefik.domain"

Closing remarks

The Gateway API, although still in early stage, brings a set of features that better abstract how gateway proxies are operated nowadays. Like the Ingress API, we can foresee that it will become a de facto standard, with many cloud-native components supporting it.

With this experimental support in cert-manager we are helping the project to grow, and finding issues that can be fixed to improve the experience for users.

As demonstrated here you can try out the Gateway API today, provide feedback to the project about whether it works well for you, and help guide it to production readiness.

Get started with Jetstack

Enquire about Subscription

Contact us