I joined Jetstack in May 2018 as a Solutions Engineer and since then I have helped customers with their Kubernetes journey. At Kubecon in Copenhagen we launched our Jetstack Kubernetes Subscription and together with that we also created our CRE role.
CRE stands for Customer Reliability Engineer, a role conceived by Google with the mission of reducing customer anxiety by sharing operational responsibilities and generally being closer to your customers. Just like Google, we care a lot about the reliability of our customers' Kubernetes platforms and we want to build a relationship wherein we can help them on an ongoing basis with their production workloads. We take a proactive approach, identifying issues ahead of time and working together on solutions to minimise toil, instead of the normal reactive support model where we sit and wait for things to break and questions to come. With this blog post we hope to take away some of the bias there is with support and help readers better understand this role.
I have a lot of prior experience working on long running technical engagements with customers, so I decided to take on the role as CRE at Jetstack. It’s a worthwhile challenge and it allows me to shape the role, create the necessary processes and help our first subscription customers.
A day in my life
My typical morning starts by working from home and checking in with my colleagues over Slack. My daily stand-up consists of the issues I completed yesterday and what I am going to do that day. Checking our open support tickets for updates is another big part of my morning and with no new tickets today, it’s already a good start to the day!
The day progresses with further duties. For a client we decided to create a small proof-of-concept for Prometheus to demonstrate its capability. I work on this for an hour before I leave the warmth of my apartment to take the tube to one of our clients.
Once on-site, we have our monthly meeting to talk about their Kubernetes usage. They bring me up to speed on what has happened since our last meeting, we talk about CI/CD, network routing and new projects they are taking on. They have an interest in Istio for securing all their internal traffic and to be able to do canary deploys. We decide to tackle the transition together. We will scope out the technical requirement and create a joint project out of it.
After a good meeting with the customer, I decide to go to the office to grab lunch together with my colleagues and I start my afternoon by finishing the notes of my morning meeting. I continue working on the Prometheus proof-of-concept and I am making good progress on it. We will be ready to demo to the customer later in the week. I get paged once in the afternoon for a new incoming support ticket. The authentication system of Kubernetes is causing some problems. Luckily we are able to reproduce the issue and fix it quickly. I reply to the customer and they are able to fix it in their cluster based on the answer.
The problem we found is an ideal case to add to the playbooks that we share with our subscription customers. Playbooks document the operational actions to follow to debug and fix common problems in Kubernetes and have proven to be lifesavers and minimised cluster MTTR. With our portal Flightdeck, customers can even practice these scenarios in a simulated, risk-free environment so they can be drilled on how to respond to incidents should they occur.
At the end of the day we decide to head to the pub together with my colleagues for some drinks. As you can see the life of a CRE has a lot of different challenging aspects. You get to talk with a lot of clients about their Kubernetes challenges, work closely with your colleagues and solve interesting technical problems.
And then there were more
We are always looking for new CREs, so if you’re interested in the role and working with our team of Kubernetes and cloud native experts, then head over to our jobs page and let us know more about you.