KubeCon Europe 2025: day 1
2 April 2025This is a fairly non-structured log of my thoughts and notes from attending KubeCon Europe 20251. I was lucky to be sponsored by work, so I made at least some attempt to get to talks and sessions which seemed relevant to that. Unfortunately getting to the ExCeL venue from Lancaster on the Wednesday morning meant that I missed the morning keynotes.
I wasn’t sure what to expect from the event really: it was the same week that I returned from a month of leave (my attendance had been arranged while I was away by a “do you want to go” text), and it’s by far the largest tech conference I’ve been to. So my preparation (going through the schedule and saving all the talks with a vaguely interesting title and/or abstract) was a bit limited, and I didn’t have as long as I’d have liked to consider what I really wanted to get out of it. The rough list of things I was keen to learn more about was:
- GitOps tooling around Flux, including secret management and the best way of managing both a production and staging cluster in a reasonably harmonious way
- The experience of other scientific institutions in using Kubernetes (for example some of the many talks on using GPUs in a cluster)
- Any other useful tools and knowledge I can pick up
- Having a go at the CTF as I’ve never taken part in one before
I was less interested in some of the talks about scaling Kubernetes to huge deployments, as all the stuff we run is fairly small-scale, and didn’t really have much time for the LLM/AI stuff that filled up a huge portion of the schedule. It’s probably going to be pretty unavoidable throughout the event though.
The rest of this post consists of mildly edited thoughts from my handwritten notes taken during the sessions I was in.
Explain How Kubernetes Works With GPU Like I’m 5
Carlos Santana, AWS
The first talk we made it to was an introduction to the various layers involved in running GPUs on Kubernetes. The speaker broke down the layers from device driver, CUDA runtime, container toolkit to node and GPU feature discovery and the device plugin to handle scheduling and access to the compute resources. It was a good introduction to the various things that are needed to set up GPU-enabled workflows, but what caught me the most was a passing comment about how EKS hybrid nodes allowed an AWS-managed EKS cluster to include nodes running remotely (for example in a homelab) over a VPN like Wireguard.
Bringing Agentic AI to Cloud Native - Introducing kagent
Christian Posta, Solo.io
I missed the start of this one as I was wandering around trying to
find the room for the CTF intro, but after giving up on that until the
later session I ended up watching this “sponsored demo” of an LLM
attached to a Kubernetes cluster. The introduction that I missed
presumably explained exactly what I was looking at, but it seemed to be
a web interface for a chatbot that was connected to a Kubernetes cluster
which could explain the state of resources in a namespace, preview and
apply changes (yes, it did have kubectl
and working
credentials), as well as reading and summarising online documentation
for the user. The speaker also made a big deal of its support for MCP,
which apparently stands for the “Model Control Framework” for
“domain-specific extensions”. I have no idea what that is, but I’m sure
if you are into LLMs that is a good thing.
Booths - Clickhouse & Wiz
Over the lunch break, we wandered around some of the booths in the exhibitor area. We had a look at the Clickhouse one, as it seems like it could be a better way to do some of the columnar querying needed in one of our projects than the current solution of a hand-rolled connection pool to a load of Parquet files in S3. They said their server was open-source, so perhaps it’s worth a bit of an experiment, especially as it can ingest Parquet files and query them from a client library.
We also chatted to the people on the Wiz stand, mostly because we’d heard they had recently been acquired for an ungodly amount of money but didn’t really know much more than that. I’m not sure how much need or budget we have for compliance and security scanning, but they didn’t look completely horrified when we said “non-profit” or “scientific research” so perhaps they have pricing options that might be compatible with what we do.
Poster - Enhancing Research and Data Delivery With the Data Delivery System (DDS)
Álvaro Revuelta, SciLifeLab Data Centre & Valentin Georgiev, Uppsala University
This was one of the things I thought would be the closest thing to the work we do in providing scientific data to other researchers. It turned out that it wasn’t really the same thing, more a tool for short-term sharing of datasets with known collaborators who have requested it specifically, rather than publishing ongoing data publicly for anyone to access.
An Introduction to Capture The Flag
Andy Martin & Kevin Ward, ControlPlane
Having successfully found the right room, there was a short introduction from the team running the CTF at KubeCon this year, in which they provided an overview of how it worked, one hint2 and then some background music for a roomful of people trying to break into an imaginary Kubernetes cluster in a scenario used at a previous conference. I initially felt like I wasn’t making much headway, and was aware of the time ticking away until the next talk I wanted to go to, before I suddenly found the first flag just before I had to pack up and go back downstairs. I guess that’s how these things often go. I learned a lot more about Hashicorp Vault than I was expecting to and I look forward to having a bash at the “real thing” tomorrow.
The Life (or Death) of a Kubernetes Request, 2025 Edition
Abu Kashem, Red Hat Inc. & Stefan Schimanski, Upbound
This talk was framed as the answer to a hypothetical interview
question of “what happens when you create a new resource with
kubectl apply -f job.yaml
?”. It gave a good tour of what
happens inside the request handler in the apiserver, mostly covering the
various validations, timeouts and audit logs that are added, as well as
what “creating a new resource” actually entails in the registry and
etcd. There were a lot of details that I’m unlikely to remember, but
it’s almost certainly useful to have a sense of what’s going on in
there, as well as some trivia like the differences between kinds and
resources and what is going on with different apiVersions.
Flux Ecosystem Evolution
Stefan Prodan, ControlPlane & Sanskar Jaiswal, Kong
Again I missed the start of this talk having accidentally walked to the wrong end of the conference centre to find the room. Luckily, I don’t think I missed too much and was able to figure out that Flagger is a system for doing canary rollouts that we are unlikely to ever use at our scale. It’s not something I’ve ever looked into in detail, and while I’m sure it’s obvious to people who do do this sort of thing, the idea of progressively rolling a new version out automatically as long as the metrics look good is not something I’d considered before.
The main thing I was interested in from this talk was Flux, something we definitely do use. There were a lot of exciting-sounding new features discussed, mostly enabled by the Flux Operator. Ephemeral environments for PRs/MRs is something I’ve thought about before for when we are reviewing changes, and it seems like these should be fairly straightforward to set up with the operator, as well as making Flux component upgrades a lot easier than re-running the bootstrap to update the component manifests in the git repository. Even the presenter said that it was scary and easy to blow up your own cluster before!
The Ultimate Container Challenge: An Interactive Trivia Game on OCI, Podman, Docker…
Aurélie Vache, OVHCloud & Sherine Khoury, Red Hat
It was definitely by now time for the “fun” talks, starting with this interactive quiz about Docker and OCI containers. For a moment before the questions got too hard I made it onto the top 5 leaderboard, but then we got onto the things I’d actually come along to learn about. It was a good format to have the audience answer a question, then give the answer and a live demo to explain it even more. I was also very impressed with whatever technology they were using to handle typing the demo commands into the terminal, as it clearly was actually doing the work live but also seemed to grab the hashes out of the command output for use in later commands.
Museum of Weird Bugs: Our Favorites From 8 Years of Service Mesh Debugging
Alex Leong, Buoyant
This one was a bit more of a punt, as I didn’t really know what a Service Mesh was or how you’d debug one, but I always enjoy hearing war stories about this sort of thing. The morals of the two bugs presented are almost just about relevant to some of the things I do, and are probably general enough to think about: make sure you aren’t calling blocking functions in places that blocking would lead to deadlock/client service denial, and be careful with different versions of CRDs. I’d also not heard of HTTP2 flow control before, which is something good to be aware of before I encounter some weird bug caused by it in the future.
Clash Loop Back Off
The final session of the day was a fun game-show type system, which challenged two Kubernetes experts to solve a problem (from a shortlist of 3 where they didn’t know which would be picked) competitively live on stage. They had to provision a cluster, install a stateful workflow, back it up, delete it and then restore from the backup, all within 25 minutes and while being entertaining on stage. I had grabbed some dinner to eat while it was going on, and I really should have cashed in my first beer token as it was very light-hearted. A fun way to finish off the day.
That was my first day at KubeCon. I feel like I made it through more talks than I was expecting: I often find that regardless of how interesting the content of the talk is, unless the speaker is extremely engaging (by which I really just mean upbeat and hyperactive) I often find it hard not to drift off while sitting and listening. Pehaps having a notebook and pen to hand, even if I’m not compulsively taking full notes, wards off that sort of drowsiness, or maybe I just had enough coffee to keep me going.
Tomorrow we’ll be there in good time to see all the morning keynotes. The container quiz I saw towards the end was in the main auditorium, which was frankly outrageously large. If it’s close to being full then that will be far too many people in one place. Now, time to press publish and get some sleep…