@@ 0,0 1,287 @@
footer: Oliver Leaver-Smith // SBG TechEdge // 2019-08-22 // ols.wtf // @heyitsols
# [fit] You Did What?
^ My name is ols, and I work as a Senior Devops Engineer in the Core Tribe at Sky Betting and Gaming.
![filtered](/Users/ole09/Desktop/Screenshot\ 2019-08-06\ at\ 13.21.20.png)
![filtered](/Users/ole09/Desktop/Screenshot\ 2019-08-06\ at\ 13.20.20.png)
^In Core, we look after many business critical applications such as customer onboarding, login, deposits, withdrawals, and safer gambling tools. I'm going to tell you the tale of a band of brave knights who bodly went where many sensibly did not tread, and learned why choosing the wrong tool for the job can sometimes be the right thing to do.
^ Once upon a time, in the kingdom of SBG, lived a small tribe called Core. Squad after squad after squad of engineers all working on their applications, all deploying their apps in mostly the same way. With help from a powerful wizard called Jenkins, and for the purposes of continuing a metaphor the chef called Ruby. The estate was Virtual Machines as far as the eye could see. Here is how we deployed a change.
# Introducing `app`
It doesn't do anything, but it's a nice example of how all hangs together in theory
# Merge to release
# Build a tag
^ Pretty self explanatory. A pull request from feature/bugfix to release, This gets pushed after review. There is then a (sometimes) automated build of the app, which spits out a tag at the end.
# What's on the box?
# ls -l /local/app/code/
lrwxrwxrwx. 1 deploy deploy 25 Jul 15 09:31 current -> /local/app/code/v3.0.6
lrwxrwxrwx. 1 deploy deploy 25 Jul 15 09:31 previous -> /local/app/code/v3.0.5
lrwxrwxrwx. 1 deploy deploy 6 Jul 15 09:29 revision-to-deploy -> v3.0.6
drwxr-xr-x. 6 deploy deploy 4096 Jul 2 10:45 v3.0.4
drwxr-xr-x. 6 deploy deploy 4096 Jul 12 14:04 v3.0.5
drwxr-xr-x. 6 deploy deploy 4096 Jul 15 09:30 v3.0.6
^ On each server in the `/local/app/code` directory. Each directory is the application code that can be run. Symlink `revision-to-deploy` to tag. Run Chef to pull the tag from git. Symlink `current` to the new tag and `previous` to the previous tag
# 12 factors, what?
# ls /local/app/etc/
# ls /local/app/logs/
app.log app.nginx.access.log app.nginx.error.log
^ Config is stored in the `/local/app/etc` directory. Also chef controlled
This oft-used naming convention has made it out of the deployment pipeline and can be found everywhere.
From `ngctl` for managing Nagios downtime and acknolwedgements, to `fdctl` for managing Slack integrations for running Chaos Engineering Drills.
^ Restart the apps using a custom `ctl` command. This will stop whatever is running and then start `current`
^ It was a little rough around the edge but it worked well
^ Then one day King CTO and his court of Heads of Tech decreed that henceforth we would be working to a cloud native, and container first approach
^ Now the good people of Core weren't used to this approach, this way of working, and so they were scared. How could they symlink a local file on disk if there is no disk? How can they log on to a box to read logs if there is no box to log on to? Sidenote, we have a centralised logging solution, but it's nice to be able to see the logs for just one server on the console like our ancestors did before us.
^ We'd heard rumours of a magical Kubernetes cluster being built by a crack team north of the Core Tribe, but no one was sure whether it was ready to handle the amount of traffic we wanted to throw at it
# 1000s of requests per second
^ For context, just on our login stack alone we scale to, and often hit, 1000s of requests per second
^ We needed a way for developers to write their applications with a cloud-native approach, without having to then unpick all that work to get the application live. And we were under no illusions that all this hard work would be turned off when we had the opportunity to move to Kubernetes
# What do we need?
* Handle the release and control of _n_ containers
* Handle `stdout` logs
* Handle application metrics
* Handle TLS-termination
* Handle load balancing
^ If we want to make it as seamless as possible for developers to onboard to (and move away from) this platform, we need to make it as simple as possible. That means allowing them to just write their logs to `stdout` and we handle that. It means allowing them to create a metrics endpoint that we take responsibility for polling and passing to a time series database. They should just need to do the pretty dashboards.
# `docker` of course
Not swarm, mesos, or anything like that
We didn't need anything fancy
# `stdout` onwards
We used `fluentd`
Integrated with existing Elastic stack
^ We threw the logs onto a Kafka topic which was then ingested by our Elastic stack and displayed in Kibana. This was a sidecar docker container
# Application metrics
We went through a few iterations of this requirement
`stasty` and `cAdvisor` were used, we even considered running an additional `collectd` container to gather metrics, alongside the collectd daemon that runs on our hosts currently
Eventually settled on `moby`
^ This was also a sidecar docker container
# TLS and load balancing
We went with `traefik`
It's cloudy and containery, perfect for this task
^ The app shouldn't have to worry about TLS, and it certainly shouldn't be responsible for balancing load. This was _also_ a sidecar container
# Frankenstein's monster
What had we created?
^ This collection of containers runs on virtual machines, similar to how our existing applications do. Because it's what we know well, it is managed by Chef.
# More importantly
What do we call it?
^ It started out as the tactical container platform, but that is boring. A few names were banded around, including the terms "artisanal container orchestration" and "if Kelham Island did docker". But given we in Core were using this as a stepping stone to Kubernetes, we settled on...
# How do we deploy a container?
# ls -l /local/code/
lrwxrwxrwx. 1 deploy deploy 25 Jul 15 09:31 current -> /local/code/2019-07-15-r5
lrwxrwxrwx. 1 deploy deploy 25 Jul 15 09:31 previous -> /local/code/2019-07-12-r2
lrwxrwxrwx. 1 deploy deploy 6 Jul 15 09:29 revision-to-deploy -> 2019-07-15-r5
drwxr-xr-x. 6 deploy deploy 4096 Jul 2 10:45 2019-07-02-r14
drwxr-xr-x. 6 deploy deploy 4096 Jul 12 14:04 2019-07-12-r2
drwxr-xr-x. 6 deploy deploy 4096 Jul 15 09:30 2019-07-15-r5
^ The same merge/build job happens, except this time the "tag" is actually a tagged container that has been built and pushed to a central repository. On our VMs, you see we've changed from /local/app/code to just /local/code, and the symlinks still exist. They aren't directories of code anymore though, just a reference to a tag of the containerised application. Chef pulls down the container.
The contents of `/local/app/etc/` is now `/opt/app.env`
^ Using an environment file which gets pulled in to the container when it's built, rather than a config directory. Chef populates this file.
# Dare I ask about the ctl script...?
`dockerctl` of course! This runs different docker commands based on the control command it receives (stop, start, restart, status, etc.)
# The big win here?
Nothing. Has. Changed
^ Because we were using the same deploy/current/previous symlink concept, and using a ctl command that matched the behaviour of other apps, we could easily integrate this into existing deployment workflows without much additional work.
# What did this give us?
* Developers are thinking cloud-first
* containers vs. apps
* environment variables vs. config files
* `stdout` vs. log files
* Ops are thinking cloud-first
* Alternative log aggregation platforms
* New and interesting ways of monitoring application performance
# The proof of the pudding
## New login service
* Built on our tactical docker platform
* Running in production
* We're so close to running public traffic through Kubernetes, I can smell it
* With *minimal* resource needed from the squad
^ The team managing the shared Kubernetes platform is ready for us to migrate our applications over. The developers in that squad have had to do *zero* work to support this. Minimal resource are things like "can you add an elastic-index field to the application logs please?"
^ So in conclusion. Is it demoralising that we build a half-baked wacky container orchestration platform that's just going to get decommissioned and forogtten? Kinda, but we always knew that was going to be the case. We should be, and are, proud that we have engineered a culture shift that is changing the way that people in the tribe architect and create their applications.
# [fit] Questions? [^1]
Slides/notes at https://ols.wtf/talks/you-did-what
[^1]: Not including statements, boasting, or requests to fix your problem