~ols/docs

b916460dc187cefd6ef92d3ed5be1be9da511c51 — Oliver Leaver-Smith 2 years ago
Initial commit with first talk and readme
2 files changed, 171 insertions(+), 0 deletions(-)

A README.md
A talks/get-paid-to-sleep.md
A  => README.md +1 -0
@@ 1,1 @@
Papers, talks, and speaker notes

A  => talks/get-paid-to-sleep.md +170 -0
@@ 1,170 @@
footer: Oliver Leaver-Smith // {{EVENT}} // {{DATE}}
build-lists: true
slidenumbers: true

![](https://static.independent.co.uk/s3fs-public/thumbnails/image/2016/02/02/13/asleep-at-desk-REX.jpg)
# [fit] Get Paid to Sleep

---

[.hide-footer]
![](https://i.redd.it/9uszrza1tt001.jpg)
# It's not a pyramid scheme

^ This is not a pyramind scheme, which I'm aware is exactly what someone who was selling a pyramind scheme to you would say. I'm going to talk about not only how to get more sleep, but how to do it when you are being paid

---

[.hide-footer]
![](https://nothinginbiology.files.wordpress.com/2015/10/doc.jpg)
# First off, the science bit

---

# Extended work availability and its relation with start-of-day mood and cortisol[^1]

> The results demonstrate that nonwork hours during which employees are required to remain available for work cannot be considered leisure time because employees' control over their activities is constrained and their recovery from work is restricted.

[^1]: https://www.ncbi.nlm.nih.gov/pubmed/26236956

^ In fact, there have been many studies into the effects of being "available for work" outside of normal work hours. This study is $12 grab a copy. It reinforces what everyone who has done an on-call shift knows, it's very hard to switch off.

---

# Me?

* 8 years in *"the industry"*
* 3 of those spent calling people out
* 5 of those spent on call
* As often as one in two
* From several calls per night, to none for weeks

---

[.hide-footer]
![filtered](/Users/ole09/Downloads/34293_410229663873_2887552_n.jpg)
![filtered](/Users/ole09/Downloads/69300_451492223873_5500681_n.jpg)
![filtered](/Users/ole09/Downloads/26889_379291843873_1322566_n.jpg)

# #publife

^ Before that I managed a bar, so I didn't get much sleep there either

---

[.hide-footer]
![filtered](/Users/ole09/Downloads/IMG_9265.JPG)

# #dadlife

---

[.hide-footer]
![filtered](https://www.saga.co.uk/contentlibrary/saga/publishing/verticals/health-and-wellbeing/conditions/doctor-roche/tired-eyes-shutterstock_126608381-1280x960.jpg)

# tired af

---

# Three Simple Things

* ~~Stable~~ ~~Reliable~~ Predictable systems
* Sane alerting and monitoring
* Luck

---

# Predictable Systems

I've done a talk on this previously titled *Engineered Chaos: Breaking Prod and Getting Away With it* as well as a *blog post*[^2] with a lot of further reading recommendations

[^2]: https://engineering.skybettingandgaming.com/2018/05/04/firedrills-in-core/

---

# Predictable Systems

This is a presentation on its own

* *tl;dr* You don't want your systems to go down, but if they do then they need to do so in a sensible way
* Queues, retries, failover pools, etc.

---

# Alerting & monitoring

* Make sure the right people know and respond
* 

---

The right people for the job coredb replication but also site down ask4

---

# Eager

* Middle of the night, the phone rings
* *We've got a critical alert for $coreDB*
* Oh dear, what exactly is the alert saying?
* *It says "MySQL replication lag, do not call out between 01\:00 and 05\:00"*
* It's 3am...
* *Yes but it had been there a while and I thought I'd better check*

---

# Not So Eager

Working for an ISP that mainly deals with student accommodation

* Arrive at the office 8am Monday, a site has been completely down since Sunday lunch
* *"Oh must have missed that"*
* We have a 24 hour SLA with the company, it's been down 19 hours
* And it's in Aberdeen

---

# [fit] I don't blame the service 
# [fit] desk for calling out (or not)

---

[.hide-footer]
![](https://www.sciencemag.org/sites/default/files/styles/article_main_large/public/images/sn-scream.jpg)

# Time for an anecdote

---

# Monolithic 3rd Party App

* Notice that memory levels are clipping *"critical"* levels during peak
* There's quite a bit of head room on the nodes, let's up the threshold
* Still hitting the threshold, let's ask the vendor what's safe
* The vendor pulled the thresholds out of thin air
* We set our own thresholds now :sunglasses:

^ This is about a third party monolithic app that we run. As our services got busier and busier, the memory levels got higher and higher until they were often clipping critical levels. Thresholds increased. Still clipping critical levels but we don't want to go much higher. We ask the vendor how high we can go. They don't know, they plucked the recommended thresholds out of thin air. So we go higher and amazingly no crits yet the box is still stable

---

# Luck

You can have the most predictable systems in the world, and your alerting is tuned to perfection. You still need luck to be on your side though. And sometimes, it just isn't

* It's just my luck that on my first night on call we dropped *10%* of users' broadband connections couldn't get them back online
* It's just my luck I worked for an ISP during the TalkTalk hack, *#RIPhalloween2015*

---

---

# [fit] Questions?

:bird: @muggahtee
:elephant: @ols@mastodon.social
:computer: git.sr.ht/ols
:notebook_with_decorative_cover: ols.wtf
:incoming_envelope: sup@ols.wtf
:key: BD3C73DF33FF729AB4B72C0BE7BF269916503BFB