Trouble starting an SRE team?

Jul 26, 2023

It’s not you, it’s them!

6 Comments

Jul 27, 2023

It's also related to scale. The Google-style approach to SRE makes the amount of work running a reliable service scale with service complexity, not not service size. It's 10x as many different binaries that hurts, not 10x as many servers. Old school manual System Admin scales with the service size, so it's not so much the 10x as many different binaries that hurts, its the 10x as many servers to run them on.

Companies in their early stages have high service complexity relative to scale, so burning dedicated headcount trying to automate how it's run is more effort than just running it manually. At some point you reach a scale vs complexity tipping point when SRE makes more sense.

Expand full comment

Reply (2)

Jos Visser

Jul 28, 2023

Scale is definitely important and factors into the "is it worth it" equation. As you point out: Automating a manual process that is not that much work to begin with is not worth it. https://xkcd.com/1319/ comes to mind.

@everyone: You should _always_ listen to Donovan. He was one of the people who interviewed me at Google back in 2006 :-)

Expand full comment

Reply (1)

Cody Ray

Aug 5, 2023

nice, I literally used that xkcd yesterday when someone was talking about automating a one-time manual task.

Expand full comment

Zhe Yao

Jul 27, 2023

I probably missed or misinterpreted something...

I read the first graph as "Google style SRE shines when the complexity-to-scale ratio is high".

The second graph mentioned "many startups do have this ratio high".

Doesn't that mean Google style SRE should be used rather than avoided in startups?

Expand full comment

Reply (1)

Donovan Baarda

Jul 28, 2023Edited

You miss-read it... work scaling with complexity means when the complexity doubles or quadriples (ie, 2 as many binaries and 2^2 = 4x as many binary interactions) the work doubles or quadriples, but the amount of work doesn't change if you double the scale. So complexity-to-scale ratio high is bad for SRE. This is why one of the things SRE likes to do is simplify and standardize services; to reduce the complexity.

Expand full comment

Grzegorz Wierzowiecki

Aug 21, 2023

Here's a 3 min audio version of "Trouble starting an SRE team?" from Wednesday Wisdom converted using recast app.

https://app.letsrecast.ai/r/a5b0aae3-bbc9-48d2-9b36-39596f5ae587

Expand full comment