It's also related to scale. The Google-style approach to SRE makes the amount of work running a reliable service scale with service complexity, not not service size. It's 10x as many different binaries that hurts, not 10x as many servers. Old school manual System Admin scales with the service size, so it's not so much the 10x as many different binaries that hurts, its the 10x as many servers to run them on.
Companies in their early stages have high service complexity relative to scale, so burning dedicated headcount trying to automate how it's run is more effort than just running it manually. At some point you reach a scale vs complexity tipping point when SRE makes more sense.
Scale is definitely important and factors into the "is it worth it" equation. As you point out: Automating a manual process that is not that much work to begin with is not worth it. https://xkcd.com/1319/ comes to mind.
@everyone: You should _always_ listen to Donovan. He was one of the people who interviewed me at Google back in 2006 :-)
You miss-read it... work scaling with complexity means when the complexity doubles or quadriples (ie, 2 as many binaries and 2^2 = 4x as many binary interactions) the work doubles or quadriples, but the amount of work doesn't change if you double the scale. So complexity-to-scale ratio high is bad for SRE. This is why one of the things SRE likes to do is simplify and standardize services; to reduce the complexity.
It's also related to scale. The Google-style approach to SRE makes the amount of work running a reliable service scale with service complexity, not not service size. It's 10x as many different binaries that hurts, not 10x as many servers. Old school manual System Admin scales with the service size, so it's not so much the 10x as many different binaries that hurts, its the 10x as many servers to run them on.
Companies in their early stages have high service complexity relative to scale, so burning dedicated headcount trying to automate how it's run is more effort than just running it manually. At some point you reach a scale vs complexity tipping point when SRE makes more sense.
Scale is definitely important and factors into the "is it worth it" equation. As you point out: Automating a manual process that is not that much work to begin with is not worth it. https://xkcd.com/1319/ comes to mind.
@everyone: You should _always_ listen to Donovan. He was one of the people who interviewed me at Google back in 2006 :-)
nice, I literally used that xkcd yesterday when someone was talking about automating a one-time manual task.
I probably missed or misinterpreted something...
I read the first graph as "Google style SRE shines when the complexity-to-scale ratio is high".
The second graph mentioned "many startups do have this ratio high".
Doesn't that mean Google style SRE should be used rather than avoided in startups?
:)
You miss-read it... work scaling with complexity means when the complexity doubles or quadriples (ie, 2 as many binaries and 2^2 = 4x as many binary interactions) the work doubles or quadriples, but the amount of work doesn't change if you double the scale. So complexity-to-scale ratio high is bad for SRE. This is why one of the things SRE likes to do is simplify and standardize services; to reduce the complexity.
Here's a 3 min audio version of "Trouble starting an SRE team?" from Wednesday Wisdom converted using recast app.
https://app.letsrecast.ai/r/a5b0aae3-bbc9-48d2-9b36-39596f5ae587