Site Reliability Engineering is sometimes referred to as the practice of treating operations as if it were a software problem. It involves automating everything to achieve scale and resiliency while minimizing manual human intervention. As organizations grow and mature, systems become more complex and reaction times get slower. Expectations quickly change from "keep us up during peak hours" to "how many nines?" seemingly overnight. By the end of this talk you'll know what an SRE team looks like, what my experience has been like, and how your organization might benefit from a dedicated SRE team.