Software Engineer, Site Reliability Engineering
Mountain View, California
Site Reliability Engineering (SRE) is a discipline that combines software and systems engineering to build and run large-scale, massively distributed, fault-tolerant systems. SRE ensures that Chronicle's services—both our internally critical and our externally-visible systems—have reliability and uptime appropriate to users' needs and a fast rate of improvement while keeping an ever-watchful eye on capacity and performance.
SRE is also a mindset and a set of engineering approaches to running better production systems—we build our own creative engineering solutions to operations problems. Much of our software development focuses on optimizing existing systems, building infrastructure and eliminating work through automation. As SREs are responsible for the big picture of how our systems relate to each other, we use a breadth of tools and approaches to solve a broad spectrum of problems. Practices such as limiting time spent on operational work, blameless postmortems and proactive identification of potential outages factor into iterative improvement that is key to both product quality and interesting and dynamic day-to-day work.
Born from X, Alphabet's moonshot factory, Chronicle is advancing cybersecurity for enterprises of all sizes. We are dedicated to helping companies find and stop cyber attacks before they cause harm. We work with the entire security industry to give good the advantage in the fight against cybercrime. Joining experts in large-scale cloud computing, big data, machine learning, and cybersecurity, you'll help build out the next generation of security intelligence solutions.
- Design, write and deliver software to measure, monitor and improve the availability, latency and efficiency of planet-scale distributed services.
- Analyze and participate in periodic on-call duties to prevent, solve and automate the response to problems in mission critical services and automated deployments.
- Engage in service capacity-planning and demand-forecasting, software performance analysis and system tuning.
- Scale and evolve systems sustainably, by pushing for changes that improve reliability and velocity.
- Champion practices such as sustainable incident response and blameless postmortems.
- BA/BS degree in Computer Science or a related technical field, or equivalent practical experience with: algorithms, data structures, complexity analysis, software design.
- 4 years of coding experience in one or more of the following programming languages: Python, Go, Java, C++.
- Experience with software performance analysis and tuning.
- Experience with Unix/Linux operating systems, tools and shell scripting.
- Experience with SQL.
- Experience with one or more of the following: virtualization, containers, cloud storage and cloud computing.
- Expertise in designing, analyzing and troubleshooting large-scale distributed systems.
- Systematic problem-solving approach, coupled with effective communication skills and a sense of ownership and drive.
- Knowledge of networking protocols from ICMP and TCP/IP to HTTP, and load balancing.