Senior Site Reliability Engineer

Located in San Francisco, United States

December 13th, 2017

Slack Operations is looking for Senior Site Reliability Engineers to improve the reliability of our systems as we rapidly scale our product and organization.

We build our app using reliable tools that our team knows and trusts, including PHP, MySQL, and Linux. Expertise in these areas is a huge plus, but having familiarity with other common web languages (such as Python or Ruby) and other relational databases are fine substitutes. We’re a collaborative team who genuinely enjoys working together to make Slack a better product. We are looking for engineers that understand that simplicity and reliability are aspects of a system to be carefully calculated with every decision made.

This position is on Slack's Reliability Engineering team which is already well-established in San Francisco. In addition to software engineering, this role requires managing a large and varied infrastructure as it grows. The position also includes building out the appropriate tools to make your own job easier.

If you were to join Slack, here are the kinds of things you would do over the course of a typical week:

  • Design and develop new highly-available infrastructure to meet the needs of our growing and evolving product.
  • Whiteboard a fix to a scaling problem -- and then make it happen.
  • Join a development team, on a rotation, to help them to reduce service latency, and increase availability.
  • Talk with our frontend team to debug a performance problem with an API method.
  • Help our skilled support team triage and solve bugs.
  • Participate in the operations on-call rotation, triaging and addressing production issues as they arise.
  • Contribute to internal tools that help us improve our operations processes, manage our infrastructure, and scale our systems.

Here are things that we consider critical to being a Senior Site Reliability Engineer:

  • You have curiosity about how things work.
  • You've been developing and operating web sites professionally and can point to things you’ve worked on.
  • You have experience with functional or imperative programming languages -- e.g., PHP, Python, Ruby, Go, C, or Java (used without frameworks).
  • You've deployed server software on Linux, and then operated it at scale (and debugged it too).
  • You are able to analyze and optimize performance in high-traffic internet applications.
  • You are a strong communicator. Explaining complex technical concepts to designers, support, and other engineers is no problem for you.
  • You enjoy helping onboard new team members, mentoring, and teaching others.
  • You know how the web works. You know HTTP and TCP/IP and what a good API looks like.
  • You possess strong computer science fundamentals: data structures, algorithms, programming languages, distributed systems, and information retrieval.


  • Professional experience in web application engineering, a large portion of which in a team environment.
  • Bachelor's degree in Computer Science, Engineering or related field, or equivalent training, fellowship, or work experience.

Bonus Points:

  • Experience using PHP without a framework.
  • Solid competency with SQL (ideally in a federated database environment; MySQL a plus).
  • Experience using deployment automation/configuration management, especially Chef.
  • Experience with virtualized environments (AWS experience a plus).
  • Prior experience with or knowledge of large scale, high volume systems.
  • Experience in a startup environment.

About Slack

At Slack, we’re building the platform that connects teams with the apps, services, and resources they need to get work done. Launched in 2014, Slack is the fastest growing business application in history.

Apply as Senior Site Reliability EngineerOr see all jobs at Slack