Overview
Do you get excited by taking responsibility for mission-critical infrastructure and making sure it never, ever goes down? Do you scoff at anything less than tens of millions of rows in your database? When you’re faced with a problem, is “automate it out of existence” your first response? Can you solve transaction deadlocks in your sleep? Are you pretty sure your site runs better than just about anything else out there?
If so, you might just be what we’re looking for at Couchsurfing. Our site is built by a crack team of amazing engineers that add new functionality to it on a daily basis, and we’re looking for someone with the same passion for site operations, reliability, and performance as some of our JavaScript whizzes have for UI. You’ll have full reign over every system, from databases and memcache instances up through the entire application stack to the UI itself, with the charter to make Couchsurfing ever more scalable, reliable, robust, and better performing than ever.
Responsibilities
We believe that in order to make the biggest impact, our operations engineers need to be able to work on any part of the system that affects their job. As a result, the entirety of Couchsurfing will be your domain; as a member of an amazing team of operations engineers, and supporting a world-class team of programmers, you’ll be able to help determine infrastructure, architecture, storage and processing of every component of the site. We’ll ask you for input on problems large and small, listen to your earned expertise, and give you full authority to go solve problems to the best of your ability.
Specific things you just might get to attack:
Scaling our site to thousands of Web requests per second;
Expanding our database storage — via MySQL and/or new tools you may come up with — to handle the huge increase in volume required;
Figuring out how to get email into the hands of millions of members, when they want it, while avoiding spam traps and other pitfalls;
Keeping our servers responding to requests in fractions of a second, optimizing throughput and latency at the same time;
Helping our engineers keep our site reliable and robust for our millions of users
Experience
We’re looking for folks who have at least a few years’ experience running scalable, large real-world websites — exactly what kind is less important than that it worked, all the time. We’re looking for amazing operations engineers, so whether you’re used to MySQL, PostgreSQL, Oracle or the like is less important to us than whether you know your stuff cold. We’re also looking for folks for whom teamwork is the norm, not the exception; we communicate constantly and work together to make sure things stay up all the time.
Skills
Particular skills that we’re interested in:
Unix/Linux where you don’t even think about it;
Web tools and techniques, from HTTP to proxies to caching and so forth;
Memcached, Redis, Hadoop or Map/Reduce, NoSQL;
Relational databases (particular flavor isn’t important, although MySQL is a bonus);
Email/SMTP infrastructure (sending millions of messages per day when required);
Dynamic programming languages (like Python, Perl, PHP, Ruby, etc.);
Firewalls, load balancers, and all other varieties of network infrastructure.
Education
Bachelor's Degree in Engineering is strongly preferred