Overview
Reports to: Chief Technology Officer
Description:
The Senior IT Security & Operations Manager is responsible for keeping the internal and production systems of LearnVest humming. The ideal candidate is someone who loves to play with computers, look at yards of log files, automate tasks via shell and Perl scripting, is cool with carrying a pager 24/7 and responding to production emergencies off-hours and on weekends, has a keen eye for troubleshooting, smart ideas for “how to avoid this in the future,” and in general, a passion for building and managing complex infrastructures. You’re a benevolent hacker at heart, and any issue that cannot be resolved becomes an ego problem. Linux (and, in general, *nix) systems, networks, firewalls, kernel parameters, databases, I/O pipes, memory/CPU histograms and sockets are good friends that you’ve tamed to a point that they do what you want them to. Finally, you don’t see yourself as ‘king of the systems’ but rather as the benevolent guru of the command line, happily sharing your boxes with developers who know their salt and care about performance and uptime as much as you do.
Responsibilities
Maintain integrity of IT infrastructures – both internal and production (in data center and the cloud)
Configuration of firewalls, load balancers and other production equipment
Scripting, automation and troubleshooting of processes and systems
Design, configure and report on server and systems monitoring, backups, and utilization
Perform daily system monitoring for load, availability, and capacity
Perform regular security monitoring – be the point person for security for the enterprise
Diagnose and troubleshoot system failures in a speedy manner and produce post-mortems
Deliver reports on system uptime, health, and issues to senior management
Plan new build-outs and infrastructure plans as LearnVest scales
Production deployment experience with TCP, NTP, SNMP, SMTP, ARP, HTTP, BIND, SSH, SSL
Maintain and support existing applications
Develop automated approaches for system administration tasks
Apply upgrades and patches to software, storage and processor capacity as necessary
Actively tune servers to deliver maximum load and throughput
Documentation of all processes, procedures, and passwords
Experience
5+ years experience in a production 24/7 high traffic web environment based on Open Source technologies
Deep experience with Nginx, Apache, Tomcat, and MySQL administration
Skills
Linux and/or BSD Unix expert with strong debugging skills
Ability to build a box from parts and configure a system in the cloud
Strong understanding of best practices for software engineering, secure systems design and scalable fault tolerant web architectures
Systems programming, performance optimization, and network design experience a must
Ability to program fluently in one or more scripting languages – Perl, Shell, and/or Python preferred
Working knowledge of PHP, Java, MySQL, Perl, and shell scripting
Expertise in TCP/IP networking, router and switch hardware
Ability to juggle multiple tasks in a fast-paced work environment
Ability to work both independently and within a team
Comfortable in a startup environment with flexible work schedules and 24/7 availability
Preferred: Experience in an startup environment
Preferred: Experience with Amazon EC2 configuration, administration, and cloud deployments in general a huge plus
Preferred: Experience with clustering
Education
BS in Computer Science or related field