Headquarters: New York
URL: https://shiphero.com
Hello. We are ShipHero (https://shiphero.com). We have built a software platform entrusted by hundreds of eCommerce companies, large and small to run their operations and we continue to grow. About US$5 billion of eCommerce orders are shipped a year via ShipHero. Our customers sell on Shopify, Amazon, Etsy, eBay, WooCommerce, BigCommerce, and many other platforms. We’re driven to help our customers grow their businesses by providing a platform that solves complex problems, and is engineered to be reliable and fast. We are obsessed with building great technology, that is beautiful, easy to use, and is loved by our customers. Our culture also reflects our ethos and belief that by bringing passionate, talented, and great people together - you can do great things.
Our team is fully remote, the company has always been remote. We communicate regularly using video chat and Slack and put a strong emphasis on asynchronous work so people have large chunks of uninterrupted time to focus and do deep work.
We are seeking an experienced Site Reliability Engineer to join our growing team. We are looking for someone with a recent track record of building and maintaining complex infrastructure within AWS (Amazon Web Services). You would be a fundamental team member, focusing on building a solid foundation for the platform. We seek excited and driven people to continue growing with the experience of working with talented engineers and helping others improve.
About You
- You understand that great things are accomplished when people and teams work together.
- You feel comfortable owning processes and tools for deploying to production and scaling.
- You understand modern web architectures and tiers.
- You have a solid understanding of security best practices.
- You take pride in your craft.
- You have made (a lot of) mistakes and, most importantly, have learned from them.
- You are comfortable and even enjoy mentoring others in different skill sets.
- You have worked on medium and large projects that have gone to production and lived there for a while.
- You have a passion for automating, developing, and improving complex workflows.
- You have strong scripting skills
Tech Knowledge
We are looking for 6+ years of production experience with AWS and:
- Aurora RDS (MySQL), Lambda, S3, SQS (Simple Queue Service).
- Practical application with Infrastructure and Application Monitoring (We use Sentry, Honeycomb, and CloudWatch).
- Comfortable debugging running applications for memory leaks, CPU, and usage, especially under Apache, mod_wsgi, Nginx, and Gunicorn.
- Broad knowledge of AWS cloud security (AWS Inspector, Guard Duty. WAF & Security Hub), infrastructure-as-code.
- The skills to write infrastructure-as-code and automate routine activities.
- A record of working with distributed teams across an organization to achieve goals.
- Python (preferably 3.6+).
- Terraform including authoring modules.
- Docker and building images including multi-stage with secrets
- CI/CD automation (we use GitHub Action, AWS CodeBuild, and CodePipeline)
The Role
Provide hands-on configuration, setup & maintenance of our development, and production environments.
Collaborate with other teams on monitoring & debugging solutions.
Developing, automating, and operating our cloud infrastructure platform.
Respond to incidents, ensuring the restoration of services when required.
Contribute to the team's backlog of activities.
Be part of on-call support.
Automate yourself and others out of everyday tasks.
Ability to estimate effort and ship on an agreed schedule. Comfortable pushing yourself and your team members when challenges pop up.
Learn and push those around you to do the same – this is a craft that you're constantly improving upon.
Implement solutions that are pragmatic to get the platform built.
Have the confidence to work with experienced and talented people to build great things; you are not a 'rock star' but a team player that takes the initiative.
We want everyone to be self-sufficient and firmly believe how we collaborate & communicate is of significant importance. Here is a glimpse of how we roll: https://shiphero.com/careers/communication-guidelines/
To apply: https://weworkremotely.com/remote-jobs/shiphero-senior-site-reliability-engineer