Description
We’re Salesforce, the Customer Company, inspiring the future of business with AI+ Data +CRM. Leading with our core values, we help companies across every industry blaze new trails and connect with customers in a whole new way. And, we empower you to be a Trailblazer, too — driving your performance and career growth, charting new paths, and improving the state of the world. If you believe in business as the greatest platform for change and in companies doing well and doing good – you’ve come to the right place.
Our Availability Engineering teams are responsible for driving ‘best in class’ availability using the software development process. You will work with delivery teams deploying Customer facing software across a multi substrate engineering platform that collectively ships hundreds of features to production for tens of millions of users across all industries every day. Our users count on our applications and platforms to be highly reliable, lightning fast, supremely secure, and to preserve all of their customizations and integrations every time we ship.
Job Details
(Lead/Principal/Architect) Software Engineer - Availability Engineering
Our Availability engineering teams are responsible for driving ‘best in class’ availability, you will work with delivery teams deploying Customer facing / supporting software across a multi substrate engineering platform that collectively ships hundreds of features to production for tens of millions of users across all industries every day. Our users count on our applications and platforms to be highly reliable, lightning fast, supremely secure, and to preserve all of their customizations and integrations every time we ship. You will need deep experience with concurrency, large scale systems, proficiency with solving real-world data management challenges, a strong understanding of how to craft solutions that are highly available, and a shown ability to design, develop, and optimize the core back-end systems.
What you’ll be doing:
As part of a specialist unit focused on availability and resilience, you will embed with delivery teams, acting in a Lead capacity, creating bandwidth and prioritizing a focus on corrective and proactive availability measures.
You will be contributing to designing, developing, debugging, and operating resilient applications and platforms deployed across distributed systems that run across thousands of compute nodes in multiple data centers.
You will champion resiliency standard processes; Observability tool integration, horizontal/vertical sizing & auto-scaling, release rollback & recovery workflows, integration tests and validation procedures for applications running on self host infra as well as public cloud platforms such as AWS, GCP, Azure & Alibaba
Using and contributing to open source technology (Spinnaker, Zookeeper, etc.)
Developing / demonstrating Infrastructure-as-Code using Terraform.
Building / integrating with API’s and microservices deployed on containerization frameworks such as Kubernetes, Docker, Mesos etc
Resolving complex technical issues and driving innovations that improve system availability, resilience, and performance
You have experience balancing live runtime management, feature delivery, and retirement of technical debt
Participate in the team’s on-call rotation to address complex problems in real-time and keep services operational and highly available
Required Skills:
15+ years of hands on software development experience
5+ year in a Tech Lead, Principal or Architect capacity
Ability to reverse engineer solutions via independent code and architecture review, envision, define and then supply to delivery of availability improvement refactoring projects
Mastery of one or more object oriented delivery with languages such as Java, Golang, APEX, Python.
Deep experience working with core web technologies: HTTP, JSON, REST, XML
Proficiency with databases including Oracle or other relational and/or NoSQL solutions
Experience owning and operating multiple instances of a critical service
Running critical infrastructure services; monitoring, alerting, logging, tracing and reporting
Domain expertise on Service ownership standard processes, SLO/I/A definition, driving proactive operational awareness and experience with Incident / Problem management
Thorough knowledge of Agile development methodology with experience in both Test / Behavioral Driven Development practices