Senior Site Reliability Engineer (Remote First)
Zensurance
This job is no longer accepting applications
See open jobs at Zensurance.See open jobs similar to "Senior Site Reliability Engineer (Remote First)" Work In Tech.Responsibilities:
- Work with the product teams to assess departmental scalability, reliability, observability, and security requirements and implement as needed.
- Guide and mentor development teams as they plan and execute development of team-specific microservice architecture.
- Explore new SRE tech and best practices (ex. Pager duty).
- Review and manage access to our tools and infrastructure.
- Manage and continuously improve CI/CD systems.
- Manage backup and recovery systems and processes.
- Review and add tools to accomplish business needs.
- Participate in team discussions and decision-making processes.
- Be available and participate in an on-call rotation schedule.
- Perform other duties as assigned.
Requirements:
- University degree or college diploma in a recognized technical, vocational or academic program (preferably in Engineering or Computer Science) or equivalent work experience.
- 5+ years of experience in a SRE, DevOps or related positions.
- Experience setting up and educating development teams in observability processes and best practices.
- Experience with a cloud service provider (AWS preferred), CI/CD systems (Github Actions preferred).
- Monitoring and Logging systems (AWS Cloudwatch and Datadog preferred), Alerting systems (Sentry preferred).
- Experience with distributed systems to ensure that services meet scalability, reliability and uptime goals by implementing strategies like redundancy, failover solutions, and monitoring
- Experience in Infrastructure and Application Security (ex. Docker, Dependencies, AWS IAM, Security Hub, System Manager, Audit Manager).
- Experience in Health and performance monitoring of internal infrastructure as well as third-party dependencies such as Salesforce, and SAP. (ex. Datadog and Sentry preferred).
- Experience in Release Engineering (ex. CI/CD, Kubernetes).
- Experience in Backup and Recovery Scenarios (ex. Salesforce outage, Git reversion, Mongo restore).
- Ability to communicate efficiently and work in a collaborative style.
- A commitment to continuous improvement, continuous learning and knowledge sharing.
Nice to have:
- Prior experience with Insurance is a plus.
- AWS and DataDog certifications are an asset, including DevOps Engineer, Solutions Architect, or SysOps Administrator.
This job is no longer accepting applications
See open jobs at Zensurance.See open jobs similar to "Senior Site Reliability Engineer (Remote First)" Work In Tech.