- 25 active jobs (view)
- Published: December 12, 2023
Description
About Atera
Atera is inventing a new way of managing IT end-to-end for IT professionals and teams worldwide.
By creating an AI-powered IT platform, Atera's all-in-one Remote Monitoring and Management (RMM) Helpdesk, Ticketing, and Reporting solution helps more than 23,000 IT pros achieve 10X operational efficiency, cut down time-to-resolution, and deliver better outcomes faster. Located in the heart of Tel Aviv, our team of passionate, like-minded individuals is driven by a shared mission to unleash everyone's potential and constantly innovate. We create an open, transparent, and supportive environment that gives our teams the autonomy, resources, and freedom to thrive.
This is a full-time and onsite (hybrid-remote) role at our Tel Aviv office.
Atera is looking for a motivated senior site reliability engineer to join us and build the framework for the engineering ops to scale.
Responsibilities:
- Build tools and automation to monitor system health, performance, and reliability, ensuring quick detection and resolution of any anomalies or issues.
- Write high-quality infrastructure-as-code that automates the provisioning, deployment, scaling, and effective monitoring, alerting, and logging solutions.
- Work with other engineers to ensure that new services are well-designed, properly monitored, and have well-defined SLIs and achievable SLOs
- Maintain runbooks for manual tasks and replace those runbooks with automation whenever possible.
- Proactively track our capacity, quotas, and other performance limits to plan for growth.
- Participate in a 24x7 on-call rotation to handle product availability issues as well as urgent customer support escalations.
- Investigate and resolve incidents and outages, performing root cause analysis to identify systemic issues and implement preventive measures.
- Develop and maintain disaster recovery plans and perform regular testing to ensure data integrity and business continuity.
REQUIREMENTS
Requirements:
- 3+ years of experience as an SRE in large-scale production environments
- Strong experience in designing, implementing, and managing monitoring processes.
- Experience in at least one scripting language (Python, Ruby, Perl, Bash) and infrastructure as code technologies (e.g., Terraform, CloudFormation)
- Strong abilities to lead, design, and execute cross-organization projects
- Experience in managing container and infrastructure orchestration tools (e.g., Kubernetes, Terraform)
- Hands-on experience administering public clouds
- Experience with CI/CD pipelines for applications and microservices
- Excellent English communication skills
Advantages:
- Knowledge of advanced monitoring and observability tools beyond basic logging and alerting.
- Experience with tools like Prometheus, Grafana, ELK stack, or similar.
- Previous experience as DevOps Engineer- a big plus
Some about our benefits
Atera is highly collaborative and, yes, fun! To support you at work (and play), we offer some fantastic perks: ample time to learn from your teammates and contemporaries, time off to relax and recharge, community volunteer days, an annual budget to support your learning & growth, a company-paid trip, and lots more.