Senior Site Reliability Engineer
Viz.ai
Software Engineering
Tel Aviv-Yafo, Israel
Location
Tel Aviv, Israel
Employment Type
Full time
Location Type
Hybrid
Department
Infrastructure
About Viz.ai
Viz.ai is the leader in building and deploying AI-powered Care Pathways and helping doctors do their work. The Viz Platform is deployed in 2,000 hospitals across the United States and trusted by many of the leading life sciences companies. The platform uniquely combines real-time, multimodal clinical data with deep clinician engagement to detect disease earlier, coordinate care teams, and help ensure patients receive the right treatment faster. Viz.ai was the first company to be awarded CMS reimbursement for AI and is ranked the #1 Healthcare AI Platform by hospitals and health systems in the Black Book Research survey. For more information, visit Viz.ai.
About the role:
We are seeking a skilled Site Reliability Engineer (SRE) to join our team and help build, maintain, and improve the reliability, scalability, and performance of our systems. As an SRE, you will be responsible for owning and evolving our observability tooling, using real-time insights to make data-driven decisions about system behavior and performance at runtime, and implementing automation to enhance our infrastructure. This role involves collaborating across teams to ensure a robust and efficient technology stack supporting mission-critical systems.
You will:
Proactively enhance system reliability, scalability, and performance through automation, monitoring, and capacity planning.
Develop and maintain observability systems, including distributed tracing, logging, and metrics platforms.
Establish and maintain organizational standards for monitoring, leveraging tools like Prometheus, Grafana, and OpenTelemetry.
Use observability tools to analyze runtime behavior and make data-driven decisions that improve system performance and reliability.
Partner with development teams to integrate reliability best practices into the software development lifecycle.
Manage infrastructure at scale in cloud services (AWS advantage) and platforms like Kubernetes.
Optimize resource utilization to reduce costs while maintaining service quality.
Contribute to the development and adoption of AI-driven tools and practices for engineering and observability.
What success looks like:
You are a trusted technical leader within the organization, mentoring others and helping shape the evolution of our SRE and observability practices.
You reduce the frequency and impact of production incidents by building resilient systems and using observability insights to address issues before they escalate.
You significantly improve observability: key metrics, logs, and traces are consistently available, well instrumented, and actionable across all critical services, enabling fast, informed decisions and rapid resolution of issues.
You are actively engaged in proactive problem solving: you identify and resolve systemic issues before they impact customers, and continuously refine SLOs and SLIs to reflect evolving business needs.
We are looking for:
At least 6 years of experience as a SRE or DevOps.
Strong experience with Observability Tools such as OpenTelemetry, Grafana, Prometheus, and ELK stack (Elasticsearch, Logstash, Kibana).
In-depth experience with Cloud Platforms: AWS services, including EC2, S3, RDS, and CloudFormation/Terraform for infrastructure-as-code.
Strong experience working in Kubernetes environments, with a focus on Helm for deployment and configuration management
Experience working with AI and LLM tools such as Cursor, Claude Code or similar.
Proficiency in scripting and/or development languages such as Bash or Python.
Thorough understanding of CI/CD pipelines and automation tools.
Strong experience with automation tools like Terraform and/or Ansible, and understanding of Infrastructure as Code.
Solid troubleshooting and debugging skills.
A team player with a strong can-do mentality.
Why should you join us?
If you are looking to make an impact, we are mission-driven and are making a difference in peoples’ lives every day.
If you want to be a part of an amazing team , our people are the heart of everything we do.
If you are a self-starter and naturally motivated, our work is driven by curiosity, innovation and team collaboration which allows us to leverage our skills immeasurably.
We are a remote-first company across the U.S. and EU, with a team in Tel Aviv operating in a flexible hybrid model, conveniently located near a train line.
Viz.ai is committed to providing highly competitive cash compensation, equity, and benefits. The compensation offered for this role will be based on multiple factors such as location, the role’s scope and complexity, and the candidate’s experience and expertise, and may vary from the range provided.
In the U.S., Viz offers competitive benefits, including medical, dental, vision, 401(k), generous vacation, and additional benefits to full-time employees. Viz.ai is an Equal Opportunity Employer and considers applicants for employment without regard to race, color, religion, sex, orientation, national origin, age, disability, genetics, or any other basis prohibited by federal, state, or local law.
Employees in Israel are offered a comprehensive benefits package, including, among others: dental insurance, performance-based bonuses, a Cibus meal allowance, meals at the office, and more.
If you’re applying for a position in San Francisco, please review the San Francisco Fair Chance Ordinance guidelines applicable in your area.
#LI: GH1
#LI: remote