Sr. Software Engineer (Agentic Runtime)
Dialpad
About Dialpad
Dialpad is the leading AI-powered customer communications platform, transforming how businesses communicate with their customers. More than 50,000 companies around the globe — including Netflix, RE/MAX, Uber, Randstad, and Tractor Supply — rely on Dialpad to build stronger customer connections using real-time, AI-driven insights. Visit dialpad.com to learn more.
Being a Dialer
At Dialpad, you’ll be part of a collaborative team working toward our shared mission of making our customers and their employees wildly successful. We believe that every conversation matters, and we're elevating each one with a platform that drives real-time insights and automation for our customers.
We thrive on continuous evolution, where every employee leverages industry-leading AI to constantly refine our platform and our own skills. We seek individuals who not only meet our high standards but go beyond them. Our ambition is significant, and achieving it requires a team that operates at the highest level. We look for individuals who are not just ambitious but who also possess the traits that are fundamental to our success: Scrappy, Curious, Optimistic, Persistent, and Empathetic.
Your role
Dialpad’s AI Engineering organization is responsible for building and maintaining customer-facing AI features at scale across all of our cloud-native products and services. Every day, millions of users worldwide leverage our technology to communicate effectively and efficiently.
Dialpad's Agentic Runtime team owns the infrastructure and execution engine that runs AI agents at scale across Dialpad's core product modalities — including voice, messaging, video, and digital engagement. From multi-step task orchestration and tool execution to real-time context management and agent memory, our team builds the foundational platform that powers Dialpad's next-generation intelligent, autonomous experiences. Our teams are highly collaborative and comprise cross-disciplinary professionals, including Product Managers, QA Specialists, and Engineers specializing in Distributed Systems, ML Infrastructure, and Platform Engineering.
This position reports to the Engineering Manager, who is based in Kitchener, CA, and has the opportunity to be based in our Buenos Aires, Argentina office.
What you’ll do
- Contribute to the design, development, and maintenance of agentic runtime systems, including agent orchestration, tool execution pipelines, and multi-step reasoning loops.
- Build and optimize core runtime components, including task planners, action dispatchers, memory managers, and context window management systems.
- Work on agent coordination techniques, including dynamic tool selection, parallel agent execution, state management, and result aggregation across multi-agent workflows.
- Maintain and enhance highly scalable agentic platforms with a focus on low-latency execution, cost efficiency, and deterministic behavior.
- Ensure high availability, reliability, and fault tolerance in agent runtime services, including graceful degradation when LLM or tool calls fail.
- Collaborate with cross-functional teams — including ML researchers, product, and platform engineers — to translate agentic product requirements into robust runtime infrastructure.
- Develop and optimize real-time distributed systems, microservices, and event-driven architectures powering agentic task execution.
- Design and implement sandboxed execution environments for safe agent use of tools, code execution, and external API calls.
- Implement and maintain monitoring, alerting, and performance metrics covering agent run success rates, token consumption, latency, and cost attribution.
- Evaluate and integrate emerging agentic frameworks, LLM APIs, and tooling ecosystems to continuously improve platform capabilities.
- Write clean, modular, and well-tested code while following best engineering practices in a rapidly evolving problem space.
- Participate in code reviews to ensure the quality, maintainability, and scalability of runtime components.
- Provide mentorship and technical guidance to junior engineers navigating the unique challenges of agentic systems.
Skills you’ll bring
- 3–6 years of experience in distributed systems, platform engineering, or ML infrastructure, with exposure to LLM-based or agentic systems strongly preferred.
- Strong understanding of agent architectures, including ReAct, plan-and-execute, and multi-agent coordination patterns.
- Deep knowledge of context management, prompt lifecycle, tool-call protocols (e.g., function calling, MCP), and agent memory strategies (short-term, episodic, and long-term).
- Experience integrating and managing external tool ecosystems, including web search, code interpreters, databases, and third-party APIs.
- Familiarity with retrieval-augmented generation (RAG) and how retrieval fits into broader agentic pipelines.
- Understanding of LLM output reliability challenges — hallucination, non-determinism, and retry/fallback strategies at runtime.
- Proficiency in Go and Python 3 (experience with Rust or TypeScript is a plus).
- Strong understanding of distributed systems, microservices, and event-driven architectures suited to long-running agent tasks.
- Passion for real-time performance optimization, including streaming responses, async execution, and parallel tool invocation.
- Experience with API design using OpenAPI, Swagger, or equivalent, with an eye toward agentic interaction patterns.
- Knowledge of gRPC or equivalent RPC protocols for inter-service communication within agent runtimes.
- Experience with Docker and Kubernetes, including managing long-running or stateful agent workloads in containerized environments.
- Familiarity with cloud platforms (GCP preferred, AWS/Azure optional), including managed services relevant to agentic workloads such as queuing, secrets management, and compute autoscaling.
- Hands-on experience with Infrastructure as Code tools like Terraform or Ansible.
- Knowledge of CI/CD frameworks and continuous delivery practices, with comfort shipping infrastructure in a fast-moving research-adjacent environment.
We believe in investing in our people. Dialpad offers competitive benefits and perks, alongside a robust training program that helps you reach your full potential. We have designed our offices to be inclusive, offering a vibrant environment to cultivate collaboration and connection. Our exceptional culture, recognized repeatedly as a certified Great Place to Work, ensures every employee feels valued and empowered to contribute to our collective success.
Don’t meet every single requirement? If you’re excited about this role and you possess the fundamental traits, the drive, and strong ambition we seek, but your experience doesn’t satisfy every qualification, we encourage you to apply.
Dialpad is an equal-opportunity employer. We are dedicated to creating an inclusive environment, free of discrimination and harassment.