KeyStep

Lead Software Engineer, Reliability

Klaviyo
Dublin, Ireland
3 days ago
full-timeEngineering

Skills & Technologies

PythonGoSoftware EngineeringQuantitativeAPIsSREScalabilityCloudStrategyKlaviyoImplementationMakeAIAutomationDocumentationDecision MakingLeadershipCapacity PlanningSustainability

Job Description

At Klaviyo, we value the unique backgrounds, experiences and perspectives each Klaviyo (we call ourselves Klaviyos) brings to our workplace each and every day. We believe everyone deserves a fair shot at success and appreciate the experiences each person brings beyond the traditional job requirements. If you’re a close but not exact match with the description, we hope you’ll still consider applying. Want to learn more about life at Klaviyo? Visit klaviyo.com/careers to see how we empower creators to own their own destiny.

Lead Software Engineer, Reliability (Dublin)

Team Overview

As a Lead Software Engineer, Reliability, you will set technical direction and lead reliability strategy for Klaviyo’s most critical platforms. You’ll ensure our systems are reliable, scalable, and sustainable while enabling rapid product development across the company.

We treat reliability as a core product feature. Our work spans security, infrastructure, and software engineering, requiring deep systems thinking and strong technical leadership. We build foundational services that must be extremely reliable, secure, and performant at global scale.

The SRE team’s charter is to design, build, and operate foundational infrastructure and services, define reliability standards, reduce operational toil through automation, and continuously improve systems based on production learnings. As a lead, your work will be highly visible and will directly influence how Klaviyo builds software and how customers experience our platform every day.

How you’ll make an impact

As a Lead Software Engineer, Reliability, you will provide technical leadership while remaining hands-on with the systems that underpin Klaviyo’s reliability and operational excellence. You will:

Set the technical vision and long-term strategy for reliability, availability, and operational excellence across critical platforms

Lead the design, implementation, and evolution of foundational, security-critical services with strong guarantees around availability, scalability, latency, and fault tolerance

Drive adoption of SRE best practices across engineering teams, including SLIs, SLOs, error budgets, and reliability-based decision making

Identify systemic reliability risks and architectural bottlenecks, and lead cross-team initiatives to address them with durable, preventative solutions

Apply software engineering principles to automate infrastructure, eliminate operational toil, and improve system reliability at scale

Own and continuously improve observability, alerting, and incident response practices to reduce mean time to detection and recovery

Guide on-call strategy and operational processes to ensure sustainability, automation, and healthy operational load

Perform and lead quantitative analysis around system behavior, capacity planning, scaling limits, and performance characteristics

Partner closely with product, platform, and security leaders to influence system architecture early and ensure reliability is built in from the start

Lead incident response for high-severity events, driving effective mitigation, communication, and follow-up

Mentor senior and mid-level engineers, raising the bar for technical quality, operational maturity, and reliability culture across the organization

Review and influence technical designs, platform APIs, operational runbooks, and system documentation at an organizational level

You’ve already experimented with AI in work or personal projects, and you’re excited to dive in and learn fast. You’re hungry to responsibly explore new AI tools and workflows, finding ways to make your work smarter and more efficient.

Who you are

You are a senior technical leader who combines deep systems expertise with strong judgment and influence. You:

Are a cloud-native, platform-focused SRE who uses software to design and operate highly reliable production systems at scale

Write and maintain production-quality code (e.g. Python, Go, or similar) to build internal pl

Company & Role Analysis

JobSeeker+
Likely perks
Private MedicalPension25+ Days HolidayStock OptionsLearning BudgetFlexible Hours
Culture & working style

Neutral 2–4 sentence summary of what working at this company is like, drawn from public reviews and press coverage. Tone, collaboration style, pace, benefits highlights.

Market salary range

£45,000 – £60,000 (Glassdoor, Levels.fyi, 2025)

Unlock the full analysis for this job
Sign in to unlock →
Apply Now
Lead Software Engineer, Reliability at Klaviyo | KeyStep