Shayan Saha

Software Engineer • Site Reliability Engineer (SRE)

Summary

Software Engineer with a strong foundation in Site Reliability Engineering (SRE), specializing in observability for large-scale, business-critical applications. Experienced in designing, implementing, and operating end-to-end observability solutions that provide deep visibility into system performance, reliability, and user experience. Proven ability to optimize application health through metrics, logs, traces, and proactive alerting, enabling faster incident detection and resolution.

Skilled in building custom automation and reliability tooling, conducting scalability and resilience testing, and applying DevOps best practices to improve operational efficiency. Proficient in Python and JavaScript, with hands-on experience leveraging observability platforms such as New Relic, Datadog, and ThousandEyes, as well as chaos engineering tools including Chaos Monkey and Gremlin. Demonstrated success in improving system resilience, reducing service disruptions, and supporting data-driven operational decisions.

Experience

Software Engineer (SRE)
Tata Consultancy Services
Aug 2024 – Present
  • Configured infrastructure monitoring for 90% of hosts across 96 critical business applications, significantly improving operational visibility.
  • Built custom New Relic analytics dashboards for 96 key applications, improving access to performance data and team efficiency.
  • Implemented synthetic monitoring for 150+ business scenarios, reducing downtime by 90% and strengthening reliability.
  • Developed Python-based agents to stream granular API performance metrics into New Relic.
  • Designed and shipped end-to-end automation tools that cut manual operational effort by 95%.

Analyst Programmer
Wipro Technologies
Sep 2020 – Jul 2024
  • Led scalability testing for Login and Money Transfer systems at Charles Schwab, improving performance and stability by 40%.
  • Executed Python-driven chaos experiments, improving robustness of Charles Schwab systems by 30%.
  • Implemented New Relic-based observability for 150+ critical HPE applications, cutting downtime by 93%.
  • Tuned agents for 2000+ on-prem hosts at HPE to ensure full-stack infrastructure visibility.
  • Developed synthetic scripts for 329 HPE business-critical scenarios, reducing downtime by 85%.
  • Led design and rollout of a Global Command Center to unify monitoring tools and processes for 100+ applications.
  • Contributed to performance engineering for the on-prem to cloud migration of 78 Enbridge applications, reducing latency by 25%.

Skills

Programming Languages
Python Golang Rust
Web & Backend
Flask Express Node.js HTML CSS JavaScript
Observability & APM
New Relic Datadog ThousandEyes
DevOps
Jenkins Bamboo GitLab GitHub Git Docker Terraform Ansible
Databases
MySQL MongoDB
Performance Testing
LoadRunner JMeter BlazeMeter
Software Proficiencies
Microsoft Word / Google Docs Microsoft Excel / Google Sheets Microsoft Powerpoint / Google Slide Jira Confluence Notion Visual Studio Code Visual Studio PyCharm

Education

National Institute of Technology Agartala
Bachelor of Technology · Civil Engineering
GPA: 7.8 · Graduated 2020
Tripura Board of Secondary Education
Higher Secondary (12th)
Percentage: 76.8% · 2016
Tripura Board of Secondary Education
Secondary / Madhyamik (10th)
Percentage: 86.43% · 2014

Certifications

Full-Stack Observability Practitioner · New Relic view
Chaos Engineering Practitioner · Gremlin view
Chaos Engineering Professional · Gremlin view

Contact

Open to conversations around SRE, observability, reliability engineering, and performance engineering.

Best way to reach out: shayansaha.con@gmail.com or via LinkedIn.