Summary
Software Engineer with a strong foundation in Site Reliability Engineering (SRE), specializing in observability for large-scale, business-critical applications. Experienced in designing, implementing, and operating end-to-end observability solutions that provide deep visibility into system performance, reliability, and user experience. Proven ability to optimize application health through metrics, logs, traces, and proactive alerting, enabling faster incident detection and resolution.
Skilled in building custom automation and reliability tooling, conducting scalability and resilience testing, and applying DevOps best practices to improve operational efficiency. Proficient in Python and JavaScript, with hands-on experience leveraging observability platforms such as New Relic, Datadog, and ThousandEyes, as well as chaos engineering tools including Chaos Monkey and Gremlin. Demonstrated success in improving system resilience, reducing service disruptions, and supporting data-driven operational decisions.
Skilled in building custom automation and reliability tooling, conducting scalability and resilience testing, and applying DevOps best practices to improve operational efficiency. Proficient in Python and JavaScript, with hands-on experience leveraging observability platforms such as New Relic, Datadog, and ThousandEyes, as well as chaos engineering tools including Chaos Monkey and Gremlin. Demonstrated success in improving system resilience, reducing service disruptions, and supporting data-driven operational decisions.
Experience
Software Engineer (SRE)
Tata Consultancy Services
Aug 2024 – Present
- Configured infrastructure monitoring for 90% of hosts across 96 critical business applications, significantly improving operational visibility.
- Built custom New Relic analytics dashboards for 96 key applications, improving access to performance data and team efficiency.
- Implemented synthetic monitoring for 150+ business scenarios, reducing downtime by 90% and strengthening reliability.
- Developed Python-based agents to stream granular API performance metrics into New Relic.
- Designed and shipped end-to-end automation tools that cut manual operational effort by 95%.
Analyst Programmer
Wipro Technologies
Sep 2020 – Jul 2024
- Led scalability testing for Login and Money Transfer systems at Charles Schwab, improving performance and stability by 40%.
- Executed Python-driven chaos experiments, improving robustness of Charles Schwab systems by 30%.
- Implemented New Relic-based observability for 150+ critical HPE applications, cutting downtime by 93%.
- Tuned agents for 2000+ on-prem hosts at HPE to ensure full-stack infrastructure visibility.
- Developed synthetic scripts for 329 HPE business-critical scenarios, reducing downtime by 85%.
- Led design and rollout of a Global Command Center to unify monitoring tools and processes for 100+ applications.
- Contributed to performance engineering for the on-prem to cloud migration of 78 Enbridge applications, reducing latency by 25%.
Skills
Programming Languages
Python
Golang
Rust
Web & Backend
Flask
Express
Node.js
HTML
CSS
JavaScript
Observability & APM
New Relic
Datadog
ThousandEyes
DevOps
Jenkins
Bamboo
GitLab
GitHub
Git
Docker
Terraform
Ansible
Databases
MySQL
MongoDB
Performance Testing
LoadRunner
JMeter
BlazeMeter
Software Proficiencies
Microsoft Word / Google Docs
Microsoft Excel / Google Sheets
Microsoft Powerpoint / Google Slide
Jira
Confluence
Notion
Visual Studio Code
Visual Studio
PyCharm
Education
National Institute of Technology Agartala
Bachelor of Technology · Civil Engineering
Tripura Board of Secondary Education
Higher Secondary (12th)
Tripura Board of Secondary Education
Secondary / Madhyamik (10th)
Certifications
Contact
Open to conversations around SRE, observability, reliability engineering, and performance engineering.
Best way to reach out: shayansaha.con@gmail.com or via LinkedIn.