Cloud, DevOps, and Site Reliability Engineer
I design, operate, and troubleshoot large-scale infrastructure platforms with a focus on reliability, scalability, automation, and security.
I work primarily with cloud-native and distributed systems, with deep involvement in production environments spanning AWS, Kubernetes, OpenStack, Linux, networking, and observability—including incident handling, capacity planning, and platform improvements.
This blog is where I document practical learnings, deep dives, and real operational scenarios rather than theoretical explanations. The goal is to share knowledge that actually helps engineers in day-to-day work.
"What would I want to read if I were debugging this at 3 AM?"
I specialize in building and operating infrastructure across the full lifecycle:
Most of my experience comes from telecom-grade and enterprise environments, where availability, correctness, and predictability matter more than experimentation.
AWS (EKS, EC2, S3, IAM, VPC networking, security)
Kubernetes, Helm, Ingress controllers, production-scale platforms
OpenStack, Ceph storage (OSD, PGs, recovery, failure handling)
Terraform, CloudFormation, automated provisioning
Linux (RHEL, Ubuntu), systemd, performance tuning
VPC design, routing, load balancers, DNS, TLS
Prometheus, metrics exporters, monitoring, alerting
Python, Bash, operational automation
CI/CD, reliability engineering, incident response
I created cloudinfrasre.in to:
Practical AWS, networking, and infrastructure patterns from production
Real-world automation, IaC, and CI/CD lessons
Incident response, reliability engineering, and observability
Kubernetes, OpenStack, and large-scale operations
If you like practical explanations, command-level detail, and real failure scenarios, you're in the right place.
Whether you're debugging a production issue or designing a new platform, I hope these articles provide the practical guidance you need.