Enterprise AI Infra Monitoring & Automation
Cloud-native observability for Kubernetes with centralized logging and proactive alerting — automated provisioning cut configuration effort 45% and auto-remediation reduced incident response time 38%.
I build resilient, scalable cloud platforms and reliable infrastructure for GenAI workloads — with Kubernetes, AWS and Terraform. Site Reliability Engineer based in Southfield, Michigan.


I'm an AI Infrastructure and Site Reliability Engineer with 3+ years designing and operating large-scale distributed systems, cloud-native platforms and Kubernetes-based AI/ML infrastructure. At Coinbase I work on AI infrastructure reliability; previously at CirrusLabs I built GenAI and cloud-optimization infrastructure.
Day to day I work with Kubernetes, AWS and Terraform — automating infrastructure as code and designing systems that stay up under pressure. I hold a Master's in Data Analytics and a Bachelor's in Information Technology, and I'm the founder of Hamuzair.
My core infrastructure toolkit and the path that got me here.
Coinbase · Full-time
CirrusLabs · Full-time
Hamuzair
Indiana Wesleyan University
Shadan College of Engineering and Technology
Designing and operating scalable, secure cloud platforms on AWS — built for performance across multiple regions.
Observability, incident response and automation that keep production systems fast, stable and highly available.
Container orchestration, autoscaling and zero-downtime deployments with Kubernetes for resilient workloads.
Reliable, optimized infrastructure for GenAI and ML workloads — from inference pipelines to cost optimization.
Representative work across cloud reliability, GenAI infrastructure and automation. Filter by focus area.
Cloud-native observability for Kubernetes with centralized logging and proactive alerting — automated provisioning cut configuration effort 45% and auto-remediation reduced incident response time 38%.
Production-grade EKS clusters with autoscaling and zero-downtime rollouts; reduced provisioning time 35% and improved deployment reliability.
GenAI-driven system that detects idle resources and optimizes compute — improving efficiency 25% with real-time CloudWatch monitoring and automated dashboards.
Automated model-serving and deployment pipelines for AI/ML workloads, with rollout automation, monitoring and feature-store concepts.
Secure, automated CI/CD that cut manual deployment effort 45%, with DevSecOps compliance checks built directly into the pipeline.
Reusable Infrastructure-as-Code templates provisioning multi-account AWS infrastructure, reducing provisioning time 50%.
"It's great getting connected with you. The website you created helped a lot to boost up my profile."

"We increased the functionality of our website dramatically while cutting costs. It's far easier to use and maintain — we couldn't be happier."

"Great experience. Helped build a professional, reliable setup and always happy to assist. Would highly recommend — 10/10."

Have a project, role or idea in mind? Drop me a message.