Like Minded People
Work Together

Back to Career

Senior/Lead OpenShift Engineer

Location: Pune

Work Experience: 10+ years of IT experience with 5+ years in OpenShift platform engineering.

Requirements:

  • Extensive hands-on experience with multi-cluster OpenShift environments.
  • Proven experience handling large-scale, mission-critical production platforms.
  • Strong troubleshooting experience across:
    • Kubernetes core components
    • OpenShift control plane
    • Networking (SDN/OVN, ingress, load balancers)
    • Persistent storage solutions
  • Experience leading critical incident management calls.
  • Strong scripting and automation skills (Bash, Python, etc.).
  • Experience with cloud platforms (AWS, Azure, GCP) and hybrid environments.
  • Deep understanding of container security and cluster hardening.
  • Experience with OpenShift upgrades across major versions (Preferred).
  • Experience with service mesh and advanced networking configurations (Preferred).
  • Knowledge of GitOps frameworks (Preferred).
  • Experience designing DR and multi-region failover strategies (Preferred).
  • Relevant certifications: OpenShift, Kubernetes, Cloud (Preferred).
  • Key Competencies: Leadership under pressure, strategic thinking, strong decision-making ability, high ownership and accountability, excellent communication skills, proactive risk management mindset.

Qualifications: Bachelor’s degree in Technology, Engineering, Computer Science, Computer Applications, or equivalent work experience.

Job Description:

Platform Architecture & Cluster Management

  • Design, deploy, and manage multiple OpenShift clusters (production, DR, staging) across hybrid/cloud/on-prem environments.
  • Architect scalable, resilient, and secure OpenShift platforms supporting high workloads and large resource counts (thousands of pods, services, namespaces, etc.).
  • Lead cluster lifecycle management including upgrades, patching, capacity planning, and performance optimization.
  • Ensure high availability, disaster recovery readiness, and zero/near-zero downtime deployments.

Operations & Critical Incident Leadership

  • Act as technical lead during P1/P0 critical production incidents, driving war-room calls and root cause analysis.
  • Resolve cluster-wide outages, control plane failures, networking breakdowns, storage issues, and performance bottlenecks.
  • Provide leadership-level decision-making during high-impact incidents.
  • Develop and implement preventive measures and post-incident action plans.
  • Establish operational runbooks and SRE best practices.

Performance & Scale Management

  • Manage environments with high resource volume (large node counts, high pod density, heavy traffic workloads).
  • Optimize resource utilization (CPU, memory, storage, network).
  • Design cluster scaling strategies including autoscaling, node pools, and workload balancing.
  • Conduct stress testing and performance benchmarking.

Security & Compliance

  • Implement enterprise security controls (RBAC, SCC, network policies, TLS, IAM integration).
  • Manage vulnerability, remediation, and cluster hardening.
  • Ensure compliance with regulatory and organizational security standards.

Monitoring & Observability

  • Implement cluster monitoring and logging solutions.
  • Proactively monitor cluster health and application performance.
  • Establish SLOs, SLAs, and error budgets.

Leadership & Mentorship

  • Lead and mentor platform engineers and Application teams.
  • Provide technical governance and architectural direction.
  • Collaborate with application teams, security teams, and infrastructure teams.
  • Participate in capacity planning and executive-level technical discussions.
  • Drive continuous improvement initiatives across platform engineering.