Digital Platform Ops Lead
Job Description
In this role at CelcomDigi, you will be responsible for the end-to-end stability, performance, and scalability of digital platforms that support our consumer and enterprise services. You will drive operational excellence, vendor governance, incident management and automation to ensure seamless service continuity across our digital ecosystem.
You are:
- A resilient operational leader with deep expertise in telecom platforms, uptime management and digital service infrastructure.
- Passionate about automation, efficiency and service excellence, with a mindset of continuous improvement.
Experienced in leading large BAU and incident resolution teams, with the ability to work across IT, Product and Network divisions. - Adept at vendor and SLA management, ensuring performance and accountability across a complex technical ecosystem.
- Calm under pressure and structured in responding to service outages or P1-level incidents, ensuring business continuity and transparency.
- Collaborative, data-driven and solution-focused. Able to communicate both technically and strategically across stakeholder groups.
Responsibilities
- Lead the end-to-end operations of platform systems, including Core and Yoodo environments, ensuring uptime, availability, and seamless customer experience.
- Manage BAU operations across digital product infrastructure, driving issue resolution, backlog management and SLA compliance.
- Oversee incident management processes, ensuring timely triage, escalation and resolution of P1–P3 incidents, including escalation to Crisis One Team when required.
- Govern relationships with strategic vendors, overseeing performance dashboards, escalations, RCAs and operational meetings. Drive automation initiatives across monitoring, alerting, resolution and reporting systems to reduce manual work and increase responsiveness.
- Support product and service launches, ensuring readiness of activation, billing, provisioning and platform health checks. Collaborate with Product, IT, Network, GTM, and CXD teams to ensure operational alignment with business strategies.
- Identify and mitigate operational risks proactively, embedding resilience and scalability into platform operations.
- Deliver structured monthly operations reports and dashboards to leadership on system health, incident trends, vendor performance and risk exposure.
Requirements
- 10+ years of experience in telecom platform operations, digital service management or large-scale IT systems support.
- Proven experience managing high-availability platforms, incident response protocols and vendor performance reviews.
- Bachelor’s Degree in Telecommunications, IT or related field. Master’s Degree is a plus.
- Familiarity with tools such as Dynatrace, firebase, AWS cloudwatch , Datadog , PRTG, Sentry or similar platforms for monitoring and incident management.
- Demonstrated ability to manage both technical operations and business priorities in a fast-paced environment.
- Strong people leadership skills, including rotational team planning, workload distribution, and team upskilling.
- Experience in crisis escalation models and integration with broader enterprise-level incident protocols.
- Excellent stakeholder communication and executive reporting skills.
- Passion for platform modernization, automation and a proactive ops culture.
Job Description
In this role at CelcomDigi, you will be responsible for the end-to-end stability, performance, and scalability of digital platforms that support our consumer and enterprise services. You will drive operational excellence, vendor governance, incident management and automation to ensure seamless service continuity across our digital ecosystem.
You are:
- A resilient operational leader with deep expertise in telecom platforms, uptime management and digital service infrastructure.
- Passionate about automation, efficiency and service excellence, with a mindset of continuous improvement.
Experienced in leading large BAU and incident resolution teams, with the ability to work across IT, Product and Network divisions. - Adept at vendor and SLA management, ensuring performance and accountability across a complex technical ecosystem.
- Calm under pressure and structured in responding to service outages or P1-level incidents, ensuring business continuity and transparency.
- Collaborative, data-driven and solution-focused. Able to communicate both technically and strategically across stakeholder groups.
Responsibilities
- Lead the end-to-end operations of platform systems, including Core and Yoodo environments, ensuring uptime, availability, and seamless customer experience.
- Manage BAU operations across digital product infrastructure, driving issue resolution, backlog management and SLA compliance.
- Oversee incident management processes, ensuring timely triage, escalation and resolution of P1–P3 incidents, including escalation to Crisis One Team when required.
- Govern relationships with strategic vendors, overseeing performance dashboards, escalations, RCAs and operational meetings. Drive automation initiatives across monitoring, alerting, resolution and reporting systems to reduce manual work and increase responsiveness.
- Support product and service launches, ensuring readiness of activation, billing, provisioning and platform health checks. Collaborate with Product, IT, Network, GTM, and CXD teams to ensure operational alignment with business strategies.
- Identify and mitigate operational risks proactively, embedding resilience and scalability into platform operations.
- Deliver structured monthly operations reports and dashboards to leadership on system health, incident trends, vendor performance and risk exposure.
Requirements
- 10+ years of experience in telecom platform operations, digital service management or large-scale IT systems support.
- Proven experience managing high-availability platforms, incident response protocols and vendor performance reviews.
- Bachelor’s Degree in Telecommunications, IT or related field. Master’s Degree is a plus.
- Familiarity with tools such as Dynatrace, firebase, AWS cloudwatch , Datadog , PRTG, Sentry or similar platforms for monitoring and incident management.
- Demonstrated ability to manage both technical operations and business priorities in a fast-paced environment.
- Strong people leadership skills, including rotational team planning, workload distribution, and team upskilling.
- Experience in crisis escalation models and integration with broader enterprise-level incident protocols.
- Excellent stakeholder communication and executive reporting skills.
- Passion for platform modernization, automation and a proactive ops culture.
Screen readers cannot read the following searchable map.
Follow this link to reach our Job Search page to search for available jobs in a more accessible format.
Job Segment:
Telecom, Telecommunications, Operations, Technology