Interview with Tim Armandpour is the Chief Technology Officer at PagerDuty

Automation and Advances in ML and AI Massively Reduce Workloads for IT and DevOps Teams

Automation and Advances in ML and AI Massively Reduce Workloads for IT and DevOps Teams

Interview with Tim Armandpour is the Chief Technology Officer at PagerDuty

Automation and Advances in ML and AI Massively Reduce Workloads for IT and DevOps Teams


devmio: Thanks for taking the time to answer our questions! First of all, could you please introduce yourself to our readers and tell us more about your work.

Tim Armandpour: I’m Tim Armandpour, Chief Technology Officer at PagerDuty. I’m responsible for leading the company's long-term technology strategy for the PagerDuty Operations Cloud, architecture vision, security, and technology research to support corporate development. A major challenge for businesses today is how to rapidly transition to being digital first, while retaining operational resiliency and supporting customer experience. I’m excited about the opportunity to address this challenge through innovation and through finding the best possible solutions for customers.

devmio: All companies are affected by downtime. This affects revenue, trust, and the bottom-line. Where can AI help with downtime? How would that work?

Tim Armandpour: GenAI promises to be the missing link between IT and customer success, enabling stakeholder communications, and process automation for faster resolution. AI and automation can support operational and customer service teams and alleviate crowded, legacy, technology stacks to minimise downtime. AI can also help manage the risks from complex IT, cybersecurity threats, and human error as firms use new technology.

Setting automation as the first line of defence enables humans to avoid spending time analysing events to determine what is important. This frees up engineering time to work on value-added initiatives. It also reduces downtime and customer impact because incidents are identified and triaged faster. Automation also supports healthier working environments, reducing staff burnout as responders no longer have to pivot between tools, sift through noise, and troubleshoot irrelevant incidents to do what matters most.

If AI is deployed correctly it can alleviate downtime. According to IDC, more than 53% of organisations say an hour of downtime on a revenue generating service costs a minimum of $100,000.

In what ways can implementing AI solutions improve productivity and create better capacity? Where do you see the greatest benefit?

Tim Armandpour: Automation and advances in machine learning and AI techniques massively reduces the workload for IT and DevOps teams. Teams are burdened by huge data lakes and manual work required to sift through system noise. They are looking to real-time AIOps and intelligent automation to prioritise work and keep digital operations functioning smoothly. With AI, teams can readily transition to DevOps best practices and proactively manage their unplanned work. AIOps can also correlate signals and reduce noise to help teams stay focussed during incidents for faster resolution.

AI can create the automation rules or scripts that reduce toil on developers and engineers. By leveraging AI to make sense of streams of data coming from a variety of monitoring tools, teams can return services to normal faster. Another AI solution is automated diagnostics. When an incident that requires human intervention happens, engineers need context to understand the cause of an issue. Event-driven automated diagnostics proactively collects relevant information from systems and services and saves valuable time.

Teams can use event-driven automation to remediate redundant issues that don’t need developer involvement. Event-driven automation can span from real-time event ingestion, complex multi-step logic, to action being taken. Well-defined automation workflows and processes lead to greater stability and business resilience. Organisations can leverage modern AI technologies (e.g., ChatGPT) to generate automation workflows. AI-generated automation workflows accurately determine the right path to resolution and act as a level zero responder.

Furthermore, when some companies designate as many as three responders to handle stakeholder communication, AI-Generated Status Updates can significantly expedite this process, making it easier for the responder to review updates, and reduce the numbers of required responders, saving costs and scaling workforces.

devmio: Do you have any real world examples or case studies to share regarding how AI helps prevent downtime?

Tim Armandpour: The travel company TUI is growing through acquisitions which resulted in separate IT functions using different technologies and operating processes. This resulted in siloed teams and paper-based processes which were difficult to manage, particularly when TUI wanted to consolidate everything into a single IT operation and unified technology stack.

Using AI, PagerDuty Operations Cloud provided TUI with the single source of real-time truth, resulting in quicker response times. If TUI already knows a scenario, PagerDuty learns and responds by executing automated scripts to recover from service disruptions.

Consequently, the time to recover from an incident with automated recovery can now be 90% quicker. Using automated scripts to recover from service disruptions saves millions of dollars in business impact.

How will AI interact with cloud computing in particular?

Tim Armandpour: Automation orchestrations from the cloud offers value by removing engineering toil from management and routine incident response. This enhances the working experience, reducing burnout and helping overall business value. Without cloud backed automation, many enterprises will struggle to use automation as a precursor to an AIOps approach. This is where automation takes on routine tasks, with AIOps identifying system performance and reliability improvements and recommending actions.

devmio: Are there any reasons to hold back on using AI in the enterprise? What are the concerns and are they any tasks that should avoid automation?

Tim Armandpour: Most automation wasn’t designed for the environment it was built for. Engineers still find themselves manually running operations which separate sensitive networks from public access. Teams lack visibility across an entire IT infrastructure. Process automation, linked to smart use of AI support with AIOps, is the answer, lending visibility, security, and automation management to proceedings.

AI and automation must also be particularly sensitive to creative industries. The recent AI “win” by Hollywood writers will spur more groups to protect themselves from the threat of GenAI taking over or impinging on creative and copyrighted material. Media pundits will continue to warn of its dangers and governments will remain behind in their attempts to regulate the technology, while leaning on the large language model (LLM) providers to guide the process by shaping the future.

devmio: Do you have any advice for developers looking to expand their knowledge about automation and AI? Where should they be focusing their efforts right now?

Tim Armandpour: Major investments into Gen AI (Amazon/Anthropic, MSFT/OpenAI, Databricks/MosaicML) shows that data science will be key. Also needed are skills pertaining to new areas in natural language processing, machine learning, data engineering, data visualisation and strategic data analysis. Education suited for GenAI includes proficiency in prompt engineering. This is learning how to ask questions to extract meaningful responses from AI. Prompt engineers must also collaborate with cross-disciplinary teams, fulfilling consulting and quality control roles. Developers should therefore also focus on improving their soft skills to help them communicate with multidisciplinary human teams.

devmio: And lastly, do you have any tools that you would like to suggest for AI solutions? What has worked for you and your teams?

Tim Armandpour: PagerDuty AIOps remedies manual work complexity and noise. The PagerDuty Operations Cloud Fall 2023 release includes event orchestration, runbook automation and updates around GenAI use cases, helping organisations cut operating costs, accelerate innovation, and mitigate risk of operational failures.

Event-driven automation and intelligent remediation enables organisations to build intelligent automation that helps inform other tools and processes for faster, more targeted incident response that can be standardised across an organisation.

Runbook Automation Add-On allows customers looking to automate actions triggered by AIOps, incident responders, or customer service representatives to benefit from capabilities of both Automation Actions and Runbook Automation on a single SKU.

Leveraging the synergies of Runbook Automation Add-On and AIOps helps resolve incidents 95% faster by allowing the delegation of repetitive tasks to incident responders while also freeing up specialists’ time. With the Runbook Automation Add-On, customers can also reduce planned downtime by 85% and support costs by 55%. In addition, AI-Generated Incident Postmortems enable teams to save hours of effort.