While serving on the front lines of major incident response, the last thing enterprise IT teams need is to have to consult an org chart to determine the right participants across several functional areas. Whether delivering new services or restoring services after a disruption, the core components of the development and production squads: DevOps, IT infrastructure, operations, application support and security operations; should be identified in advance and working in concert at all times. Unfortunately, organizations have invested in service technology that, while effective in its basic functionalities, has only increased the silo effect between disparate IT teams. Early detection tools, like application performance monitoring (APM), network performance monitoring (NPM), application aware network performance management (AANPM), as well IT Monitoring tools, IT service management (ITSM) and ticketing systems, are all worthy investments. However, if these tools are not integrated in an open, flexible and accessible manner by all the key stakeholders involved in responding to an IT issue, the anticipated gains in efficiency, coordination and response time just may not be there.
Per Gartner: “Coordinating incident response across the organization is the biggest challenge for most enterprises”. This means today’s IT challenges require a new generation of IT response solutions that leverage automation in order to efficiently guide key stakeholders through the entire end-to-end incident resolution process, from detection all the way through full restoration, while keeping impacted customers informed. This includes the communications, collaboration and orchestration processes required to resolve IT incidents, ensuring IT responders have the right contextual information, and automating the required human and digital steps based on preference and accessibility. In today’s fast-paced and agile development environments, application, DevOps and other IT teams need to be able to design their own integrated cross-application and cross-tool workflows as quickly as possible. This requires a flexible self-service integration platform that easily shares the needed data, mixes and matches the required tools and also seamlessly interoperates with associated functions ranging from patch management to security, ticketing, remediation and validation.
Unfortunately, most enterprises are relying on vendor-built and vendor-supported integrations that are either “TBDs” on a vendor roadmap, “one-size-fits-all” in their approach, or requiring costly, time-consuming customizations before they meet a team’s specific requirements. On the main, this approach results in an increase in integration costs, frustration and delays that impact both response time and time-to-market.
Self-service integration: Empowering IT teams
Increasingly, enterprise organizations are turning to a “low code” or “no code” self-service approach to incident response through which IT teams can design and construct integrations through point and click actions. This integration-platform-as-a-service (iPaaS) architecture enables global enterprises to quickly take ownership of integrations for their response process, while maintaining their choice of management tools that participate in the process. Effective integration of management tools enables enterprises to develop a closed loop response automation process to any critical IT incidents, changes and requests. In this scenario, “real people,” not a ticketing system, are contacted to resolve issues in a timely manner, the required tools are appropriately mixed and matched and the necessary data is easily shared with the right response teams.
Core benefits of an iPaaS approach to incident response automation include:
- Flexibility for enterprises – freedom from vendors: An open platform is able to ingest events from current and future IT tools, no matter where they are currently “silo-ed,” including IT monitoring, APM, NPM, SIEM, event correlation, ticketing systems, DevOps, security, configuration management databases and patch management, release and change management. Ideally, the platform provides self-service access to data by users, and receives alerts and critical event feeds from any point solution. It can then process and analyze the inbound data, assess the criticality of events and trigger appropriate rules-based responses with no development required on the end-point solution. All this gives the user flexibility in choosing and deploying tools without relying on vendors to deliver integrations or extensions.
- Engage the right responders at the right time: Where traditional targeted alert systems simply notify stakeholders when something goes wrong, an integration platform communicates intelligently, automatically alerting and engaging with all key players via global multi-modal targeted notifications. If there is no response, automatic escalation kicks in, engaging the next person in line based on the best matching profile. Not limited to the walls of the enterprise, some of these solutions ensure that partners and affected business users can be notified about the IT issue and provided updates about its resolution.
- Duh…Solve the IT issue!: To address complex IT issues, teams must communicate, collaborate and orchestrate from the moment an incident is identified. An integration platform enables everyone on an organization’s response team access to “one-click, sign-in-to” collaboration tools such as virtual war rooms, conference bridges and ChatOps channels. In case of a known issue where a runbook exists with predefined steps to provision or repair critical systems– its execution can be triggered directly from the notification itself. The IT task automation process can also embed human decision points within workflows. Using bi-directional communication, users can remotely control the advancement or execution of any step within a process, such as re-starting a server or backing up a database, and trigger such remote initiation via SMS, IM, email, or phone.
- Analyze the issue and learn from it: After an incident, an integration platform is able to store audit trail and key compliance data needed by auditors, including full conference bridge recording and a copy of the ChatOps communication during a particular incident. Detailed information on the IT teams’ response performance is critical for improvement and measurement over time.
The current IT environment is rife with potential threats, many of which occur outside the bounds of a typical release schedule. Every minute of downtime from an IT incident means tens or hundreds of thousands of dollars lost when mission critical business applications are unavailable. In those scenarios, open communication and cooperation between the traditionally separate entities involved with Dev, Ops and security is absolutely essential. Rather than spend money to customize existing tools that might never effectively mesh, an iPaaS approach is the most powerful, cost-effective and flexible means to ensure all hands are truly on deck to minimize the impact of an IT incident or outage.