This article provides an introductory overview of the processes EmpowerID follows in monitoring SaaS environments. As an introduction, it does not include details about all the various aspects of Site Reliability Engineering or Security Information and Event Management performed by EmpowerID. For instance, performance monitoring is not discussed in great detail here. Instead, this article focuses on availability monitoring and the processes followed by the EmpowerID DevOps team to ensure a base level of service with minimal impact on end-users. The information provided here is intended to help SaaS customers understand what EmpowerID performs in this area and help the in-house operations team of non-SaaS customers know what they need to monitor to achieve parity.

Within the context of availability monitoring, as a solution, EmpowerID can be divided into three broad areas:

Front-end services, which include any EmpowerID UI and API that users or systems interact with,
Back-end services, which include the various EmpowerID jobs, processes, and automation that occur without user interaction, and
The underlying infrastructure monitored by EmpowerID DevOps.

Front-End Monitoring

EmpowerID DevOps primarily monitors site availability by ensuring the main web applications load. Azure Monitor is used for this purpose, and the following URLs are loaded:

Core Login: https://<core-domain>/WebIdpForms/Login/Portal
IAM-Shop (if applicable): https://<iamshop-domain>
My-Identity (if applicable): https://<myid-domain>

These URLs are checked every two minutes per Azure region, and two primary regions are usually configured for monitoring. The primary and child requests are checked to ensure all are successful. Three consecutive failures result in a High-Priority alert being raised, which would then be handled by the EmpowerID DevOps team.

Aside from the active front-end monitoring described, passive error rate monitoring is optionally performed when there is a large user base, and EmpowerID UI is utilized frequently and regularly by end-users. Azure Application Gateway provides a failed-requests metric. A High-Priority alert is raised if the error rate exceeds the 5% threshold and sustains for more than five minutes.

Backend-Monitoring

Most clients utilize EmpowerID partly or solely for identity lifecycle automation, sometimes more so than UI-based functionality. Even if the primary use cases involve UI-driven processes, EmpowerID’s backend processes are invaluable to overall system functionality. Therefore, much effort has been spent to monitor all of EmpowerID’s various backend processes.

Because EmpowerID persists all vital information – including process state information – in one database, EmpowerID has implemented a simple, effective mechanism to report process health. A stored procedure named Z_EmpowerID_Health checks process state information against predefined criteria and outputs a list of problematic conditions that require attention. The configuration and details of these health checks and a complete listing of the checks performed is detailed here EmpowerID HealthCheck: SQL Procedure Z_EmpowerID_Health .

EmpowerID DevOps deploys a particular monitoring container that invokes this health-check procedure every five minutes and submits any reported problem conditions to Azure Monitor. If the problem condition is reported consecutively in polling intervals, then a medium-priority alert is raised.

Infrastructure Monitoring

EmpowerID SaaS runs in Azure utilizing various products, such as Azure Kubernetes Services (AKS) and SQL Database (as-a-Service). To stay ahead of issues before they affect front-end and back-end services, EmpowerID DevOps monitors specific metrics for each service. A medium or high-severity alert is generated depending on the metric and threshold. Some of the metrics monitored include:

SQL Database:
- Free Space Remaining: <15% raises medium-severity alert
- Deadlocks: >3 deadlocks within 10 minutes raise a high-severity alert
- Average CPU: >90% utilization raises a medium-severity alert

Alert Handling

EmpowerID utilizes Azure Monitor to aggregate metrics, evaluate rules, and raise alerts. Actions are configured in Azure Monitor to trigger alerts in Atlassian Ops Genie, which then pages EmpowerID DevOps personnel. Depending on the severity, EmpowerID manages these alerts in the following way:

High-severity alerts page the person/people on-call no matter the time of day and follow-up with escalations if not acknowledged.
Medium-severity alerts page people during waking hours so they can be followed up accordingly.

Monitoring of EmpowerID SaaS

Front-End Monitoring

Backend-Monitoring

Infrastructure Monitoring

Alert Handling