Sign into your workshop cloud instance. Details of your login credentials and should have been emailed to you.
On the left sidebar go to Alerts and IRM --> Alerting --> Alert Rules
Click + New Alert Rule
Enter an Alert rule name. To Differentiate from others use the formatting of {{initials}}-TestAlert. ex: MC-TestAlert
2.1 Define query and Alert Condition.
Switch from builder to code
Copy and paste in the query:
histogram_quantile(0.95, sum(rate(traces_spanmetrics_latency_bucket{span_kind=~"SPAN_KIND_SERVER|SPAN_KIND_CONSUMER", job="ditl-demo-prod/checkoutservice", deployment_environment=~".*"} [$__rate_interval])) by (le,job)) * 1000
This query is telling us the latency in MS of a service called checkoutservice
Click Run Query after you've added it in so you can see the current latency.
2.2 Let's play around with the alert condition. By Default it is saying If above 0 fire the alert, in this case that will make the alert always be firing.
Set it to 800 and click Preview alert rule condition to see that the alert rule preview no longer shows it firing since the current latency is below 800.
Set it to a lower number, like 100 so we can test out it actually firing to us.
2.3 Let's also look at what grouping looks like in Alerts. Temporarily replace your query with this query, where we have added cloud_region to the sum, breaking down the results by each cloud region we are deployed to.
histogram_quantile(0.95, sum(rate(traces_spanmetrics_latency_bucket{span_kind=~"SPAN_KIND_SERVER|SPAN_KIND_CONSUMER", job="ditl-demo-prod/checkoutservice", deployment_environment=~".*"} [$__rate_interval])) by (le,job,cloud_region)) * 1000
Put the original query back to keep it simple, but it's good to keep in mind that you can have multiple series returned in a single alert query. This can help provide more information in the alert, as well as reduce the number of alerts you have to create.
3.1 Create a New Alert folder with your name. Folders are just used for Organization in the UI.
3.2 Optionally, add some Labels.
Labels are helpful for identifying an alert instance, routing the alert to the appropriate contact point, and including additional information in the alert notification. Read more about them here.
4.1 Create a new Evaluation Group, name it {{initials}}-Eval Group.
4.2 Set your pending period to None. For testing purposes this will be helpful for us to see our alert quickly. In a production scenario, you would likely want this to be a little higher to help reduce false positives.
5.1 Create a new Contact Point by clicking View or Create Contact Points
5.2 Click Create Contact Point.
Give it a name
Select Email for integration
Enter your Email address.
Leave everything else as default
Save
5.3 Go back to the Alert Page and select this newly created Contact Point
Note - In more advanced scenarios, you can use Notification Policies to route to contact points based on labels. Contact Points and Notification Policies can be re-used across alerts.
6.1 Add in the summary:
Service latency for {{ index $labels "job" }} has exceeded threshold at {{ index $values "A" }} ms for the last minute.
6.2 Add in the runbook URL: https://github.com/mcove11/Alerting-Hands-On
6.3 Select the Dashboard and Panel: Systems Overview / Durations by Service
At the top, select Save Rule and Exit.
You will be brough to the Alerts Overview Page. Find the Folder you Created and expand it to show your alert group and rule. Watch in real time when it evaluates within the minute and turns to firing.









