Uptime Escalation Matrix #71

opened 2024-07-02 19:08:39 +00:00 by mik-tf · 4 comments


We need an escalation matrix to ensure 99.9% uptime for TF.


  • We need 7/7 24h escalation possibilities to team (need to make shifts)


How to Use the Matrix

  • If something is urgent enough -> escalate to people in matrix
# Todo We need an escalation matrix to ensure 99.9% uptime for TF. # Parameters - We need 7/7 24h escalation possibilities to team (need to make shifts) # References - We already have an escalation matrix defined in the private tfgrid repo: https://git.ourworld.tf/tfgrid/info_tfgrid_private/src/branch/development/collections/procs_and_docs/escalation_matrix.md - Adapt from this to 99.9% uptime escalation matrix # How to Use the Matrix - If something is urgent enough -> escalate to people in matrix
mik-tf changed title from TF 99.9% Uptime to Uptime Escalation Matrix 2024-07-02 19:09:10 +00:00
mik-tf added this to the (deleted) project 2024-07-02 19:09:14 +00:00
despiegk modified the project from (deleted) to tfgrid_3_17 2024-07-03 12:52:06 +00:00
despiegk modified the project from tfgrid_3_17 to (deleted) 2024-07-03 12:52:11 +00:00
despiegk modified the project from (deleted) to tfgrid_3_17 2024-07-03 12:52:29 +00:00

We could train the support team and add them to the monitoring group (that needs to be optimized). Support team could be split up to have 24/7 coverage. Once there indeed is an issue, they can get in touch with the person on call. Let's discuss

We could train the support team and add them to the monitoring group (that needs to be optimized). Support team could be split up to have 24/7 coverage. Once there indeed is an issue, they can get in touch with the person on call. Let's discuss

That's perfect. I like that plan. Yes let's discuss and make it work. We can coordinate with Sherwin for this phase.

At this point we will have clear documentation on the monitoring system (e.g. alerta.io, with text + video guides).

That's perfect. I like that plan. Yes let's discuss and make it work. We can coordinate with Sherwin for this phase. At this point we will have clear documentation on the monitoring system (e.g. alerta.io, with text + video guides).
despiegk added the
label 2024-07-28 07:58:11 +00:00




  • We don't have much more work than the escalation matrix provided
  • We should discuss further in the engineering calls how to proceed.
    • Basically we have the basic, but we must set an insfratructure/process so that we have 24/7 escalation
    • It can be to check the support team working hours, and assigned the tasks to escalate issues the rest of the time when support isn'T there
    • I know @scott discussed with Marie (@itsathreefoldworld) about how we could have some community members filling this gap.
# Update - We don't have much more work than the escalation matrix provided - We should discuss further in the engineering calls how to proceed. - Basically we have the basic, but we must set an insfratructure/process so that we have 24/7 escalation - It can be to check the support team working hours, and assigned the tasks to escalate issues the rest of the time when support isn'T there - I know @scott discussed with Marie (@itsathreefoldworld) about how we could have some community members filling this gap.
Sign in to join this conversation.
No Milestone
No project
No Assignees
3 Participants
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.


No dependencies set.

Reference: tfgrid/circle_product_management#71
No description provided.