Cloud provider VM outage

Incident Report for SEKOIA.IO

Resolved

The situation is back to normal even though we are missing incident updates from our cloud provider. All machines are now online and event processing is back in real-time. Our external TCP and HTTP probes are not returning any errors, as well as our APIs.

Posted Apr 03, 2024 - 03:38 CEST

Monitoring

The web application, API and ingestion endpoints have been fully accessible for a few minutes now. We are still monitoring the situation, as some processing backlog was bufferized internally while the VM were offline.

Posted Apr 03, 2024 - 02:59 CEST

Identified

The issue has been confirmed by our cloud provider. A generalized incident is ongoing on their Gravelines datacenter. For more info from their side, see https://public-cloud.status-ovhcloud.com/incidents/897ngd9y00sq

It seems that our hosts are gradually coming back online. We are monitoring the recovery of the platform.

Error rate on API and event ingestion is currently going down.

Posted Apr 03, 2024 - 02:53 CEST

Investigating

Our monitoring system indicates that we lost connectivity to several virtual machines at the same time. This is most likely an issue caused by an incident at our cloud provider. We are investigating.

Events are still being processed but a number of API and intake endpoints are currently returning 50x errors.

Posted Apr 03, 2024 - 02:37 CEST