The situation is back to normal even though we are missing incident updates from our cloud provider. All machines are now online and event processing is back in real-time. Our external TCP and HTTP probes are not returning any errors, as well as our APIs.
Posted Apr 03, 2024 - 03:38 CEST
Monitoring
The web application, API and ingestion endpoints have been fully accessible for a few minutes now. We are still monitoring the situation, as some processing backlog was bufferized internally while the VM were offline.
It seems that our hosts are gradually coming back online. We are monitoring the recovery of the platform.
Error rate on API and event ingestion is currently going down.
Posted Apr 03, 2024 - 02:53 CEST
Investigating
Our monitoring system indicates that we lost connectivity to several virtual machines at the same time. This is most likely an issue caused by an incident at our cloud provider. We are investigating.
Events are still being processed but a number of API and intake endpoints are currently returning 50x errors.
Posted Apr 03, 2024 - 02:37 CEST
This incident affected: FRA1 - XDR (Ingestion, Web application) and FRA1 - CTI (Search, API consumption, TAXII consumption, MISP consumption, Web application).