Provider outage

Incident Report for SEKOIA.IO

Resolved

All the delayed events were fully handled at 8h50 CEST.
Posted Apr 19, 2025 - 09:28 CEST

Update

We're continuing to monitor the processing closely and are seeing steady progress in reducing the backlog. While it may still take some time to fully catch up, we’re doing everything we can to maintain stability and ensure no data is missed.
Posted Apr 18, 2025 - 18:50 CEST

Update

The backlog of the queued events is still being processed at maximum capacity. Our team is dedicated to clearing this backlog as efficiently as possible, ensuring that all events are handled promptly.
Posted Apr 18, 2025 - 16:37 CEST

Update

Cluster recovery is done with no data loss. We are still processing the backlog of queued events, at maximum capacity.
Posted Apr 18, 2025 - 01:40 CEST

Update

We are making great progress on fixing the event storage cluster.
All "cold" data is progressively getting available.
For all events ingested after 03:30 CEST, we are processing the delayed events.
We found a way to speed up the recovery process. The event storage cluster is steadily recovering.
Posted Apr 17, 2025 - 18:13 CEST

Update

We are still making progress on the event storage cluster.
All "cold" data is progressively getting available.
For all events ingested after 03:30 CEST, we are processing the delayed events.
The situation is progressively recovering however we had to slow down the process for the moment due a very high number of parallel tasks causing a risk for the cluster. We are trying to find ways to improve the situation faster.
Posted Apr 17, 2025 - 16:25 CEST

Update

We are still making progress on the event storage cluster.
All "cold" data is progressively getting available.
For all events ingested after 03:30 CEST, we are processing the delayed events.
The situation is progressively recovering however we had to slow down the process for the moment due a very high number of parallel tasks causing a risk for the cluster. We are trying to find ways to improve the situation faster.
Posted Apr 17, 2025 - 15:22 CEST

Update

We are still making progress on the event storage cluster.
All "cold" data is progressively getting available.
For all events ingested after 03:30 CEST, we are processing the delayed events.
The situation is progressively recovering.
Posted Apr 17, 2025 - 12:00 CEST

Update

We are still making progress on the event storage cluster.
All "cold" data is progressively getting available.
For all events ingested after 03:30 CEST, we are currently fixing the situation.
Posted Apr 17, 2025 - 11:01 CEST

Monitoring

We are still making progress on the event storage cluster.
All "cold" data is progressively getting available.
For all events ingested after 03:30 CEST, we are currently fixing the situation.
Posted Apr 17, 2025 - 09:53 CEST

Update

We are still working on the event storage cluster.
We are also making progress with alerts without events. Events linked to alerts are progressively available in the event storage cluster.
Posted Apr 17, 2025 - 08:55 CEST

Update

We are still working on the event storage cluster.
So far, event search are working but the oldest events are not available for search.
We are fixing this situation progressively meaning more and more older events will be available later.
On another hand, all events ingested from this morning 03:30 CEST are not available in the event storage cluster.
Posted Apr 17, 2025 - 07:53 CEST

Update

We are still working to stabilize the event storage cluster.
So far some event query and search are working however all data are not available for the moment.
Posted Apr 17, 2025 - 06:37 CEST

Update

We are still preparing a fix to rollout on our whole event storage cluster.
In the mean time, we fixed the automation cluster.
Posted Apr 17, 2025 - 05:28 CEST

Identified

Most services are up.
There is still some issues with our event storage cluster making events and events search not being available.
All events are still received and properly processed.
On another hand, automation (playbooks) is also having issues.
We are working quickly to fix these situations.
Posted Apr 17, 2025 - 05:06 CEST

Investigating

We had an outage on our main provider, and the network went down.
We are currently recovering access to the platform and fixing the different issues.
Posted Apr 17, 2025 - 04:22 CEST
This incident affected: FRA1 (Web application, Event ingestion, Event storage, Detection, Hunting, Case management, Automation, AI, CTI Search, CTI Feed (API), CTI Feed (TAXII), CTI Feed (MISP)).