[FRA1] Lag on event processing

Incident Report for SEKOIA.IO

Resolved

Everything is stable, and we are under 30min of delay.
At the current rate, we estimate to fully catch up on the lag in about 2 hours.
Posted Jul 09, 2024 - 18:39 CEST

Monitoring

The platform is now consuming events at nominal rate again, and is catching up on lag.
You can expect around 30min lag right now.
We'll keep monitoring for a while.
Posted Jul 09, 2024 - 18:03 CEST

Identified

A deployment caused OOM on our ingestion service.
Our team had to stop it and rollback the deployment before starting it again.
We are taking lag on event processing on this time, but no events are lost.
We will come back to you shortly.
Posted Jul 09, 2024 - 17:14 CEST