Hyonix - LON1 Network becoming unstable – Incident details

All systems operational

LON1 Network becoming unstable

Resolved
Partial outage
Started 2 months agoLasted 9 days

Affected

Core Networks

Partial outage from 10:23 PM to 7:37 AM, Operational from 7:37 AM to 5:17 PM

London, UK (LON1)

Partial outage from 10:23 PM to 7:37 AM, Operational from 7:37 AM to 5:17 PM

Updates
  • Resolved
    Resolved
    This incident has been resolved.
  • Monitoring
    Monitoring

    We were able to confirm that the issue was solved after multiple tests from multiple sources.

    Our team will schedule a core router upgrade in the coming days to fix this issue permanently. For now, no further impact is anticipated.

    We sincerely apologize for any inconvenience caused.

    Please let us know if you still experiencing any issues!

  • Identified
    Update

    We've completed the rollback and everything looks to be restoring to normal. We were still monitoring the issue closely.

  • Identified
    Identified

    After extensive troubleshooting effort and gathering data from other sources (upstream, partners), we've been able to narrow down the issue to one of our Cores not forwarding packets properly, causing an internal ARP flood. This issue was caused due to an Arista software bug after we converted our network to VXLAN-EVPN based.

    Our team began to rollback all the changes.

  • Investigating
    Investigating

    This is a snapshot of the incident timeframe, status was posted after the incident concluded due to lack of human resources.

    We were aware of the issue where LON1 would randomly dropping packets. Our team is currently waiting for responses from all related parties.