Our team has received the final Reason for Outage (RFO) from the datacenter and has summarized key details below. Our team is also discussing internally various options that can enhance service reliability and prevent similar incidents in the future.
We sincerely apologize for the service disruption and any inconvenience this incident may have caused your operations. We fully understand the impact such outages can have on your business and take this matter very seriously.
If you have any questions about this incident or need additional information for your records, please don't hesitate to contact us via support ticket.
----
Data Center Power Outage - May 6, 2025
West 7 Center - Data Halls L1-130 & L1-135
Incident Overview
On May 6, 2025, West Center experienced a critical power and cooling outage affecting data halls L1-130 and L1-135. The incident began at 3:45 PM and was fully resolved by 9:00 PM, resulting in approximately 5 hours and 15 minutes of service outage.
Detailed Timeline of Events
3:45 PM - Initial Detection
Smoke alarm activated in the main power switchgear room
Building engineering team responded immediately to investigate the source
Backup generators automatically started due to utility power loss
Fire safety protocols took priority, preventing immediate generator troubleshooting
4:15 PM - Smoke Alarm Reset
After 45 minutes of investigation, the smoke alarm was reset within the Fire Life Safety system
The source of smoke remained undetermined at this time
Engineering team suspected the smoke originated from the utility company's transformer area (inaccessible to facility staff)
4:15 PM - 4:30 PM - Power Investigation
4:30 PM - UPS System Failure
The Uninterruptible Power Supply (UPS) system in L1-135 shut down due to battery depletion
The UPS had been cycling between battery and unstable generator power, preventing proper load transfer
Generator instability caused the system to enter "Lock-Out" condition
5:20 PM - Root Cause Identified
5:33 PM - Utility Power Restored
The 4000-amp utility breaker was successfully reset
Computer Room Air Handlers (CRAH) units in L1-135 automatically restarted
UPS system in L1-135 failed to restart automatically
7:45 PM - UPS Technician Arrival
8:30 PM - UPS System Restart
9:00 PM - Full Service Restoration
Root Cause Analysis
Initial Investigation (May 2025) The facility's engineering team, along with senior electrical consultants from Fakouri Electrical Engineering and Ramboll Electrical Engineering Group, conducted comprehensive testing of all electrical infrastructure. The investigation revealed that the incident was caused by an intermittent ground fault originating from customer equipment - specifically a faulty power distribution unit (PDU) located within a customer cabinet in L1-130.
Updated Root Cause (May 26, 2025) Further investigation revealed the complete picture: the ground fault from the customer equipment triggered an arc flash in the 4000-amp bus duct located in the ceiling of the main switchgear room. This arc flash caused an explosion that created a puncture approximately six inches from the adjacent wall, making it difficult to detect during initial inspections. The damaged bus duct connected the utility feed to the affected data halls.
Corrective Actions Taken
Immediate Actions:
May 14, 2025: Removed the identified faulty customer PDU from L1-130
Identified and began monitoring a second cabinet in L1-130 that was activated less than 12 hours before the incident
Scheduled additional inspection with third-party consultant (Hawthorne Power System)
Infrastructure Repair:
May 22, 2025: Completed emergency replacement of the damaged 4000-amp bus duct
Collaborated with three engineering consultants and preferred electrical contractors
Successfully rerouted power to backup utility source during replacement
Replacement completed without service interruption (7:00 AM - 2:30 PM)
Policy Changes:
Implemented new mandatory testing and inspection policy for all customer-provided equipment
All customer equipment must now undergo facility engineering review before installation
Enhanced monitoring procedures for recently installed customer equipment
Preventive Measures
The data center has implemented several measures to prevent similar incidents:
Equipment Inspection Protocol: All customer-provided equipment must undergo comprehensive testing and inspection before installation and use within the facility
Enhanced Monitoring: Increased surveillance of customer equipment, particularly newly installed systems
Third-Party Validation: Engaging additional senior power equipment consultants for independent facility assessments
Infrastructure Improvements: Complete replacement of damaged electrical infrastructure with new, properly tested components
Current Status
All electrical infrastructure has been fully restored and upgraded. The facility is operating normally with enhanced safety protocols in place. The emergency bus duct replacement was completed without any service interruption, and all systems have undergone comprehensive testing to ensure reliability.
The data center management has emphasized that this incident represents an unusual and isolated occurrence in their three decades of critical facility operations, and they remain committed to maintaining the highest standards of reliability and safety.