During the morning hours of August 17th, our redundant pair of load balancers failed and brought down all services on the 241 network. These services included JES email, Wiki, WebPublish, and many others.
The reason for the outage was that one of the load balancers lost its primary network interface. Normally, traffic would automatically failover to the secondary unit and most services would continue to operate normally, with a few needing a restart to reconnect lost database connections, but on Wednesday morning the secondary load balancer also had an issue and failed to respond. The result was we were left with neither load balancer operating.
We had scheduled an upgrade to a new pair of load balancers for August 23rd and as the equipment was ready, we were able to shorten the outage time (other than a configuration issue that affected QShare slightly longer) by installing the new pair early.
Services were back to normal by 10:00 a.m. and we have new load balancers in place. Some operational cleanup is still required but nothing that will affect services.
We apologize for the inconvenience this outage caused.