Fonality Suffers Four-Hour Outage

September 21, 2016 Jeff Ferry

Fonality Customer Service - “Transparency Breeds Trust”

Fonality is a leading hosted voip and pbx provider offering hosted voip and unified communications solutions for small to enterprise, multi-location companies.

Unified communications vendor Fonality suffered a network-wide outage on Friday that left most of its 80,000 subscribers unable to use their voice or other Fonality services for four hours. Service was restored around noon pacific time on Friday. In an effort to be transparent with customers and other industry participants, Fonality Chief Marketing Officer Jeff Valentine agreed to explain the situation to the Daily Cloud.

Early on Friday morning, a Fonality software engineer at Fonality’s main facility in Dallas, Texas was working on a future version of Fonality’s software running in a Fonality test network. Inadvertently, he connected that network to Fonality’s production network. The effect of that was to bring down Fonality’s domain name server (DNS) system. That’s the system that translates a domain name (such as fonality.com or dailycloud.info) into an IP address (such as 74.115.96.36 or 173.236.245.124). According to Valentine, Fonality quickly saw that customers could not use their voice or other services, but also found that the voice servers and other servers were up and running. The problem, they discovered, was that nobody could get to those servers. It affected not just voice, but video, chat, and Fonality’s trademark HUD (heads-up display) system. In addition the website, the help desk, and even the “trust” page (trust.fonality.com), showing the status of Fonality services, were unreachable. “He created a network bridge between two networks and it blackholed every single request to the name servers,” Valentine says.

Once the team identified the problem, it was not hard to reverse it. But then every customer had to be notified to reboot their system, a process that took significant time too. “It was an enormous mess as a result of one engineer’s mistake,” Valentine says. It seems that DNS issues are not that rare. Only last week, Microsoft Azure experienced a DNS issue, as we reported here on Daily Cloud. And in March 2015, Apple experienced a DNS issue that brought down iTunes and iCloud for 12 hours, with significant revenue implications. Apple issued a public apology to customers and investors. “DNS is one of those things that gets overlooked,” Valentine says. “You make your voice servers super-redundant, but you take it for granted that DNS will always work.”

Fonality has already begun implementing two major changes to ensure this problem does not recur. The first involves the process by which changes are made to the production network. “We have 280 people and a very good change management policy,” Valentine explains. What happened this time was that a software development engineer made changes that affected the production network. “We have disabled their [development engineers] access to the production network. From now on, every change to our production network must go through our network team.” The second change was to build a third backup to the DNS system. “DNS is too important. We cannot let it go down in the future. We already had a backup, but that didn’t help in this case. So we need a backup for the backup. We are putting in place an offsite system with DNS and name servers at a different location.”

Fonality senior management also developed a communications plan to reach out to customers, explain and apologize, and address any customer questions, issues, or anger. “The entire executive team worked all weekend on a plan,” says Valentine. One plank in the plan was an email from CEO David Scult explaining the situation which went out to every customer on Monday. Another plank was a social media campaign to address customers’ questions and issues, led by Kristen Cruz, Fonality’s Senior Director for Digital Marketing. Cruz led a team of three people who responded to questions and complaints, from Friday morning through the weekend and into this week. Cruz told us they had 70,000 impressions and more than 1,200 “engagements,” primarily on Facebook and Twitter.

“We have almost 2,000 followers on Facebook,” Cruz said. “For many customers, that’s their primary way of communicating. The fact that we responded personally to those customers, showed them that we were aware of the problem and on top of it, and their comments weren’t going unnoticed, that made a difference.”

Ironically, the “trust” page, which should have explained the reasons for the outage and the expected recovery time, was unreachable, even though it was not hosted at fonality.com, and was instead hosted at a separate service, statuspage.io—precisely to avoid a situation where it would go down when fonality.com went down. But the name trust.fonality.com was dependent on Fonality’s DNS system, which was hit on Friday. The company has begun publicizing an alternate URL as a second way to reach trust.fonality.com.

But Valentine said that although there were some angry, vitriolic comments, in general customer and partner reactions to the outage were not as negative as management expected. “People were less passionate than we were expecting,” Valentine says. “Our partners were surprisingly understanding. People said: it happens, nobody is perfect.”

One place where the pain will be felt will be on the revenue line. Valentine said Fonality cancelled all marketing activities for two days, which should hit revenue by some 10% this month, and there could be further fallout from lost business in the days ahead.

In fact, cloud outages are a common problem. In the past week alone, Fonality, Microsoft Azure, and Google all suffered serious outages. Vendors and their surrogates at the leading analyst firms work hard to try to focus customer attention on the positive features of cloud technology. But in Daily Cloud discussions with CIOs and their teams, we find customers are well aware, and deeply concerned about reliability records. The growing complexity of major public clouds, with new technologies deployed regularly, almost guarantees that there will be occasional outages. Meanwhile, the culture of many startups, a relentless dash for growth, often without the recruitment of enough senior, experienced management to manage that growth, and with a layer of secrecy and self-promotion layered on top, also makes problems all too likely. In that context, we’ve always found Fonality to be surprisingly open and straightforward as compared to many other startups.

Transparency, says Jeff Valentine, is an important part of Fonality’s culture. “Everybody has problems, but with Fonality, the customer will always know. We’re being transparent and that makes us more trustworthy.”