Service Disruption

Cloud Computing: AWS glitch strikes Netflix and Tinder, offering a wake-up call for others

Grazed from NetworkWorld. Author: Katherine Noyes.

Netflix, Tinder and other major websites were affected for a time Sunday by glitches in Amazon Web Services' Northern Virginia facility, offering a cautionary lesson to other companies that rely on the cloud service for mission-critical capabilities. The problem manifested itself primarily in the form of higher-than-normal error rates. Sites affected reportedly also included IMDb and Amazon's Instant Video and Books websites.

At the heart of the snafu were issues with AWS's DynamoDB database, but it spread to include other services such as EC2, the mobile-focused Cognito service and the CloudWatch monitoring service, according to the AWS Service Health Dashboard. "The root cause began with a portion of our metadata service within DynamoDB," AWS explained in a dashboard update posted at 4:52 a.m. PDT on Sunday...

How to move your business to the cloud with minimum down time and disruption

Grazed from ManufacturingGlobal. Author: Abigail Phillips.

As with any new technology, cloud computing has been through the hype cycle, with inflated expectations giving way to disillusionment and moving further along the maturity curve toward realistic, practical and beneficial solutions. However, for many organisations, whether they have taken their first steps toward the cloud or are still contemplating a move, the journey can seem like an overwhelming task. In order to leverage maximum cloud benefit, whether organisations are just starting or are well on the way, a pragmatic approach is needed for the journey to the cloud.

The need for digital transformation within organisations is one of the biggest drivers for the cloud from the perspective of the CEO, along with the desire to transform the business and enhance the customer journey. However, the IT manager and the business typically have different cloud priorities, as the IT manager is tasked with improving processes and cutting costs...

Cloud Computing: Expired Google certificate temporarily disrupts Gmail service

Grazed from PCWorld. Author: Lucian Constantin.

Google forgot to renew one of its TLS certificates, leading to service disruption Saturday for people using Gmail through third-party email clients. The problem was fixed in a matter of hours, but should serve as a reminder to online service operators that keeping track of digital certificate expiration dates is important and should be planned for in advance.

Some users reported Saturday on Twitter and other sites that email clients like Microsoft Outlook and OS X Mail were displaying certificate errors when trying to send email messages through smtp.gmail.com. It seems that it wasn’t the SMTP (Simple Mail Transfer Protocol) server’s certificate that expired, but one higher up in the chain that corresponded to Google Internet Certificate Authority G2—an intermediate certificate authority operated by Google...

Read more from the source @ http://www.pcworld.com/article/2906216/expired-google-certificate-temporarily-disrupts-gmail-service.html

Google Blames Software Glitch for Two-Hour Cloud Service Disruption

Grazed from eWeek.  Author: Jaikumar Vijayan.

Google’s Compute Engine cloud infrastructure hosting service suffered a nearly two-hour disruption between late Feb. 18 and early Feb. 19 causing disruption for customers on a global scale.  The disruption started at 10:59 p.m. Pacific Time on Feb. 18 and was resolved shortly before 1 a.m. on Feb. 19, a Google incident report noted.

The company blamed the outage on a glitch in an internal software system used to manage virtual machine egress traffic on the Google Compute Engine. According to Google, the software stopped issuing updated routing information bring outbound traffic to a halt...

Fukushima sends Japanese IT to the cloud

Grazed from The Register. Author: Phil Muncaster.

Analysis The devastating triple whammy of earthquake, tsunami and nuclear meltdown which struck Japan in March 2011, has led many IT managers to rebuild their infrastructure with a key focus on disaster recovery and business continuity, according to experts.

It’s an effort which has had obvious knock-on benefits for cloud computing, virtualisation and mobile vendors touting their wares in the land of the rising sun but also teaches some important lessons about IT best practice. Now the dust has settled on one of the world’s worst disasters in recorded history, the short and longer term impact on IT operations becomes clear...

Despite recent cloud service outages, security a bigger concern than availability

Grazed from PCWorld. Author: Tony Bradley.

Wow. No sooner did I finish writing about how the Google and Microsoft outages were not a reason to lose confidence in the cloud, than Amazon went down. The online retail site—and its associated cloud services—were down for just under half an hour Monday afternoon. I stand by my assertion that the sky is not falling, but there’s more to using the cloud than just availability. Amazon.com was the third major cloud service to suffer an outage in the last week.

Over on WindowsITPro.com, Paul Thurrott summed up the hysteria over cloud outages nicely. “And of course, the cloud computing doubters—who, like global warming doubters are increasingly at odds with reality—will argue that such outages prove that our move away from on-premises hardware and local storage is nothing but a temporary trend.”Let’s start with some perspective, breaking down the math like I did yesterday for Google and Microsoft. Amazon was down for about 25 minutes (although I’ve seen reports from 15 minutes to 40 minutes)...

No Reason to Panic Over Periodic Cloud Outages

Grazed from ChannelNomics. Author: Larry Walsh.

Amazon.com became the fourth major site and/or Internet service to go dark in the past week. The sudden outage that lasted 15 minutes meant millions of online consumers couldn’t order “50 Shades of Gray” or the latest John Mayer CD.

More importantly, though, this string over service outages is drawing attention to the fragility of the Internet and cloud-based services. While cloud computing is still evolving, it has become an indispensable part of our daily work and personal life. Consider what’s happened in the past week.

  • Microsoft’s Outlook.com – the recently rebranded cloud email service – was dark for many users for days. Microsoft has issued an apology to users and has restored service. However, the outage comes as Microsoft is touting the high uptime for Office 365 and other cloud services.
  • The New York Times – the gray old lady and bastion of traditional journalism – was offline for several hours last Wednesday due to technical difficulties. The Washington Post described the scene as people “surging out of their offices in a blind panic” because they couldn’t catch up on the latest news trends...

Cloud Computing: Google Investigates Google Drive Disruption

Grazed from InformationWeek. Author: Eric Zeman.

Google admitted Monday morning that its cloud-based Google Drive service is experiencing a service disruption. According to a statement made on Google's service dashboard, "We're investigating reports of an issue with Google Drive. We will provide more information shortly." Reports of trouble accessing Google Drive spread across social networks such as Twitter Monday. Users from around the globe reported trouble connecting to the service, as well as service crashes.

Google Drive is Google's cloud-based storage and productivity service. It encompasses Google Docs, Sheets and Presentations, in addition to providing document and file storage. Google Drive is accessible from various avenues, including Web browsers, desktop apps and apps for Android and iOS devices. The standalone mobile apps appear to be working more consistently at the moment when compared to the browser-based versions of the service...

Cloud Computing: Amazon E-Retail Down For Hour On Thursday

Grazed from InformationWeek. Author: Charles Babcock.

For about an hour starting around 11:30 a.m. Pacific, the Amazon.com retail site was inaccessible to customers. Attempts to access the site were met with a http://1.1 message on an otherwise white and unadorned screen. The site appeared to be operating normally again about 12:45 p.m. Pacific. Several customers noted the outage on Facebook and Twitter. At 12:24 p.m. Pacific, The San Francisco technology reporting site, GigaOm, noted: "Amazon is down; Yes, you read that right."

Security blogger George V. Hulme noted about 12:30 p.m. on Facebook: "Http/1.1 Service Unavailable." Queries submitted to Amazon Web Services and Amazon.com public relations personnel did not result in any immediate response. Amazon Web Services appeared to be up and running during that time. The AWS home page appeared on cue and key services, such as the Heroku cloud running on AWS, were still up during the period Amazon.com was inaccessible.