免费网络电话(5icall.cn)只介绍和推荐最好的网络电话!
  免费网络电话(www.5icall.cn)   
当前位置:免费网络电话网络电话点评SKYPE和其他网络电话

Skype故障由Windows漏洞引起

减小字体 增大字体 作者:免费网络电话(本站)  来源:本站原创  发布时间:2010-12-30 18:53:18
      12月22日,Skype服务出现全球性宕机事件。该公司随后表示,导致该服务故障的原因是一个“软件问题”。拉伯今天对此次宕机事件作出了更详细的解释:由于Skype软件Windows客户端(版本号为5.0.0152)存在一个技术漏洞,导致多个负责处理离线即时信息的服务器过热,而无法及时处理大量离线信息。
 
      据国外媒体报道,Skype首席信息官拉尔斯·拉伯今天在其官方博客中表示,由于Windows客户端存在的一个技术漏洞,而引发了Skype服务上周全球性宕机事件。
 
      在发生Skype网络电话上述现象后,一些Skype客户端开始从过热服务器中接收延迟信息,而这些信息没能被Windows客户端正常处理,并由此最终引发了Skype服务全球性宕机事件。拉伯表示,由于技术漏洞存在于Windows客户端当中,因此在发生服务故障初期,Mac机、iPhone等设备的相应客户端用户并没有受到影响。
 
Skype故障
 
      拉伯在博文中写道:“超级节点对P2P(个人对个人)网络而言至为关键。与常规节点相比,超级节点需负责处理额外任务,如承担任务目录、支持其他Skype客户端和建立各超级节点之间的连接等等。如果某个超级节点无法正常工作,即使该节点重启工作,则将需要花费一定时间,才能使该超级节点重新被纳入P2P网络当中。如此一来,该P2P网络所拥有的超级节点数量将比正常情况下少25%~30%左右。这种情况也导致并未发生故障的超级节点无法正常加载。”
 
      但随着Windows客户端服务故障未能及时得到解决,导致Skype整个服务系统最终崩溃。据悉,在Skype所有用户中,Windows客户端用户比率为50%左右。而在所有Windows客户端用户中,约40%受到了此次宕机事件的影响。由于上述原因,Skype网络的“超级节点”(Supernodes)也出现服务故障,而Windows客户端约占Skype所有超级节点的三分之一。
 
      拉伯还表示,在大量Windows客户端用户遭遇服务故障后,他们选择了关闭该软件并重启,而这一举动又进一步导致Skype的P2P网络负载加重。他透露,在上周三发生全球性宕机事件当天,Skype网络超级节点的流量为正常值的100倍左右。
 
      Skype今后将同时采取多项有效措施,以杜绝诸如全球性宕机事件的再次发生。
 
 
该报道的英文版:
CIO update: Post-mortem on the Skype outage
      As a follow-up to last week’s outage, here is a detailed explanation of what transpired, the root cause, and plans to mitigate this from happening again in the future. For starters, it helps to understand that Skype is based on a peer-to-peer (P2P) network, which is explained here. Last week, the P2P network became unstable and suffered a critical failure. The failure lasted approximately 24 hours from December 22, 0800 PST/1600 GMT to December 23, 0800 PST/1600 GMT.
 
What was the cause for the failure?
On Wednesday, December 22, a cluster of support servers responsible for offline instant messaging became overloaded. As a result of this overload, some Skype clients received delayed responses from the overloaded servers. In a version of the Skype for Windows client (version 5.0.0152), the delayed responses from the overloaded servers were not properly processed, causing Windows clients running the affected version to crash.
 
Users running either the latest Skype for Windows (version 5.0.0.156), older versions of Skype for Windows (4.0 versions), Skype for Mac, Skype for iPhone, Skype on your TV, and Skype Connect or Skype Manager for enterprises were not affected by this initial problem.
 
However, around 50% of all Skype users globally were running the 5.0.0.152 version of Skype for Windows, and the crashes caused approximately 40% of those clients to fail. These clients included 25–30% of the publicly available supernodes, also failed as a result of this problem.
 
If approximately 20% of total Skype clients failed, why was there a much bigger disruption to Skype functionality?
Although Skype staff responded quickly to disable the overloaded servers and to eliminate client requests to them, a significant number of supernodes had already failed. A supernode is important to the P2P network because it takes on additional responsibilities compared to regular nodes, acting like a directory, supporting other Skype clients, helping to establish connections between them and creating local clusters typically of several hundred peer nodes per each supernode.
 
Once a supernode has failed, even when restarted, it takes some time to become available as a resource to the P2P network again. As a result, the P2P network was left with 25–30% fewer supernodes than normal. This caused a disproportionate load on the remaining available supernodes.
 
Why weren’t the other supernodes available to help?
The failure of 25–30% of supernodes in the P2P network resulted in an increased load on the remaining supernodes. While we expect this kind of increase in the instance of a failure, a significant proportion of users were also restarting crashed Windows clients at this time. This massively increased the load as they reconnected to the peer-to-peer cloud. The initial crashes happened just before our usual daily peak-hour (1000 PST/1800 GMT), and very shortly after the initial crash, which resulted in traffic to the supernodes that was about 100 times what would normally be expected at that time of day.
 
Supernodes have a built in mechanism to protect themselves and to avoid adverse impact on the systems hosting them when operational parameters do not fall into expected ranges. We believe that increased load in supernode traffic led to some of these parameters exceeding normal limits, and as a result, more supernodes started to shut down. This further increased the load on remaining supernodes and caused a positive feedback loop, which led to the near complete failures that occurred a few hours after the triggering event.
 
Regrettably, as a result of the confluence of events – server overload, a bug in Skype for Windows clients (version 5.0.0.152), and the decline in available supernodes – Skype’s functionality became unavailable to many of our users for approximately 24 hours.
 
How did Skype help support supernode recovery?
In order to restore Skype functionality, the Skype engineering and operations team introduced hundreds of instances of the Skype software into the P2P network to act as dedicated supernodes, which we nick-named “mega-supernodes,” to provide enough temporary supernode capacity to accelerate the recovery of the peer-to-peer cloud.
 
By late Wednesday night (PST) it was evident that only a proportion (about 15-20%) of Skype users connections were ‘healing’ and the volume of load on the supernodes continued to be unusually high. In response, our team introduced several thousand more mega-supernodes through the night. During Wednesday night, full recovery of the P2P network was underway and the majority of users were able to connect to the P2P network normally by early morning (California-PST) on December 23rd.
 
As we reported during the incident, in order to recover the core Skype functionality as quickly as possible, we utilized resources normally used to support Group Video Calling, to deploy supernodes, and over the course of Thursday night and Friday morning we returned these to their normal use and restored Group Video Calling functionality in time for Christmas.
 
The supernodes stabilized overnight on Thursday and by Friday, several tens of thousands of supernodes were supporting the P2P network. During Friday, we withdrew a significant proportion of the mega-supernodes from service, leaving some in operation to ensure stability of the P2P network over Christmas and New Year.
 
What is Skype doing to prevent this from happening again?
We understand how important the reliability, security and quality of our software is to Skype users around the world, and we work hard to maintain high standards, as well as develop new features and products.
 
First, we will continue to examine our software for potential issues, and provide ‘hotfixes’ where appropriate, for download or automatic delivery to our users. Since a bug was identified in Skype for Windows (version 5.0.0.152), we had provided a fix to v5.0 of our Windows software prior to the incident, and we will provide further updates for download this week. We will also be reviewing our processes for providing ‘automatic’ updates to our users so that we can help keep everyone on the latest Skype software. We believe these measures will reduce the possibility of this type of failure occurring again.
 
Second, we are learning the lessons we can from this incident and reviewing our processes and procedures, looking in particular for ways in which we can detect problems more quickly to potentially avoid such outages altogether, and ways to recover the system more rapidly after a failure.
 
Third, while our Windows v5 software release was subject to extensive internal testing and months of Beta testing with hundreds of thousands of users, we will be reviewing our testing processes to determine better ways of detecting and avoiding bugs which could affect the system.
 
Finally, as we continue to grow, we will keep under constant review the capacity of our core systems that support the Skype user base, and continue to invest in both capacity and resilience of these systems. An investment program we initiated a year ago has significantly increased our capacity already and more investment is planned for 2011 both to support the ongoing roll out of our paid and enterprise products, and to continue to support the growth of our core Skype software that we know millions of users rely on every day.
 
We are truly grateful to all of our users and humbled by your continued support. We know how much you rely on Skype, and we know that we fell short in both fulfilling your expectations and communicating with you during this incident. Lessons will be learned and we will use this as an opportunity to identify and introduce areas of improvement to our software, further assess and invest in capacity and stability, and develop better processes for outage recovery and communications to our user base. Thank you to everyone.
 

温馨提示:以下几款网络电话为目前主流的网络电话,其中大都数提供免费试用:

标签(Tags):Skype故障 Windows漏洞

作者:免费网络电话(本站)

赞助商链接

Copyright © 免费网络电话 All Rights Reserved.