[an error occurred while processing this directive]
 
ホーム    P2P関連ニュース    ソフト    掲示板    jnudev    Jnutellaについて  
Jnutella全文検索

Enter
(powered by namazu)
Jnutella
■P2Pを知ろう
 ・Gnutellaって何?
 ・P2Pって何?
 ・P2P用語辞典
 ・P2P情報センター
メーリングリスト
 ・jnutella(議論)
 ・jnutella.news(ニュース)
 ・jnudev(開発)
オープンリンク
コラム
インタビュー
投票
■Jnutellaについて
 ・スタッフ紹介
 ・編集日記
P2P関連ニュース
国内
 ・過去のニュース
海外
 ・過去のニュース
ソフト
FAQ
■ソフトウェアレビュー
 ・BearShareと日本語パッチ
 ・LimeWire
 ・GnuACE
 ・Mactella日本語パッチ登場
  ...etc.
■ソフトウェア
 ・GTKt (Gnutella Tool Kit)
 ・Windows
 ・Macintosh
 ・Unix/Linux
 ・BeOS
 ・Java
 ・その他
掲示板
Jnutella掲示板
■作者さん直結の掲示板
 ・GnuACE掲示板
jnudev
■jnudevプロポーサル
GTKt (Gnutella Tool Kit)
■JPPP
 ・日本語
 ・English
■他の日本発P2Pの開発
 ・EMIP
 ・GISP
 ・LEAF
 ・P2P2
各種技術ドキュメント
 ・Gnutella仕様書
 ...etc.
jnudev > 各種技術ドキュメント
<原文はこちら
Gnutella: To the Bandwidth Barrier and Beyond November 6, 2000
Overview
このレポートで、Clip2 DSSは、5ヶ月間以上に渡る多くのデータに基づいて、ピア・ツー・ピアのファイル共有ネットワークであるGnutellaの進化と現在の状態について記述しています.ここで意味がある点としては、このネットワークが2000年8月にダイアルアップモデルの帯域をたびたび越えて成長して以来、ネットワークは滑らかに伸びているわけでも、破局的なほど崩壊しているわけでもないことを発見したことです.その代わり、継続的に発展し反応のある数多くのセグメントから構成される、断片的な状態として、このネットワークは生き残っているのです.この中で最大のものは、数百のホストを含む典型的なものとなっています.我々は、目下のところ、一日あたりのユニークなGnutellaのユーザーの数は、少なくとも10,000人、多いには30,000人に達するものと見積もっています.我々は、Gnutellaのネットワークが、現在の状態を越えて成長していくためには、これ以上の技術的な革新と、その革新の広範囲での採用が必要であるとここで示しています.

ダウンロードファイルを提供しているホストの割合は、9月の相当数のスライドにつづいて、10月の跳ね返りで、15%から40%の間を振れていました<赤>(訳注:9月のスライドは、Napsterの裁判が起こされた影響でNpasterユーザーがGnutellaへとなだれ込んできた時期)</赤>.ホストの大部分は、数分から数時間オンラインというもので、1日中、それ以上の時間接続しているのは少数派(8月初頭に30%)でした.ユーザーは、主にネットワークへ、オーディオファイル、ビデオファイル、イメージファイル、プログラムファイルを要求するクエリーを投げていて、自動化されたインデックス化を行う仕組みは定期的に数パーセントのクエリートラフィックからなっています.


大規模な調査では、3つのネットワークのうち1つが、非アメリカの人口を支配しているドイツと日本という、アメリカを中心とするトップレベルドメインの外部から、認められました.インターネット・サービス・プロバイダーのセカンド・レベル・ドメインでは、COMドメインとNETドメインがホストの上位 のソースの大半を占め、@HomeとRoad Runnerのブロードバンドサービスは、特に優勢でした.MITやバージニア工科大学が広めたパッケージによると、Gnutellaのホストは、EDUドメインにある550を超える研究所で見つかりました.
Contents by Section
Contents by Frequently Asked Question
Introduction
Gnutellaの歴史は、2つの時代に分けられます:これは「壁の前:プレ・バリア」と「壁の後:ポスト・バリア」で、この期間が、ダイアルアップモデルの帯域に関して、公的なGnutellaのネットワーク上のトラフィックレベルを参照しているもので、この2つの時代は、2000年8月において線引きされます.この記事で、Clip2 DSSは、プレ・バリアネットワークの状態に関して、以前に発表されなかったデータを公表し、追加的な情報を提供します.これは、バリアの移り変わりを説明し、その移り変わりは、私たちが9月8日に最初に報告した<A Href=http://dss.clip2.com/dss_barrier.html>ものです.加えて、理解が不十分だったポスト・バリアのネットワークの状態を明らかにし、ネットワークホストのロケーションを調査しました.最後に、ネットワークの現在の状態を描き、ネットワークの発展のための将来のシナリオを示すために、類推を行います.

Gnutella Genesis
2000年3月14日、太平洋標準時午前8時31分、スラッシュドット<http://slashdot.org/>に投稿されたメッセージ<http://slashdot.org/articles/00/03/14/0949234.shtml>が、AOLのNullsoft部門<http://www.nullsoft.com/>が、「Gnutella」という名の「オープンソースのNapsterクローン」を発表したというニュースを広めました.3月15日、太平洋標準時午後1時25分、Wired Newsは、Nullsoftが配った「ファイル共有ソフトウェアツールで、それはNapsterを越えるポテンシャルがあるかもしれないもの」は公開が中止になったと報告<http://www.wired.com/news/technology/0,1282,34978,00.html>しました.それでもなお、サードパーティは、NullsoftからダウンロードしたGnutellaのコピーを再配布しました.そして、異常ともいえる速さで、「公式な」Nullsoftバージョンを増加させるために、いくつもの「Gnutellaクローン」が生まれたのです.クローンはNullsoftのアプリケーションのように、Gnutellaプロトコルという同じ言語を話し、それゆえに、Nullsoftのバージョンともお互いに通 信を行うのができました.ユーザーは、これらのプログラムを走らせ、そしてインターネットを介してお互いに接続しました.ネットワークは、Gnutellaという言語を話すアプリケーションが形成する混合物から構成さたのです.何故なら、完全な分散システム(中央が存在しないシステム)の中において、これらのアプリケーションは、サーバーとクライアント、両方の動きを同時に演じたかたです.付け加えるなら、「Gnutella」はいくつかの意味を齎しました:

  • Gnutella=Nullsoftによって生み出されたサーバント.「Gnutella」(それはしばしば「Gnutella V0.56だが)として認識されているプログラムは、広く流通している;Download.com<http://www.download.com>という1つの場所だけで、今までに約25万本ダウンロードされました.
  • Gnutella=プロトコル.プロトコルの部分では広く使われているのはバージョン0.4である.Clip2 DSSがプロトコル<http://dss.clip2.com/GnutellaProtocol04.pdf>の仕様書を公開している.(訳注:この仕様書に関しては、Jnutella.orgが日本語訳を提供している.
    こちら<http://www.jnutella.org/docs/gnutellang/gnutella_protocolv4.shtml>を参照
    )
  • Gnutella=ネットワーク.Gnutellaという言語を話すアプリケーションの単独のより大きな公にアクセスできるネットワークがあり、そして幾つかのより小さく、完全な(そしてしばしば私的な)切断されたネットワークがあるのです.Clip2DSSは主要なネットワークを追跡し、ホームページ上で<http://dss.clip2.com/>関連する情報を報告しています .
  • Gnutella=上記の組み合わせすべて、又は全体としてのシステム

    このレポートでは、Gnutellaとそのネットワークの発展について論じていきたいと思います.
Pre-Barrier Gnutella
Clip2 DSS began conducting systematic studies of the Gnutella network in June 2000. Most Gnutella servents provide a "host count" feature in which they report to the user a dynamically updated count of the hosts they have identified on the network. During the period from June through mid-August, servents connected to the public Gnutella network for at least a matter of minutes typically reported host counts on the order of 1,000 to 4,000 hosts. Clip2 DSS's Gnutella network crawler regularly found 1,000 to 8,000 hosts online during the course of a sub-1-hour full-network traversal. Were these hosts connected in a tight concentration or in a loosely knit and far-flung configuration? How many connections did a typical host have? We were able to answer these questions using the network graphs generated by our crawler.

The "concentration" of the network is a particularly interesting question because Gnutella servents typically issue queries with a "TTL" of 7, meaning a user's query will travel up to 7 hosts away on the Gnutella network from the originating computer. There is always a (not necessarily unique) shortest path between any two hosts on the network, and the longest such path is a known in mathematical graph theory as the "diameter" of the network. Clearly, if the diameter were greater than (7*2+1)=15, there would be hosts from which a query could be launched and not reach the entire network before expiring. We saw such high-diameter networks in early July. Below, we show data for a crawl of 1,959 hosts on July 7. Here, we plot on a semi-log scale the number of pairs of hosts that had a given shortest-path distance (or "separation") between them. The diameter of the network discovered by this crawl was 22, indicating some regions were not in communication with others (assuming messages used the common TTL value). We also note that most pairs of hosts were separated by 7 hops.

didstribution

In the latter days of the pre-barrier period, the network diameter fell to smaller values, typically 8 or 9. Below, we show network diameters for crawls made between July 25 and August 18. Note that while the period of larger diameters around July 28 corresponds to the "Napster Flood", a larger diameter network is not necessarily a direct result of more hosts connecting to the network. A network with few hosts can have a large diameter and vice versa; the diameter is purely a function of how the hosts are connected, not the number of hosts. The smaller diameters are significant in that they imply any host on the network could easily reach all other hosts using TTL=7.


diameter

Further examining host connectivity, we found a host was most likely to have a single connection, and hosts with higher numbers of connections were increasingly uncommon. In more technical terms, we saw roughly power-law degree distributions, where "power-law" means the number of hosts having a given degree varied as a power of the degree, and "degree" is shorthand for the total number of incoming and outgoing connections a given host has open. In addition to host separations, the degree distribution is another means of quantitatively assessing network connectivity, and we present below the degree distribution based on a 1,813-host crawl made on July 7. One particularly interesting coincidence is in that other researchers (e.g., Broder et al. 1999) have found power-law degree distributions for graphs in which the nodes are Web pages and the connections are Web links. In the plot, we compare measured data with a least-squares best fit of slope (power-law index) = -2.3.

diameter distribution

As the above data show, the structure of the Gnutella network is in a continuous state of flux. Hosts come and go as quickly as users open and close Gnutella applications, and connections between hosts may only last for seconds. We found that half of an initial host population persisted after five hours, and that approximately 30% of the initial host population was stable on the timescale of 24 hours. We determined this result by comparing host populations found in a succession of crawls to an initial reference crawl and by repeating this analysis for multiple reference crawls; the plot below illustrates our findings.

persistance

Gnutella Hits the Wall
As we first reported in "Bandwidth Barriers to Gnutella Network Scalability" (September 8), average Gnutella network traffic began to regularly exceed the throughput capacity of dial-up modems in August. Hosts connected to the Internet via dial-up modems ceased to be able to effectively participate as peers on the Gnutella network. These hosts essentially became dead-ends, resulting in a widespread fragmentation of the Gnutella network into effectively disconnected components comprised of hosts with higher-speed Internet connections. The animation below illustrates the evolution of the network as traffic passed the dial-up barrier.

fragmentation

Data gathered by Clip2 DSS clearly illustrates the effective loss of dial-up hosts from the responsive portion of the Gnutella network. In one long-term experiment, we regularly visited hosts and issued probe-like Gnutella "ping" messages of TTL=2 to discover their neighbors. As shown below, unique hosts sending Gnutella "pong" messages in response to these small-TTL pings declined substantially and permanently in mid-to-late August.

pingpon

As noted earlier, most Gnutella servents display a host count based on the number of pongs received since the application began running. As the barrier was passed, numerous postings on public user forums noted servents were displaying lower-than-usual host counts, often only in the tens or hundreds. Many users interpreted the decrease in host counts as meaning the total number of hosts on the network had decreased. However, as the above data show, the change in network responsiveness was rather abrupt. Such an abrupt transition is much more plausibly explained by the reaching of a technical barrier than a mass change in user behavior. Had users departed the network, we would have expected to see a decrease in the usage rate of the Clip2 DSS host list service; instead, we saw an unabated rise in usage. In addition, even though responses in the ping experiment had dropped, this experiment continued to find ten to twenty times the number of hosts that would be reported in a servent's host counter over a similar time interval. In sum, we found no evidence supporting a user exodus and multiple indications that the host population remained sizable but fragmented. The total number of hosts online can be substantially larger than the number reported in a servent's host counter, since a servent only sees out as far as the boundary of the responsive region to which the servent is connected.


The preceding discussion begs the question of the source of the traffic that caused the network to reach the dial-up modem bandwidth barrier. Was it the result of continued growth in the number of Gnutella users, or was it the result of the introduction of programmatic sources of traffic, such as machine-generated spam? The question is difficult to answer due to a lack of comprehensive data. Clip2 DSS reported on one anomalous form of traffic seen on the network in early September that may have existed for some time prior. Since that report, we have regularly observed other forms of apparently automated messages on the network, including repeated series such as {a.mp3, b.mp3, c.mp3, ...} that appear to be network-indexing attempts. While this is an unresolved question, it is moot in a sense, because with continued growth in the user base, user-generated network traffic would have eventually reached the barrier level of its own accord.
Gnutella Beyond the Barrier
What is the state of the post-barrier Gnutella network? Among other sources, we can find some clues in data from the Clip2 DSS-operated "gnutellahosts.com" host list service, which publishes IP addresses of live Gnutella hosts. Approximately 10% of users access this list by visiting the Clip2 DSS home page; the remainder retrieve addresses by connecting their servents to a special-purpose Gnutella server operated by Clip2 DSS at gnutellahosts.com, port 6346. After responding to the incoming Gnutella servent with multiple IP addresses, the gnutellahosts.com server disconnects, and the servent proceeds to connect directly to the hosts at the provided addresses. As noted in the previous section, we have not observed any decrease in the traffic to gnutellahosts.com due to Gnutella having hit the barrier. On the contrary, traffic has continued to grow in the post-barrier period. Below, we show the long-term (3-month) straight-line trend in gnutellahosts.com usage.

gnutella hosts

How many users are there on the post-barrier network? How are they connected? From the number of callers to gnutellahosts.com and our probing experiments, we found no evidence of a sudden population collapse as the barrier was passed. Instead, we found evidence that the population had fragmented into multiple dynamically changing responsive and unresponsive segments. The sum of all data sources leads us to estimate that the total number of daily users of Gnutella numbers between 10,000 and 30,000, where the lower bound is a much better approximation than the upper bound. Below, we show the numbers of hosts in the largest responsive network segments we were able to identify over a range of post-barrier dates.

segment

Since the fragmentation occurred, it has been a matter of chance whether or not a user manages to find and remain connected to a responsive segment. Typically, the gnutellahosts.com host list service has provided addresses of hosts in the largest identifiable responsive segment, although this region is a moving target. In order to track it, Clip2 DSS has refined its crawling strategy in recent weeks and regularly crawls the network on the timescale of every 15 minutes.

On the post-barrier network, dial-up modem users cannot effectively participate as peers throughout the network. What can be done to alleviate this situation? One solution is to connect these users to high-speed proxies that handle network traffic on their behalf. This is an underlying concept of the Clip2 Reflector(TM), a special-purpose Gnutella server, and the network architecture that results is illustrated below:

reflectorization

Reflectors are programmed to maintain connectivity to the most responsive segment of the network by calling gnutellahosts.com (by default). Dial-up users singly connect to Reflectors that in turn maintain multiple outgoing network connections on their behalf. A list of running public-access Reflectors can be found on the Clip2 DSS home page.

How different is user behavior in the post-barrier period relative to the pre-barrier period? One measure of behavior is the fraction of hosts serving a non-zero number of files. In a 24-hour period in early August, during the late pre-barrier era, researchers at Xerox PARC found 30% of hosts made available one or more files for download (Adar & Huberman 2000). Using a different methodology, Clip2 DSS independently measured the "serving fraction" before, during, and since the period of the PARC study. Our results confirm theirs during the period of their study. In the final days of the pre-barrier era, we observed an increase in the serving fraction to a maximum in excess of 40%. However, as the network evolved into the post-barrier period, we saw a substantial decrease in the serving fraction, down to a low of less than 15%. Notably, since early October we have observed a general rise in the serving fraction back to near pre-barrier levels. Below, we plot the serving fraction over time.

sharing

What are users searching for in the post-barrier period? Clip2 DSS analyzed three query stream samples of varying sizes taken on three different dates. Applying a subjective analysis to categorize 2,000 queries heard on September 19, we found the following breakdown:

queries

Notes on these categories: The "gibberish" category includes queries consisting of non-alphanumeric characters; among other sources, we have seen such queries generated by poorly programmed clients that do not properly read and forward Gnutella query messages. The "automated indexing" category includes queries that appeared to be programmatically generated with the intent of indexing network content, such as "a.mp3", "b.mp3", etc. "File extension only" queries are just that, containing extensions such as "mp3" or "mpg" (possibly with accompanying periods, asterisks, or both) but no other content. "Song and artist+song" counts queries either containing only a song title or a song title along with an artist name. "Artist" counts queries containing just an artist name.

By objectively analyzing only those queries containing a file extension anywhere in the query string, we are able to analyze much larger data sets. We found from 20% to 40% of queries in three separate stream samples of 2,000, 30,000, and 150,000 queries contained a popular file extension somewhere in the query string. Below, among the set of queries that contained an extension, we show the frequencies of specific types of extensions.

extentions

Note the programmatically generated queries of the form "a.mp3", "b.mp3", etc. were a recurring feature in every sample, and the samples were spread over a 21-day period. These queries likely originated from a single source and amounted to a few percent of total network query traffic.

Global Gnutella
We complete our survey of the Gnutella network by examining the access points of Gnutella hosts. Where are Gnutella hosts? To arrive at an answer, we utilized 3.3 million non-unique IP addresses gathered continuously by Clip2 DSS between July 27 and November 3, spanning both Gnutella eras. Of these addresses, 1.3 million (39%) were resolvable to non-numeric hostnames, and our reports below are on this resolvable subset. The populations we report below should be interpreted in probabilistic terms; the data have not been de-duplicated, so that the relative populations represent relative probabilities of host discovery within a given domain.

We divided top-level domains into US-Centric (COM, NET, EDU, MIL, GOV, US) and Non-US-Centric (all others) categories, although we note a number of non-US-based organizations operate domains in the US-Centric set. Our first finding is that Gnutella is a truly international phenomenon, with one out of three hosts located on a non-US-centric domain.

us_vs_nonus

Among Non-US-Centric domains, 95% of hosts were found in just 17 country-specific top-level domains. European domains comprised 67% and the Asia-Pacific domains comprised 20%

nonus

Among US-Centric hosts, COM, NET, and EDU domains predictably dominated, although non-zero numbers of hosts were found on each of ORG, US, GOV, and MIL domains (in ratios 19:8:2:1, respectively).

us

In the case of the COM, NET, and EDU domains, we dug deeper to examine populations at the level of second-level domain names. The popular second-level COM domains were primarily broadband Internet service providers. The ISP @Home accounted for half of all Gnutella hosts in the COM domain, and Road Runner trailed in second place at nearly one quarter of hosts. Gnutella hosts with resolvable host names in the COM domain were therefore strongly concentrated in a small number of second-level domains, with 87% of hosts residing in the top 25. Among the top 25 were six second-level domains either representing non-ISP companies or organizations whose nature we could not determine. In total, Gnutella hosts were found on 1750 unique second-level domains within the COM domain.

com

The popular second-level NET domains were exclusively Internet service providers, both broadband and dial-up. The distribution among NET domains was less concentrated than among COM domains, with the leader, Road Runner, claiming less than 10% of Gnutella hosts on the NET domain. Note the second-place second-level NET domain was a German company, illustrating that while the NET domain is US-centric, it is not US-exclusive. Over 2000 unique second-level domain names were represented in the Gnutella host population.

net

The distribution of hosts among EDU domains was even less concentrated than among NET domains. The Massachusetts Institute of Technology led the pack at 3.8% by a sizable margin. Virginia Tech made a strong showing at slightly less than 3%, and the distribution declined smoothly from 3rd-place University of Southern California through the remainder of the list. In all, Gnutella hosts were discovered on over 550 second-level domains within the EDU domain.

edu

In summary, major conclusions are that (1) the Gnutella network is an international phenomenon led by the US, Germany, and Japan; (2) substantial populations of hosts in COM and NET domains are on ISP second-level domains; and (3) hosts are widely distributed among EDU domains.

Gnutella Tomorrow
The Gnutella network is analogous to a continuous global rave: an informal, decentralized, unregulated gathering without a permanent location. Network hosts, like rave attendees, come and go unpredictably, connecting and disconnecting as fast as ravers switch dance partners. In the pre-barrier era, the Gnutella rave could accommodate all comers. The effect of the network hitting the dial-up modem barrier is analogous to a rave venue reaching capacity, with many would-be revelers being crammed shoulder-to-shoulder and left unable to dance, and still more spilling out the doors. Small regions within the crowd, corresponding to responsive segments of the Gnutella network, remain sufficiently open to enable movement. These pockets form and vanish, grow and shrink, and merge and split on a variety of timescales. In the post-barrier era, the Gnutella rave remains more popular than the typical raver, pressed in among the crowd, might realize. The crowding results in the potential of the gathering being released isolated bursts rather than in a continuous widespread discharge. While Gnutella has not scaled, we call attention to the fact that it has also not collapsed. Like many simple decentralized systems, it has remained remarkably robust in the face of technical adversity. In the present post-barrier period, Gnutella exists in an intermediate state between scaling and collapsing.

In the opinion of Clip2 DSS, this situation is probable to persist until either (1) the user population collapses and traffic falls to pre-barrier levels or (2) an "organizing principle" for connectivity that enables scaling takes root. In the former case, the improved performance that would result could potentially drive alienated users to return to the network, driving resurgence in traffic, another barrier crossing, and repetition of the entire cycle. In the latter case, examples of organizing connectivity principles include (1) dial-up users regularly connecting to broadband Reflectors and (2) widespread adoption of servents with sophisticated and consistently implemented connection management rules. However, to be widely and rapidly successful, any organizing principal must require no user action or change in behavior that is not immediately and powerfully rewarded, and it must not involve a change to the protocol that breaks the considerable installed application base. In the post-barrier era, there have been various initiatives to create new and smaller networks - spin-off raves - enabled by user adjustment of the "Gnutella handshake" mechanism in servents that support this feature. However, because the underlying technology is no different, if traffic on these networks were to grow sufficiently large, they would be subject to bandwidth barriers as well. These attempts to re-create pre-barrier conditions do so at the cost of sacrificing the relatively large user base on the main network and do not directly address the problem. To move beyond its present state, Gnutella awaits widely adopted technical innovation.

© 2000 Ian Hall-Beyer. All Rights Reserved.
<manuka@nerdherd.net>

OPEN content gnutella.wego.com

このページはJnutella.orgメンバーによって運営されています  
不都合は、members@jnutella.orgまでよろしくお願いしますm(_ _)m