|
Eytan Adar and Bernardo A. Huberman
Internet Ecologies Area
Xerox Palo Alto Research Cente
Palo Alto, CA 94304
訳:カワサキ、kawasaki@jnutella.org
翻訳完了:ほんやくちう00/00/00
●●抜粋●●
Gnutellaのユーザートラフィックの広範囲での分析結果
は、このシステムでは非常に多くのただ乗りがいることを示しています.24時間Gnutellaネットワーク上のメッセージを採取した結果
、Gnutellaユーザーの70%がファイルを共有しておらず、90%のユーザーが検索に答えていないことが分かりました.なお、我々はドメイン間でただ乗りが広く分布していることを発見し、そのためどのグループも他のユーザー以上には意味深いことに貢献しません.そしてファイルを共有するボランティアは必ずしも魅力的なファイルを持っているユーザーではないということを示しているのです.我々は、ただ乗りはシステムパフォーマンスの低下を導き、システムへの弱点を付け加えると主張します.もしこのトレンドが継続するのであれば、著作権問題は、システムの起こる可能性のある崩壊と比較すると無意味なものになるかもしれません.
●●目次●●
●●1. イントロダクション●●
Gnutella [Gn00a]、FreeNet
[Fr00]、Napster [Na00]のようなネットワークアプリケーションの新しい形の突然の出現には、すべてに渡る分散型の情報共有システムを出現させるという見込みがあります.プライバシーの問題はあるとはいえ、これらのシステムは、ユーザーが世界中にアクセスし、情報を供給することを可能にします.これは、ウェブの現在のクライアント−サーバー構造においてではできなかったことなのです.
これらのシステムを通じての音楽への無料アクセスと著作権法の侵害という問題については、多くの注目を浴びている一方で、これらのシステムが本当に有益なものになるというのは、そのように大きく、匿名性のあるシステムだからであり、これに基づいた十分な協力体制の確保という追加的な問題が残されています.ネットワークの残りに対してこれらのファイルを利用可能にしているユーザーなのか(生産:produce)、それとも、遠く離れたファイルをダウンロードしているユーザーなのか(消費:consume)モニターされておらず、これらの統計も行われていないため、拡大しているこのようなネットワークでのユーザーコミュニティでは、ユーザーが生産するのを止め、消費することだけをするようになる可能性があります.そしてユーザーは気付いていないかもしれませんが、このただ乗りの振る舞いは、このようなシステムのすべてのユーザーが直面
する社会的なジレンマの結果なのです.
一般的な社会的なジレンマでは、人々のグループは、中央集権が存在しない場合、一般
財を利用しようとします.Gnutellaのようなシステムの場合、ある一般
財は、音楽やユーザーコミュニティへの他の文書のようなファイルの巨大な図書館の供給を意味します.別
のものでは、システムで帯域を共有されるかもしれません.すべての個人のジレンマというものは、その時に一般
財に貢献するものであるか、又は、仕事をさぼり、他人の仕事にただ乗りするものです.
Gnutella上のファイルは公共財のように扱われ、その使用に関してユーザーは料金を請求されないため、彼ら自身のファイルを他のユーザーにアクセス可能な状態にしておくことで、Gnutellaへの貢献を行うことなく、人々が、音楽ファイルをダウンロードするということは、理性の現れです.なぜならば、参加する個人全員は、この方法とただ乗りを他人のせいだと論じることができるため、全体のシステムのパフォーマンスは顕著に、品質を下げ、全員の環境をより悪くする可能性があります−これが、デジタル庶民の悲劇です
[Ha68].
ただ乗りによって引き起こされる2番目の問題は、個人へのリスクというシステムの弱点を作り出してしまうことです.もしほんのわずかな個人が、公共財に貢献している場合、ほとんどピアーが、中央集権化したサーバーとしての役割を効果
的に果たすことができまません.このような環境ではユーザーは、こうして訴訟、サービス拒否攻撃、プライバシーの潜在的な損失等を受けやすくなります.Gnutella、Napster、FreeNetのようなシステムは、個人が、確実なコミュニティのゴール周りにはせ参じる、または同じゴールでは他人の間に「隠れる」意味として描かれているという事実と照らして、これは関連性があります.自由な会話、著作権法改正、個人へのプライバシーの保護などのフォーラムを提供することを、これらは含むかもしれません.
提起したこういった関係性から、Gnutellaシステムにおいてただ乗りしている現状の総量
を確定するため、一連の実験を行う決定を致しました.我々が以下に示す通
り、70%にも上るユーザー人口の大きな割合で、これらのユーザーは、コンテンツへの貢献を行うこと無く、システムからの利益を謳歌しています.
次に、Gnutellaの基本的な構造と、我々が行った実験について記述します。我々は、その際にデータの分析を提供し、そのようなはびこるただ乗りユーザーが分散型システムにどのように衝撃を与えうるかを示します.最後に、我々はただ乗りユーザーへの対応するための幾つかのメカニズムを提言します.
●●2.
Gnutella●●
ネットワークを使いたいと望んでいる人々は、Gnutellaのプロトコルに忠実なアプリケーションをダウンロードするか
[Gn00a]、または開発することになるでしょう
[Gn00b].クライアントとサーバーの間で接続し、情報をルーティングするハイレベルなネットワークと同じように、このアプリケーションは、クライアント(情報の消費者)として動くかまたは、サーバー(情報の提供者)としても動くのです.アプリケーションの1つの段をピアーと呼びます.次の議論ではホストの代わりに、我々はピアーを使うことになります.
Gnutellaは、確信したユーザーにとって魅力的なものとしている、多くの特徴を誇りにしています.例えば、Gnutellaは、クエリーを投げているピアーの特定にマスキングを行うことで、匿名性を提供しています.付け加えると、Gnutellaは、中央のコントロールが無くとも特別
なネットワークが形作られうるというメカニズムを提供しています.
Gnutellaネットワークには中央のサーバーが無いため、システムに入るためには、ユーザーは最初に、幾つかの知られたホストのうちの1つに接続します.そして、そのホストはほとんどいつでも利用可能なのです(こういったサーバーは一般
的には共有ファイルを提供していないのですが).これらのホストは、その際にIPとポートアドレス情報を他のGnutellaのピアーに転送します.
1度ネットワークに結びつくと、ピアーはメッセージの意味で、他のサーバーと相互に作用します.(ブロードキャストを受けとって、隣へ転送する)他のピアーに対しての再度のブロードキャストと同じように、ピアーは、メッセージのブロードキャストを作り出し、始めることになります.ネットワーク上で見られるそのメッセージは次のようなものです:
-
Ping Messages - Essentially,
an "are you there?" message directed at a host.
-
Pong Messages - A
reply to a ping ("yes, I'm here"). The pong message
contains information about the peer such as their
IP address and port as well as the number of files
shared and the total size of those files. Peers
forward this kind of message to their neighbors
so that it is possible to later find other peers.
This is needed in case there is a disconnect in
the network.
-
Query Messages - These
are messages stating, "I am looking for x" and can
get forwarded throughout the entire network (at
least theoretically). Query messages are uniquely
identified, but their source is unknown.
-
Query Response Messages
- These are replies to query messages, and they
include the information necessary to download the
file (IP, port, and other location information).
Responses also contain a unique client ID associated
with the replying peer. These messages are propagated
backwards along the path that the query message
originally took. Since these messages are not broadcast
it becomes impossible to trace all query responses
in the system.
|
Several features of Gnutella's protocol prevent
messages from being re- broadcast indefinitely through the network.
One such feature includes a short memory of messages that have
been routed through a peer (thus preventing re- broadcasting).
Additionally, messages are flagged with a time-to-live (TTL)
field. At each hop (re-broadcast) the TTL is decremented. As
soon as a peer sees a message with a TTL of zero, the message
is dropped (i.e. it is not re-broadcast).
●●2.1
Gnutellaへのただ乗り●●
In our analysis we consider two types of free riding. In the
first type, peers that free ride on Gnutella are those that
only download files for themselves without ever providing files
for download by others. The second definition of free riding
considers not only the amount of downloadable content a producer
has, but how much of that content is actually desirable content.
This is essentially a quantity versus quality argument that
also poses a social dilemma when there is a cost to the provider
to make desirable files available to others. In the "old days"
of the modem based bulletin board services (BBS), users were
required to upload files to the bulletin board before they were
able to download. In response to this requirement users would
upload their own bad artwork or randomly generated text files
and would be able to download high quality content generated
by others. In the experiments described below we address both
kinds of free riding.
●●3.
実験●●
In the following section we describe the experiments used to
test the following three hypotheses:
-
Hypothesis 1: A significant
portion of Gnutella peers are free riders.
-
Hypothesis 2: Free
riders are distributed evenly across different domains.
-
Hypothesis 3: Peers
that provide files for download are not necessarily
those from which files are downloaded.
|
●●3.1
ダウンロードの測定●●
One of the features that attract users to Gnutella is the difficulty
in associating queries to any particular peer/user. Given a
query message it is virtually impossible (unless some large
percentage of peers collude) to find the peer that originated
the query. The unfortunate side effect of this property is to
make it impossible to experimentally measure the number of queries
and files downloaded by each client. This forces us to make
assumptions about downloads in order to measure them.
One possible assumption is that users that share a high number
of files had to have downloaded them, so those that share more
also download more. In this case, there is no free riding. The
other possible assumption is that users who have no files are
those that will try to access them. Therefore the fewer files
a user has the more likely he is to download them, resulting
in rampant free riding.
Since we unfortunately have no way of knowing which of these
two extremes is closest to reality, we assume that the truth
is somewhere in between, and that therefore all peers generate
a fairly uniform number of queries.
●●3.2
実験の設定●●
In order to perform monitoring experiments on the Gnutella network
it was necessary to modify a Gnutella client to log messages
flowing through the system. We elected to use the Java based
Furi client [Fu00]
which was a full featured implementation, with numerous hooks
for logging.
The Furi client was then executed for a 24-hour period over
a weekend in August of 2000 (Saturday 1pm to Sunday 1pm) under
the assumption that there would be more traffic when most people
were not at work. During this time period we collected both
pong and query response messages.
In the 24-hour period we observed 34,902 hosts issuing ping
messages, which shared a total of 4,248,875 files. Unfortunately,
a portion of those hosts (3,507 hosts specifically) were using
Network Address Translation (NAT) addresses [Nat00].
NAT allows multiple computers on a local network to connect
to the Internet, but renders them unreachable from the outside,
and their files cannot be accessed. These hosts, representing
1,229,470 shared files, were removed from the sample. Additional
peers that were misconfigured with invalid/unreachable addresses
were also removed 1.
The final count used was 31,395 hosts representing 3,019,405
shared files. The loss of hosts with NAT addresses is very significant
as it already indicates a minimum 10% of peers that are free
riding.
Although we could not capture all query response messages it
was nonetheless possible to sample a wide selection by shifting
locations (i.e., by reattaching to different hosts) within the
Gnutella network. Over the 24-hour period, we were thus able
to capture 87,668 query response messages. Filtering these messages
as was done above we obtained a final count of 67,123 valid
query response messages.
●●3.3
結果●●
Figure 1 illustrates the number of files shared by each of the
31,395 peers we counted in our measurement. The sites are rank
ordered (i.e. sorted by the number of files they offer) from
left to right. These results indicate that 20,845, or approximately
66%, of the peers share no files, and that 22,888 or 73% share
ten or less files. Counting the NAT hosts this total jumps to
70% for no files and 76% for less than ten. Again, these figures
are a minimum as there can be multiple peers with the same NAT
address.
The data also shows that the top 1 percent (314 hosts) represent
approximately 40 percent of the total files shared. This quickly
escalates to the top 20 percent (6,250 hosts) sharing 98% of
the files. Table 1 shows the values of the in- between data
points.
|
The top
|
Share
|
As percent of the whole
|
|
314 hosts (1%)
|
1,189,345
|
40%
|
|
1,570 hosts (5%)
|
2,158,630
|
71%
|
|
3,140 hosts (10%)
|
2,631,293
|
87%
|
|
4,710 hosts (15%)
|
2,854,824
|
95%
|
|
6,280 hosts (20%)
|
2,958,544
|
98%
|
|
7,850 hosts (25%)
|
3,002,222
|
99%
|
Table 1
As per our second definition of free riding we determined which
hosts provide files and which hosts provide files that are actually
downloaded. We attempted to capture this by analyzing the query
response traffic. The difficulty with analyzing this data is
that it is unclear for how long each peer was actually connected
to the network. However, we can assume again that due to the
large sample, network connectivity averages out to some degree.
After eliminating hosts that provide no downloadable files we
were left with a set of 10,510 hosts.
Again, we measured a considerable amount of free riding on the
Gnutella network. Out of the sample set, 6,513 peers, or approximately
61%, never provided a query response. These were hosts that
in theory had files to share but never responded to queries
(most likely because they didn't provide "desirable" files).
Incorporating those hosts that have no files or are NAT hosts
we see that almost 90% of hosts never answer queries!
Figure 2 illustrates the data by depicting the rank ordering
of these sites versus the number of query responses each host
provided. We again see a rapid decline in the responses as a
function of the rank, indicating that very few sites do the
bulk of the work. The top 1 percent of sites provides nearly
50% of all answers, and the top 25 percent provide 98%.
●●3.4
ファイルを共有しているのは誰か?●●
In our second experiment we verified the hypothesis that files
and query responses (and therefore free riders) are shared equally
across different domains. The implication is that hosts based
in domain a do not contribute more than hosts in domain b in
terms of the ratio of peers on the network to files and responses
offered. This does not imply that certain domains contribute
more or less total hosts to the network, but simply that free
riders are distributed equally.
In order to do this analysis we filtered our initial test set
to 26,014 peers. These were hosts with IP addresses that were
readily converted to host names. We then counted the number
of hosts in each domain (mit.edu, home.com, etc.) as well as
the number of hosts in each top-level domain, or TLD (.edu,
.com, .net, etc.).
In our second experiment we verified the hypothesis that files
and query responses (and therefore free riders) are shared equally
across different domains. The implication is that hosts based
in domain a do not contribute more than hosts in domain b in
terms of the ratio of peers on the network to files and responses
offered. This does not imply that certain domains contribute
more or less total hosts to the network, but simply that free
riders are distributed equally.
In order to do this analysis we filtered our initial test set
to 26,014 peers. These were hosts with IP addresses that were
readily converted to host names. We then counted the number
of hosts in each domain (mit.edu, home.com, etc.) as well as
the number of hosts in each top-level domain, or TLD (.edu,
.com, .net, etc.).
In our set of hostnames there were 2,538 unique domains. The
range of peers in each ranged from 1 to a maximum of 2,951.
Figure 3a above illustrates this data. Each of the points in
the figure represents a domain in terms of the number of peers
(the x-axis) and the total number of files shared (the y-axis).
The dashed line is the trend line for this data. A regression
of the two dimensions yields an r-squared value of 0.927, indicating
that peer count is linearly related to the number of files shared
independent of the domain.
Figure 3b depicts the relationship between query responses and
peer count. Again, a regression on this sample of 1,276 domains
reveals a fairly linear relationship between the two dimensions
(with an r-squared of 0.922). We consider this evidence of an
even distribution of free riders 2
2.

Figures 4a and 4b display the equivalent data sets for TLDs
(edu, net, org, etc.). Figure 4a represents the 77 top-level
domains in terms of peer count to the number of files shared.
Figure 4b represents 61 top-level domains in terms of peer count
to query responses. Again, there appears to be a linear relationship
in both figures with the regression fitting with an r-squared
of 0.953 and 0.958 for figures 4a and 4b respectively3.
●●3.5
質 vs. 量●●
In the final experiment we tested our hypothesis that the number
of queries answered is not necessarily proportional to the number
of files offered. This provides a test of the "quality" vs.
quantity argument. The intuition is that the kinds of queries
that are issued by the bulk of Gnutella users are very concentrated
on particular topics. The files that are returned for these
queries are therefore more desirable, which defines their quality.
Therefore, only a small number of peers will actually share
anything that is considered to be high "quality."
We found the degree to which queries are concentrated through
a separate set of experiments in which we recorded a set of
202,509 Gnutella queries. The top 1 percent of those queries
accounted for 37% of the total queries on the Gnutella network.
The top 25 percent account for over 75% of the total queries.
These values are a little lower than reality because we did
not fully combine equivalent queries ("britney spears" vs. "spears
britney").
The predicted behavior is present to some extent. For example,
the top responding host only hosted 695 files, but responded
to 3,436 queries. The next most responsive peer hosted 956 files
and responded to 1,474 queries.
Figure 5 illustrates the relationship between files hosts (the
x-axis) and query responses (the y-axis) for 10,510 peers. As
is apparent from the plot there is very little evidence of a
relationship between quantity and quality in the Gnutella network.
A regression analysis yields a very low r-squared value of 0.00105
for this data.

●●4.
検討●●
Studies of social dilemmas [Gl94]
[Hu96]
[Hu97]
have shown that is hard to generate spontaneous cooperation
in large anonymous groups. As we have shown in this paper, Gnutella
is no exception to this finding, and an experimental study of
its user patterns shows indeed that free riding is the norm
rather than the exception.
If distributed systems such as Gnutella rely on voluntary cooperation,
rampant free riding may eventually render them useless, as few
individuals will contribute anything that is new and high quality.
Thus, the current debate over copyright might become a non-issue
when compared to the possible collapse of such systems. This
collapse can happen because of two factors, the tragedy of the
digital commons, and increased system vulnerability, which we
now discuss.
●●4.1
デジタル庶民(共同食卓:Commons)の悲劇 ●●
An ideal analysis of free riding would allow us to calculate
the contribution provided by individuals in exchange for consumption
(either in proportion or some fixed cost). There are two ways
in which individuals on Gnutella can contribute. The first is
simply by uploading files. The second is the active participation
in the protocol of the network, thus providing the "glue" that
holds the network together. It may be then that all peers on
the network contribute even if they provide no downloadable
files. However, there is a point at which peers that act only
as glue provide diminishing returns to the system leading to
at least two ways in which the quality of the service degrades.
First, peers that provide files are set to only handle some
limited number of connections for file download. This limit
can essentially be considered a bandwidth limitation of the
hosts. Now imagine that there are only a few hosts that provide
responses to most file requests (as was illustrated in the results
section). As the connections to these peers is limited they
will rapidly become saturated and remain so, thus preventing
the bulk of the population from retrieving content from them.
A second way in which quality of service degrades is through
the impact of additional hosts on the search horizon. The search
horizon is the farthest set of hosts reachable by a search request.
For example, with a time-to-live of five, search messages will
reach at most peers that are five hops away. Any host that is
six hops away is unreachable and therefore outside the horizon.
As the number of peers in Gnutella increases more and more hosts
are pushed outside the search horizon and files held by those
hosts become beyond reach.
●●4.2
弱点 ●●
One argument that has appeared in the popular press regarding
systems such as Gnutella [Or00]
is that there is a diminished risk of the system being shut
down by either lawsuit or attack. It will be impossible, users
argue, for the RIAA to sue all of them. This belief, which was
spread by the press, allowed users to believe that they were
safe among others. Unfortunately, in light of the evidence provided
above, Gnutella provides a false sense of security.
As we have seen in the experiments, there is a small collection
of peers that provide the bulk of the shared files and answered
queries. These few providers act as a rather centralized server
consisting of several peers and thus the RIAA need not sue all
users or even the bulk of users. They simply need to target
the top-serving peers (of which there are very few that serve
very many).
●●4.3
ただ乗りを打ち負かす点 ●●
There are many ways of patching Gnutella so that it can accommodate
the same privacy rules but scale more effectively 4
4.
It is interesting therefore to establish how different file-sharing
applications rely on technological features to induce users
to share. FreeNet, for example, forces caching of downloaded
files in various hosts. This allows for replication of data
in the network forcing those who are on the network to provide
shared files. Unfortunately, such a system is prone to replication
of "bad" or illegal data and "tainting" hosts 5.The
second cost of the automatic replication as implemented in FreeNet
is the unique identifiers for files that forces users to know
exactly what they are looking for.
Napster, by default, downloads all files into a shared upload
directory. In this way when a user downloads a file it is automatically
shared. In some ways this feature addresses the FreeNet problem
because users will only keep "good" files on their computers.
However, users can easily circumvent this shared upload/download
directory and frequently do. Both system provide their own set
of solutions to the free riding but at the cost of introducing
other problems to their systems.
Another possible solution to this problem is the transformation
of what is effectively a public good into a private one. This
can be accomplished by setting up a market based architecture
that allows peers to buy and sell computer processing resources,
very much in the spirit in which Spawn was created [Wa92].
●●5.
結論●●
In this paper we analyzed user traffic in Gnutella and concluded
that there is a significant amount of free riding in the system.
Specifically, we found that upwards of 70% of Gnutella users
share no files, and that 90% of the users answer no queries.
Furthermore, we found that free riding is distributed evenly
between domains, so that no one group contributes significantly
more than others, and that peers that volunteer to share files
are not necessarily those who have desirable ones.
These findings have serious implications for the future development
of Gnutella and its many variants. In order for distributed
systems with no central monitoring to succeed, a large amount
of voluntary cooperation is required, a requirement that is
very hard to fulfill in systems with large user populations
that remain anonymous.
Sometimes, the logic behind the decision to cooperate or not
changes when the interaction is ongoing since future expected
utility gains will join present ones in influencing the rational
individual's decision. In particular, individual expectations
concerning the future evolution of the social dilemma can play
a significant role in each member's decisions[Hu96]. An interesting
continuation of these experiments may lead to an understanding
of how free riding changes over time.
●●謝辞●●
The authors would like to thank Rajan Lukose, Lada Adamic, Ed
Chi, and Pam Schraedley for valuable discussions. We also thank
Sara Dubowsky for her late night editorial help.
●●参照●●
[Ch85]
Chaum, D., "Security without identification: Transaction systems
to make big brother obsolete," Communications of the ACM,
28(10):1030-1044, 1985.
[Fr00]
The FreeNet Homepage, http://freenet.sourceforge.net/
[Fu00]
The Furi Homepage, http://www.jps.net/williamw/furi/
[Gl94]
Glance, N. S., and B. A. Huberman, "Dynamics of Social Dilemmas,"
Scientific American, March 1994.
[Gn00a]
The Gnutella Homepage, http://gnutella.wego.com/
[Gn00b]
The Gnutella Developer Homepage, http://gnutelladev.wego.com/
[Ha68]
Hardin, G., "The Tragedy of the Commons," Science, 162(1968):1243-
1248. Available online at: http://dieoff.com/page95.htm
[Hu96]
Huberman, B. A., and N. S. Glance, "Beliefs and Cooperation,"
Modeling Rational and Moral Agents, ed. Peter Danielson, Oxford
University Press, 1996, pp. 210-235.
[Hu97]
Huberman, B. A., and R. Lukose, "Social Dilemmas and Internet
Congestion," Science, 277(1997):535.
[Na00]
The Napster Homepage, http://www.napster.com/
[Nat00]
"The Network Address Translation White Paper," @Home Networks,
http://work.home.net/whitepapers/natwpaper.html
[Or00]
Oram, A., "Gnutella and Freenet Represent True Technological
Innovation," The O'Reilly Network, May 12, 2000. http://www.oreillynet.com/lpt/a/208
[St00]
Stop Napster Homepage, http://www.stopnapster.com/
[Wa92]
Waldspurger, C. A., T. Hogg, B. Huberman, J. O. Kephart, and
S. Stornetta, "Spawn: A Distributed Computational Economy,"
IEEE Transactions of Software Engineering, 18:2, February
1992, pp. 103-117.
●●Notes●●
[1]
These included, for example, the localhost (127.0.0.1) address.
[2]
Of tangential interest may be the top number of hosts sharing
files. The top 5 domains are (from most to least) home.com,
rr.com, aol.com, t-dialin.net, and mediaone.net. The top hosts
in query responses are home.com, rr.com, mediaone.net, ks.us,
and pacbell.net.
[3]
The top five domains for queries in the first-level domain
in terms of files shared are: net, de, nl, edu, and ca. For
queries answered they are: com, net, edu, de, and nl.
[4]
Hint: Mix one part mailing list, one part anonymous bulletin
board (see for example [Ch85]),
and one part anonymous re-mailer (add more re-mailers depending
on taste for paranoia).
[5]
If a user requests a bad file (say a bomb or Trojan [St00]),
this file is replicated between all computers from the host
uploading to the host downloading.
Last modified 8/10/00
© 2000 Ian Hall-Beyer. All Rights Reserved.
<manuka@nerdherd.net>
|