Software Systems Spring 2005 For today, you should have: 1) worked on your project 2) read about distributed hash tables Outline: 1) quiz 2) peer to peer networks 3) distributed hash tables For next time you should: 1) work on your project Peer to peer concepts --------------------- The idea of p2p networks makes the most sense in contrast to the primary alternative, centralized client-server networks. "centralized" means that server functions tend to be co-located, often at the "core" of the network most common example in a LAN is a file server colocated with a network switch client-server means that most machines are entirely servers or entirely clients p2p is different on both of these axes 1) everything is at the edge of the network, the only thing at the core is switching The world wide web is a good example of this. 2) machines that use services are expected or required to provide service, too. there are no dedicated servers. well-known p2p services include Napster, Gnutella, FreeNet The advantages of p2p include 1) high system utilization (both bandwidth and server processing) capacity scales with popularity 2) robustness (no single point of failure) Disadvantages include 1) vulnerability to maliciousness wikipedia lists: poisoning, DOS, defection, etc. 2) evasion of legal responsibility this can be an advantage, too In a PURE p2p network, all peers are truly equal, but many deployed systems use centralized indices, or at least super-peers. Many p2p networks are based on the concept of an "overlay network" which is an abstract connectivity model unrelated to the physical connections in the network. This takes some explaining. Indexing in p2p networks ------------------------ Indexing = building metadata that makes it possible to find what you are looking for. 1) a card catalog is an index for finding books by author, title or subject -- basically three sorted lists 2) Yahoo is/was an attempt to organize the WWW into a semantic hierarchy -- basically a tree of unordered lists 3) Google is a content-addressable index -- a mapping from word/phrase lists to ordered lists of URLs p2p networks have historically used a variety of indexing techniques. In some cases they have resorted to a centralized index. Alternatives include 1) flooding (overlay = fixed outdegree graph) 2) superpeers (overlay = tree) 3) unstructured routing (overlay = dynamic graph) 4) distributed hash table (overlay = fixed outdegree graph) Limitation of a hash table: each data item must have a unique key. So how do we get from "I want X" to a unique key? How Chord works: Imagine the set of possble keys arranged in a ring. New peers are placed randomly around the ring. Each peer is "responsible for" the closest keys. lookup(key) is equivalent to finding the closest peer to a given point in the ring. Routing = how does a given node find other nodes? Chord uses "finger tables" How many other nodes does each node know about? How long does it take to handle a query? Alternatives to Chord: 1) Pastry and others use a tree. 2) CAN uses a multi-dimensional Cartesian space. Summary: DHT is a pure p2p solution for mapping from unique ids to nodes. The more general problem of indexing in p2p networks is open. Holy grail: pure p2p implementation of Google-like searching.