PDA

View Full Version : basic book about p2p



frankdowling1
December 10th, 2002, 02:55 PM
I think that p2p is great. I would really like to have a better
appreciation of the technology. HOwever I am not a programmer. All the books that dealing with p2p on Amazon seem to be for programmers.
Can anyone recomend a good book or Internet link that explains p2p technolgy and how it works .

thanks

method
December 12th, 2002, 07:28 AM
I might write a longer, more detailed paper (and grammatically correct/spellchecked version) on this subject some day but a brief idea of some things behind the scenes on p2p development are listed below... It's not a book but a brief explanation of some of the main elements in p2p..

If you're wanting to know about the guts of a specific application... well.. better post that, generally though.. a lot of p2p applications use at least some of the elements below...

(btw.. take all this info lightly, i'm just an amateur developer!!) ;)

------------------------

Networks:

There are a lot of different styles of network, some are true p2p, some are not, some are centralized and some are completely

decentralized. (although in truth, the latter is still fairly rare.)

If you are writing a p2p application you need to think of your target audience and the level of control you want over them.

Personally, I don't like the idea of having responsibility over other users, I don't like authenticating users, I just like

to let them get stuck in and start hammering their bandwidth or communicating with each other.

If you want a business model, you probably need to look at authentication/login systems which ultimately require some sort of

central server. For this you can use login servers (apparently FT does this) and then, upon authentication, the users client

will interract with the rest of the network.

There are 3 main network structures that I'm aware of:

1. Fully Decentralized Networks (with or without login servers)
2. Centralized Networks
3. Independant Server/Client Networks

Of course, some p2p networks exist employing a combination of networking structures. To cover all purposes (distributed

indexing systems alongside a central news provision server for instance) the combination networks are usually the best option

despite the risk of the centralized system been an attack point for the RIAA/MPAA/etc.

With a fully decentralized network, the networks functionality relies on functional-nodes, nodes that are designed to carry

out segments of larger tasks, such as file indexing, building/relaying host-caches, etc. (Like supernodes on the FastTrack

network) Usually the main tasks covered by such nodes are the indexing of files or the indexing of IP addresses of other

users. (hostcaches, etc) Decentralized networks can take many forms, some are cluster-based networks and one, aptly named "The Circle" uses a ring network. For more details on the actual architecture of the network, check out some more detailed documents on networking methods.

Centralized networks need little explanation and little investigation. Napster.. AudioGalaxy.. Their fate is enough to put

any developer off trying to make a centralized system, they have their purposes.. but if the likelihood of your network

enabling distribution of copyrighted material is even slight... have a centralized network will make your project vulnerable

to closure.

Independant server/client networks are similar to decentralized networks in that it's hard to take them down. (I first used

Hotline 5 years ago and it's still going strong, it can't and won't be stopped, same with Direct Connect) Although these

network structures are more confined and the userbase is divided into clusters, the results from these systems are

suprisingly good when well populated. Because there is usually more of a community feeling with this system, it also helps

keep leeches out, kiddy fiddlers get kicked/banned a lot quicker, etc. The biggest problem with this network structure to

date is more down to the people who use it... elitism and the rules that each hub/server operator can set. At least with a

fully decentralized system, you don't get hot-heads making stupid rules. This structure is in use in Filetopia, Fileshare,

Direct Connect, Haxial-KDX, Hotline and of course.. XS ;)

---

Transfers:

Most standard file transfer systems transfer the file sequentially from start to finish. There are also other systems that incorporate multi-sourcing and swarmloading. The two buzzwords just mentioned are a bit of a grey area. People have their own definitions so you can take the next two lines as my definition...

Multi-sourcing: creating a list of file locations so that if one fails, the program can move to the next immediately.
Swarmloading: downloading segments of the same file from different locations for greater speeds.

The big problem (which I think relates more to the donkey, etc. than kazaa), is that some programs do this with large gaps between segments... If a user wants to preview a file, the wide-range swarmloading will be inadequete for this. I guess that's more a pointer for other developers than useful information to people just curious about p2p applications.

Other issues concerning transfers from a p2p developers point of view are reliability and integrity.

We don't people polluting out network, we also don't want the network bogged down with long filenames and paths, instead we use a hashing system. Hashing systems use algorithms to sum the total value of stepped byte/word/long values (like every 217th byte or whatever) in the file and creating a key of some sort to identify files irrelevant of names.

We also want resumed files to be reliable, if there's a slight problem with the writing of an incomplete block, we'll have difficulties, for this reason developers often implement a rollback.. where you resume the transfer from 4KB or so back from where the file got to... from what other developers have said, from the fact this is widely implemented and that I personally know this to be helpful... this DOES help reliability.

We also want to ensure blocks of data are not corrupt through transfers. (Who knows what the RIAA will try if they get to ISPs that are prepared to try and screw with us!!??) - For this we need a CRC on the packets of data when and where it's viable. (CRC means Cyclic Redundancy Check, it's basically where values of a dataset [like a file] are added up in a variable that cycles, when it's over a certain amount [like the max value of a longword=16777216], it's set to zero plus whatever it was over by, thus.. it's cyclic!) this value is used to verify that the data hasn't been corrupted)

There we go, rollbacks, multisourcing, swarmloading, CRC on packets and hashing. Those are the main behind-the-scenes sorta things that p2p developers look at.

---

Security:

When making communications applications there are many things you really need to consider as priority before public releases.

Exploit proofing is essential, ensure you make your protocol use handshakes so that other applications attempting to send data will just be kicked back out. Also when data stream lengths are incorrect or slightly incorrect, kick the connection rather than parsing the information, make standards, make a verification of those standards when data is received.

P2P developers are seeing more and more potential threats too. There are companies trying to pollute networks (as mentioned above, hashing helps protect against this). There are also data-mining companies like Ranger Inc. trying to scan networks for copyright infringers. You can just add IP range blocking to your applications. There are other ways to deal with and ways to make your network intelligent enough to identify these companies but this goes far too deep for me to explain right here!!!... besides, these things are what make or break a project... as a p2p developer, even just an amateur one... this is where i have to shut my mouth for my own good.

I hope I've been of some help though.

Peace!!

frankdowling1
December 21st, 2002, 07:38 AM
Just out of interest. There is a p2p source book which is
highly highly rated on Amazon.com. The book is titled "P2P
Networks: Cracking the Code"
This book although highly rated is a book mainly for programers. 98 % of the book is dedicated to rather complex programming (?).
If you are not a computer programmer this book is a waste
of time and money to you.
There are about 5 pages explaining the basics of p2p networks. However even this is a bit tedious and boring to non-programmers. Some people have children others have programmers.
You will get a better explanation from the sites referred to in this and other threads on the subject in this forum. Thanks again.
The book does come with a Cd with an e-book. Who knows what an industrious person could find by looking,:devil

freddyfreeloadr
December 22nd, 2002, 09:27 AM
yo

frankdowling1
December 22nd, 2002, 03:24 PM
freddie,
what exactly does yo yo mean???::hole :hole

Rahwgwar
December 22nd, 2002, 04:19 PM
Thanx. Now I will be able to use that on my PPT presentation as well.

frankdowling1
December 22nd, 2002, 11:00 PM
Use what in your specifically in your PPT presentation ?

:mellow :mellow

Rahwgwar
December 23rd, 2002, 12:56 AM
Originally posted by frankdowling1
Use what in your specifically in your PPT presentation ?

:mellow :mellow


I'll refer u to my Powerpoint thread which is in regards to this. Hopefully it didnt get deleted like some of my other threads. Basically I am doin a PPT presentation in school on P2P and its effects along w/ why it should be legalized. I still need more legit reasons for it to be legalized, not just stuff like CDs are too expensive, etc. OK after checkin now and doin a search it appears as if it was deleted. Damn that pisses me off......................

frankdowling1
December 23rd, 2002, 05:26 AM
Rahwgwar ,
I cannot seem to find the "Powerpoint" thread. Perhaps someone can refer me to it.

:shoot :aim

frankdowling1
December 23rd, 2002, 06:51 AM
method,

thanks for the great answer !
A question regarding multi-sourcing and swarming.
With Kaaza files can be partially uploaded and listed as either
"aborted" or "completed". This is done with partial downloads.
i.e upload of x file 7 mb 1.4 mb completed the file can be listed as either "completed" or "aborted".
Can you please explain this.

thanks

:fire

method
December 23rd, 2002, 12:34 PM
Hmmm... I've only ever known uploads being aborted when the other side has cancelled the DL. I've never really known a file to say aborted when the user has actually DL'd the end of a file from you. I guess I'm not always that helpful, eh? heh!!

@Rahwgwar...
P2P doesn't need to be legalized... it already is legal... using P2P to distribute or download copyrighted material that you don't already have a legit copy of.. is illegal. P2P being illegal is a common misconception. :;)

Rahwgwar
December 23rd, 2002, 04:19 PM
Method: That's what I meant. Sorry bout' being a bit too vague. But you now how it goes how "ignorant" ppl label us as criminals who are just stealing from the artists. And how RIAA sales have gone down because of well....according to them US. blah blah blah

frankdowling1
December 23rd, 2002, 10:26 PM
A point about p2p and the recording industry. The recording industry itself has funded studies that showd that
p2p had no appreciable effect on cd sales the recording industry has gone after p2p with a vengance.
P2P offered the recording industry a wonderfull new opportunity to market their product.
As I understand it only a very tiny percentage of all recordings
pay for the whole industry. The cost of promotion is great and most groups do not make it.
The recording industry saw P2P as a threat rather than an opportunity. P2P allows extensive promotion at virtually no cost.
One can expand the number of groups that the industry promotes at virtually no cost. If it is all in the percentages then
there will be more successfull cash paying projects.
Secondly the profit margin is great. No factories to make CDs. Little hard costs to make the product. No shipping and
small distribution costs. And no retailers to share with.
There was even an industry funded study (at the time of Napster) whose summary was that teh average American adult
would gladly pay I believe around $ 25 us a month for access to a Napster type service.
We were supposed to be in "the new economy". However
we have ended up with the same jerks in charge. In the case
of the recording industry however the management may have never changed at all.
Aol is a good example of what has happened to the "new
economy'.
Here is a snippet from a business article about the troubles
at AOL. Head office types devoid of reality:

Miller planned to give his top 35 execs the talk of a lifetime. He would let them know that it was up to them, over the following six weeks, to map out a future for AOL. Sure they were disheartened, but after two years of merger madness, infighting, broken promises, and stock options far under water, the company was theirs again.
The speech didn't go as planned. Miller emerged late from an AOL Time Warner board meeting in New York, and his corporate jet couldn't get him to Dulles in time. So as the managers munched on an austere spread of cheese, crackers, and celery sticks, Miller, from a runway at a Teterboro (N.J.) airport, talked to them on his cell phone. His voice came through a single phone, and the execs craned to hear him as he appealed to their wounded pride: "We need to take back the right to define ourselves.
...Miller planned to give his top 35 execs the talk of a lifetime. He would let them know that it was up to them, over the following six weeks, to map out a future for AOL. Sure they were disheartened, but after two years of merger madness, infighting, broken promises, and stock options far under water, the company was theirs again.
... Yet AOL has lined up only 650,000 subscribers for its broadband package. ... Worse, the company is losing some 200,000 subscribers per month to other broadband providers. And AOL's dial-up market is expected to start shrinking by the end of next year.
.....What will replace it? AOL plans to charge from $2.95 to $17.95 per month for a host of offerings. They range from phone services that electronically read e-mail to interactive shows featuring the galaxy of Time Warner stars. The whole industry, from MSN to Yahoo (YHOO ), is chasing similar visions. But to date, premium services haven't sold. "It's going to be difficult when so much free stuff is on the Web," says Jupiter Research analyst David Card.
Can you believe it. talk about dial up dinosaurs ?
What happened to the "new economy" ??


b.t.w. the link to the full article is:
http://www.businessweek.com/magazine/content/02_50/b3812080.htm




:mellow :mellow

zaphodiv
December 24th, 2002, 02:44 PM
Method posted a good overview of p2p systems.

>If you want a business model [...]
then do somthing other than PC software. Metamachine has
said they consider voluntary registrations to be their
main source of income not the banner ads.

>Who knows what the RIAA will try if they get to ISPs
>that are prepared to try and screw with us
>a CRC on the packets of data when and where it's viable
If an adversary can corrupt the data in a packet coming
in to you then your adversary can also recalculate a
CRC so it does not appear to have been changed in transit.
Sending the CRC first, instead of after the data, dosn't
work either because finding some corrupt data that gives
the same CRC isn't hard enough. Finding a CRC collision
is much easier than finding a hash collision.
Only secure solution is link encryption (which needs
public key crypto+stream cypher).

I think that this is very unlikly to be a real problem,
if action is taken agaist p2p at ISP level they won't
bother with man-in-the-middle attacks, they will
use letters, disconnections and perhaps new laws to
fine people.

Crypto _dosn't_ solve the general problem of adversarys
putting false data into the system.

>P2P developers are seeing more and more potential threats too.
When a file is more than one chunk in edonkey and
overnet, the hash if the whole file is defined
to be the result of hashing the hashes of the
individual parts. (yes the hash of the hashes not
the hash of the data). This is very cunning, it
makes to harder to disrupt the system by giving
peers incorrect chunk hashes.

Unfortunetly the rc4 hash algorithm is not as
cryptographically strong as some other hashes.
It's probably still ok, there are more effective
ways to disrupt ed/overnet.

method
December 24th, 2002, 04:34 PM
good point.. i did miss one or two security elements.. encryption imparticular.. thnx for covering that :;)