PDA

View Full Version : Unhierarchical distribution of web-search-engine-index.


View Full Version : Unhierarchical distribution of web-search-engine-index.


ormm
May 12th, 2005, 01:38 AM
Open letter to the developer and supporters of free p2p technologies.

I am developing a search engine spidering www for ogg and mp3 files, at the time I am writing this is c. 70 000 files indexed. (http://openmusic.op.funpic.org/catalog/)

This is a brief discussion of ways to distribute this index in an unhierarchical way built over existing p2p networks, it is submitted here as a request for opinions on how it could be implemented without violating and/or disturbing protocol specifications and usability of the networks.

The motivation for implementing this is that many unsigned artists releases their music on the internet but those files are rarely available at p2p networks. This design is also in contrast to central servers since it will take too much computer power to provide such service to the whole worlds p2p-networks.

The spidering and updating of index is centrally made by our servers, with an engine released under GNU/GPL an sourceforge. It could be done in an unhierarchical way but we insist that p2p-clients must remain free from such addons and only implementing things that directly benefits the user.

An conceptual approach on the problem:
The index is split in many small files (ex. 100) which will take c. 300kb each, the index is ordered by the artist name and the client searching for an artist downloads the meta-file containing an url to it and additional redundant information about other artists. This makes it impossible to search for a particular song without knowing the artist but I consider it appropriate. The client shares that meta file and makes in this way it available for more users. The wanted targetfile is downloaded via http.

Inconvenience
This might cause a dissonance with clients that not is designed with this use in mind, since these files containing the index will be viewed by people browsing the host expecting to find "real" files and not a set of meta data.

This would be a vulnerable system to erroneous data and spam, but it is more a question about the p2p concept than this way of using it.

Worth to consider implementing or not?

Sincerely Johan Mattsson

You can find a part of the index here: http://openmusic.op.funpic.org/catalog/

larytet0
May 12th, 2005, 09:15 AM
70 000 of MP3s do not sound like a lot. XML file describing this can be 70M ? 35M in ZIP. keep it on BT tracker.
when growing larger, divide by pages 0,A,B,C, ...
i can add plugin to my applycation to search the database (if XML formatted) no problem

ormm
May 13th, 2005, 03:42 AM
70 000 of MP3s do not sound like a lot. XML file describing this can be 70M ? 35M in ZIP. keep it on BT tracker.
when growing larger, divide by pages 0,A,B,C, ...
i can add plugin to my applycation to search the database (if XML formatted) no problem

it is already divided in 100 files and you only downloads the chunk you need so it is possible to update it partly without downloading the whole index each time

Afn
May 13th, 2005, 06:19 AM
Open letter to the developer and supporters of free p2p technologies.

I am developing a search engine spidering www for ogg and mp3 files, at the time I am writing this is c. 70 000 files indexed. (http://openmusic.op.funpic.org/catalog/)

This is a brief discussion of ways to distribute this index in an unhierarchical way built over existing p2p networks, it is submitted here as a request for opinions on how it could be implemented without violating and/or disturbing protocol specifications and usability of the networks.


Why unhierarchical?

ormm
May 15th, 2005, 05:27 AM
Why unhierarchical?

since datamigration in p2p networks makes it possible to distribute this to many people using p2p clients without setting up a central server, where should I and the open source comunity get money to do this?