View Full Version : Seeking Gnutella Search Keyword Indexer
GnutellaSearcher
April 8th, 2004, 12:13 PM
_________________
I am seeking a gnutella tool that passively monitors searches on the network and saves
the search keywords to an index file so the data can be manipulated. In other words, I
want to be able to connect to the gnutella network, recieve search queries, and save the
search queries to a list.
I know some people have made modified gnutella apps that do this, but I don't know where
to find them. Using the search engines nets no good results for such software because so
many irrelevant site owners spam the index by using gnutella-related keywords in their
META tags to draw higher search ranking.
Anyone with information on where I can find such an app, please post links.
Thank you.
GnutellaSearcher
URICHO_Corp.
April 8th, 2004, 01:00 PM
In shareaza there its a search monitor.
GnutellaSearcher
April 8th, 2004, 01:41 PM
In shareaza there its a search monitor.
Yes, I use Shareaza and I realize it has a search monitor, but that does not solve the problem. The Shareaza search monitor does not allow one to *save* the searches to a file for data mining purposes. There is no way to copy & paste search keywords or save them from Shareaza.
I am seeking an application that does exactly what the Shareaza search monitor does, but also allows the list of searches to be automatically written to a database or flat text file.
GnutellaSearcher
Jorge
April 8th, 2004, 01:45 PM
So you are trying to mine gnutella for data huh? Why don't you just write an app to do what you need it to do?
GnutellaSearcher
April 8th, 2004, 01:48 PM
So you are trying to mine gnutella for data huh? Why don't you just write an app to do what you need it to do?
There are people who have already written such apps, the matter is just finding a download site. I do not have the time to learn a high-level programming language and the gnutella protocol. I'd rather just download one of the existing apps, which seems to be impossible to find with a search engine.
GnutellaSearcher
Sephiroth
April 8th, 2004, 02:02 PM
they are impossible to find because i doubt the spammers release the software they use, and so i dont think that there is program. Those who study the statistics of the network, look at more detailed statistics which determine the health of the network, what people are searching for isnt one of those things.
GnutellaSearcher
April 8th, 2004, 02:51 PM
they are impossible to find because i doubt the spammers release the software they use, and so i dont think that there is program. Those who study the statistics of the network, look at more detailed statistics which determine the health of the network, what people are searching for isnt one of those things.
What does a gnutella query index have to do with spammers? I'm talking about collecting and categorizing search queries, not email addresses. I don't get what you mean about spammers not releasing the software they use, nor do I see the relevance to my question. Would you care to explain in detail?
Sephiroth
April 8th, 2004, 03:51 PM
What does a gnutella query index have to do with spammers? I'm talking about collecting and categorizing search queries, not email addresses. I don't get what you mean about spammers not releasing the software they use, nor do I see the relevance to my question. Would you care to explain in detail?
No but they can then send fake results, that would show up as a spam to visit some web site, also in the past people have faked search replies and when people would download it and open it would be porno or a link to a some porn site or etc. This is not your typical email junk here.
I dont get why you want to categorize and index what people are searching for in the first place. Alot of it would be junk, some servents like bearshare you wont be able to see anything their hosts are searching for if they have the secure channels on because it would be encrypted, you would get some duplicate queries and most of what youll get is from malicious hosts spamming the same searches over and over. Meaning that the data would not represent the actual users of the network but instead would be biased by malicious hosts and buggy servents which is another reason why its not done, that and again there is no practical reason to look at the content of the searches.
Which is why i dont believe that one exists, anyways your best bet would be to ask the developers on some place like The GDF (http://groups.yahoo.com/group/the_gdf/) but unless you have a good reason to be trying to do this i dont think they will be much help.
GnutellaSearcher
April 9th, 2004, 12:52 AM
No but they can then send fake results, that would show up as a spam to visit some web site, also in the past people have faked search replies and when people would download it and open it would be porno or a link to a some porn site or etc. This is not your typical email junk here.
I dont get why you want to categorize and index what people are searching for in the first place. Alot of it would be junk, some servents like bearshare you wont be able to see anything their hosts are searching for if they have the secure channels on because it would be encrypted, you would get some duplicate queries and most of what youll get is from malicious hosts spamming the same searches over and over. Meaning that the data would not represent the actual users of the network but instead would be biased by malicious hosts and buggy servents which is another reason why its not done, that and again there is no practical reason to look at the content of the searches.
Which is why i dont believe that one exists, anyways your best bet would be to ask the developers on some place like The GDF (http://groups.yahoo.com/group/the_gdf/) but unless you have a good reason to be trying to do this i dont think they will be much help.
I appreciate the insights you have offered on this. What I see you saying is that a lot of the search pings on the network are generated by malicious hosts and "fake files" put there by people trying to draw attention to their porno sites? Like when I download an ebook entitled, "Magic Card Tricks" and opening it I find nothing but a link to some smut site or an online order form for David Copperfield videos? That is contiguous to spam, if that's what you're getting at and I have understood correctly.
I am trying to do a search frequency index over a very long period of time to map a diagram and concordance of the "consciousness" behind the most popular searches, but taking into account what you have stated, I would need to have some intelligent manner to distinguish between good and bad results, which would require me to program my own application from scratch, with lots of sorting algorithms and intense AI built in. Argh!
It sounds next to impossible to do, given the input you've posted. I'll need to think much about a different solution.
I did find a news article that detailed the 30 most popular gnutella search keywords, most of them related to porn and pop icons. I was hoping to find something that would detail the most popular thousand keywords over a long period of time, at least several days. I am interested to know the subject matter that dominates the consciousness behind Gnutella searches. Until I figure out a better approach, I think perhaps I can search for articles written by others who have approached this subject and see if there are any reporting trends.
Thanks for all your input. I'll try to remember to post here if I find out anything interesting in my search, God willing.
GnutellaSearcher