PCI Chapter 4 – Searching and Ranking

I am currently revisiting chapter four of Programming Collective Intelligence, in which they build a full blown search engine. Many features of existing search engines are explored and tried. I think it would be neat to create a search engine for What Movie Now?

Doing this would require crawling some movie information repository for information in each movie in set. This should be easily obtainable from Wikipedia. This will be the basis for material to both search and retrieve as search results.

Afterwards, I would need to index all the documents retrieved and store this information in a database.

The last step (and this is the interesting one) would be to write a query that returns a ranked list of documents based on keywords supplied. I.E. – A movie information search engine.

This resultset would be returned to search request and, woohoo, a search engine is born. The methods for doing this is not much different from ranking user ratings…

One cool thing book points out is the amount of metrics that can be gathered from what the users search for, the returns they get and, most importantly, which documents do they click on. Very neat indeed.

I would be outlining the details of each step here. This is pretty much my too list; more later.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s