|   Home   |   About Us   |   Services   |   Research Tips   |   Speeches & Presentations   |   Blog   |  
 
|  Articles   |   Books   |   White Papers   |   News   |   Info Brokering   |   Coaching   |   Contact Us   | Site Map   |

January 2008 InfoTip: Clustering On Demand

I was recently doing some research for a client on the topic of social capital (see, for example, Robert Putnam's book, Bowling Alone). It's a difficult topic to search and, of course, I retrieved kajillions of results from several search engines. I went through as many of them as I had the patience for, and I tried a number of refinements to further focus my search. But I found it difficult to find what I wanted in the major search engines.

Then I remembered hearing about Carrot2, an open source search-results-clustering engine, just recently out in beta. In a nutshell, it takes search results, analyzes them and, on the fly, creates groups of the most common concepts or terms from those results. Since this is all done by algorithms rather than by humans, expect the odd result every once in a while, but I found the clusters to be consistently useful.

Carrot2's default is to search the web using eTools.ch, a Swiss meta-search engine that queries 10 search engines, including Google, Yahoo, Ask and MSN. However, since eTools only returns the top 20 results from each search engine, I prefer not to use eTool search results. Instead, you can click a tab to limit your search to Google, Yahoo, MSN, Wikipedia, PubMed and a few other finding tools. Because clustering is a computationally intensive process, Carrot2 limits the search results by default to the top 100 results from any of the search engines. However, you can click the Show Options link and set Carrot2 to search and sort up to 400 results. (Note that increasing the number of search results also increases the number of results from each search engine when using the eTools meta-search engine from 20 to 40.)

Geek that I am, I find it even more intriguing that, under that "Show Options" link is a pull-down menu that lets you select which of six different sorting algorithms you want to use. The clustering results are dramatically different (although keep in mind that the search results themselves stay the same -- only the clusters change). With my "social capital" search, I was able to see a variety of groupings of my search results, and identify some of the key writers and terms.

Carrot2 may not be your day-to-day search tool, but it is tremendously useful for those searches in which it is difficult to sift the wheat from the chaff.

Sign up for a BatesInfoTip -- Bates Information Services' monthly e-newsletter.

Your email address:
Your first name:
Your last name:
Your city:
Your state/province:
Where did you hear about BatesInfoTips?

XML image
Add this feed to My Yahoo
Or use Subscribe to RSS Feed to add this feed to your RSS reader

I syndicate a value-added version of this newsletter for redistribution within an organization. Raise the profile of your information center and build information literacy with this monthly feature to your clients or patrons. Contact me for more information.

Copyright © Bates Information Services, Inc. All rights reserved.

Site Design by judithmcElhinneyDesign