Saturday, March 29, 2008

baamboo


It seems that Baamboo, a very popular music search engine in VN, uses SQL full text search. This discovery makes me pretty surprised since all of the people I know use Lucene as soon as they want to create a search engine.

Lucene is very good at what it does. It’s indexing and storage performance is second to none. In fact, it’s so fast that a lot of companies use it as a quick-and-dirty storage dumping ground for raw data, knowing that it will be much faster and more scalable than a relational database. Why not take advantage of this incredible power and take one more item off of your database’s back? This is all not to mention the fact that a Lucene index query is probably a lot faster than an SQL query grabbing data from a Microsoft SQL Server full-text index.

If I were the designer of Baamboo, I'd use, yeah you got it already, Lucene and its sub-projects to do searching. A quick draft architecture should be a combination of Nutch and Solr, i.e. using Nutch to crawl the Internet, then posting found data to Solr which of course is powered by Lucene to index, and finally leveraging Solr to search within your applications using one of sp many interfaces Solr provides.

Easy money, isn't it? Not actually since Nutch and Solr require a steep learning curve. Customizing and operating them the way you want require a handful of experienced Java programmers and sysadmins. But I'm sure it's worth the effort. I'm myself on my way to implement a vertical search engine powered by these excellent open source softwares. They're really fast and scalable. I can't be pleased more.

BTW, it seems that Baamboo needs a better sysadmin since their error page gives out too much information.

1 comment:

Server said...

Hi Thai,

just wanted to leave a comment that I enjoy reading your blog very much. In fact your blog is one of the most useful blogs I found. I have been reading your blog for 2 years or so, and I can tell that I learnt from what you wrote more than from most forums or other websites about sysadmin and related stuff. Thanks so much for sharing your experience and opinions with others.