Wednesday, March 19, 2008

Thrudb on Ubuntu - installation guide

I've been playing with Thrudb project for the last several days. Basically Thrudb is a set of simple services built on top of Facebook’s Thrift framework that provides indexing and document storage services for building and scaling websites. Its purpose is to offer web developers flexible, fast and easy-to-use services that can enhance or replace traditional data storage and access layers. It’ is also an alternative to Amazon’s recently announced SimpleDB which Jake, Thrudb's creator, has provided some commentary on.

Below are step-by-step instructions to go from zero to running Thrudb on your Ubuntu box. It's mostly based on the excellent guide from YourSharade with some small modifications to meet the current Thrudb source code.

1. Initial Setup

Alright, we’re ready to start installing. Most of the dependencies are available with apt-get, but some we’ll have to build from source. The first thing we need for thrudb is thrift, which itself has quite a few dependencies.

First let’s make a directory to put our files in:

$ mkdir buildthrudb
$ cd buildthrudb

Next we’ll update apt-get and install our build tools and thrift dependencies:

$ sudo apt-get update
$ sudo apt-get -y install subversion g++ make flex bison python-dev libboost-dev libevent-dev automake pkg-config libtool

Also, since we’ll need it later, let’s update CPAN, the Perl module install tool now:

$ sudo perl -MCPAN -e "install Bundle::CPAN"

CPAN takes 10 minutes or so to update. When prompted about “manual configuration”, enter no. Eventually you’ll get 3 more prompts, to which the defaults are fine (Update configuration for libnet: no, Perl expression: exit, warning: y). Just hit return at each prompt.

2. Thrift

Now let’s grab thrift from SVN:

$ svn co thrift
$ cd thrift

Then build and install it (don’t worry about the warnings):

$ cd thrift
$ ./
$ ./configure
$ make
$ sudo make install

Again, this will take a few minutes.

3. Thrift Client Libraries

Now that thrift is installed, we need to install the client libraries for whichever language(s) that we’re planning on using. The C++ and Python libraries are installed by default, but this guide will focus on Java, Ruby, Perl as examples. If you get other client libraries working, leave a comment with the steps taken and I’ll amend this post.

For Java, first we need to update Java and ant:

$ sudo apt-get -y install sun-java5-jdk ant

This will take another 5 minutes or so. When prompted with the Java license, hit twice. Now we’ll build and install with:

$ cd lib/java
$ sudo ant install
$ cd ../..

This will install the thrift JAR file to /usr/local/lib/libthrift.jar. For the Perl client libraries, we have another dependency to install:

$ sudo perl -MCPAN -e "install Bit::Vector"

Enter yes at the first prompt, then accept the defaults for the dozen or so prompts that follow. Now we can build and install:

$ cd lib/perl
$ perl Makefile.PL
$ make
$ sudo make install
$ cd ../..

For the Ruby client libraries:

$ cd lib/rb/
$ sudo ruby setup.rb install

Please take a look at lib/*/README if you want to install other client libraries. Alright, done with thrift.

4. Thrudb Dependencies

Let’s go back to our build dir, and start on the other dependencies for thrudb:

$ cd ..
$ sudo apt-get -y install memcached libexpat1-dev libssl-dev libcurl4-openssl-dev liblog4cxx9-dev uuid-dev libboost-filesystem-dev libmysql++ libdb4.5++-dev

We need Brackup, which has a couple of dependencies. CPAN seems to have trouble installing them all in one go, but one at a time seems to work:

$ sudo perl -MCPAN -e "install DBI"
$ sudo perl -MCPAN -e "install DBD::SQLite"
$ sudo perl -MCPAN -e "install Brackup"

These will take a few minutes each. Accept all defaults when prompted.

There are 3 dependencies that we’ll install from source: libmemcached, Spread, and CLucene (since, as of this writing, apt-get has Spread 3.x and we need Spread 4.x; the same with CLucene, apt-get has 0.19.x and we need the latest from svn).


Now we’ll get, build, and install libmemcached:

$ curl | tar xzf -
$ cd libmemcached-0.12
$ ./configure
$ make
$ sudo make install
$ cd ..


Spread requires filling in a form to download it, which will be tricky from the command line, but curl to the rescue (replace the values of name, company, and email in the first command with your own information):

$ curl -L -d FILE=spread-src-4.0.0.tar.gz -d name="Thrudb User" -d company="Thrudb User" -d email="" -d Stage=Download | tar xzf -
$ cd spread-src-4.0.0
$ ./configure
$ make
$ sudo make install
$ cd ..


In order to build CLucene, we need libltdl3-dev:

$ sudo apt-get install libltdl3-dev

Now we'll checkout, build, and install the latest version of CLucene:

$ svn co clucene
$ cd clucene
$ ./
$ ./configure
$ make
$ sudo make install

Somehow clucene-config.h, which is expected to be in /usr/local/include/CLucene/, but for some reason is actually in /usr/local/lib/CLucene/. To fix this, we just copy the file:

$ sudo cp /usr/local/lib/CLucene/clucene-config.h /usr/local/include/CLucene/

Update Shared Libraries

For linking to work right later, we need to update our shared libraries with:

$ sudo /sbin/ldconfig

5. Install Thrudb

Well, if you’ve made it this far, congratulate yourself. We’re now ready to actually install thrudb! First we’ll get it from SVN:

$ svn co thrudb
$ cd thrudb

To build and install everything:

$ sudo make

You can build specific portions, thrucommon is required, everything after is if you want it:

$ cd thrucommon
$ ./bootstrap
$ ./configure
$ make all
$ sudo make install

Repeat for each of the pieces you want, i.e. thrudoc, thrudex, thruqueue, throxy (throxy is not ready yet though).

6. Thrudb Client Libraries and Tutorials

And finally, we just need to build the thrudb client libraries (similar to what we did for thrift), and tutorials if we want to test it out.

To build the client libraries, you just need to run /usr/local/bin/thrift on Thrudoc.thrift and Thrucene.thrift and specify which language to generate. However, there’s a handy Makefile in the tutorial directory that will take care of this for us:

$ cd tutorial
$ make

Start thrudoc and thrucene with the handy control script:

$ ./thrudbctl start

At last! Unfortunately, since thrudb has been gone under some major changes recently, only Python and Perl tutorials work out of the box. I'll try to fix other tutorials soon, stay tuned!

Let’s run the Python tutorial:

$ cd python
$ python

You should see something like:

*Indexed file in: 0.20 seconds*

Searching for: tags:(+css +examples)
Found 3 bookmarks
1 title: Dynamic Drive CSS Library- Practical CSS codes and examples
url: (
tags: (css examples)
2 title: Dynamic Drive DHTML Scripts -DD Tab Menu (5 styles)
url: (
tags: (cool css examples menu)
3 title: Uni-Form
url: (
tags: (examples CSS)
Took: 0.00 seconds

Searching for: title:(linux)
Found 4 bookmarks
1 title: Debian GNU/Linux System Administration Resources
url: (
tags: (linux administration tips)
2 title: Linux Scalability
url: (
tags: (linux sysadmin ulimit)
3 title: Set Up Postfix For Relaying Emails Through Another Mailserver | HowtoForge - Linux Howtos and Tutorials
url: (
tags: (email linux server)
4 title: ZFS on FUSE/Linux
url: (
tags: (zfs linux fuse)
Took: 0.00 seconds

*Index cleared in: 0.06 seconds*

The Perl tutorial needs one more dependency:

$ sudo perl -MCPAN -e "install Class::Accessor"

Again, accept the default when prompted. Then run with:

$ cd perl
$ perl

You should see output similar to the Python example.

7. Next Steps

Congratulations! You now have thrudoc and thrucene running on your own Ubuntu box. From here, you can poke around in the tutorial directory and have a look at the *.conf files. You may also want to join discussion list.

No comments: