This article covers:
Indexing concepts and terminology
Knowing your data and users
Installation and set up
Indexing your data
Querying the database
More than the Basics
What you will learn:
You will learn: how to build a sub-second text searching system; how to index and query this data from within PHP; what Xapian is and how it works.
What you should know:
The knowledge requirements for this article are basic PHP and MySQL skills; however, it helps to have a database with a large volume of data to index!
Good search is at the core of every web site these days. Users need to be able to find what they’re looking for quickly, accurately and they need to be able to filter and sort their results in a variety of different ways.
I was introduced to Xapian many years ago as part of a project to improve the search on one of my client’s websites: ReportBuyer.com. They sell market research reports and have over 250,000, comprising 1.6GB of textual data (titles, subtitles, summary and table of contents) plus a range of numeric and date information (prices, publication date, etc). We also wanted to be able to provide counts for users based on categories and date ranges, in a similar way to eBay’s search.
We looked at a number of different products at the time and Xapian came out on top, primarily because it’s very lightweight, fast (results in microseconds) and provides range search and faceting capabilities. This means that you can run queries like “Find me all reports in vehicle manufacturing priced between £100-£500” and “Find me all reports with all the words ‚tobacco industry revenue‘ published within 6 months”.
In this article, I’ll be introducing some of the concepts behind Xapian, together with some simple examples of code that’s needed to start using the system, which you can expand upon to create a bespoke search engine for your own data.
THIS IS A PREVIEW. DOWNLOAD ISSUE 6 TO READ THE FULL ARTICLE.