Adding Full Text Search to a Core Data App Using SQLite
Full text search is one area where the base iOS SDK doesn't have a lot to offer. You can get something that looks kind of like full text search (if you squint) with NSPredicate
and CONTAINS
or BEGINSWITH
, but it becomes really slow (see Performance) with larger data sets. And glob help you if you need to search multiple properties on multiple entities as it just compounds the problem. Anecdotally, this seems to be the route a lot of developers have taken (myself included) because there's not a lot of info out there on the alternatives.
Indeed I found myself in this situation again recently having implemented the search functionality in Rego using NSPredicate
and filteredArrayUsingPredicate:
. To make things worse, our users need to be able to search not only the names of their places but also all the text in both their notes and photo captions. As a user's collection grew, search performance would degrade, especially on older devices (I don't think many people realize just how much faster the iPhone 5 really is). And I quickly reached the limits of how fast I could make it feel using performSelector…afterDelay
and dispatch_async
.
Fortunately there is another option to consider.
Roll Your Own
SQLite offers full text search functionality through the optional FTS3 and FTS4 extensions. The problem we run into almost immediately is that these extensions are not compiled into SQLite by default and are only available in the iOS 6 SDK and later 1.
If we can come to grips with the fact that we need to do a little legwork we find there are actually a few different ways to include our own SQLite library with FTS enabled. We could add it as a separate static library project to our workspace. We could also add it as a pre-built static library. I'm going to be lazy and take advantage of the fact that the SQLite source is available as a single C source file and header (the Amalgamation) that I can drop right into the project.
First download the latest source files from the SQLite site. Then add just sqlite.c
and sqlite.h
to your project in whatever way pleases you the most.
Now we need to tell SQLite to include the FTS extensions by setting a couple of preprocessor macros. To do this, select your project from the Project Navigator on the left, select "Build Settings" and add the following "Preprocessor Macros" to all of your build configurations:
SQLITE_ENABLE_FTS3
SQLITE_ENABLE_FTS3_PARENTHESIS
The only thing left to do now is make sure that you're not including the default SQLite framework. To do this, select your target(s) and make SQLite isn't included under "Build Phases" and "Link Binary With Libraries".
Eff Emm Dee Bee
For this post, and in my own code, I'm going to use the excellent FMDB Sqlite wrapper written by Gus Mueller (and others) to talk to SQLite. FMDB provides a convenient Objective-C interface, supports both ARC and reference counted projects and is, in general, quite swell. Gus also sells awesome Mac apps, blogs about pizza, and bought me breakfast at WWDC once.
To add FMDB to your project you can use CocoaPods (if that's your thing), or just download the source files and copy over the contents of the src
directory (minus fmdb.m
).
Full Text Search - The Sequel
So now we have our own SQLite library and a way to talk to it. Let's set up a new database to store our index.
// Create (or open) our database
NSString *dbPath = [NSSearchPathForDirectoriesInDomains(NSLibraryDirectory, NSUserDomainMask, YES) objectAtIndex:0];
dbPath = [dbPath stringByAppendingPathComponent:@"Index.sqlite"];
// Using the FMDatabaseQueue ensures that we don't accidentally talk to our database concurrently from two different threads
FMDatabaseQueue *queue = [FMDatabaseQueue databaseQueueWithPath:dbPath];
[queue inTransaction:^(FMDatabase *db, BOOL *rollback) {
[db executeUpdate:@"CREATE VIRTUAL TABLE IF NOT EXISTS docs USING fts4(name, contents);"];
}];
You'll want to read the docs, but basically I've just created several new tables in our database that we'll talk to as if it was a single table named 'docs'. This table has two columns, name
and objcontents
. It also has a primary key column but we won't be using that in this example.
There are a few different options you can consider when creating your index table. An interesting one is the objcontent=
option. Supplying an empty string tells SQLite not to store the content you insert into the table, just the index. This option introduces some quirks to the way the primary keys work and prevents you from using some niceties like the snippet function but it could be useful if you had to index a lot of data and storage was a concern.
Let's add some content to our index.
[queue inDatabase:^(FMDatabase *db) {
[db executeUpdate:@"INSERT INTO docs (name, contents) VALUES(?, ?);", @"doc1", @"She sells sea shells by the sea shore."];
}];
If you have any experience with SQL this should look very familiar. Anyway, now that we have some data we can try a search.
__block NSMutableArray *matches = [NSMutableArray array];
[queue inDatabase:^(FMDatabase *db) {
FMResultSet *resultSet = [db executeQuery:@"SELECT name FROM docs WHERE docs MATCH ?", @"she*"];
while ([resultSet next]) {
[matches addObject:[resultSet stringForColumn:@"name"]];
}
}];
This query will match both "She" and "shells" and return one result with the name "doc1" which I'm storing in an array. You should also note that the FMDatabaseQueue methods are synchronous (or blocking) and if you want to use a background thread you'll need to handle that yourself.
So now we can find the rows that match our string but what if we wanted some context for those matches? In that case we could do something like this:
FMResultSet *resultSet = [db executeQuery:@"SELECT name, snippet(docs, '[', ']', '', -1, 3) AS snippet FROM docs WHERE docs MATCH ?", @"she*"];
This would return results with an additional column named "snippet" that included the text our query matched in braces with some additional text included before and after. See the docs for more on the snippet function.
But I Use Core Data…
Sure you do. So do I. And I don't think either of us need to rewrite our Core Data apps to take advantage of our new found power. By using the NSManagedObject
life cycle methods we should be able to automatically index our model content.
Lets imagine we have a model named Post
. Each post has a title and a body. First we need to add a new property to our posts that will store a unique ID for that object in the index table. Lets add a new string property to Post
called indexID
.
Then, to make sure we keep track of changes to our object we update our index from both willSave
and prepareForDeletion
:
- (void)prepareForDeletion
{
[super prepareForDeletion];
if (self.indexID.length) {
[queue inDatabase:^(FMDatabase *db) {
[db executeUpdate:@"DELETE FROM docs WHERE name = ?", self.indexID];
}];
}
}
- (void)willSave
{
[super willSave];
if (self.indexID.length) {
[queue inDatabase:^(FMDatabase *db) {
[db executeUpdate:@"UPDATE docs SET contents = ? WHERE name = ?;", self.body, self.indexID];
}];
}
else {
__block NSString *name = [NSString UUIDString];
[queue inDatabase:^(FMDatabase *db) {
if (![db executeUpdate:@"INSERT INTO docs (name, contents) VALUES(?, ?);", name, self.body]) {
name = nil;
}
}];
self.indexID = name;
}
}
In the prepareForDeletion
we make sure to delete our object from the index when it's deleted from Core Data. In willSave
we look to see if our object already has an index ID and if not, we generate a UUID using a category on NSString
, add this object to the index and set the index ID of our object to the new UUID. If our object already has an index ID we just update the index.
Now our index will stay updated as we create, update and delete new Posts. When it's time to find them again we just query our index for an array of index IDs matching our search terms and perform a Core Data fetch for those objects.
Performance
I wrote a simple sample app in order to see what kind of performance gain I might see from using a separate full text index in SQLite. The app loads 1682 text files from textfiles.com (where else?), that's about 42mb of plain text, into both Core Data model objects and an SQLite database. I then timed how long it took to find a single word using both a full text query and a core data fetch request. For one query running on the main thread on slowest device I have (a 4th gen. iPod touch) the Core Data fetch took 9.34 seconds while the SQLite query only took 1.48 seconds. This was a quick test, YMMV and all that, but I was pretty happy with the results.
Keep in mind that's an Apples to Apples comparison too. Because the index is separate from our model we can consider doing clever things like combining together the text properties of an entity and all its children into a single row in the index.
Index All The Things!
Seriously though, this is all still pretty new to me. There may be issues with either the FTS extensions or this specific approach that I haven't thought of or discovered yet. Please consider what's here a starting point, not gospel, and let me know what you discover.
1 Updated on 2013-05-19: I received confirmation from an Apple Engineer that both the FTS3 and FTS4 modules are included in iOS 6 and later. It seems FTS3 might also be included in iOS 5.x but if you need to support iOS 5 I would suggest including your own build.