One of my initial concerns with IndexedDB was how it would handle aggregates (counts, sums, averages etc.). Most of the IndexedDB tutorials/examples I found online seemed very simplistic, showing basic tasks such as putting an object into a store, getting an object out of a store, and using cursors to iterate over a set of objects; but there were no good examples of aggregating objects. Also, at that time, Chrome 17 had not yet implemented the IDBObjectStore.count() and IDBIndex.count() methods as per the W3C spec, so the only way to get a count of objects from an object store or index was through a cursor iteration.
My other concern with IndexedDB was regarding relationships/joins. In the relational-world, when in comes to modelling database entities the rules are usually pretty clear cut (thanks to E. F. Codd’s theory of normalisation). Conversely, the schema-less nature of NoSQL databases means that there is often more than one way to model your data. It’s up to you, as a developer, to determine which model best suits your application.
For this exercise, there are three different approaches to modelling the data:
- Separate object stores for each of the three entities (corresponding to the three tables in the WebSQL database), i.e. a “Programs” object store, a “Series” object store, and an “Episodes” object store. This is a many simple objects design.
- A single object store for storing everything; and a “type” attribute on each object to distinguish between them. Programs, series & episodes would still be stored as separate objects, but all in the same object store.
- A single object store that stores only Program objects. Within each program object would be an array holding it’s set Series objects, and within those nested Series objects would be an array containing it’s Episode objects. This nested object structure takes full advantage of IndexedDB’s ability to store complex objects; something that a relational database such as WebSQL cannot do. This is a few complex objects design, which works well for one-to-many relationships (but no so well for many-to-many relationships).
Each has it’s pros and cons. The first approach is probably most familiar to relational database designers, and has the benefit of providing very granular access to each entity’s records, as well as the ability to use those native count() methods I mentioned earlier. The downside is that retrieving objects from multiple object stores (ie. the equivalent of SQL’s “JOIN” keyword) has to be done by manually merging the results of multiple requests.
The latter approach sacrifices granular access to child objects in exchange for simplified joins (i.e. get a program object, and you automatically get all of it’s child series and episode objects along with it).
For simplicity, I chose the first approach (separate object stores).
I wanted to compare the performance of a
reasonably complex SQL query consisting of multiple joins, aggregates with a group by/having clause, an order by clause, and an in-line case expression. An example of this from my
tvmanager app is the initial view for the app called the “Schedule” view. The Schedule view lists any series that you either have, or expect to have, recorded to your DVR. The criteria for inclusion in the Schedule view is:
- any series that is marked as nowShowing OR
- any series that has at least one episode with a status of Recorded OR
- any series that has at least one episode with a status of Expected
For each series in the Schedule view, we also include additional information from the parent program (ie. it’s name), as well as some summary aggregate data from it’s child episodes (total # watched, # recorded, #expected etc.), and finally we order the results by the day of week that they are shown, then by program name.
In WebSQL, we achieve all of this with a single query:
Note: there are no indexes of any of the tables in the WebSQL database. I did run some benchmarks after adding indexes on Episodes(EpisodeID, Status), Series(SeriesID) and Series(NowShowing); but there was little improvement, as the query essentially has to do a full table scan of the Series table. But it’s worth noting anyway.
The code for the schedule() function for WebSQL looks like this:
An equivalent operation in IndexedDB is much more complex, requiring 3 + 3*N separate database requests (where N is the number of series that meet our above criteria) to retrieve all of the information. Let’s look at each of these in more detail…
The first task is to identify the series objects that meet our criteria, which you’ll recall has three elements.
Firstly, we need to get the list of series objects that are regularly airing (i.e. have a non-null nowShowing property), which is pretty easy as we have an index for that (“series_nowShowing”). So we open a cursor using a key range with a lower bound of 0, iterate over the cursor and populate an associative array. We use an associative array here rather than a normal array, because it’s likely that there will be duplicates when we get to our next steps, and we want to filter those out.
Secondly, we want the list of series objects that have at least one recorded episode. Again, we can do this using an index on the episodes store (“episodes_status”), and specifying a single key value of “Recorded”.
Finally, we want the list of series objects that have at least one expected episode. This is identical to the above, just with a different key value.
For performance reasons (and, let’s be honest, because no browsers yet support IndexedDB’s synchronous API), we can fire off each of these three requests asynchronously, and wait for a signal from each to indicate that it has completed (this is the processCandidates.call(this) at the end of each request).
Now that we have our candidate series, we need to loop through these and collect the remaining information required for display (e.g. program name, episode counts etc.). We start by getting the series object itself.
Note: You might notice that I’m still using openCursor to get a single series object. I’m not sure if it’s a bug with Chrome, or I was doing something wrong, but I simply couldn’t get get() to work (I kept getting a DataError exception, which generally means that the key you’re attempting to get doesn’t exist, even though I knew the key existed). Using a cursor seemed to solve the problem, but probably incurs some extra overhead beyond a simple get.
Once we’ve got the required values from the series object (including the programId it belongs to), we get the related program object (again, using openCursor() because of the issue with get())
While we’re firing off a request to get the program, we can simultaneously do the request for the related episodes. We use a cursor to iterate over all of the episode objects for the series, and manually calculating the total/watched/recorded/expected counts.
As per earlier, at the end of each of our parallel requests we set a flag to indicate the request is complete, and call updatePopulatedCount(). Only when both requests for a given series are done will we update our progress counter.
Finally, once we have all the data we need, we sort the results in the same order as we did for the WebSQL version; before rendering the HTML table in the same way as we did for WebSQL.
The Results
As demonstrated above, the code for IndexedDB can be a lot more complicated, as we’re having to do a lot of the work ourselves in JavaScript that is otherwise provided natively by SQL.
That said though, WebSQL takes on average between ~750-850ms to complete the query and render the results; and IndexedDB takes on average ~300-350ms to render the exact same results. Thats over twice as fast.
At this point I should add the disclaimer that
- My timing method (diff between two JavaScript Dates) is not particularly scientific, and is at the mercy of the browser’s main event loop. Given that we’re measuring down to the millisecond, it’s probably not 100% accurate.
- The code could possibly be optimised more (either on the WebSQL or IndexedDB sides). I didn’t spend a great deal of time squeezing every last ounce of performance out of it.
If you spot something in the code that is causing the results to be skewed (one way or the other), please let me know.