I recently worked on a project to speed data search in a medical data server. The server used MongoDB to store the data, and it had to handle many complex queries. These queries were often slow, which made it hard for users to get the information they needed quickly. Partially, the problem came from the fact that the queries were complex and involved many joins (lookups in the MongoDB world), which MongoDB doesn’t handle as efficiently as relational databases. Due to constraints and the nature of the data, I couldn’t denormalize the database for faster queries or use a different database system.
I needed a way to make these queries run faster without changing the database structure or the way the data was stored. The goal was to optimize the queries themselves so that they could be processed more efficiently by MongoDB.
I decided to take a different approach. Instead of trying to optimize the MongoDB queries directly, I focused on rewriting the queries before they were sent to the database. This meant I could change how the queries were structured and make them more efficient without altering the underlying data model. I could also use my domain knowledge to simplify queries adaptively based on the data being queried.
To implement this, I decided to first represent the queries as Abstract Syntax Trees (ASTs). An AST is a tree representation of the structure of the query, which allows for easier manipulation and optimization. By converting the queries into ASTs, I could analyze their structure and apply various optimizations before converting them back into MongoDB queries.
Specifically, I used a series of rewriters to optimize the ASTs. Each rewriter focused on a specific aspect of the query:
- One rewriter handled in-process joins, which allowed me to combine data efficiently.
- Another rewriter replaced parts of the queries with cached data, reducing the need for repeated database lookups.
- A third rewriter removed unnecessary parts of the queries entirely, streamlining the overall query structure.
The result was a big improvement. The server found information much faster – sometimes thousands of times faster! This eliminated the performance pressure some of our key clients experienced, allowing them to access data quickly and efficiently.