The datastore is one of the most heavily-used App Engine APIs, and it was one of the first components that AppScale implemented. It has seen many iterations, including (at one time) support for 12 database backends. Now that we’ve settled on Cassandra as our supported database, we’ve reached much greater stability. However, we are always looking for ways to increase the performance of the datastore while maintaining that stability.
AppScale 3.0 brings a much-needed update to the way that the datastore handles transactions, and it allows us to support even larger datasets than before. To explain the necessity of this change, I’ll outline how transactions used to work and the problems we encountered with that approach.
Transactions Before 3.0Cloud Datastore transactions can consist of multiple operations over separate API calls before a final COMMIT call is made.
BEGIN TRANSACTION 2 GET GREETING:001 PUT GREETING:001 DEL GREETING:002 PUT GREETING:003 COMMIT TRANSACTION 2
In order for these operations to be durable, the datastore needs to keep track of their effects until they are committed. Prior to 3.0, the datastore wrote entity data to both an entities table and a journal table during every PUT and DELETE operation. In the entities table, the datastore stored a transaction ID alongside every entity.
Since transactions are not always committed, the datastore needed a way to keep track which items in the entity table were valid. This was done by storing transaction metadata in ZooKeeper. Every time a transaction was started, a corresponding entry in ZooKeeper was created to indicate that the transaction was in progress. If the transaction failed, the transaction ID would be added to a blacklist that ZooKeeper maintained.
During every GET and query operation, the datastore filtered out invalid entities by checking the transaction metadata. If an entity was part of a transaction in progress, or if it was part of a transaction that had failed, the datastore would use ZooKeeper to determine a valid transaction ID for that entity. It would then use that valid transaction ID to fetch the correct entity data from the journal table.
BEGIN TRANSACTION 3 PUT GREETING:003 ROLLBACK TRANSACTION 3
In the above example, the datastore would fetch the entry for Greeting:003 from the entities table. It would then use data stored in ZooKeeper to determine a valid transaction ID for this entity. Since the valid transaction ID for this entity (Transaction 2) differs from what the datastore fetched from the entities table, the datastore would use the entity data from key Greeting:003:002 in the journal table to satisfy the GET operation.
The most significant downside of this approach to transactions was managing indices. If a PUT or DELETE happened during a transaction, the datastore could not alter the corresponding index entries because it had no way of ensuring that they could be restored to their original values if the transaction were to fail.1 Therefore, the datastore needed to validate every index entry before using it during queries.
In the long run, keeping these invalid index entries around made queries extremely inefficient. In particular, queries based on properties like “last modified” timestamps could result in the datastore fetching a large number of index entries before finding even a single valid one. Therefore, the datastore relied on a separate “groomer” process that regularly scanned through the entire index space to remove invalid index entries.
Another downside to the old approach was the number of ZooKeeper read operations required during a typical query. Oftentimes, the datastore needed to make multiple ZooKeeper GET operations for every entity that was considered during the query.
Lastly, having a separate persistent data structure that was essential for the integrity of application data presented an additional opportunity for data loss. If the transaction blacklist in ZooKeeper were ever corrupted or lost, there would be no way to tell which entries in the entities table were valid.
Transactions in 3.0
To address these downsides, we removed the journal table and the ZooKeeper blacklist from 3.0. Instead, we keep track of uncommitted entity data in a separate transactions table. On COMMIT, this transaction data is written to the entities table and the index tables in an atomic fashion. We use Cassandra’s logged batch statement feature to achieve this atomicity.
This approach solves several problems for us. By waiting until commit time to write index entries, we can count on them to accurately reflect the data in the entities table. This speeds up queries by reducing the number of index entries the datastore needs to fetch before satisfying the query. It also reduces the total size that the indices occupy. While the previous approach suffered from performance degradation over time, our benchmarks demonstrate that the new approach achieves consistent performance.
Since we eliminated the entity validation step, the datastore no longer needs to query ZooKeeper every time it fetches a result from the entities table. For operations that return a large amount of entities, this previously presented a substantial overhead. This feature also allowed us to remove the journal table and dramatically reduce the disk space that the datastore requires.
Finally, the migration of transaction metadata from ZooKeeper allows us to keep all application data contained in Cassandra. A loss or corruption of ZooKeeper data no longer results in a loss of application data. As a convenient bonus, this feature simplifies our backup and restore operations. Instead of requiring the datastore to be in read-only mode while we backup both ZooKeeper and Cassandra data, we can simply perform a Cassandra snapshot.
By keeping the datastore responsive and removing the need for a slow, CPU-intensive grooming process, we can scale deployments to larger sizes than before.
1. Ideally, these indices would be altered at commit time. However, the datastore did not have a way to update both the transaction metadata in ZooKeeper and the index entries in an atomic fashion.
Now go check out AppScale 3.0
Is this is your first time trying AppScale? Click below and see it in action in just a few minutes!