Backup & Recovery on AppScale

Posted by Meni Vaitsi on 8/18/15 8:28 AM

Find me on:

How many backups does it take to help you sleep well at night?

Nobody likes to spend time on Backup and Recovery. That’s because Backup & Recovery solutions are often a nightmare to implement and a pain to maintain. If you are an AppScale user or just curious, keep on reading. AppScale offers a set of different options when it comes to backing up and restoring your application(s) and your data.


Google App Engine and AppScale: Bulkloader

First off, AppScale exposes the bulkloader that resides in the Google App Engine SDK. The bulkloader can be used for data downloads/uploads on both Google App Engine and AppScale deployments. It can be a slow process, but, with a couple of tweaks on the bandwidth and the batch size, you can go a long way for small datasets. You can find instructions on how to use the bulkloader here.


AppScale to AppScale

When you are working exclusively with AppScale deployments, there are more suitable options available.


Protocol Buffer Level

You can use the command-line backup and restore tools that are part of AppScale’s AppDB layer. Those pieces of code were written to facilitate rapid development and testing on AppScale environments. They work well with test datasets that developers need to play with and move around in a quick and easy fashion. These tools leverage the AppDB interface to pull out entity protocol buffers and dump their serialized versions into local files.


Raw Data Level

As many of you already know, we have a more advanced Backup & Recovery solution in the works, that handles application and data backups at a lower level in the system. Using Cassandra as the default backend database for AppScale, this approach allows you to take full cluster backups of large datasets in a short amount of time that is primarily dominated by network bandwidth. The cool part is that you can perform local backups on the AppScale machine itself, move the backups to the location of your choosing, or even use Google Cloud Storage as a backend store. You can achieve the latter by specifying your own Google Cloud Storage bucket that AppScale can use for uploading your data.

For the tech-savvy, here is a digest of the process:

A backup/restore action is initiated by invoking a request to a new AppScale-native service called Hermes (for more details on Hermes, click here), which sits on the head node of the AppScale deployment. In the case of backup, the head node backs up the source code and triggers Cassandra and Zookeeper backups on the corresponding nodes.

Each node compresses the backup files and can either store them locally:

Backup to Local Filesystem
Backup to local file system

or upload them to one of the supported backend stores. The current implementation offers the ability to upload your backups to Google Cloud Storage!

Backup to Google Cloud Storage
Backup to Google Cloud storage

In the near future, we will also be releasing this feature as part of the AppScale Management Portal, which will allow users to perform backups with a single click and restore those backups from within the AppScale Portal.

Do you have an interesting story to share or a question to ask? Send me an email at or join our IRC channel, #appscale on freenode.

Topics: AppScale News, Best Practices

Subscribe to Email Updates