Pull Queues now in AppScale and anywhere
Oh the TaskQueue API… You know what it is, you’ve most likely used it at some point; perhaps you’ve based your entire system on it. Now you can take it to a whole new level with AppScale.
The App Engine TaskQueue API is an amazing tool for executing work in the form of tasks at a later stage, asynchronously. It is a valuable asset in improving application latency, user experience, and so much more. If an app needs to process workloads in the background, tasks can be added to backend task queues.
There are two types of task queues in the App Engine ecosystem, push and pull. Push Queues are the simplest; you push a task into the queue and it will be processed by an application worker at the desired rate (specified in your queue config file). With Push Queues you have very little to worry about, mainly processing rate and max concurrency. Defining these values in a smart way can actually prevent bursts of load and save you quite a bit of money in Google App Engine (GAE). In AppScale, on the other hand, you don’t need to be stringent with those rates because, in contrast to GAE, you pay for virtual machine uptime and not per frontend instance hour. That means that as long as your virtual machine can process the load with multiple frontend instances, tasks can go as fast as they come.
AppScale implements the Push Queues API based on RabbitMQ and Celery and has had many happy developers using it over the past few years.
Now imagine a scenario where your app needs more control and flexibility over when and where a task executes. Pull Queues allow developers to create their own customized task queue system. They do not dispatch tasks automatically; application workers are responsible for creating and leasing tasks. However, with great queue flexibility comes great queue management. When a task is leased, it is associated with a deadline. The application must delete or extend the lease of a task before that deadline, otherwise the same task will become available for processing to another worker.
Implementation in AppScale
The Pull Queues API comes with neat features such as the ability to specify a delay before a given task is available for processing, or a tag that identifies one or multiple tasks so you can easily group them together, and even the ability to lease and delete tasks by name. All that seemed a lot more like a database use case rather than a queuing system use case to us, so we implemented Pull Queues on top of Cassandra. It can hold up to millions of pull tasks, index them by ETA, and filter by tag.
The TaskQueue API convenience doesn't end here though. Pull Queues API is also available via REST endpoints that allow external clients to interact with the service. As a result, you have the power to produce and consume tasks from anywhere, provided that there is a backend application sitting on either GAE or AppScale, or both! For example, you can have your main app producing tasks in an App Engine environment (GAE or AppScale) and one or more clients consuming a certain type of tasks or any of those tasks from within Google Compute Engine, Microsoft Azure, AWS EC2, etc. Your options are virtually unlimited.
Pull Queues are now available in AppScale 3.1 with full Python and REST support, and will be released in general Beta for Java as well, in AppScale 3.2.