Serverless features are great for small tasks
Cloud computing using serverless features has gained widespread popularity. Their appeal for implementing new features comes from simplicity serverless computing. You can use serverless functionality to analyze an incoming photo or process an event from an IoT device. It’s fast, simple and scalable. You don’t need to allocate and maintain compute resources—just deploy the application code. Major cloud vendors, incl AWS, Microsoftand Googleall offer serverless features.
For simple or ad hoc applications, serverless features make a lot of sense. But are they suitable for complex workflows that read and update persistent, mission-critical data sets? Imagine an airline that operates thousands of flights every day. Scalable NO-SQL data store (e.g Amazon Dynamo DB gold Azure Cosmos DB) can store data describing flights, passengers, baggage, gate assignments, pilot scheduling, and more. While serverless functions can access these data stores to process events such as flight cancellations and passenger rebookings, are they the best way to implement the large volumes of event processing that airlines rely on?
Problems and limitations
The very power of serverless features, namely that they are serverless, creates a built-in limitation. By their very nature, they require overhead to allocate computing resources when they are invoked. They are also stateless and must retrieve data from external data stores. This further slows them down. They cannot take advantage of local caching to prevent data movement; data must always flow over the cloud network where serverless functions are running.
When building large systems, they don’t offer serverless features or a clear software architecture to implement complex workflows. Developers must enforce a clean “separation of interests” in the code that runs each function. When building multiple serverless features, it’s easy to fall into the trap of duplicating features and developing a complex, unmanageable code base. Serverless functions can also generate unusual exceptions such as timeouts and quota limits that must be handled by application logic.
Alternative: Move code to data
We can avoid limiting serverless functionality by doing the opposite: moving code to data. Consider using scalable in-memory computing to run code implemented by serverless functions. In-memory computing stores objects in primary memory distributed across a cluster of servers. It can invoke functions on these objects by receiving messages. It can also retrieve data and store changes in data stores such as NO-SQL stores.
Instead of defining a serverless function that works with remotely stored data, we can just send a message to an object stored on the computing platform in memory to execute the function. This approach speeds up processing by eliminating the need to repeatedly access the data store, reducing the amount of data that must flow over the network. Because in-memory computing is highly scalable, it can handle very large workloads involving huge numbers of objects. Highly available message handling also eliminates the need for application code to handle environment exceptions.
In-memory computing offers key advantages for structuring code that defines complex workflows by combining the strengths of data structure stores such as Redisand acting models. Unlike serverless functionality, an in-memory data grid can limit the processing of objects to methods defined by their data types. This helps developers avoid deploying duplicate code to multiple serverless functions. It also avoids the need to implement object locking, which can be problematic for persistent data stores.
Comparison example
To measure the performance differences between serverless functions and in-memory computations, we compared a simple workflow implemented using AWS Lambda functions with the same workflow created using ScaleOut’s digital twins, a scalable in-memory computing architecture. This workflow represents event processing that an airline can use to cancel a flight and rebook all passengers on other flights. It used two data types, flight and passenger objects, and stored all instances in Dynamo DB. The event controller triggered the cancellation of the flight group and measured the time required to complete all booking changes.
In the serverless implementation, the event handler fired a lambda function to cancel each flight. Each “passenger lambda” rebooked the passenger by selecting a different flight and updating the passenger’s information. It then ran serverless functions that confirmed the removal from the original flight and added the passenger to the new flight. These features require the use of locks to synchronize access to Dynamo DB objects.
The digital twin implementation dynamically created in-memory objects for all flights and passengers when those objects were accessed from Dynamo DB. Flight objects received cancellation messages from the event controller and sent messages to the passenger digital twin objects. The passengers’ digital twins rebooked by selecting a different flight and sending messages to both the old and new flights. Application code didn’t need to use locking and the in-memory platform automatically kept updates back to Dynamo DB.
Performance measurements showed that the digital twins processed 25 flight cancellations with 100 passengers per flight more than 11x faster than the serverless function. We couldn’t scale the serverless features to meet the target workload of canceling 250 flights with 250 passengers, but ScaleOut Digital Twins had no trouble handling twice that target workload with 500 flights.
Summary
While serverless features are well suited for small and ad hoc applications, they may not be the best choice when building complex workflows that must manage many data objects and scale to handle heavy workloads. Moving code to data using in-memory computing may be a better option. It increases performance by minimizing data movement and provides high scalability. It also simplifies application design by taking advantage of structured data access.
To learn more about ScaleOut Digital Twins and test this approach to managing data objects in complex workflows, visit: https://www.scaleoutdigitaltwins.com/landing/scaleout-data-twins.