APRIL 13, 2020
17 min read
Nowadays, market players within even the most regulated industries like banking are becoming more and more interconnected with each other. Card processing, account management, online investment platforms, budgeting apps, “traditional” and challenger banks — all these solutions are often closely tied. Open API initiatives like Open Banking push this process forward, forcing financial solutions to give access to the most commonly used features for everyone, using the third-party authentication providers.
Nuances and Common Problems Within the Typical Banking Software Development Process
Today we want to talk with you about the most complex platforms that underlie the processing, keeping the info about accounts, balances, transactions, and so on — in other words, being the core financial processor. It’s crucial to keep them consistent not only because of strict regulations but also due to the risk of license revocation for the banking solution itself since it potentially leads to money loss and data corruption in the linked solutions as well, all while ruining your reputation. One hardly wants his money to be hidden in the mists of transaction processing, right?
Consequently, one of the main problems in core fintech software development is to keep data consistent even in the face of distributed transactions and seamless scaling. Moreover, app performance should be held at a decent level (yeah, the CAP theorem again), and the solution itself should be reliable for both business-to-consumer (B2C) and business-to-business (B2B) clients and partners.
In a bid to keep trade-offs acceptable for business, developers combine technologies and concepts all the way: dividing large solutions into smaller ones, using message buses, gRPC, actor model, event sourcing, dedicated (often NoSQL) storages with data projections, and more.
However, what if there is an easier way to address all the bottlenecks mentioned above? — Working closely with both fintech startups and reputed banks and building custom solutions of diverse complexity, we’ve tried and tested a myriad of technologies, tools, and approaches. In this article, we’d like to share our experience and opinion about one of such novelties in the tech world. Aside from other significant Microsoft software solutions, the Microsoft Orleans framework with its virtual actors may come in handy.
Introduction to the Actor Model and What All the Fuss Is About
First, let’s start with a very brief overview of the concept and avoid digging deeper into technical nuances. As Wikipedia states:
“The actor model in information technology is a mathematical pattern of parallel computations that treats ‘actor’ as an elementary item of concurrent computing. In answer to a message it gets, an actor can: make targeted decisions, send more messages, generate more actors, and decide how to reply to the succeeding message. Additionally, actors may tweak their own state, but can only affect each other implicitly via messaging (evading lock-based synchronization).”
Simply put, actor-based frameworks allow actors to communicate with each other directly in a concurrent environment, leveraging a distributed lock and providing single access points for domain objects.
Although the actor model history dates back to 1973, and the Erlang programming language emerged in 1986, the authors of the language learned about actors only at a later date. Still, this language, along with a set of specific libraries for distributed processing, is reasonably seen as the first widely used actor-based software solution since it operates granular entities that don’t share state and only exchange immutable messages. The similarity is dictated by the issues that both the actor model and Erlang were trying to resolve.
Thus far, the most known actor framework is Akka, introduced in the mid-2000s, although initially written for JVM, there’s also a .NET version. Among all Microsoft software development solutions, the Orleans actor framework is quickly catching up, being originally presented in 2010 with Phil Bernstein as one of the authors (who also co-designed Azure SQL DB engine). BioWare, the division of Electronic Arts, was so inspired by the virtual actors’ idea that they’ve implemented their own framework for Java, namely Orbit.
A Detailed Overview of Microsoft Orleans and Why It’s a Good Pick for Financial Services App Development
So why Orleans? — The main distinction from Akka (the most recognized actor framework at the moment) lies in the fact that Grains (actors’ definition in Orleans) are virtual. Consequently, software engineers don’t have to manage the lifecycle of any Grain manually, and the Microsoft framework does it for them. Moreover, getting a Grain reference by ID doesn’t even mean that the particular Grain itself is instantiated anywhere! Sounds magical, right? Let’s take a look at the image from the official docs:
In simple terms, the lifecycle of any Grain is as follows:
- Activation happens on a call to any Grain interface method;
- Being active in memory, stateful Grains can modify their state;
- Deactivation occurs when a particular Grain isn’t called for a while for resource consumption efficiency or if a Silo — a server definition in the Orleans cluster — with this Grain is stopped;
- Finally, if a particular Grain has a state, it persists.
This way, we can use Grains as if they’re always active — Orleans does the rest for us!
MS Orleans provides a stable Grain identity in the form of a string, Globally Unique Identifier (GUID), long, or a composite one. So far, this information is enough since we’ll come back to this topic later.
Dashdevs Microsoft Orleans Tutorial: Features overview
Transparent business layer. Distributed messaging. Grain placement
One of the most prominent features of actors is distributed messaging. Simply said, it looks like this:
- We get Grain by ID, having a reference to the proxy interface as a result. At this stage, the actual Grain object may not even be instantiated anywhere;
- We call a Grain reference interface method;
- From a developer’s perspective, it’s just an async method call, but under the hood, there’s an HTTP request with a serialized message, associated with a Grain interface method;
- If any, the proxy uses the already activated Grain; otherwise, the Grain is instantiated on a Silo which supports it;
- Actual activated Grain processes the deserialized message and sends the HTTP response with the serialized result back;
- Grain client deserializes the message, and you get the result as if an ordinary local in-memory object did this.
Since you can refer to actors from anywhere across the cluster, it makes any actor fully transparent throughout the system. So far, there is no need for intermediate communication implementation (e.g., REST API) if business requirements specify it. As a result, it simplifies fintech app development, all while reducing time and cost. It’s even more significant if we consider the need for service layer implementation for controllers, which at least becomes much more lightweight.
By default, Grains are placed randomly on any Silo that supports their implementation. This behavior can be configured with additional standard options on the per-Grain basis, choosing either to prefer local placement (where locality goes after caller) or load-based one (using the less loaded Silo for activation). Finally, one can define their own strategy and choose a Silo following a certain algorithm.
For instance, instead of calling the communication layer, which will then call the service layer (even for the simple, close-to-atomic actions like account modification change in a local scope), and finally making changes in the storage, you just call a method of the actor representation of an account, leaving message transfer process to the framework. This message will be delivered wherever the Grain is located, so you don’t even have to bother with it.
Distributed lock. Reentrancy
Another powerful feature of actors is a distributed lock. In the Orleans’ Grains, each method is not reentrant by default. As a result, if some Grain method is already in progress of processing the call, each subsequent query will wait for the previous one to finish. Moreover, this cross-platform app framework offers a way to manually configure reentrancy with method decorators, e.g., read-only/query methods are reentrant while state change methods are not. This is the cornerstone of keeping the data consistent, because, taking the .NET ecosystem as an example, traditional REST API calls with underlying EntityFramework Core ORM (the most common choice) will at least force you to deal with optimistic concurrency, or even use a third-party distributed lock provider (like RedLock) for the most critical parts.
Getting this feature right out of the box applied to the domain actors directly is very convenient since it helps to avoid problems in various use-cases: transaction duplication, concurrent account balance modification issues, and others.
Managed lifecycle. Persistence
Managed lifecycle is extremely useful, and was mostly covered in the ‘Common concepts’ section. Persistence mechanisms are entwined with it. By default, special Grain state object reads and writes don’t imply the underlying storage calls, so everything happens in-memory before a Grain is persisted on deactivation or activated again. This way, the performance is kept at the top level. Still, if you’re concerned about possible failures that may cause data loss between persistence cycles, you can call the WriteStateAsync method to force entry into the underlying storage.
Considering our primary focus on banking app development and taking into account industry-specific needs, this was obviously our choice though we use framework-based persistence in a minimal scope due to the reasons described below.
Distributed ACID transactions (and what we use instead)
Distributed transactions are seen as the most painful development issues any software solution may ever have. And one may be impressed by what Microsoft Orleans promises since properly configured, your transactional objects don’t even have to perform rollbacks on failures. Also, there’s even no such option because the rollback process is completely automatic. Moreover, you don’t have to define your transactions in a traditional way at all! So, how is that even possible?
First, to make use of transactions, you have to decorate your methods with specific attributes which define the transactional behavior, introducing a kind of cascading:
- Create or join. They either initiate a transaction on their own or merge into an existing transaction if called from a method already involved in one.
- Create. Always create a new transaction, even if called within a different context.
- Join. Only append to an existing transaction if called from a method already involved in one, but don’t create new transactions alone.
- Suppress. A call is not transactional, though it can be performed from a transaction, and the context will be suppressed.
- Support. A call is not transactional but can be performed from a transaction and access relevant context.
- Not supported. A call is not transactional and can’t be performed from a transaction, throwing an exception in such a case.
Secondly, you should use special transactional state objects to hold and change the state of all transactional Grains, which prevents us from using this feature in production, coming with restrictions and uncovered applications.
Given that Orleans data storage mechanism suggests unified serialization of objects and has a variety of standard and open-source storage providers, it requests the use of nearly all your business data (to support transactions), and this is where the drawbacks journey begins:
- Are you ready to provide your critical data to the persistence mechanism, developed to hide the storage formatting details in the first place? Following Murphy’s law, it will be extremely difficult (if not impossible) to fix anything if something goes wrong. There is a way to implement your own storage provider, and it’s not only a way of significant fintech development complication (which we’re trying to avoid), but also it doesn’t guarantee the resolution of the problem. It’s not even about the persistence format unification usage (like JSON), but more about a concept of hiding the implementation details itself - in fintech we want (and need) to have full control over everything, especially over data.
- Even if you decide to use ACID transactions and implement corresponding data storage requirements, there’s a huge stopper for almost any existing core financial processing system. For instance, if you already have data (and probably large volumes), the migration process will look like walking on thin ice even from a theoretical perspective, presupposing significant risks.
- As we’ve mentioned earlier, the main trend in the financial sector development is interconnection since we often call services not only from our own platform but also from the externals, managed by other organizations. Obviously, these calls just can’t follow the required transactional rules, because we don’t own the state. Consequently, we still have to manage fallbacks, and Orleans doesn’t help here, forcing us to implement (or integrate) yet another solution. Also, there are special cases: when we call an outside service and wait for asynchronous webhook response that can cause the reversal of the whole transaction (even if it’s already finished); or we need to put the transaction on hold while it’s manually approved or rejected by the authorized person (e.g., if automatic AML check failed).
Considering the Microsoft Orleans use cases and nuances mentioned above, we’ve built our own Saga-like implementation that resolves all these problems and helps our engineering team to deal with distributed transactions, successfully meeting our needs and goals. Initially, we’d implemented it before using MS Orleans, and fortunately, it turned out to be good enough, thus requiring minimal effort with making an additional Orleans layer on top of it. The basic idea is somewhat similar to this community project, but the algorithms are much more advanced. Persistence is used only to hold states of commands and transactions, while all business data is stored in a regular way. This may be of use for multiple Microsoft development software aspects.
Stateless workers. Grain services
Though Grains are mostly considered to be domain objects, there are also Grains that aren’t.
The first one is a Stateless Worker, which is referenced by default value ID, e.g., 0 for long, null or empty for string, and more. As the name suggests, it doesn’t have state and is intended for cross-grain activities, such as management tasks of any kind or aggregation of financial data like transactions. What’s interesting about this type of Grain is that it can be load balanced locally, making as many activations as a CPU cores number by default.
The second one is a Grain Service, which requires a bit tricky configuration (both the service itself and a dedicated client) and acts more like a regular service, making it possible to be injected in a DI container without the need to use a Grain client and get a Grain reference. It starts with the Silo and stops with it, too, i.e., it always works as long as the Silo does, and there’s only one instance of it. The purpose is to perform any kind of Grains orchestration, meantime a standard Orleans Reminder service may serve as an example.
Timers and Reminders
Both are dealing with repeatable actions, but the concepts are quite different, and it’s tricky to distinguish the goals of each by plain names.
- Timers are registered per Grain activation. They’re executed only while the particular Grain is active and are intended for short periodicity (seconds).
- Reminders, on the other hand, activate the Grain if there’s a need to execute some action, and are intended for longer periods (minutes or more).
In the context of banking application development, reminders are more valuable since they help to schedule periodic transactions (regular client payments for services, bond payments, deposit interests, and others), daily post-processing, ETL, and anything else your business requires to be done in a repeatable manner.
Microsoft Orleans Performance Review: Other Features
Here, let’s get a brief overview of other functions that are either minor or require too many technical details.
As with other features, this MS cross-platform app framework tries to bring innovation forward, so if you’re interested - dig in. Unlike well-known and established stream processors, here, you can find virtual streams, specific processing for each data item, a reactive programmatic model, and more.
It’s worth considering for businesses with large volumes of real-time data that is constantly updated, like trading.
Microsoft Orleans has a journaling feature for Grains that allows you to implement Event Sourcing. If you’re familiar with the technique and interested in how it can be achieved in Orleans - please, read this document.
Request context. Tracing
The framework allows you to access and use the request context (similarly to HttpContext), which is a simple key-value storage abstraction, still giving the ability to perform efficient tracing, logging, and other activities.
Clustering, scaling and fault tolerance
Microsoft Orleans supports clustering (and the cluster is a typical deployment scheme), including geo-distribution. The rule of thumb is to have at least two instances of the same Silo to sustain crashes. Each Silo registers itself in the cluster on start and deregisters on stop. However, a crash may break this process, so there’s an automatic procedure of dead Silos metadata cleanup. Please, refer to this article for detailed information.
Microsoft Orleans makes it possible to co-host with regular ASP.NET Core web services, so that you don’t need a dedicated server for APIs for front-end or external systems, encapsulating all the business logic in Orleans. The following sample project can be a starting point to know more.
Can MS Orleans Become the Best Cross-platform Development Framework? — Things to Consider (Carefully)
Delivery guarantees and failure retries
Orleans Microsoft provides rather weak at most once delivery guarantees by default, effectively meaning that failures should be handled like for any other HTTP requests (which, in fact, they are).
Grain interface definition
Grain interface definition requires all members to be async methods, so you can’t have properties there, which is a programmatic limit, still minor and acceptable in comparison to the benefits of using Orleans in general.
Dealing with data
This topic is mostly covered in the ‘Distributed ACID transactions’ section. For us, as a fintech software development company, it was the most important thing to consider among all others. If you develop a solution from scratch and a uniformed data format is OK for your business needs, then relying on a framework persistence may be an option; otherwise, it’s hardly possible.
Grain identity vs. Grain accessibility
While Grains have stable identity definitions, in fintech, it’s not always the case. We often have entities that hold many external IDs, and different operations have only one of these IDs available at a time to initiate processing. Sounds familiar, right? Unfortunately, there’s no silver bullet, and you need to implement an intermediate layer to get the primary internal ID, which entity is bound to in Orleans by a set of other IDs with at least one of them to have a value.
Deployment and configuration. Server faults and cluster metadata cleanup
Silos register themselves in the cluster by IPs. Consequently, your cluster should be in the same network to enable Silos communication with each other. Please, refer to official docs and the sample project for more info.
As we’ve mentioned in the ‘Scaling and fault tolerance’ section, there’s an automatic process of cluster metadata maintenance, and it can be configured. You should consider various use-cases and define the values that fit your needs most (or just leave the defaults).
Clusters support heterogeneous Silos (i.e., where different servers support different sets of Grain implementations), but still, you have to reference all the Grain interfaces in each Silo, so Grain interfaces are, in fact, contracts, and you should treat them respectively.
Platform (in)dependence, Storage and Stream providers
While Microsoft Orleans is marketed as a platform-agnostic solution from the very beginning, the reality is more complex than that. Discovering the documentation and features, soon, you’ll find that Microsoft-backed Azure PaaS support always comes first with a little room for others. The ADO.NET storage providers seem to be the most reliable choice outside of the Azure ecosystem for now, and Stream processing can hardly be used at all in this case. There are some contribution providers as well, though they should be carefully reviewed and tested before production usage.
The documentation is still a rather weak point of MS Orleans (at least for now). Though it covers most of the needs, incomplete and outdated information is often the case, so making a production-ready system requires additional investigation (community articles, samples, and GitHub issues will be useful).
Could we just use a service bus or gRPC?
There’s no quick answer because they are completely different instruments.
For example, if you already have a service bus in use with Sagas support out of the box (like MassTransit), it’s probably the right choice to just use another feature of the existing software. It can solve the majority of common development problems among typical banking applications with event-based processing by introducing async messaging and decoupling services but still has a window of opportunity for concurrent storage writes, lacking a distributed lock.
There is another drawback associated with existing or especially new systems (both which are the subject of software transformation) aren’t using this yet since regular event-based communication comes with a bulky programmatic model that requires much more time for integration and support.
gRPC, on the other hand, is a replacement for the communication layer, while MS Orleans implies we can just get rid of it for cross-service communication. It benefits from using HTTP/2 streaming and connectivity, therefore ensuring drastically increased performance and fit for high load in comparison to REST over HTTP. Protobuf definitions, on the other hand, allow not only to decrease the payload (compared to JSON) but also to use the same definitions for multi-platform message generation, leveraging the polyglot nature of microservices. So gRPC, being a great alternative to REST, doesn’t provide us with any benefit from the actor model (and Microsoft Orleans framework in particular) — it’s just a tool with a totally different goal.
Microsoft Orleans is a very interesting piece of software, having a variety of features to address different kinds of problems, including the ones we face during fintech product development — consistency and reliability, as well as providing the distributed lock and single access points to domain objects. A programmatic model and such features as distributed messaging and co-hosting make it very comfortable to work with. Although it has constraints like rather weak message delivery guarantees, specific persistence mechanics, and limited infrastructure providers’ support, the pros are still very strong, and thus, it can be reasonably called one of the best cross-platform app development frameworks.
Finally, we suggest reading the whitepaper by Microsoft Research to understand the ideas behind Microsoft Orleans. The framework is dynamically progressing, and by the time you read this, there may be significant fixes and improvements already released.