Zeebe is a workflow engine for orchestrating microservices, using BPMN models to define complex workflows that can be monitored operationally using a graphical representation of the process state.
Sometimes you have external resources that your microservices access - maybe a database or a rate-limited or pay-per-call API.
In these cases, you want to batch up calls that make logical sense and can happen at the same time. These calls are not correlated in your BPMN model. The workflows know nothing about each other, but they do require access to the same resources.
In this case, the worker that accesses the external resource is the point where correlation can take place. It knows when these calls are taking place, and can buffer and batch these calls, dispatch them to the external system, then update each of the processes.
ZBBatchWorker class encapsulates this functionality.
It buffers jobs in the worker until either a minimum batch size is reached, or the maximum batch timeout is reached, then executes its job handler function on an array of jobs. The
ZBWorker, in contrast, executes its job handler immediately on a single job.
Creating a ZBBatchWorker
Here is how you create a
Note that the
timeout value must be greater than the
timeout is the amount of time that the worker tells the broker it needs to complete the task.
In this example, for each job that the worker pulls from the broker, it tells the broker that it needs 80 seconds to complete it, max. After 80 seconds, the broker will make that job available to another
activateJobs call from a worker. So the batch worker should make sure it gets the job done before then.
If you set the
timeout less than the
jobBatchMaxTime, the worker will emit a warning message to the logs. I considered having it throw, but what do I know - someone may come up with a crazy, innovative way to exploit this behaviour, and I'm all about that.
You want to ensure you allow enough time to deal with any retry logic (related to the external API) that you put in your handler.
The ZBBatchWorker handler
complete object - with methods to signal the completion state of the job, and an instance of the
worker - to allow you to log from the handler using the worker's configured log.
ZBBatchWorker handler, on the other hand, takes an array of jobs, and an instance of the worker.
To allow you to signal the completion state of each of the individual jobs in the batch, the jobs in the array have the complete methods attached to them.
So, here is a handler that demonstrates doing nothing more than completing each job with
You need to carefully manage exceptions in your handler.
ZBWorker will fail a job if an unhandled exception takes place in the job handler, and it has an option to fail the entire workflow if you want that behaviour.
ZBBatchWorker, we don't know what state the job batch is in, or how your handler will act after throwing an unhandled exception. So, no action is taken by the library.
It also cannot catch errors deep in the handler code, for example inside
forEach predicate functions, so you gotta manage that.
Internal Implementation Details
There is no API in the broker for batching jobs. You can request a maximum number of jobs at a time, but there is no guarantee about a minimum. You cannot send a request that will be fulfilled when there are enough available jobs.
So, all the management of the batching takes place on the client.
To implement this, I used a pattern that I described in the “Valid Use of Variables” section of the article “Shun the Mutant - The case for
const", wrapping the batching functionality into a small state machine with an API:
Here is a BPMN diagram of the state machine:
And the code that implements it:
It has two variables in it, but these are “true variables”. It is a state machine, so it has to model actual variables in the system. One represents the batched jobs. The other is the batch timer.
The batch timer is started when the first job is pushed into the batch array. From that point, two things will trigger the execution of the job handler on the array of batched jobs: either a job will be pushed in via the
batch API method that takes the count of jobs to the maximum batch size, or the timer will trigger the execution.
That's not a lot of code, but a lot of thought has gone into it.
I am still slightly concerned about race conditions in this code at high volumes. I gotta admit - my knowledge of how the Node.js event loop and timers work is not complete. Even after reading through The Node.js Event Loop, Timers, and
process.nextTick(), I'm still not 100% clear if this code is susceptible to interrupts that cause its internal state to become inconsistent.
The issue is that two different triggers can cause the execution (timer interrupt, or call to
batch), and consumers can call
batch while the execution is taking place.
I think that I have guaranteed synchronous execution through
execute up to the handler invocation before another call to
batch is processed - but I'm not sure.
I'm pretty sure that if the handler does any I/O, then at that point the execution will yield to the event loop - and all my state machine's mutating code happens before then.
I pass in a copy of the batched jobs and scrub the state's batched jobs before calling the handler, so that should take care of it.
Anyway, you'll be glad to know that I'm thinking like this about your run-time state and data while you use the library, and I will get completely flat on this.
Managing state and time are the biggest sources of complexity in an application, so I've encapsulated all of that with respect to this concern inside a state machine.
The complexity of managing this state is not leaked out into the application, and although the race condition that I'm concerned about is not easy to test for, it is easy to reason about in such a small bounded context. So all I need to do to be sure that this works as advertised is to increase my understanding of the event loop.