Hide
Scraps
RSS

Asynchronous programming in Nim

3rd September 2021 - Guide , Nim , Programming

In the previous article we got a small primer on various kinds of multi-tasking, in this article we'll have a look at possibly the simplest form of this. Namely asynchronous execution. As discussed in the last article asynchronous execution is a way for our programs to tell the hardware to do something, and then do something else while waiting for it to complete the operation. This is great when we want to for example read a file into memory, or communicate over the network. Both of these tasks are extremely slow compared to running computations on the CPU, and we can get a lot done while we wait for them to complete. As mentioned in the primer, asynchronous execution really only helps us speed up our programs if they are bound by IO, but for these tasks it is almost essential.

Nims asynchronous dispatcher

Nims implementation of asynchronous execution is done within a library that is split into a couple parts. The core of this is in the asyncdispatch module in the standard library. It works by creating a global dispatcher (technically one per thread) that is responsible for running the procedures that are registered with it. The procedures are tagged with the {.async.} pragma which we will look more into later, and when calling other asynchronous procedures they use the await keyword to wait until that procedure is done. This means that when calling await the procedure tells the dispatcher that it is done executing for now and that it is free to execute other things until the awaited thing completes. This kind of multitasking is called "co-operative multitasking" because each execution unit actively tells the dispatcher when it can change what is executed. Execution can only ever change when we actively tell it that we allow it. So let's look at an example:

import asyncdispatch # This is what provides us with async and the dispatcher
import times, strutils # This is to provide the timing output

let start = epochTime()

proc ticker() {.async.} =
  ## This simple procedure will echo out "tick" ten times with 100ms between
  ## each tick. We use it to visualise the time between other procedures.
  for i in 1..10:
    await sleepAsync(100)
    echo "tick ",
         i*100, "ms ",
         split($((epochTime() - start)*1000), '.')[0], "ms (real)"

proc delayedEcho(message: string, wait: int) {.async.} =
  ## Simply waits `wait` milliseconds before echoing `message`
  await sleepAsync(wait)
  echo message

let
  delayedEchoFuture = delayedEcho("Hello world", 550)
  tickerFuture = ticker()

waitFor tickerFuture and delayedEchoFuture
echo delayedEchoFuture.finished

When run, this will output:

tick 100ms 100ms (real)
tick 200ms 200ms (real)
tick 300ms 300ms (real)
tick 400ms 401ms (real)
tick 500ms 501ms (real)
Hello world
tick 600ms 601ms (real)
tick 700ms 702ms (real)
tick 800ms 802ms (real)
tick 900ms 902ms (real)
tick 1000ms 1003ms (real)
true

What happens here is that the call to delayedEcho and ticker returns a Future[void] object. This happens instantly and no waiting happens here. This object encapsulates the return type of the call (in this case void since we don't return anything), and gives us a reference we can use to ask the dispatcher whether our call has completed or not. This is what happens on the last line, where we check if the delayedEcho future object has been completed or not. But futures can't get resolved by themselves, we need to actually run the dispatcher in order for any of the code registered with it to run. Remember all of this is still running in a single thread of execution. There are many ways to run the dispatcher, but in this case it is done by the waitFor call. When we run waitFor the dispatcher will run in a loop until the given future is completed and then return any potential value of that future. In this case we use the and procedure that takes two futures and creates a new future that will be completed when both of the given futures have completed. When this future is done the program is done. Had we waited for only the delayedEchoFuture instead, none of the "tick" messages after "Hello world" would be shown as that procedure would never have been executed further after the dispatcher loop stopped as delayedEchoFuture completed.

Tick tock, repeating yields

As seen with the ticker a Future like this can yield its execution back to the dispatcher many times during its lifetime. If we take a look at the follow example (modified from a very similar example of goroutines) we can see how the dispatcher switches back and forth between tasks in this scenario:

import asyncdispatch # This is what provides us with async and the dispatcher
import times, strutils

let start = epochTime()
template timestampEcho(x: varargs[string, `$`]): untyped =
  for s in x:
    stdout.write s
  echo " ", split($(1000*(epochTime() - start)), '.')[0], "ms"

proc numbers() {.async.} =
  for i in 1..5:
    await sleepAsync(250)
    timestampEcho i

proc letters() {.async.} =
  for i in 'a'..'e':
    await sleepAsync(400)
    timestampEcho i

var
  n = numbers()
  l = letters()

waitFor sleepAsync(2500)
echo "main terminated"

This program will output the letters and numbers all jumbled up, but if we look at the timestamps we can see what is going on:

1 250ms
a 400ms
2 501ms
3 751ms
b 801ms
4 1002ms
c 1202ms
5 1252ms
d 1603ms
e 2003ms
main terminated

If we take a closer look at our two asynchronous tasks and map out their sleep intervals we can see how the dispatcher interleaves the execution and map it out nicely:

         0ms     250ms   500ms   750ms   1000ms  1250ms
           ┌───────┬───────┬───────┬───────┬───────┐
numbers    │       │       │       │       │       │
           └───────┴───────┴───────┴───────┴───────┘
                   1       2       3       4       5

         0ms          400ms        800ms        1200ms       1600ms       2000ms
           ┌────────────┬────────────┬────────────┬────────────┬────────────┐
characters │            │            │            │            │            │
           └────────────┴────────────┴────────────┴────────────┴────────────┘
                        a            b            c            d            e

         0ms                                                                               2500ms
           ┌─────────────────────────────────────────────────────────────────────────────────┐
main       │                                                                                 │
           └─────────────────────────────────────────────────────────────────────────────────┘
                                                                                     main terminated

                      400ms        800ms        1200ms       1600ms       2000ms
         0ms     250ms   500ms   750ms   1000ms  1250ms                                    2500ms
           ┌───────┬────┬──┬───────┬─┬─────┬──────┬┬───────────┬────────────┬────────────────┐
executed   │       │    │  │       │ │     │      ││           │            │                │
           └───────┴────┴──┴───────┴─┴─────┴──────┴┴───────────┴────────────┴────────────────┘
                   1    a  2       3 b     4      c5           d            e        main terminated

Time is slipping!

One thing you might've noticed is that the real time is slowly drifting off from the "expected" time. In the last example for example we are 3ms off when we get to the last letter. This is caused by two different things, first off each sleepAsync is of course relative to when it was called. So since our code takes some time to execute the accumulated execution time between these statements will start to show. If we wanted a more precise stepping we could either set up all the timers before we start looping, essentially making them all relative to the start of our procedure:

proc numbers() {.async.} =
  let sleeps = [sleepAsync(250), sleepAsync(500), sleepAsync(750),
                sleepAsync(1000), sleepAsync(1250)]
  for i in 1..5:
    let fut = sleeps[i-1]
    await fut
    timestampEcho i

Or we could dynamically calculate the sleep time dependent on what delay we have experienced:

proc numbers() {.async.} =
  var fsleep = epochTime()
  for i in 1..5:
    let d = (epochTime() - fsleep)*1000 - (i.float-1)*250
    await sleepAsync(250 - d)
    timestampEcho i

But you might still see some differences between the expected time and the sleep cycle times. This is due to another effect of the co-operative multitasking that asynchronous execution offers. Since we rely on each piece of the execution to actively yield control back to the dispatcher we can never be sure how long it will take until we get execution back. If a task takes longer to execute than the time left before our task should be executed we will see that our execution is delayed. The dispatcher itself will also spend some time executing, which is what we see in the above example. If we look at the last example with our added dynamic sleeping and simulate some work by adding a sleep(100) statement (from the os module, not the sleepAsync offered by asyncdispatch) in the loop of our numbers and letters procedures we will see that some of the letters and numbers are printed at the incorrect time:

proc numbers() {.async.} =
  var fsleep = epochTime()
  for i in 1..5:
    let d = (epochTime() - fsleep)*1000 - (i.float-1)*250
    await sleepAsync(250 - d)
    timestampEcho i
    sleep(100)

proc letters() {.async.} =
  var fsleep = epochTime()
  for i in 'a'..'e':
    let d = (epochTime() - fsleep)*1000 - (i.ord - 'a'.ord).float*400
    await sleepAsync(400 - d)
    timestampEcho i
    sleep(100)
1 250ms
a 400ms
2 501ms
3 750ms
b 850ms
4 1001ms
c 1200ms
5 1300ms
d 1601ms
e 2000ms
main terminated

If we look at our timing here and map it out again we can see why this happens, when one tasks execution (marked with the bold squares) overlap the start of another it will get delayed until the first completes:

         0ms     250ms   500ms   750ms   1000ms  1250ms
           ┌───────┲━━┱────┲━━┱────┲━━┱────┲━━┱────┲━━┓
numbers    │       ┃  ┃    ┃  ┃    ┃  ┃    ┃  ┃    ┃  ┃
           └───────┺━━┹────┺━━┹────┺━━┹────┺━━┹────┺━━┛
                   1       2       3       4       5

         0ms          400ms        800ms        1200ms       1600ms       2000ms
           ┌────────────┲━━┱─────────┲━━┱─────────┲━━┱─────────┲━━┱─────────┲━━┓
characters │            ┃  ┃         ┃  ┃         ┃  ┃         ┃  ┃         ┃  ┃
           └────────────┺━━┹─────────┺━━┹─────────┺━━┹─────────┺━━┹─────────┺━━┛
                        a            b            c            d            e

         0ms                                                                               2500ms
           ┌─────────────────────────────────────────────────────────────────────────────────┐
main       │                                                                                 │
           └─────────────────────────────────────────────────────────────────────────────────┘
                                                                                     main terminated

                      400ms          850ms      1200ms       1600ms       2000ms
         0ms     250ms   500ms   750ms   1000ms    1300ms                                  2500ms
           ┌───────┲━━┱─┲━━┳━━┱────┲━━┳━━┱─┲━━┱───┲━━┳━━┱──────┲━━┱─────────┲━━┱─────────────┐
executed   │       ┃  ┃ ┃  ┃  ┃    ┃  ┃  ┃ ┃  ┃   ┃  ┃  ┃      ┃  ┃         ┃  ┃             │
           └───────┺━━┹─┺━━┻━━┹────┺━━┻━━┹─┺━━┹───┺━━┻━━┹──────┺━━┹─────────┺━━┹─────────────┘
                   1    a  2       3  b    4      c  5         d            e        main terminated

So we see that b is offset 50ms from it's target by the execution of 3 and that 5 is similarily offset 50ms by the execution of c. Task 2 is set to start right after task a, so it might be slightly delayed if a was slow in it's actual work, but it's hard to notice in this example. The reason this happens is because the os.sleep procedure will put the entire thread of execution to sleep, which to the dispatcher is indistinguishable from a task just taking a long time before yielding. This means that the dispatcher can't switch to another scheduled task when the sleep timers runs out, because it isn't running. So the sleepAsync procedure should be seen more as a "sleep at least this long" procedure. This phenomenon is what is called an execution starvation as the task that takes too long effectively "starves" the dispatcher and the other tasks of execution time. In co-operative multitasking there is nothing we can do about this, apart from making sure that our tasks are well behaving (after all they should be "co-operating"). If you know that a procedure will do a lot of work and want to try and manually yield execution to someone else you can add some simple await sleepAsync(0) statements that will allow the dispatcher to run something else from time to time. If we for example added ten sleep(10) statements interleaved with await sleepAsync(0) statements instead of our single sleep(100) our tasks wouldn't be delayed by more than 10ms.

Final remarks

As we have seen asynchronous programming is a valuable tool to speed up our code. However it comes with it's own set of benefits and drawbacks. It is easy to work with, knowing exactly where our code will yield to other tasks. But it can only be used to speed up IO bound code because we are still only running on a single thread of execution. The fact that it is a co-operative approach also means that long running tasks can delay the execution of other tasks unexpectedly. In the next article we will be looking at another kind of multitasking, namely pre-emptive multitasking, which avoids this issue and allows us to speed up computationally bound code, but which comes with its own set of issues.

Appendix A, implementation details:

The above article tries to explain how to use asynchronous programming in Nim but if you're curious about how it works, I have included this little section. It might be worth reading even if you are just interested in using it, simply because understanding the underlying mechanism might make it easier to reason about what is going on.

So the way async works is that the {.async.} pragma rewrites the procedure to an iterator and adds the required logic to register the future with the dispatcher. Iterators in Nim are mostly used for, well, iterating over things. But if you've ever written an iterator you know that they can do anything a procedure can, but allows yielding control back to the caller while keeping its scope. To better understand how the asynchronous execution works let's have a look at how creating such an iterator works:

iterator ourIterator(): int {.closure.} =
  var i = 1
  yield i
  while i <= 3:
    yield i
    i += 1

var x = ourIterator # Create an instance of the iterator
echo x() # Call the iterator
echo x()
echo x()
echo x()
echo x.finished
echo x()
echo x.finished

This prints out:

1
1
2
3
false
0
true

There are a couple of things to note here, ourIterator is marked as a closure which means that we can create instances and call it arbitrarily, iterators not created this way are inline and behaves more like a template in that they will be rewritten on compile-time. Alternatively we could return an iterator from a procedure which implicitly makes it a closure and enables us to pass in arguments when we create our instance. Doing it that way is in fact closer to how the async macro does it. We also have two yield statements in our iterator, we can have as many as we like, and put them wherever we like. They will return the value given to them, and the next time the iterator is called it will continue from the last yield that was executed. This is the beauty of iterators, they allow us to jump in and out of our code while keeping the scope. We can also see two calls to system.finished which returns true when the last run of the iterator did not hit a yield statement but rather completed or hit a return statement. This means that the last value is not valid (as seen here by the value being 0) and that calling the iterator any more times will not serve any purpose. This is something that the async macro uses to determine if a task has completed and will then copy the last good value into the future and execute any callbacks assigned to it.

You might start to catch on to how asynchronous execution is implemented in Nim by now. When we add the {.async.} pragma to our procedure it is rewritten to a procedure that registers an iterator with a global dispatcher. Then we can tell this dispatcher to run and it will run the iterators within it. These iterators will call yield every time they do an await operation. And the value they yield is the future they want to wait for to complete. When that task was created it was registered with the dispatcher and started and when we await the result of its future the dispatcher will continue to run that task until the future is completed and control can be passed back to our iterator. The dispatcher also has support for timers, which is how the sleepAsync we used above works. These are simply kept in a list and the dispatcher will check if any of the timers are due for execution.