Windows 8 Registered I/O - Generating load for the performance tests

Now that we have five example servers, four RIO designs and a traditional polled UDP design, we can begin to look at how the RIO API performs compared to the traditional APIs. Of course these comparisons should be taken as preliminary since we’re working with a beta version of the operating system. However, though I wouldn’t put much weight in the exact numbers until we have a non-beta OS to test on, it’s useful to see how things are coming along and familiarise ourselves with the designs that might be required to take advantage of RIO once it ships.

Sending a stream of datagrams

Before we can compare performance we need to be able to push the example servers hard. We do this by sending a stream of datagrams at them as fast as we can for a period of time. The servers start timing when they get the first datagram and then count the number of datagrams that they process. The test finishes by sending a series of smaller datagrams at the server. When the server sees one of these smaller datagrams it shuts down and reports on the time taken and the number of datagrams processed and the rate at which they were processed.

All we need to be able to do to stress the servers is to send datagrams at a rate that gets close to 100% utilisation of a 1Gb Ethernet link. This is fairly simple to achieve using the traditional blocking sockets API.

   for (size_t i = 0; i < DATAGRAMS_TO_SEND; ++i)
      if (SOCKET_ERROR == ::WSASendTo(
         reinterpret_cast<sockaddr *>(&addr),

There’s not much more to it than that. We use similar code to setup and clean up, but if you’ve been following along with the other examples then there’s nothing that needs to be explained about that.

The code for this example can be downloaded from here. This code requires Visual Studio 11, but would work with earlier compilers if you have a Windows SDK that supports RIO. Note that Shared.h and Constants.h contain helper functions and tuning constants for ALL of the examples and so there will be code in there that is not used by this example. You should be able to unzip each example into the same directory structure so that they all share the same shared headers. This allows you to tune all of the examples the same so that any performance comparisons make sense. This program can be run on versions of Windows prior to Windows 8, which is useful for testing as you only need one machine set up with the beta of Windows 8 server.

Code is here

Code - updated 15th April 2023

Full source can be found here on GitHub.

This isn’t production code, error handling is simply “panic and run away”.

This code is licensed with the MIT license.