Write sequencing bug in all versions of The Server Framework
I’ve just found and fixed a bug which has been present in The Server Framework from the very beginning. The problem affects connections which are “sequenced” and which have had more than 2,147,483,647 writes performed on them. The observant amongst you will be thinking that it’s a counter wrap bug and you’d be correct. The annoying thing for me is that the code in question has unit tests which explicitly test for correct operation when the sequence number wraps; the tests pass but the bug is still there.
The reason that the bug survived the unit tests is that the tests were testing the obvious place for the problem; the CInOrderBufferList
class which ensures that if a series of buffers with sequence numbers in them are added to the collection then they are removed in the sequenced order. The wrap test code checks that the collection doesn’t get confused when it has buffers with very high sequence numbers in it at the same time as buffers with low sequence numbers are added. The test works fine and tests the correct aspects of the class under test. Unfortunately the code that generates the sequence numbers that go into the buffers in the real code was out of sync with the code that processes these sequence numbers. The CInOrderBufferList
class used the IBuffer::SequenceNumber
type, a size_t
, throughout, the code that generated the numbers did so using a long
. The type mismatch meant that the sequence numbers generated were from 0 to 2,147,483,647 and then back to 0. Unfortunately the consuming code did not wrap from 2,147,483,647 to 0 and so expected a buffer with a sequence number of 2,147,483,648 to come along next and it never would.
The symptoms of the bug are that data stops being written to the network and memory usage increases.
I expect the code was originally OK for 32-bit platforms (where size_t
and unsigned long
are the same size) as I’m sure the code used to use ::InterlockedIncrement()
to increment the sequence numbers in the producer and this would have treated the long
as an unsigned long
for increment and wrap purposes. Every x64 build has always been broken through and the x86 build was broken as far back as release 5.0
I’ve added a test which is run once inside the sequence number generator which ensures that the numbers being generated have the same range as the IBuffer::SequenceNumber
type. This prevents the broken code from compiling (signed/unsigned mismatch) and throws and exception at run time if the code can compile but the data types are incorrect. The library’s unit tests then fail due to the exception.
This fix will be included in release 6.5.8 which also includes the changes for the Windows 8/Server 2012 AcceptEx()
bug. This will be released this week.