Untangling .NET Atomics: Defining Terms

Atomicity, volatility, and memory ordering - what does it all mean?

This is the first in a series of blog posts I'm writing on how we might fix the mess that is .NET atomics.

This won't be the first time I've written on this topic, as I wrote this Mono documentation page once upon a time. That page is mainly aimed at Mono runtime developers, though, so it's fairly terse and full of lingo that may be confusing to the average .NET developer. I also omitted some details on that page that are fairly obvious to a Mono developer but perhaps not so much to everyone else. I hope to address those issues in this post.

Note that I will not be considering the .NET Framework in this series; .NET Framework 4.x is considered feature-frozen, with .NET 5+ (formerly .NET Core) being the future.

To properly discuss atomics and memory model issues, I first have to actually define the terms being used. This is necessary because there are some subtle differences between how these terms are used in various programming languages. I've also seen a lot of confusion around these terms in online .NET communities. I'll assume at least intermediate familiarity with C# programming and some very basic understanding of CIL bytecode. I'll use some terminology from the C++11 memory model, though in a very simplified form.

Memory Ordering

If you're familiar with multithreaded programming at all, you're probably aware that just because a mutating thread A performs memory operations in a certain order, it doesn't necessarily mean that an observing thread B will see those operations carried out in the same order. For example, A could do operations M1, M2, and M3, but B might see M2, M3, and M1.

Of course, any non-trivial cooperation between threads will require a coherent view of certain memory operations. You impose such a view by using memory barriers (also known as memory fences). By issuing a memory barrier between two memory operations M1 and M2, you inform the compiler and the CPU that you always want M1 to happen before M2 when observed from other threads.

It gets a little more complicated than that, though: A memory barrier has an ordering associated with it. This ordering describes in more detail what the barrier allows and prevents. For the purposes of this post, the following kinds of orderings exist:

  • Relaxed: This is the default state of affairs, as in the first example in this section. There are no ordering guarantees. A relaxed memory barrier is effectively a no-op.
  • Acquire: No memory operations occurring after this barrier (in source code) can be reordered to occur before this barrier.
  • Release: No memory operations occurring before this barrier (in source code) can be reordered to occur after this barrier.
  • Sequential consistency: Acquire and release semantics combined. No memory operations can be reordered past this barrier in either direction.

(Note: In the C++11 memory model, acq and rel can actually be combined into either acq_rel or seq_cst which have slightly different semantics. This nuance doesn't currently exist in .NET land, so I'll only consider seq_cst.)

Acquire and release orderings exist because sequential consistency is quite expensive to impose and is often stronger than actually necessary. Relaxed ordering is only useful in combination with atomic operations.

Memory orderings can also be associated with specific memory operations - they don't have to be explicit barriers. For example, you could have an atomic load operation that has acquire semantics, an atomic store operation that has release semantics, or an atomic compare-and-exchange operation that has sequential consistency semantics.

That said, it's important to understand that while memory ordering and atomicity are often used together, using one concept does not necessarily imply using the other.

Atomicity

Atomicity is often poorly understood and tends to be conflated with strong memory ordering constraints. In reality, even a relaxed memory operation can be atomic - under the right circumstances, anyway.

Fundamentally, a memory operation is said to be atomic if an incomplete value cannot be observed. For example, if thread A stores a 64-bit value to memory location P while thread B simultaneously loads a 64-bit value from P, and thread B is guaranteed to always see the full value that A stored, it can be said that these are atomic operations. Since they're working with 64-bit values, these operations will usually be atomic by default on a 64-bit system, but this would not be the case on a 32-bit system. On such a system, B might see a garbled value consisting of 32 lower bits from the old value at P and 32 higher bits from the new freshly written value at P. This is sometimes referred to as word tearing.

In .NET land, there is a guarantee that any memory operation will be atomic if the size of the value is equal to or less than the system's word size. In practice, since .NET only runs on 32-bit and 64-bit machines, this means that 32-bit memory operations are always atomic, while 64-bit memory operations are only atomic on 64-bit machines.

(Note: .NET also requires memory locations to be properly aligned for the size being accessed in order for the operation to be atomic. You usually won't have to worry about this since you would have to go pretty far out of your way to obtain a C# ref that is misaligned.)

At this point, you can probably see how both atomicity and memory ordering are crucial in ensuring that threads can actually talk to each other properly. In real code, atomic operations will often have a memory ordering associated with them as this tends to be much easier to reason about than explicit barriers. Still, as mentioned above, keep in mind that this is not always the case - this will become relevant in the next post.

Volatility

So far, defining the terminology has been fairly straightforward. Unfortunately, this is where the waters get muddied.

The concept of a volatile operation dates all the way back to C. In C (and by extension C++), the volatile keyword is used to indicate that every load and store operation at a particular memory location has a side effect and so those operations must not be optimized. This is mostly relevant when talking to hardware through memory-mapped I/O.

More specifically, it says:

  • Volatile operations must not be reordered with respect to each other. (Note that they may still be reordered with respect to non-volatile operations.)
  • Seemingly redundant volatile load and store operations from/to the same memory location must not be eliminated.

For historical reasons that I'm not familiar with, the Microsoft C++ compiler interprets the volatile keyword differently from the above definition when compiling for x86 - both 32-bit and 64-bit. Specifically, in addition to the above constraints, it guarantees that a volatile load operation has acquire semantics and that a volatile store operation has release semantics. (It does not guarantee anything about atomicity!) This has contributed massively to the confusion that surrounds this keyword and term. To this day, many people still believe that volatile in standard C has to do with concurrency even though this is only the case with Microsoft's compiler.

(Note: The Microsoft C++ compiler will use standard semantics if you pass /volatile:iso on the command line, or if you compile for a non-x86 architecture such as ARM.)

Unfortunately, this non-standard definition of volatile was inherited by C# and .NET, where the concept of volatility carries the same guarantees as the Microsoft C++ compiler assigns to it.


Next, I'll discuss the atomics functionality currently provided by the .NET ecosystem.