How does C++ memory model play with cache coherence CPU like MESI?

I am slightly confused with wrapping my head around the visibility of changes on non-atomics and atomics with C++ memory model together with cache coherence.

For example, what happens when calling this function from multiple threads in a MESI like cache coherent architecture:

int global_var =  0;

// to be called from multiple threads
void inc_global() {
    ++global_var;
}

I know if called for 1000 times from 10 thread, the final value may not be 10000, but I want to understand what is happening in processor and caches that leads to this.Is below understanding correct?

Each thread(core) reads the global_var cacheline from physical memory to its L1, increments it based on the snapshot in L1, does a MESI update to Modified and sends the update to all other cores(to mark Invalid) but without any kind of locking. This would force other processors to get latest cache line but by that time they might have already processed an increment and hence updates are lost? In general how do other cores react to Modified/Invalid message when they are also doing updates on same cacheline? And for next update, does the core use the snapshot it has already OR received from other cores OR does it depend if the local copy is Invalid or Modified?

Furthermore the way it works for atomic_int(see below code snippet) is that the cacheline is locked by the core doing the update, so the Read-Modify-Update is indeed done atomically across all cores and hence no updates would be lost?

std::atomic<int> global_var{0};
// to be called from multiple threads 
void inc_global() { 
   global_var.fetch_add(1, std::memory_order_relaxed); 
}

But how does it work for a non cache-coherent architecture? Will the processor implement an internal loop for CAS?

Furthermore, for these simple independent updates, a memory order of `relaxed` is not good enough as per C++ memory model as it only guarantees atomicity on writes but not on visibility(i feel so stupid saying this, but is it true), and hence we need acq_rel on `fetch_add` ? But on a system that supports MESI, a relaxed order would work.

Madison Howard

Share Your Mood

Sheepy_wolf123

How does C++ memory model play with cache coherence CPU like MESI?