3. Description of RCU Primitives

The RCU mechanism consists of an interface that registers a callback function This function is called after all the CPUs in the system have gone through at least one "quiescent" state (such as context switch, idle loop, or user code). This function can be used while updating shared data structures. However, more than one update cannot be performed at the same time, so each update must be serialized using conventional mutual exclusion mechanisms. Typical update side code would look follow this process:

  1. Lock the update side.

  2. Update the global pointer to the data structure (struct rcu_head) with the new copy or NULL. After this step any new kernel threads in another CPU can see only the new pointer.

  3. Invoke a callback (call_rcu) to free the old copy after all CPUs go through a quiescent point. This step ensures that when the copy is freed, no CPU can hold a reference to this data structure.

  4. Unlock the update side.

The header file to performs this process is Linux/rcupdate.h.

3.1. struct rcu_head

This structure contains all the information necessary for RCU mechanism to maintain pending updates. Each update corresponds to one struct rcu_head. Typically, the rcu_head structure is embedded in the data structure that uses RCU for updating and lookup.

		struct rcu_head {
       			struct list_head list;
       			void (*func)(void *obj);
       			void *arg;
		};


		#define RCU_HEAD_INIT(head) \
			{ LIST_HEAD_INIT(head.list), NULL, NULL }
		#define RCU_HEAD(head) \
			struct rcu_head head = RCU_HEAD_INIT(head)
		#define INIT_RCU_HEAD(ptr) do { \
       			INIT_LIST_HEAD(&(ptr)->list); (ptr)->func = NULL; \
				(ptr)->arg = NULL; \
			} while (0)
			

3.2. call_rcu()

void call_rcu(struct, rcu_head, *head, void, (*func) (void, *head) );

This call invokes a callback function, func after all the CPUs have gone through at least one quiescent state. For performance reasons, good implementations of RCU do not wait for completion of the quiescent cycle. Instead, they queue the cycle in batches and return. The callback will be invoked later when all the CPUs go through at least one quiescent state. The callback function may be invoked from bottom-half context, so the user must take that into consideration.

3.3. synchronize_kernel

void synchronize_kernel(void);

This interface waits until all the CPUs of the system have gone through a quiescent state, like context switch. The interface blocks, so it can be called from process context only. Calling from process context is useful in situations where the update of an RCU protected data is not performance critical. For example, module unloading is not performance critical, so it can use this interface if it is protected by RCU.

3.4. Use of Memory Barriers

Ordering of reads and writes in critical sections is taken care of implicitly in the locking primitives of conventional locks. However, with a lock-free read of data, as supported by RCU, ordering of memory accesses by CPUs is important. The Linux kernel already provides a set of memory barrier operations. The most commonly used barrier interfaces for RCU are: wmb() and rmb().

Note: In future versions of the kernel, the names of these interfaces might be changed to write_barrier() and read_barrier() .

For example, consider a situation where a stale copy of the data structure cannot be tolerated by the algorithm. You can avoid this situation with the following approach:

Update side:

		Mark entry obsolete
		Remove it from the list
		Schedule a RCU callback to free it after quiescent cycle
					

Read side:

		For each element 
		if element is obsolete continue
		if element->value == key
		return element;
					

If the writes to mark the entry obsolete and its removal are re-ordered by either the compiler or the CPU, this code will not function correctly. You can avoid using a write memory barrier during update, as follows:

		Mark entry obsolete
		wmb(); /* write_barrier() later */
		Remove it from the list
		Schedule a RCU callback to free it after quiescent
		cycle
				

For example, on CPUs such as DEC Alpha, wmb() does not guarantee that writes are seen in order on the read side, even if there is a data dependency (setting of element pointer in the loop header and using element pointer in the loop body). For these CPUs, there must be an explicit rmb() to ensure that the writes are seen in order, as follows:

		For each element 
			rmb();
			if element is obsolete
				continue
			if element->value == key
				return element;
				

Most CPUs ensure read-side ordering if there is a data dependency. To avoid the unnecessary penalty, a new interface, read_barrier_depends(), is under development. This interface has an rmb() for Alpha only.