Read Copy Update HOWTO


Dipankar Sarma


dipankar@in.ibm.com

Kristin Thomas


kristint@us.ibm.com

2001-09-03
Revision History                                          
Revision 0.1        2001-11-12       Revised by: KET      


This HOWTO is a concise guide to the Read-Copy Update mutual exclusion
mechanism in the Linux kernel.

---------------------------------------------------------------------------
---------------------------------------------------------------------------

Copyright and Legal Notice

Copyright © 2001 IBM Corporation. All rights reserved.

This document may be reproduced or distributed in any form without prior
permission provided the copyright notice is retained on all copies.
Modified versions of this document may be freely distributed, provided that
they are clearly identified as such, and this copyright is included intact.

This document is provided "AS IS," with no express or implied warranties.
Use the information in this document at your own risk.

Special Notices

Linux is a trademark of Linus Torvalds. Other company, product, and service
names may be trademarks or service marks of others.
---------------------------------------------------------------------------

Introduction

This document describes how to use the Read-Copy Update (RCU) mechanism in
the Linux kernel for scalable mutual exclusion. The HOWTO provides examples
and explains the interfaces associated with RCU. It also prepares you to
save cycles using RCU.
---------------------------------------------------------------------------

What is Read-Copy Update?

Read-Copy Update (RCU) is a mutual exclusion primitive that takes advantage
of the event-driven nature of operating systems and provides a way to read
data structures shared between CPUs at zero cost. For read-mostly data
structures, it reduces contention on conventional locks and lock cache line
bouncing by allowing lock-free reading of data.

RCU allows you to read a shared data structure as if there is no other CPU
accessing it. When you need to update the data structure, you can update
the global pointer to the data and keep the old copy until all the threads
currently executing inside the kernel have completed. The updated pointer
ensures that none of the CPUs have any remaining references, so the old
copy can be deleted.
---------------------------------------------------------------------------

Why use Read-Copy Update?

The following list explains why you should use the RCU mechanism. It also
explains the motivation for RCU's development.

Increase in cost of conventional locks
    CPU speeds have been increasing by a factor of from 1.5x to 2x per year
    for more than a decade. However, interconnect speeds (printed circuit
    boards and buses) have only been increasing by roughly 15% per year
    over the same period. This increase in overhead provides the motivation
    for constructing mutual-exclusion schemes that do not rely so heavily
    on communication over the relatively slow interconnects between the
    relatively fast CPUs.
   
    The RCU mechanism allows locking operations to be eliminated on read
    accesses to the data it protects.
   
Avoiding complicated races
    The RCU mechanism is not as susceptible to deadlock as are explicit
    lock operations. The reduced susceptibility to deadlock results in
    reduction in or elimination of code that would otherwise be required to
    detect and repair or to avoid deadlocks. This reduction also reduces
    software development and maintenance costs.
   
   
Cache benefits of lock-free reads
    By allowing lock-free reading of data, RCU avoids thrashing of shared
    lock cache lines. In read-mostly situations, lock-free reading provides
    significant benefits. One such example is the synthetic "chat"
    benchmark where a number of cloned tasks share the files_struct. Even
    though a reader-writer lock is used to read the file pointer in fget(),
    the bouncing of the lock cache line severely impacts the performance
    when a large number of CPUs are used. Using RCU, fget() can look up the
    file pointer without acquiring any lock and that significantly improves
    the performance.
   
   
---------------------------------------------------------------------------

Additional Resources

Additional information about RCU can be found on the RCU homepage at http:/
/lse.sourceforge.net/locking/rcupdate.html. The white paper for RCU for
Linux was published at Ottawa Linux Symposium, 2001. This paper is at http:
//lse.sourceforge.net/locking/rclock_OLS.2001.05.01c.sc.pdf.
---------------------------------------------------------------------------

Description of RCU Primitives

The RCU mechanism consists of an interface that registers a callback
function This function is called after all the CPUs in the system have gone
through at least one "quiescent" state (such as context switch, idle loop,
or user code). This function can be used while updating shared data
structures. However, more than one update cannot be performed at the same
time, so each update must be serialized using conventional mutual exclusion
mechanisms. Typical update side code would look follow this process:

 1. Lock the update side.
   
 2. Update the global pointer to the data structure (struct rcu_head) with
    the new copy or NULL. After this step any new kernel threads in another
    CPU can see only the new pointer.
   
 3. Invoke a callback (call_rcu) to free the old copy after all CPUs go
    through a quiescent point. This step ensures that when the copy is
    freed, no CPU can hold a reference to this data structure.
   
 4. Unlock the update side.
   

The header file to performs this process is Linux/rcupdate.h.
---------------------------------------------------------------------------

struct rcu_head

This structure contains all the information necessary for RCU mechanism to
maintain pending updates. Each update corresponds to one struct rcu_head.
Typically, the rcu_head structure is embedded in the data structure that
uses RCU for updating and lookup.
                struct rcu_head {
                        struct list_head list;
                        void (*func)(void *obj);
                        void *arg;
                };


                #define RCU_HEAD_INIT(head) \
                        { LIST_HEAD_INIT(head.list), NULL, NULL }
                #define RCU_HEAD(head) \
                        struct rcu_head head = RCU_HEAD_INIT(head)
                #define INIT_RCU_HEAD(ptr) do { \
                        INIT_LIST_HEAD(&(ptr)->list); (ptr)->func = NULL; \
                                (ptr)->arg = NULL; \
                        } while (0)
                        
---------------------------------------------------------------------------

call_rcu()


void call_rcu(struct, rcu_head, *head, void, (*func) (void, *head) );


This call invokes a callback function, func after all the CPUs have gone
through at least one quiescent state. For performance reasons, good
implementations of RCU do not wait for completion of the quiescent cycle.
Instead, they queue the cycle in batches and return. The callback will be
invoked later when all the CPUs go through at least one quiescent state.
The callback function may be invoked from bottom-half context, so the user
must take that into consideration.
---------------------------------------------------------------------------

synchronize_kernel


void synchronize_kernel(void);


This interface waits until all the CPUs of the system have gone through a
quiescent state, like context switch. The interface blocks, so it can be
called from process context only. Calling from process context is useful in
situations where the update of an RCU protected data is not performance
critical. For example, module unloading is not performance critical, so it
can use this interface if it is protected by RCU.
---------------------------------------------------------------------------

Use of Memory Barriers

Ordering of reads and writes in critical sections is taken care of
implicitly in the locking primitives of conventional locks. However, with a
lock-free read of data, as supported by RCU, ordering of memory accesses by
CPUs is important. The Linux kernel already provides a set of memory
barrier operations. The most commonly used barrier interfaces for RCU are:
wmb() and rmb().
   
   
    Note: In future versions of the kernel, the names of these interfaces
    might be changed to write_barrier() and read_barrier() .


For example, consider a situation where a stale copy of the data structure
cannot be tolerated by the algorithm. You can avoid this situation with the
following approach:

Update side:
                Mark entry obsolete
                Remove it from the list
                Schedule a RCU callback to free it after quiescent cycle
                                        
   
Read side:
                For each element 
                if element is obsolete continue
                if element->value == key
                return element;
                                        
   
If the writes to mark the entry obsolete and its removal are re-ordered by
either the compiler or the CPU, this code will not function correctly. You
can avoid using a write memory barrier during update, as follows:
                Mark entry obsolete
                wmb(); /* write_barrier() later */
                Remove it from the list
                Schedule a RCU callback to free it after quiescent
                cycle
                                

For example, on CPUs such as DEC Alpha, wmb() does not guarantee that
writes are seen in order on the read side, even if there is a data
dependency (setting of element pointer in the loop header and using element
pointer in the loop body). For these CPUs, there must be an explicit rmb()
to ensure that the writes are seen in order, as follows:
                For each element 
                        rmb();
                        if element is obsolete
                                continue
                        if element->value == key
                                return element;
                                

Most CPUs ensure read-side ordering if there is a data dependency. To avoid
the unnecessary penalty, a new interface, read_barrier_depends(), is under
development. This interface has an rmb() for Alpha only.
---------------------------------------------------------------------------

Applications of RCU

This section describes several example applications of RCU.
---------------------------------------------------------------------------

RCU for a Hash Table with refcnt

One common application of RCU is a list structure protected by a global
mutual exclusion whose individual elements are protected by a per-element
mutual exclusion. Usually, searches of the list are more common than
insertions or deletions.

For example, the code for simple search and delete functions for a
circular, doubly-linked list of elements would have many lock round-trips
and some complicated deadlock-avoidance code, if you did not use call_rcu.
The following code is much shorter and more simple, and it uses call_rcu:
                struct element {
                        struct element *next;
                        struct element *prev;
                        atomic_t refcnt;
                        int address;
                        int data;
                        struct rcu_head rcu;
                };


                /*
                 * Find element with specified address, lock it, and return
                 * a pointer to it.  Return NULL if no such address.
                 */
                struct element *search(int addr)
                {
                        struct element *p;
                        struct element *p0;

                        p0 = &head[hashit(addr)];
                        rmb(); /* read_barrier_depends() */
                        p = p0->next;
                        while (p != p0) {
                                rmb(); /* read_barrier_depends() */
                                if (p->address == addr) {
                                        atomic_inc(&p->refcnt);
                                        return (p);
                                }
                                p = p->next;
                        }
                        return ((struct element *)NULL);
                }


                /*
                 * Insert an element into the hash table
                 */
                void insert(struct element *element)
                {
                        struct element *p0;

                        p0 = &head[hashit(p->addr)];
                        p->next = p0->next;
                        /*
                         * Make sure that next pointer of the inserted
                         * element is visible to other CPUs before
                         * actual insertion. This allows lock-free
                         * lookup of the bucket list to search
                         * the entire list.
                         */
                        wmb();
                        p0->next = p;
                }

                void myfree(void *obj)
                {
                        struct element *p = obj;
                
                        /*
                         * Now it is safe to check refcnt since
                         * noone can be holding a stale pointer
                         * to this element got during lock-free
                         * search().
                         */
                        if (atomic_read(&p->refcnt)) {
                                /*
                                 * Let some other garbage collection
                                 * free it
                                 */
                                return;
                        }
                        kfree(obj);
                }

                /*
                 * Delete the specified element from the table.  Element
                 * must be locked.
                 */
                void delete(struct element *p)
                {
                        /*
                         * Get global lock.  There is no possible deadlock,
                         * since the global lock is now always acquired
                         * after the per-element lock.
                         */

                        spin_lock(&hash_lock);

                        /*
                         * Delete the element from the list, release
                         * lock, free up the element, and return.
                         */
                        p->next->prev = p->prev;
                        p->prev->next = p->next;
                        spin_unlock(&hash_lock);

                        call_rcu(&p->rcu, myfree, p);
                        return;
                }
                                
---------------------------------------------------------------------------

RCU for use with kmem_cache_free()

Unlike [kv]free(), kmem_cache_free() requires one additional pointer to the
cache that frees any memory allocated from it. This pointer can be a
problem for RCU because it is designed for only one pointer. You can avoid
this problem by using a specific destroyer function for each cache that
requires deferred freeing. If the cache pointer is a global variable, the
destroyer function can directly use that and invoke kmem_cache_free() in
the callback. Otherwise, the cache pointer can be embedded in the data
structure along with struct rcu_head and can be re-computed by the
destroyer function.

For global cache pointers the function is as follows:
                static kmem_cache_t xxx_cache;

                struct xxx_entry {
                        struct xxx_entry *next;
                        struct xxx_entry *prev;
                        int flag;
                        int info;
                        struct rcu_head rch;
                }

                void xxx_destroy(void *ptr) {
                        struct xxx_entry *xp = (struct xxx_entry *)ptr;
                        kmem_cache_free(&xxx_cache, (void *)xp);
                }

                void xxx_delete(struct xxx *xp) {
                        call_rcu(&xp->rch, xxx_destroy, xp);
                }
                                

If the cache pointer is not available as a global variable, it can still be
embedded in the structure and accessed during destruction as follows:
                struct xxx_entry {
                        struct xxx_entry *next;
                        struct xxx_entry *prev;
                        int flag;
                        int info;
                        kmem_cache_t *cachep;
                        struct rcu_head rch;
                }

                void xxx_destroy(void *ptr) {
                        struct xxx_entry *xp = (struct xxx_entry *)ptr;
                        kmem_cache_free(&xp->cachep, (void *)xp);
                }

                void xxx_delete(struct xxx *xp) {
                        call_rcu(&xp->rch, xxx_destroy, xp);
                }
                                
---------------------------------------------------------------------------

To Do

---------------------------------------------------------------------------

Contributions