LSE Conference Call Minutes 12/14/2001

Attendees : Paul Jackson, Jack Steiner, Gerrit Huizenga, Rick Lindsey,
Steve Carbonari
          Michael Hohnbaum, Pat Gaughen, Badri Pulavarty, Shailabh Nagar,
John Hawkes,
          Simon Winwood, John Stultz, Bill Irwin, Ruth Forester


I. New member introduction : Simon Winwood is interning with IBM Research
in the Enterprise Linux Group. He is currently working on Linux large page
support.


II. John Stultz - MCS lock overview

      John introduced the MCS locks which are an implementation of a lock
proposed in a

classic paper by J.M. Mellor-Crummey and Michael L. Scott. John described
the lock
implementation where each CPU spins on its own cache line, locking is an
atomic operation
and unlocking is atomic when passing the lock to another waiting CPU.The
lock can be used
anywere but is most suited to locks which are acquired and released within
the same
function (a block of memory has to be passed around - if lock/release are
in the same
function, this can be allocated on the stack).

      A few tests on high-contention NUMA machines showed that MCS locks
performed well
at the high-end. Preliminary testing on "hackbench" showed between 7-30%
improvement.

      Jack Steiner replace the big kernel lock with MCS and a 10%
improvement in CPU
time and higher throughput on a 32-processor NUMA running AIM7.

      There are no patent issues on the lock that John is aware about.

----------

III. Bill Irwin - Reverse Mapping VM

      Bill introduced the high performance Linux VM effort being
spearheaded by Rik Van
Riel. The notable points about the VM are
 - reverse mappings
 - page table scanning eliminated : savings in page launder
 - time spent in fork() is 4-6 times that of current fork()
 - the LRU list will be per zone
 - PT chains will be incrementally copied rather than all at once on fork

Bill wanted feedback from LSE members on the effort. Whether it was
addressing relevant
issues, if algorithmic improvements were possible, what scalability
features were needed
and of course, participation in the effort.

In response to various questions by Gerrit, Ruth and Badri, it came out
that

- the space overhead included the PT chains in the page structure. The
space overhead
  wasn't quantified but was not a concern right now.
- response time overhead came from faulting in extra chains.
- locking issues worked out fine if modifications were localized. Making
the division of
  labour too fine-grained would complicate locking so it was being kept
simple.
- for IA-32 large page sizes, the reverse mapping macros would have to
change
- Simon Winwoods work would also benefit the effort. Gerrit would link them
offline.
- Similar work on PTX for Oracle had shown good TLB savings
- the VM is also looking at BSD-style mem objects
- a revised implementation of the page cache using radix trees was being
done. An initial
  profiling by Anton Blanchard, using splay trees, had seen good results on
a 12-way SMP.
  The results had been posted on lkml

- page launder used to be one of the top 10 CPU users (fixed by reverse
mappings)

There was some discussion on general VM stability problems which showed up
on machines
running databases. Ruth mentioned her experiences with various kernels and
that OSDL had
some database workloads available for general use now. It would be useful
to do regression
testing on the recent 2.4 kernel versions to determine where the VM started
showing
problems.

The VM is described at http://linuxvm.bkbits.com

----------------


(Minutes compiled by Shailabh Nagar)