Minutes from 3/22 lse conference call Martin Bligh - Hybrid user-kernel virtual address space See the thread on linux-kernel for more information: http://marc.theaimsgroup.com/?t=101665157900001&r=1&w=2 Normally kernel and user address spaces break at the 3gb barrier. One problem is the need to shift page tables into high mem to alter them Martin would like to make a hybrid user-kernel address space that is per task like user space and protected like kernel space. It need not be very big, a few MB. This area could come from either the current user space area, or from the current kernel area - initial thoughts are to take it from the current user area, above the user stack. This area would have several possible uses: 1. A place to put the user pagetables. 2. An area where an efficient, scalable, per-task kmap could be created. 3. Possibly moving the task's kernel stack into this space 4. Possibly moving the task_struct into this space. There are two current known problems: 1. If we take the memory from the current user space, the pagetables are not really per-task they are per-address space. 2. Not all accesses to the task's page tables are done from within the task's context (eg. swapout). Current plans for these problems: 1. To create a true per task mapping is thought to require a seperate PGD for each task, which would be somewhat inefficient for multi-threaded processes. A per address-space mapping is much simpler, and provides for the user pagetable usage (though not the other usage). Stage 1 will be to implement the per-address space area. We could implment a per address-spece kmap, which would not be as good as a per-task kmap (in that it would still require locking), but would be much more scalable than the current system. 2. Initial plan is to use atomic kmap, as currently done by the pte-highmem patch. Martin noted that kswapd doesn't use the current 3Gb user area, and this could be used as a huge kmap pool. Dave pointed out that the address mapping is shared between all kernel threads, but thought this could be easily fixed. Going back to the usage list, items 3 and 4 are much more problematic, Waitqueues are currently put into the processes kernel stack, but globally accessed. Task structurea are used globally as well, but it may be advantageous to do a secondary mapping into the user-kernel area in order to locate the structure at a fixed virtual address (though this requires the task_struct to be page aligned). Bill pointed out that some architecture do virtual caching instead of physical caching so wouldnt help them much. There was some discussion of other methods of deriving current that were cleaner that the current kernel stack address trick. Martin mentioned that the NUMA kernel text replication would pose similar problems to the per-task user-kernel adress space, and there was some ensuing discussion, including the fact that ia64 already does this, but nobody present knew how this was done. Bill Irwin - status of pagemap_lru_lock subdivision in the -rmap VM The page map lru lock protects a variety of nebulous things which caused races and is hard to maintain. Basically the locks seemed to be a major point of contention in large memory systems. Recently found a race that stopped us from running very long. Looks like some other races are triggerable. With this patch the system now runs for a while. Rik said it might be fun to try two zones. a primary zone and a fall back zone which is used only if page allocation bound system. Might not be useful but would be fun to try. Martin said the problem with breaking up into too many zones is it might take too long to find pages. Pat asked Bill if the reason he wanted her discontig mem patch is to fake an smp system into using multiple zones like a numa. Bill said yes that is why. Bill would like to see lockmeter data on Martins NUMA machine running the pagemap lru patches. Hanna Linder - Get it right people Hanna verified Arjan van de Ven's first name is pronounced: ar-ian not ar-jan ---------- minutes compiled by hannal@us.ibm.com with significant editing by mjbligh@us.ibm.com