LSE Con Call Minutes for May 17

I. VM discussion and status with Andrea Arcangeli.

o ZONE_NORMAL over 1G phys on NUMAQ
o config_nonlinear on iSeries
o pte-highmem/highpte scalability
o roadmap for atomic-persistent user-address-space-based kmaps in 2.5
o ZONE_NORMAL exhaustion.
o 4GB - 4GB split

	1. ZONE_NORMAL over 1G phys on NUMAQ   

Andrea is not convinced NUMAQ(tm) architecture needs the whole
config nonlinear patch.  Martin Bligh said they have been
thinking of NUMAQ needs as a special case of config nonlinear.
NUMAQ only needs half of the config nonlinear changes daniel 
has done but not all of it. 

Martin Bligh said there is still an interesting problem with
doing dma to a zone normal above 4gb. Andrea said it is
a secondary problem. If you never use pci32 device then 
is not a problem. 

Martin said we should separate out the assumptions of dma. Let 
the allocator sort out where it will get memory from.  Andrea said
to make additional flag to explicityly specify greater than 4gb 
ZONE NORMAL on other nodes.  Martin liked that idea.

	2. config_nonlinear on iSeries 

Andrea said one architecture where config nonlinear is absolutely 
required is iSeries(tm).  They can have huge holes of physical memory 
spread over large space, without Daniel'ss patch managing memory 
could be very slow.  Tis is an architecture where discontig mem 
doesnt work well. If have hundreds of holes absolutely need the non 
linear patchs.

Andreas doesnt have access to a system with this architecture and 
asked if it would be a big problem for the system administrator. 
Dave Englebret from IBM iSeries said there is no visibility to the 
administrator of the memory. The hypervisor doesn't allow the sysadmin
to the memory; Linux doesnt see any holes because of virtual mapping.

Dave said one thing iSeries would like to see in Linux as soon as
it is ready is hotplug cpu. which is already supported in os400.

Martin asked if they would have a problem with noncontiguous
physical memory assumptions being broken.  Dave Englebret said we 
dont have to worry about non physically contiguous memory because 
all their memory is mapped virtually contiguous anyway. Also 
DMA doesn't need it on iSeries either.

	3. pte-highmem/highpte scalability 

Martin Bligh said Andrea's changes appear to have fixed the 
problem they were having on the NUMAQ. 

Dave McKracken talked about reserving a portion of user address
space specifically for mapping pagetables. However, there seems 
to be a performance problem due to the extra allocations.

	4. atomic-persistent user-address-space-based kmaps

Andrea said we should get per-cpu persistant kmaps working first
before working on highpte for big memory issues. Bill said Manfred 
Spraul has already been working on per-cpu persistent kmaps and 
it mostly works so we can look into other stuff.  Andrea said 
there is still some work to be done with per-cpu persistant kmaps.  
We need to run more benchmarks to help figure out the best solutions.

	5. Zone normal exhaustion and 4gb/4gb split

Bill was wondering if "bigmem" might be reinstated to help this 
problem as a config option? That was a patch from Kanoj in the
2.2 kernel timeframe that would switch address spaces whenever 
doing a context switch. wouldnt have to map so much other stuff but 
would switch much more often. Andrea suggested kmapping struct
page for real 64gb systems.

Which was Ben LaHaise's idea of having 4gb for kernel and 4gb for
user then using kmap whenever kernel wants to access user
pages. Andrea said that would still be very invasive but potentially
less so than kmapping struct page. Bill said it would only be used for 
32 bit large systems via a config option. Currently they dont work 
so we need some solution. 

Bill and Martin asked if it might be accepted into the mainline
as a config option. Andrea said it is up to Linus not me. but if 
it could be made clean and not intrude too much it might get in.

----------
minutes compiled by Hanna Linder (hannal@us.ibm.com)