To: kumon@flab.fujitsu.co.jp cc: Kanoj Sarcar , Paul McKenney/Beaverton/IBM@IBMUS, andi@suse.de, andrea@suse.de, aono@ed2.com1.fc.nec.co.jp, beckman@turbolabs.com, bjorn_helgaas@hp.com, Jerry.Harrow@Compaq.com, jwright@engr.sgi.com, kanoj@engr.sgi.com, kumon@flab.fujitsu.co.jp, norton@mclinux.com, suganuma@hpc.bs1.fc.nec.co.jp, sunil.saxena@intel.com, tbutler@hi.com, woodman@missioncriticallinux.com, mikek@sequent.com, Kenneth Rozendal/Austin/IBM@IBMUS, Pratap Pattnaik/Watson/IBM, Shailabh Nagar/Watson/IBM Date: 04/02/01 06:28 AM From: Hubertus Franke/Watson/IBM@IBMUS Subject: Re: NUMA-on-Linux roadmap, version 2 Yes, I agree with all that. My point was, that NUMA tackles many of the scaling problems that appear on large scale systems. (a) multiple kswapd's, (b) multiple memory pools, (c) multiple schedulers with load-alancing on top (d) limited interrupt routing (e) multi-path I/O. Naturally one things that these align nicely with nodes. But they can be equally applied for large scale SMPs (no-NUMA) if for instance lock contention is observed (e.g. on memory pools). I just want to point out that we should keep that in mind and use these scenarios as a tool to make an argument for early adoption in the base kernel. Another point that came across from the discussions was that with the ratio of (cache-miss/local memory access) is getting substantially larger than the (remote/local memory latency, NUMA becomes more and more an architectural artifact. It strongly suggests to me that we should treat large SMP and NUMA the same and focus strongly on cache affinity. When cache affinity is hard to achieve, then NUMA issues, e.g. moving process etc. should become an issues. Looking at some of the points (a)-(e), nothing prevents us from providing this in large SMPs. Should the system be NUMA, ofcourse we align these with the NUMA boundaries. Hubertus Franke Enterprise Linux Group (Mgr), Linux Technology Center (Member Scalability) , OS-PIC (Chair) email: frankeh@us.ibm.com (w) 914-945-2003 (fax) 914-945-4425 TL: 862-2003 Please respond to kumon@flab.fujitsu.co.jp To: Hubertus Franke/Watson/IBM@IBMUS cc: Kanoj Sarcar , Paul McKenney/Beaverton/IBM@IBMUS, andi@suse.de, andrea@suse.de, aono@ed2.com1.fc.nec.co.jp, beckman@turbolabs.com, bjorn_helgaas@hp.com, Jerry.Harrow@Compaq.com, jwright@engr.sgi.com, kanoj@engr.sgi.com, kumon@flab.fujitsu.co.jp, norton@mclinux.com, suganuma@hpc.bs1.fc.nec.co.jp, sunil.saxena@intel.com, tbutler@hi.com, woodman@missioncriticallinux.com, mikek@sequent.com, Kenneth Rozendal/Austin/IBM@IBMUS, kumon@flab.fujitsu.co.jp Subject: Re: NUMA-on-Linux roadmap, version 2 I definitely agree with you Hubertus. Hubertus Franke writes: > I already pointed out to Paul that some of the API, such as cpu-binding > and memory binding applies to large scale SMP as well. All of optimization we've done on NUMA is also usefull for SMP. A large cache can be treated as a local memory. At least, in a past year I posted SMP code improvements to LKML based on our NUMA machine experience. > In that light, it might be useful to restate our goal to > "SMP and NUMA-API" and start layering it. In generally, NUMA has longer memory access time and therefore NUMA is much sensitive to "SMP optimization" But, at some of the time, NUMA optimization differs from SMP. Because, a process on NUMA has a origin of memory, but SMP has a symmetrical memory view. -- Kouichi Kumon Computer Systems Laboratory, Fujitsu Labs. kumon@flab.fujitsu.co.jp