To: kumon@flab.fujitsu.co.jp
cc: Kanoj Sarcar , Paul McKenney/Beaverton/IBM@IBMUS, andi@suse.de, andrea@suse.de, aono@ed2.com1.fc.nec.co.jp, beckman@turbolabs.com, bjorn_helgaas@hp.com, Jerry.Harrow@Compaq.com, jwright@engr.sgi.com, kanoj@engr.sgi.com, kumon@flab.fujitsu.co.jp, norton@mclinux.com, suganuma@hpc.bs1.fc.nec.co.jp, sunil.saxena@intel.com, tbutler@hi.com, woodman@missioncriticallinux.com, mikek@sequent.com, Kenneth Rozendal/Austin/IBM@IBMUS, Pratap Pattnaik/Watson/IBM, Shailabh Nagar/Watson/IBM
Date: 04/02/01 06:28 AM
From: Hubertus Franke/Watson/IBM@IBMUS
Subject: Re: NUMA-on-Linux roadmap, version 2
Yes, I agree with all that.
My point was, that NUMA tackles many of the scaling problems
that appear on large scale systems.
(a) multiple kswapd's,
(b) multiple memory pools,
(c) multiple schedulers with load-alancing on top
(d) limited interrupt routing
(e) multi-path I/O.
Naturally one things that these align nicely with nodes.
But they can be equally applied for large scale SMPs (no-NUMA)
if for instance lock contention is observed (e.g. on memory pools).
I just want to point out that we should keep that in mind and use
these scenarios as a tool to make an argument for early adoption in
the base kernel.
Another point that came across from the discussions was that with
the ratio of (cache-miss/local memory access) is getting
substantially larger than the (remote/local memory latency,
NUMA becomes more and more an architectural artifact. It strongly
suggests to me that we should treat large SMP and NUMA the same and
focus strongly on cache affinity. When cache affinity is hard to
achieve, then NUMA issues, e.g. moving process etc. should become
an issues.
Looking at some of the points (a)-(e), nothing prevents us from
providing this in large SMPs. Should the system be NUMA, ofcourse
we align these with the NUMA boundaries.
Hubertus Franke
Enterprise Linux Group (Mgr), Linux Technology Center (Member Scalability) , OS-PIC (Chair)
email: frankeh@us.ibm.com
(w) 914-945-2003 (fax) 914-945-4425 TL: 862-2003
Please respond to kumon@flab.fujitsu.co.jp
To: Hubertus Franke/Watson/IBM@IBMUS
cc: Kanoj Sarcar , Paul McKenney/Beaverton/IBM@IBMUS, andi@suse.de, andrea@suse.de, aono@ed2.com1.fc.nec.co.jp, beckman@turbolabs.com, bjorn_helgaas@hp.com, Jerry.Harrow@Compaq.com, jwright@engr.sgi.com, kanoj@engr.sgi.com, kumon@flab.fujitsu.co.jp, norton@mclinux.com, suganuma@hpc.bs1.fc.nec.co.jp, sunil.saxena@intel.com, tbutler@hi.com, woodman@missioncriticallinux.com, mikek@sequent.com, Kenneth Rozendal/Austin/IBM@IBMUS, kumon@flab.fujitsu.co.jp
Subject: Re: NUMA-on-Linux roadmap, version 2
I definitely agree with you Hubertus.
Hubertus Franke writes:
> I already pointed out to Paul that some of the API, such as cpu-binding
> and memory binding applies to large scale SMP as well.
All of optimization we've done on NUMA is also usefull for SMP. A
large cache can be treated as a local memory. At least, in a past
year I posted SMP code improvements to LKML based on our NUMA machine
experience.
> In that light, it might be useful to restate our goal to
> "SMP and NUMA-API" and start layering it.
In generally, NUMA has longer memory access time and therefore NUMA is
much sensitive to "SMP optimization"
But, at some of the time, NUMA optimization differs from SMP.
Because, a process on NUMA has a origin of memory, but SMP has
a symmetrical memory view.
--
Kouichi Kumon
Computer Systems Laboratory, Fujitsu Labs.
kumon@flab.fujitsu.co.jp