Topology API
Version: 0.4
Date: October 25, 2002
Revision History
Versions
0.1 and 0.2 by Paul McKenney, IBM, 2001
Version 0.3 by Michael Hohnbaum, IBM, February 11, 2002
Version 0.4 by Matthew Dobson, IBM, September 6, 2002
Table of Contents
- General
- Include Files
- CPUs, Memory Blocks, and
Nodes
- Definitions
- CPU
- Memory Block
- Node
- Numbering
- Simple Topology Discovery
- Future Extensions
General
This Topology API is intended to be a "simple" API which provides
rudimentary topology discovery of processors, memory blocks, (I/O busses,
) and nodes. This API is being defined in such a way that it is
architecture agnostic and can be mapped onto (hopefully) any platform.
As such it only has CPUs, Memory Blocks, and Nodes as basic building
blocks.
Include Files
The definitions are accessed via:
#include <asm/topology.h>
Each architecture (and sub-arch) should define their own versions of the
functions specified below. For now, any architecture that has not
written arch-specific code for their topology.h file, automatically uses
the asm-generic version which implements generic, non-numa versions of the
calls.
CPUs, Memory Blocks, and Nodes
The purpose of this API is to create a useful, flexible, generic
topology infrastructure for the Linux kernel, and also to export that
in a meaningful way to userspace. To that end, we try and keep
the assumptions about the underlying hardware to a minimum. As mentioned
above, this is done by only specifying the largest and most important system
components in the topology.
Definitions
CPU
This is a straightforward definition. A CPU in the topology
represents an actual, physical CPU in the system.
Memory Block
A Memory Block is defined to be a physically contiguous block of
memory. Memory Block is often written more concisely as memblk.
There has been much discussion on whether or not to allow multiple
memblks per node, a strict 1-1 memblk to node mapping, allowing nodes
to have only 0 or 1 memblks, or doing away with the memblk concept entirely.
Right now, the API assumes that there are 0 or 1 memblks per node.
This is not hard-coded anywhere, so that in the future it could change,
if necessary. The CONFIG_NONLINEAR option may make the possibility
of dealing with multiple memblks per node unnecessary.
Node
A Node in the Topology API is no more or less than an abstract
container for other topology elements. The node often approximates
the 'physical node' building block that the underlying system may be composed
of, but I cannot emphasize this strongly enough: They are NOT the same!
The node is simply a container for CPUs and Memory Blocks (and
System Busses, and ...). The reason for this is that not all architectures
are composed of 'physical nodes' per se. Even for those platforms
that are composed of 'physical nodes', the actual 'physical nodes' may
be different from the 'physical nodes' of other platforms. For these
reasons we use the node as only an abstract container type.
Numbering
In architectures that do not allow CPUs and nodes to be dynamically
added to a running system without a reboot, CPUs and nodes are numbered
consecutively from zero. Each node's CPUs are numbered consecutively.
Systems that can dynamically remove CPUs or nodes from a running system
may have "holes" in the numbering scheme. However, if new CPUs are
introduced, they will appear in the same range as other CPUs on the
same node, and if new nodes are introduced, their CPUs will be consecutively
numbered. CPUs from different nodes are never interleaved.
This means that if a node has the capacity to have additional CPUs added
to it, space must be left in the numbering scheme to accommodate those
additional CPUs.
A NUMA node may contain zero, one, or more memory blocks. As the
Linux kernel in general does not support multiple pgdats per node, the
topology does not explicitly support multiple memory blocks per node.
While it is possible that there is a one to one relationship of
memory blocks to NUMA nodes, this is not guaranteed, and NUMA implementations
are expected that do not adhere to this relationship. Each memory
block has a distinct memory block number. The NUMA topology description
provides this numbering and the linkage between memory blocks and NUMA
nodes.
See the
rationale [update this too!] for example numberings for different
architectures.
Simple Topology Discovery
Userspace topology discovery is provided via driverfs. This
is being further developed by Matthew Dobson to provide a more complete
topology discovery and reporting mechanism. There are patches
to the kernel currently available [insert link], and the code is also
a part of Andrew Morton's experimental tree [link to Andrew's tree].
It is also useful to have a C-language API providing an efficient and
simple means of getting a few critical pieces of information. The following
functions are defined here as they are necessary for supporting a minimal
Topology API.
-
int __cpu_to_node(int cpu);
Given a CPU number, return the number of the node containing that
CPU. If the architecture supports hierarchical NUMA (nodes containing
other nodes), then the lowest level node (i.e., the node most immediately
containing the cpu) is returned.
Returns a node number, or a negative errno if an error occurs.
-
int __memblk_to_node(int memblk);
Given a memory block number, return the number of the node containing
that memblk. If the architecture supports hierarchical NUMA (nodes containing
other nodes), then the lowest level node (i.e., the node most immediately
containing the memblk) is returned.
Returns a node number, or a negative errno if an error occurs.
-
int __parent_node(int node);
Given a node number, return the number of the parent node (i.e.,
the node that contains it). This is useful for hierarchical NUMA
machines which may have nested NUMA nodes.
Returns a node number, or a negative errno if an error occurs.
-
unsigned long __node_to_cpu_mask(int node);
Given a node number, return a bitmask of CPUs on that node.
This interface may soon be changed to take a pointer to a bitmask as
an additional argument, as there is motivation to allow more CPUs than
BITS_PER_LONG.
Returns a CPU bitmask. An empty bitmask may signify either
a node with no CPUs, or an invalid node number.
-
int __node_to_memblk(int node);
Given a node number, return the number of the first memory block
on that node.
Returns a memblk number, or a negative errno if an error occurs.
-
int get_curr_cpu();
Current executing CPU. This is similar to smp_processor_id(),
but will be available at user level.
Returns a CPU number.
-
int get_curr_node();
NUMA node containing the currently executing CPU.
Returns a node number.
Future Extensions
There are additional capabilities that could be implemented with
the Topology API. Some have been identified and are listed below:
- Functions to bind/restrict to a node and obtain node
binding/restriction information. This capability can be obtained
by using the CPU and memblk specific calls. [Insert link for MemBind
API]
- Virtual address or page to memory block function
(i.e., __va_to_memblk(), __page_to_memblk()).
- Adding various I/O (ie: PCI) busses to the base elements of the topology:
__pcibus_to_node(). This allows for things like Multi-Path I/O, and
intelligent bindings for I/O intensive processes.