reflex 1.0.0 : benchmark for evaluating the Linux scheduler =========================================================== by Shailabh Nagar (nagar@us.ibm.com) Based on sched_test_yield by Bill Hartner, bhartner@us.ibm.com Algorithm : ----------- The benchmark's goal is to provide a sufficient number of tunable knobs by which to test kernel scheduler performance and introduce minimal overheads and dependencies. The program begins with the parent cloning several threads. The threads group themselves into "active sets". Only one thread from each active set is on the runqueue at any given time (except when a handoff of the token is performed - see below). By choosing the number of threads spawned and number of active sets, the average number of runnable tasks can be controlled. Each thread executes a "round" consisting of Either Path I : 1. compute 2. yield Or Path II : 1. compute 2. send token to successor 3. block (by waiting for token from predecessor) The probability of following Path I or II is decided by the input parameter w, which is the weight to be assigned to reschedule_idle(). Path II causes more calls to reschedule_idle() as it causes more sleeps/wakeups (on messages in a pipe), while Path I exercises the schedule() part of the scheduler. Tokens are one byte messages and are handed off to other members of the active set through pipes. Pipes are used instead of semaphores/sockets etc. because the lock contention on these is lower (its restricted to contention amongst the readers/writers of the pipe alone - not a system-wide IPC lock as is the case with semaphores etc.) The compute phase of a round consists of a small calibrated function being executed repeatedly for a user-specified amount of time (in the microsecond range). The compute part is present to model realistic workloads and to control the number of scheduler invocations (caused by the yield/block parts of the round) Slight randomness is introduced in the compute time for each thread to avoid scheduling patterns being formed. The overall performance metric is the total number of rounds done by all children. This is expressed as microseconds/round : higher the number, worse is the performance of the scheduler. Naturally, this metric should be used carefully if its the sole arbiter of scheduler performance as it may not adequately reflect "desired" scheduler behaviour. Runtime parameters : -------------------- 1) w : legal values 0-100 Runtime paramter w determines the weight of "reschedule_idle" as follows : probability of sched_yield = (100-w)/100 The more the w, the lesser number of times sched_yield is called and hence lower is the number of invocations to schedule(). The relative number of calls to reschedule_idle and schedule has to be determined experimentally for a given number of threads by tweaking the w parameter. 2) r : legal values 0..infinity Runtime parameter r determines the number of microseconds to be spent in the compute phase. 3) c : number of children/threads to be launched Currently this has to be even (see below) 4) a : number of active children == number of active sets The subset of c which are going to be on the runqueue. If an active set has only one member in it, that thread will never block. For a very brief interval, there could be two threads on the runqueue from the same active set (since handoff of a token isn't atomic) but the path length from token handoff to blocking is short enough that this shouldn't be a problem. 5) t : Time (in seconds) for which each run of the test should be performed Several runs are done till the primary metric, microseconds/round, reaches a 95% confidence level. 6) o : output format 2 is most suitable for processing output using scripts. 7) q : quiet mode (no extraneous printfs) Improvements/todo's : --------------------- The effect of parameters w,r,c,a on the metric us/round, has been tested on a 4-way Pentium-II SMP using simple profiling of the kernel scheduler. Lower values of w cause schedule() calls to increase and values around 75-90 cause the number of schedule() and reschedule() calls to be roughly the same. Higher r values increase us/round. Values of t in the range 20-50 seconds is adequate to reach 95% confidence in reasonably short number of runs - this might need to increase on 8-ways. Todos : - verifying that the dependencies of token passing in a set do not affect the scheduler (other than the desired ones of causing wakeups, regulating #threads on runqueue etc.) =============================================================================== Sample run : reflex -c 80 -a 40 -t 40 -w 90 -r 1 -o 2 -q results in output that looks like 80, 40, 50.65 where the first two numbers identify the parameter set and the last one is the metric, us/round. Automated runs (over several c values) can be done by : runreflex and modifying the "for tc in ......." line appropriately (to run over the set of thread counts that the test should be run). The runreflex script needs some work =============================================================================== Comments on assumptions/performance welcomed.So is data from running this benchmark on different SMP systems. - Shailabh Nagar nagar@us.ibm.com (914) 945-2851