reflex 1.0.0 : benchmark for evaluating the Linux scheduler
===========================================================

by Shailabh Nagar (nagar@us.ibm.com)
Based on sched_test_yield by Bill Hartner, bhartner@us.ibm.com



Algorithm :
-----------

The benchmark's goal is to provide a sufficient number of tunable
knobs by which to test kernel scheduler performance and introduce minimal 
overheads and dependencies. 

The program begins with the parent cloning several threads. The threads group
themselves into "active sets". Only one thread from each active set is on the 
runqueue at any given time (except when a handoff of the token is performed - 
see below). By choosing the number of threads spawned and number of active 
sets, the average number of runnable tasks can be controlled.
 
Each thread executes a "round" consisting of

	Either Path I :	
		1. compute
   		2. yield

	Or Path II :  
		1. compute
  		2. send token to successor
		3. block (by waiting for token from predecessor)	
	

The probability of following Path I or II
is decided by the input parameter w, which is the weight to be assigned to 
reschedule_idle(). Path II causes more calls to reschedule_idle() as it 
causes more sleeps/wakeups (on messages in a pipe), while Path I exercises the 
schedule() part of the scheduler.

Tokens are one byte messages and are handed off to other members of the active set through pipes. Pipes are used instead of 
semaphores/sockets etc. because the lock contention on these is lower 
(its restricted to contention amongst the readers/writers of the pipe alone - 
not a system-wide IPC lock as is the case with semaphores etc.) 

The compute phase of a round consists of a small calibrated function being 
executed repeatedly for a user-specified amount of time (in the microsecond 
range). 
The compute part is present to model realistic workloads and to control the 
number of scheduler invocations (caused by the yield/block parts of the round)
Slight randomness is introduced in the compute time for each thread 
to avoid scheduling patterns being formed.


The overall performance metric is the total number of rounds done by all 
children.  This is expressed as microseconds/round : higher the number, 
worse is the performance of the scheduler. Naturally, this metric should be 
used carefully if its the  sole arbiter of scheduler performance as it may not
adequately reflect "desired" scheduler behaviour.


Runtime parameters :
--------------------

1) w : legal values 0-100

Runtime paramter w determines the weight of "reschedule_idle" as follows :

	probability of sched_yield = (100-w)/100

The more the w, the lesser number of times sched_yield is called and hence 
lower is the number of invocations to schedule(). The relative number of calls
to reschedule_idle and schedule has to be determined experimentally for a given
number of threads by tweaking the w parameter.

2) r : legal values 0..infinity

Runtime parameter r determines the number of microseconds to be spent in the
compute phase. 

3) c : number of children/threads to be launched

Currently this has to be even (see below)


4) a : number of active children == number of active sets
	
The subset of c which are going to be on the runqueue. If an active set has 
only one member in it, that thread will never block. For a very brief interval,
there could be two threads on the runqueue from the same active set (since 
handoff of a token isn't atomic) but the path length from token handoff to 
blocking is short enough that this shouldn't be a problem.

5) t : Time (in seconds) for which each run of the test should be performed

Several runs are done till the primary metric, microseconds/round, reaches a 95% confidence level.

6) o : output format

2 is most suitable for processing output using scripts.
		

7) q : quiet mode (no extraneous printfs)


Improvements/todo's :
---------------------

The effect of parameters w,r,c,a on the metric us/round, has been
tested on a 4-way Pentium-II SMP using simple profiling of the kernel
scheduler. Lower values of w cause schedule() calls to increase and
values around 75-90 cause the number of schedule() and reschedule()
calls to be roughly the same. Higher r values increase us/round.
Values of t in the range 20-50 seconds is adequate to reach 95%
confidence in reasonably short number of runs - this might need to increase
on 8-ways.

Todos :

- verifying that the dependencies of token passing in a set do not affect the
scheduler (other than the desired ones of causing wakeups, regulating #threads 
on runqueue etc.)


===============================================================================


Sample run :

	reflex -c 80 -a 40 -t 40 -w 90 -r 1 -o 2 -q

results in output that looks like

	80,   40,   50.65

where the first two numbers identify the parameter set and the last one is 
the metric, us/round.

Automated runs (over several c values) can be done by :

	runreflex <directory_for_results> 

and modifying the "for tc in ......." line appropriately (to run over the 
set of thread counts that the test should be run). The runreflex script needs 
some work 

	
===============================================================================

Comments on assumptions/performance welcomed.So is data from running this 
benchmark on different SMP systems. 

- Shailabh Nagar
nagar@us.ibm.com
(914) 945-2851