VolanoMark 2.1.2 - VolanoMark loopback test of the SCHED_YIELD patch

Bill Hartner, IBM Linux Technology Center, hartner@austin.ibm.com

During initial VolanoMark loopback tests, the following bug was identified in the linux 2.4.0-test8 scheduler for SCHED_OTHER processes that call sys_sched_yield() :

On SMP, if a Process_A called sys_sched_yield(), the following would occur depending on the state of the run queue :

(1) if Process_A was the only process on the run queue and Process_A called sys_sched_yield(), then the task->counter for every process in the system would be recalculated even if Process_A still had task->counter value left.  So unnecessary recalculates were being done.

(2) if there were two processes on the run queue, Process_A with priority 5 and Process_B with priority 3, and Process_A calls sys_sched_yield(), then Process_B would be dispatched and schedule_tail() would then set current->need_resched on Process_B causing schedule() to be reentered.  Process_A would then be dispatched (higher priority).  So an extra trip into the scheduler was occurring for this scenario.  This scenario broke the linux semantics of yield where SCHED_OTHER processes that yield the CPU will always yield a time slice to any process waiting for a CPU [even lower priority processes].

These bugs were identified using the scheduler statistics patch.

A patch was submitted to fix the SCHED_YIELD bug.  Working with both Ingo and Linus, the original patch was reworked for correctness and further enhanced by Ingo to remove an acquire of the runqueue_lock in schedule_tail() for yielded processes [VolanoMark does about 800,000 yields during a 10 room/100 message loopback run].

On a 8P system, message throughput increased by 72 % from 7505 msg/s to 12,939 msg/s.
On a 4P system, message throughput increased by 50 % from 11541 msg/s to 17397 msg/s.

JVM heap sizes were 8MB/64MB.
2.4.0-test9 included the SCHED_YIELD bug fix and removal of a runqueue_lock acquire in schedule_tail().
2.4.0-test10 included a reschedule_idle() optimization, sys_sched_yield optimization, and bug fix in __wake_up_common().

Note that 8P scaling (relative to UP) went from largely negative to slightly above 1.  Scaling from 4P to 8P is still negative.  More work to do ....



Copyright © 2001 IBM Corporation.