During initial VolanoMark loopback tests, the following bug was identified in the linux 2.4.0-test8 scheduler for SCHED_OTHER processes that call sys_sched_yield() :
On SMP, if a Process_A called sys_sched_yield(), the following would occur depending on the state of the run queue :
(1) if Process_A was the only process on the run queue and Process_A called sys_sched_yield(), then the task->counter for every process in the system would be recalculated even if Process_A still had task->counter value left. So unnecessary recalculates were being done.
(2) if there were two processes on the run queue, Process_A with priority 5 and Process_B with priority 3, and Process_A calls sys_sched_yield(), then Process_B would be dispatched and schedule_tail() would then set current->need_resched on Process_B causing schedule() to be reentered. Process_A would then be dispatched (higher priority). So an extra trip into the scheduler was occurring for this scenario. This scenario broke the linux semantics of yield where SCHED_OTHER processes that yield the CPU will always yield a time slice to any process waiting for a CPU [even lower priority processes].
These bugs were identified using the scheduler statistics patch.
A patch was submitted to fix the SCHED_YIELD bug. Working with both Ingo and Linus, the original patch was reworked for correctness and further enhanced by Ingo to remove an acquire of the runqueue_lock in schedule_tail() for yielded processes [VolanoMark does about 800,000 yields during a 10 room/100 message loopback run].
On a 8P system, message throughput increased by 72 % from 7505 msg/s
to 12,939 msg/s.
On a 4P system, message throughput increased by 50 % from 11541
msg/s to 17397 msg/s.
JVM heap sizes were 8MB/64MB.
2.4.0-test9 included the SCHED_YIELD bug fix and removal of a runqueue_lock
acquire in schedule_tail().
2.4.0-test10 included a reschedule_idle() optimization, sys_sched_yield
optimization, and bug fix in __wake_up_common().
Note that 8P scaling (relative to UP) went from largely negative
to slightly above 1. Scaling from 4P to 8P is still negative.
More work to do ....