Results can be found here:
Peak Throughput
./up
Uniprocessor results
./up/baseline
Default scheduler 275.477
Mbps
./up/o1
O1 scheduler
289.221 Mbps
./4p
4-way SMP results
./4p/affinty
IRQ & process affinity
./4p/affinity/baseline
Default scheduler 741.147
Mbps
./4p/affinity/o1
O1 scheduler
774.804 Mbps
./4p/no_affinity
No IRQ & process affinity
./4p/no_affinity/baseline
Default scheduler 596.774
Mbps
./4p/no_affinity/o1
O1 scheduler
623.589 Mbps
Under each of these results are more directories and files:
./netbench
NetBench results in Excel
./sar
Sysstat files
./proc
./samba
Samba log files
./config-`uname -r` .config from kernel
build
sysctl.conf
dropped_packets.txt Any dropped packets
during test
Hardware:
Server
------
4 x 700 MHz PIIIXeon, 1 MB L2
2.5 GB memory
4 Alteon Gbps Ethernet adapters
14 15k rpm SCSI, RAID 5
Clients (48)
-------
866 MHz PIII, 256 MB
100 Mbps Ethernet
Network
-------
2 Switches, 2 Gbps and 24 100 Mbps ports per switch
Each switch has 2 VLANS, for a total of 4 networks
Notes:
Good News: 4-way tests were conducted initially, and when using affinity, results using the O1 scheduler showed a 4.54% increase in peak throughput. The same tests were run without affinity, and results using the O1 scheduler showed a 4.49% increase in throughput. This was a little surprising, since previous kernprofs on the default scheduler showed only about 2% in schedule.
Bad News: Now, on to the uniprocessor results. Every single test with the O1 scheduler resulted in some client errors. These tests typically start with 4 clients, and add 4 more clients for each test case until 48 clients are used (that's all we have). The O1 scheduler would run without client errors until 20 clients were reached. Clients would complain about various failed attempts and doing an SMB operation.
Now, none of these problems were evident with the default scheduler. I did some poking around during one of the tests. Things like "ps" would take more than 2 minutes to start. Actually, attempting to start any new task would be delayed until the current NetBench test case would complete (each test case runs for 3 minutes). With the default scheduler, response time was excellent, within 1 second, even with a run queue length in the 40's.
I think it's safe to say these two side affects are related. Both the "ps" and "smbd" seem to have problems getting on to a run queue. The Netbench test is throttled, so all of the smbd processes get a chance to sit on the wait queue for a while. I'm not sure why occasionally one of these smbd processes has a very difficult time getting a shot on a run queue.
There is some good news in this. Results using the O1 scheduler
still showed a higher peak throughput than the default scheduler, despite
the client errors.
So, I am looking for some guidance here. Has anyone seen
this behavior with O1 on high load workloads? What's going on here?
Andrew Theurer
habanero@us.ibm.com