Scalable FD Management Using Read-Copy-Update --------------------------------------------- This patch provides a very good performance improvement in file descriptor management for SMP linux kernel on a 4-way machine with the expectation of even higher gains on higher end machines. The patch uses the read-copy-update mechanism for Linux, published earlier at the sourceforge site under Linux Scalablity Effort project. http://lse.sourceforge.net/locking/rcupdate.html. In SMP kernel the performance is limited due to reader-writer lock taken during various calls using files_struct. Majority being in the routine fget(). Though there is no severe contention for files->file_lock as the files_struct is a per task data structure but enough performance penalties have to be paid while even acquairing the read lock due to the bouncing lock cache line when multiple clones share the same files_struct. This was pointed out by John Hawkes in his posting to lse-tech mailing list http://marc.theaimsgroup.com/?l=lse-tech&m=98235007317770&w=2 Changes in the new version (04) --------------------------------- This version uses the basic rcu interface "call_rcu" instead of kmalloc_rcu and kfree_rcu. Changes in the new version (03) --------------------------------- This new version of the patch uses the new interfaces provided by the Read-Copy Update patches (rcu_sched-2.4.6-02.patch or rcu_qsctr-2.4.6-02.patch). Now, the fd_array and fd_set bitmaps are allocated using kmalloc_rcu() or vmalloc_rcu() and are freed using the defferred free mechanism provided by kfree_rcu() or vmalloc_rcu() interfaces as appropriate. Now, there is no need to have a special callback handeler as it was in the earlier version. Performance Measurements ------------------------ The improvement in performance while runnig "chat" benchmark (from http://lbs.sourceforge.net/) is about 30% in average throughput. For both configurations the results are compared for base kernel(2.4.2) and base kernel with files_struct_rcu-2.4.2-0.1.patch. Profiling results were also collected by using SGI's kernprof utility and that shows a considerable decrease in amount of time spent in fget(). The "chat" benchmark was run with rooms=20 and messages=500. For each configuration, the test was run for 50 times and average of throughput result in terms of messages per second was taken. 1. 4-way PIII Xeon 700 MHz with 1MB L2 Cache and 1GB RAM -------------------------------------------------------- Chat benchmark results ---------------------- Kernel version Average Throughput linux-2.4.2 (Base) 191986 linux-2.4.2 + rclock-2.4.2-0.1.patch + file_struct_rcu-2.4.2-0.1.patch 253083 Improvement = 31.8% linux-2.4.6 (Base) 242760 linux-2.4.6 + rcu_qsctr-2.4.6-02.patch + file_struct_rcu-2.4.6-0.3.patch 254093 Improvement = 4.63% kernprof results --------------- Kernel Version - linux-2.4.2 (Base) default_idle [C01071EC]: 150696 schedule [C0112EE4]: 105452 __wake_up [C0113518]: 74030 tcp_sendmsg [C020FB14]: 29201 fget [C013436C]: 16318 __generic_copy_to_user [C023A13C]: 15477 USER [C0121DF4]: 12925 tcp_recvmsg [C0210A68]: 7737 system_call [C0109150]: 7399 mcount [C023A4E4]: 5509 Kernel version - (linux-2.4.2 + rclock-2.4.2-0.1.patch + files_struct_rcu-2.4.2+0.1.patch) schedule [C0113174]: 101392 __wake_up [C01137D4]: 68182 default_idle [C01071EC]: 32833 tcp_sendmsg [C021ECE4]: 29318 __generic_copy_to_user [C024930C]: 15472 USER [C0122A00]: 12803 tcp_recvmsg [C021FC38]: 8170 system_call [C0109150]: 7636 mcount [C02496B4]: 6150 fget [C0134F58]: 5694 With this patch the routine fget() gets about 65% less hits. 2. 2-way PIII Xeon 700 MHz with 1MB L2 Cache and 1GB RAM -------------------------------------------------------- Chat benchmark results ---------------------- Kernel version Average Throughput linux-2.4.2 (Base) 209592 linux-2.4.2 + rclock-2.4.2-0.1.patch + file_struct_rcu-2.4.2-0.1.patch 222729 Improvement = 6.2% The results with 4-way are much better than 2-way, which shows the need for this patch even more necessary for higher end SMP systems. Results for 8-way will be published soon. Usage Information ----------------- The current patch is built on 2.4.10 kernel. Before applying this patch, the user has to apply the Read-Copy Update patch (rcu-2.4.10-1.patch or for linux, which can be obtained from http://lse.sourceforge.net/locking/rcupdate.html. For any questions or comments please contact Maneesh Soni