Global spinlock list and usage Rick Lindsley IBM Linux Technology Center nevdull@sequent.com nevdull@us.ibm.com rick@eaglet.rain.com Current for 2.4.5; Rev 0.2; last revised 6/6/01 ----------------------------REVIEWERS -- READ PLEASE--------------------------- The goal of the review process, though, is to insure accuracy while minimizing the impact to any one developer. Many of you who have volunteered have other irons in the fire, and I want to use your time efficiently. So I emphasize that I'm not asking any one person to review the entire document -- please review those locks that you believe you understand already and verify that I've interpreted them correctly. Note that the r/w locks are only half finished; although I believe there's only a few more hours work there, I really wanted to get this out NOW and avoid putting it off while working on "just one more thing." The remaining uses of the Big Kernel Lock are especially enigmatic. If anyone feels qualified to comment on any remaining, legitimate uses for the BKL, please do elaborate. My default belief, right now, is that all the current uses are either incorrect, unnecessary, or due to be so very soon (although I've not inserted this opinion into the document quite so succinctly.) One more topic I encourage you to comment on is support. My original intention was that this become a document submitted for inclusion in usr/src/Documentation, and that it then be maintained on an ongoing basis either by myself or another. (Yes I volunteer, initially.) If you have thoughts on format, enhancements, or support issues please do include that in your comments. Simply to provide a starting point, I'll take the position right now that it be updated no more frequently than quarterly and quite possibly less frequently. Updates will necessarily follow any new release by a minimum of six weeks since I can't make the document correct for any particular release until it is out and I have time to go through it (or unless the release waits for me and I'm quite happy to have Linus be the final gate on release thank you.) If you have other thoughts or desires on support, by all means -- dissuade me. I'm fairly open on this. Thank you for your efforts. I encourage you to return comments by 6/22 but will accept them both sooner and later. ----------------------------REVIEWERS -- END READ --------------------------- 1. Introduction One of the first steps in determining whether Linux scales in SMP situations or not was to draw upon existing SMP experience and expertise and investigate whether existing locking mechanisms were SMP-efficient or not. It would seem, however, that there is frequently neither external documentation, nor internal documentation (comments) describing the proper usage of most of the locks on the system, making this an inexact task at best and frustrating one at worst. 2. Scope This is an attempt to document both the existence and usage of the spinlocks in the Linux 2.4.5 kernel. Since it is borne from the need to understand how some of these global locks are properly utilized, this first effort is restricted to the type spinlock_t and rwlock_t and does not, for the most part, include semaphores, wait queues, or any other kind of synchronization mechanism. The thinking, for this initial effort, is that the spinlocks most likely to be misused are those whose scope extends outside the file of their declaration. For this reason, this document does not (currently) include static spinlocks, or spinlocks declared within a structure (which typically guard elements of the structure). Ideally, all locks and semaphores would be clearly documented as to their use, and this document may be extended in the future (or patches to comments in the code) to provide that. If, coincidentally, the usage of an excluded lock was examined as part of determining the usage of a global lock, then that information was also recorded in the last section, below, rather than discarding it. However, static locks really ought to be conveniently documented in the file to which they are static. If they are not, they are (I hope) at least less prone to misuse when they are only used within a single file. You'll find that the document itself is more of a reference document than a paper, but that's intentional at this point. Once we establish the "correct" use of the various locks in use in the system, we can then spend time discussing incorrect or unnecessary usages. Before we agree on "correct" usage, such discussions can become quite philosophical. As mentioned, this document does not attempt to cover spinlocks which may be part of global structures. Usually, these guard elements of the structures in which they are declared, although this should not be blindly assumed. (Elaborating upon this class of locks would be another nice addition to this document.) In general, the information here describes what IS, not what SHOULD BE. It is true, however, that seeing what IS in print may make some misusages obvious. When those bugs are fixed, if the result changes the information here, this document should be updated. 3. Format of Lock Descriptions These are the fields used to describe each lock: Lock The name of the spinlock as it is declared. (Sometimes a lock's use is "hidden" by being part of a macro. When this is the case, it will be listed under its declared name, and the macros which use it will be mentioned in the Macros section. Interrupts The Interrupts field describes (in part) how the lock is acquired. (Note, again, that this is NOT a commentary on how it SHOULD be acquired.) Saved: the lock is acquired by spin_lock_irqsave() or equivalent. Interrupt and flag state is saved and restored, and interrupts are blocked while holding this lock. Blocked: the lock is acquired by spin_lock_irq() or equivalent. Interrupts are blocked while holding this locked. Blocked (bh): The lock is acquired by spin_lock_bh(), read_lock_bh(), or write_lock_bh(). Bottom half (software) interrupts are blocked while holding the lock in this manner, but hardware interrupts may occur. Found most commonly in network code. Ignored: the lock is acquired by spin_lock(), or read_lock() or write_lock() for rwlock_t's. Interrupts may occur while this lock is held. Use Describes under what conditions this lock is held. As described above, this is not necessarily a commentary on how it SHOULD be used, but an observation on how it appears to be currently used (or an indication that it is not obvious how it is to be used.) Functions When provided, indicates which functions access this spin lock directly. Macros Some spinlocks are only accessible via macros. When that is the case, they will be listed here, followed by the files in which those macros are utilized. Notes If some additional comments need to be made about a lock, they will be made here. Architectures Most locks are present on all architectures because they are used with data and code that is architecture independent (or present on every architecture). Some, however, are architecture-specific. This field, if present, indicates which architecture(s) it is used on. The absence of this field indicates the lock is used on all architectures. 4. Global spin locks # # kernel/ # Lock: task_capability_lock Interrupts: Ignored Functions: sys_capget() Use: Held while modifying the capabilities of a task. Notes: When tasklist_lock() is also needed, this one should be acquired first. Lock: dma_spin_lock Interrupts: Saved Macros: claim_dma_lock() eata.c, u14-34f.c, floppy.c, xd.c, esp.c, tpqic02.c, 3c505.c, ltpc.c, dmascc.c, lance.c, ni65.c, tms380tr.c, cosa.c, z85230.c, znet.c, parport_pc.c, NCR53c406a.c, wd7000.c, dmabuf.c, sscape.c, irda_device.c; release_dma_lock() eata.c, u14-34f.c, floppy.c, xd.c, esp.c, tpqic02.c, 3c505.c, ltpc.c, dmascc.c, lance.c, ni65.c, tms380tr.c, cosa.c, z85230.c, znet.c, parport_pc.c, NCR53c406a.c, wd7000.c, dmabuf.c, sscape.c, irda_device.c; Use: Held while changing DMA parameters such as (but not limited to) enabling or disabling a particular DMA channel. Lock: lastpid_lock Interrupts: Ignored Functions: get_pid() Use: Held while last_pid is incremented and checked for rollover. Notes: When tasklist_lock is also needed, this should be acquired first. Lock: console_lock Interrupts: Ignored, Blocked, and Saved Functions: nmi_watchdog_tick(), show(), do_con_write(), console_sorting(), con_flush_chars(), con_font_op(), vcs_read(), vcs_write(), vt_ioctl(), do_syslog(), printk(), console_print(), unblank_console(), register_console(), unregister_console() Use: Held while adding or removing a character from log_buf (the syslog console buffer), as well as modifying any of these variables: log_start, logged_chars, console_loglevel. Also held while writing a character to the console. Apparently held while changing consoles as well. Lock: runqueue_lock Interrupts: Ignored, Blocked, and Saved Functions: release(), wake_up_process(), wake_up_process_synchronous(), __schedule_tail(), schedule(), setscheduler(), sys_sched_yield(), signal_wake_up() Use: Held to check or modify p->has_cpu. Held while forcing a reschedule on another cpu due to a signal being sent to a process running on that cpu. Held to test the condition of, traverse, or modify the runqueue (task_on_runqueue(), add_to_runqueue(), del_from_runqueue(), schedule()). Notes: The one case where interrupts are ignored is in signal_wake_up(), and a comment there remarks that it is expected that interrupts are already blocked by a previous lock acquisition. So for all intents and purposes, this lock is consistently held with interrupts blocked. Lock: global_bh_lock Interrupts: Ignored Functions: bh_action(), sync_timers() Use: Held while running a bottom half. (Only one bottom half may run at a time, even on SMP systems.) Lock: timerlist_lock Interrupts: Blocked and Saved Functions: add_timer(), mod_timer(), del_timer(), del_timer_sync(), run_timer_list() Notes: It appears this could be made static to timer.c. Use: Held while modifying or inspecting any of the timer_vec's (including tv1, a timer_vec_root). Lock: tqueue_lock Interrupts: Saved Macros: queue_task() run_task_queue() Use: Held while adding a bottom half "software interrupt" handler to ANY task queue. # arch/alpha Lock: i8259_irq_lock Interrupts: Ignored Functions: i8259a_enable_irq(),i8259a_disable_irq(), i8259a_mask_and_ack_ireq() Use: Held when writing to the IO ports for the i8259a. Notes: It appears this could be made static to irq_i8259.c. Lock: global_irq_lock Interrupts: Ignored Functions: wait_on_irq(), get_irqlock(), release_irqlock(), irq_enter() Notes: This is declared differently on different architectures. On an alpha and a Mips-64, it is a spinlock. On a i386 ia-64, and ppc it is simply an unsigned int. On an s390, it is an atomic_t, which is a signed int. The purpose of the lock, however, seems identical across all architectures since it is used in similar manners across all architectures. Use: It is difficult to tell. It appears to not guard any actual data, but instead seems to provide a means to serialize cpus when a global cli is desired. get_irqlock is only called in __global_cli(), and thus this seems to serve to serialize global cli requests. Architectures: alpha Lock: srm_irq_lock Interrupts: Ignored Functions: srm_enabled_irq(), srm_enabled_irq() Use: Held while writing to hardware registers. Architectures: alpha Lock: dp264_irq_lock Interrupts: Ignored Functions: dp264_enabled_irq(), dp264_enabled_irq(), clipper_enable_irq(), clipper_disable_irq(), dp264_set_affinity(), clipper_set_affinity() Use: Held while writing to hardware registers. Architectures: alpha Lock: rawhide_irq_lock Interrupts: Ignored Functions: rawhide_enabled_irq(), rawhide_enabled_irq() Use: Held while writing to hardware registers. Architectures: alpha Lock: sable_irq_lock Interrupts: Ignored Functions: sable_enabled_irq(), sable_enabled_irq(), sable_mask_and_ack_irq() Use: Held while writing to hardware registers. Architectures: alpha Lock: titan_irq_lock Interrupts: Ignored Functions: privateer_enabled_irq(), privateer_enabled_irq(), privateer_set_affinity() Use: Held while writing to hardware registers. Architectures: alpha Lock: wildfire_irq_lock Interrupts: Ignored Functions: wildfire_enabled_irq(), wildfire_enabled_irq(), wildfire_mask_and_ack_irq() Use: Held while writing to hardware registers. Architectures: alpha Lock: rtc_lock Interrupts: Ignored, Blocked, and Saved Functions: set_rtc_mmss(), nvram_read_byte(), nvram_write_byte(), nvram_check_checksum(), nvram_check_checksum(), nvram_read(), nvram_write(), nvram_ioctl(), nvram_read_proc(), pc_proc_infos(), rtc_interrupt(), rtc_read(), rtc_ioctl(), rtc_open(), rtc_release(), rtc_poll(), rtc_init(), rtc_exit(), rtc_dropped_irq(), rtc_proc_output(), rtc_is_updating(), get_rtc_time(), get_rtc_alm_time(), mask_rtc_irq_bit(), set_rtc_irq_bit() Use: Usage for all routines but the nvram_* routines (including set_rtc_mmss()) is explained nicely in rtc.c: A very tiny interrupt handler. It runs with SA_INTERRUPT set, but there is possibility of conflicting with the set_rtc_mmss() call (the rtc irq and the timer irq can easily run at the same time in two different CPUs). So we need to serializes accesses to the chip with the rtc_lock spinlock that each architecture should implement in the timer code. By contrast, it's not clear why it is acquired in the nvram_* routines. The nvram_* routines are the only ones that do a save/restore on this lock. # arch/arm Lock: gpio_lock Interrupts: Saved Functions: wb977_init_gpio(), cpld_init(), nw_hw_init(), netwinder_leds_event(), netwinder_lock(), netwinder_unlock(), netwinder_set_fan(), kick_open(), vnc_update_spkr_mute() Use: According to netwinder-hw.h, "This is a lock for accessing ports GP1_IO_BASE and GP2_IO_BASE." Lock: die_lock Interrupts: Blocked Functions: die() Use: Held while processing a trap to insure serialization for the die() routine (only one process is trying to die at a time!). (Not sure why this is global.) # arch/i386 Lock: i8259A_lock Interrupts: Saved, Ignored Functions: disable_8259A_irq(), enable_8259A_irq(), 8259A_irq_pending(), mask_anc_ack_8259A(), init_8259A(), print_PIC(), do_slow_gettimeoffset(), do_timer_interrupt() Use: Held while writing to or reading from some device registers to guarantee that only one process is doing this at a time. Architectures: i386 Lock: i8253_lock Interrupts: Saved, Ignored Functions: get_8254_timer_count(), do_slow_get_timeoffset(), timer_interrupt() Use: Held while writing to or reading from some device registers to guarantee that only one process is doing this at a time. Architectures: i386 # arch/ia64 Lock: pci_lock Interrupts: Saved Macros: PCI_OP() pci.c; Use: As described in pci.c, "This interrupt-safe spinlock protects all accesses to PCI configuration space." Lock: sal_lock Interrupts: Ignored Macros: SAL_CALL() sal.h; Use: Not clear what this is used for. Appears to be IA64 related. Architectures: ia64 Lock: ptcg_lock Interrupts: Ignored Functions: flush_tlb_no_ptcg(), flush_tlb_range() Use: Not clear what this is guarding. Notes: The comment at the top of flush_tlb_no_ptcg() appears to be incorrect. This lock probably could be made static. Architectures: ia64 Lock: hcl_spinlock Interrupts: Never used Notes: initialized, but never used to guard anything. Architectures: ia64 Lock: cpuprom_spinlock Interrupts: Ignored Macros: PROM_LOCK() prominfo_add(), prominfo_get(), prominfo_nodeget() PROM_UNLOCK() prominfo_add(), prominfo_get(), prominfo_nodeget() Use: Seems to guard cpuprom_head. Notes: This lock probably could be made static. Architectures: ia64 Lock: hub_mask_lock Interrupts: Ignored Functions: mlreset(), per_hub_init() Use: Not clear how this is used. Notes: There are two instances of this lock, each called by a routine named the same (per_hub_init) but defined once each in two different files. Having one instance of this lock be static and another global is confusing at best and dangerous at worst. It appears that the two locks are intended for distinct purposes, so both should be made static and one renamed. Architectures: ia64, mips64 Lock: intr_dev_targ_map_lock Interrupts: Ignored Functions: init_platform_nodepda(), do_intr_reserve_level() Use: Apparently guards intr_dev_targ_map. Notes: Could be made static. Architectures: ia64 Lock: xbow_perf_lock Interrupts: Ignored Functions: xbow_enable_perf_counter(), xbow_update_perf_counters(), xbow_attach() Use: Seems to guard xbow_perf. Notes: Could be made static. Architectures: ia64 Lock: xbow_bw_alloc_lock Interrupts: Ignored Functions: xbow_prio_bw_alloc(), xbow_attach() Use: Not clear what it guards. Notes: Could be made static. Architectures: ia64 Lock: int_test_spin Interrupts: Saved, Blocked Functions: xbow_prio_bw_alloc(), xbow_attach() Use: Not clear what it guards. Notes: Grabbed, but never released. Architectures: ia64 Lock: efivars_lock Interrupts: Ignored Functions: efivar_read(), efivar_write(), efivar_init(), efivar_exit() Use: Not clear what it guards. Notes: Could be made static. Architectures: ia64 # arch/mips64 Lock: nmi_lock Interrupts: Ignored Macros: enter_panic_mode() ip27-nmi.c; Use: Seems to be serializing NMI processing on a mips64 architecture, allowing only one cpu to process a NMI at a time. This appears to be a prelude to purposefully crashing the machine. Architectures: mips64 # # arch/parisc # Lock: __atomic_hash Interrupts: Saved Macros: ATOMIC_HASH() Use: parisc doesn't have atomic operations, so they need very clever tricks to make atomic.h work. One is having a hashed list of spinlocks, and this lock guards that list. Notes: When CONFIG_SMP is not defined, this is an array of spinlocks of length 1. When it IS defined, the array is length ... one. Looks wrong to me somehow, but ... Architectures: parisc Lock: __atomic_lock Interrupts: Saved Functions: __xchg Use: Seems it might be used to guarantee the atomicity of a 64 byte exchange operation. See above about lack of atomic operations. Architectures: parisc # arch/ppc Lock: rtas_lock Interrupts: Saved Functions: call_rtas() Use: This is acquired before calling enter_rtas(). Notes: I can't find the definition of enter_rtas() anywhere. Architectures: ppc Lock: i8259_lock Interrupts: Saved, Ignored Functions: i8259_irq(), i8259_mask_and_ack_irq(), i8259_mask_irq(), i8259_unmask_irq(), i8259_init() Use: Held when writing to the IO ports for the i8259. Notes: It appears this could be made static to i8259.c. Architectures: ppc Lock: pmac_pic_lock Interrupts: Saved Functions: pmac_mask_and_ack_irq(), pmac_set_irq_mask() Use: Not clear. Notes: It appears this could be made static to pmac-pic.c. Architectures: ppc Lock: oops_lock Interrupts: Blocked Functions: die() Use: Looks like it intends to serialize the die() routine. Notes: Every other architecture seems to call this the die_lock. For consistency, this lock should probably change its name. Architectures: ppc # arch/s390x # arch/s390 # arch/sh # arch/m68k Lock: semaphore_wake_lock Interrupts: Saved Functions: wake_one_more(), wake_non_zero(), wake_non_zero_interruptible(), wake_non_zero_trylock() Use: This appears to be a global lock that guards all semaphores. When fields of *any* semaphore is examined or modified, this lock is held. Notes: It's not clear why this is not a member of the individual semaphore structures rather than being made global. It also appears that the macros using this lock in semaphore-helper.h are not included anywhere. Architectures: s390, sh, arm, m68k, parisc, s390x # arch/sparc Lock: sun4d_imsk_lock Interrupts: Saved Functions: sun4d_disable_irq(), sun4d_enable_irq(), smp4d_callin() Use: Acquired to serially mask interrupts on other processors. Architectures: sparc Lock: srmmu_nocache_spinlock Interrupts: Ignored Functions: __srmmu_get_nocache(), srmmu_free_nocache() Use: Apparently held while looking for uncached memory on a Sparc processor. Architectures: sparc Lock: prom_lock Interrupts: Saved Functions: prom_nbgetchar(), prom_nbputchar(), prom_query_input_device(), prom_query_output_device(), prom_mapio(), prom_unmapio(), prom_devopen(), prom_devclose(), prom_seek(), prom_seek(), prom_reboot(), prom_feval(), prom_cmdline(), prom_halt(), prom_startcpu(), prom_stopcpu(), prom_idlecpu(), prom_restartcpu(), prom_putsegment(), __prom_getchild(), __prom_getsibling(), prom_getproplen(), prom_getproperty(), __prom_nextprop(), prom_setprop(), prom_inst2pkg() Use: This lock is apparently used to serialize nearly every operation on the Sun prom, guarding against concurrent operations. Architectures: sparc # arch/sparc64 Lock: pci_controller_lock Interrupts: Saved Functions: pci_scan_each_controller_bus(), psycho_init() Use: This is used to protect the modification or use of pci_controller_root. Architectures: sparc64 Lock: pci_poke_lock Interrupts: Saved Functions: pci_config_read8(), pci_config_read16(), pci_config_read32(), pci_config_write8(), pci_config_write16(), pci_config_write32() Use: Held while doing any sort of PCI config operation. Architectures: sparc64 Lock: ctx_alloc_lock Interrupts: Saved Functions: get_new_mmu_context(), destroy_context() Use: Held while changing mmu context. Architectures: sparc64 Lock: prom_entry_lock Interrupts: Ignored Functions: prom_get_lock(), prom_release_lock() Use: Held while accessing or modifying the contents of the prom. Architectures: sparc64 Lock: timod_pagelock Interrupts: Ignored Functions: getpage() Use: Held while accessing or modifying the static variable 'page' in timod.c. Notes: Could be made static. Architectures: sparc64 Lock: mostek_lock Interrupts: Blocked, Saved Functions: kick_start_clock(), has_low_battery(), sbus_time_init(), set_rtc_mmss(), set_system_time(), get_rtc_time(), rtc_open() Use: Apparently held while modifying clock registers (but not while reading them.) Architectures: sparc64 # # drivers/block # Lock: io_request_lock Interrupts: Ignored, Blocked, Saved Functions: Use: Held while accessing or modifying any I/O request list in the system or in any driver. Also held while executing some SCSI command functions in ibmmca.c. Notes: Its use in ibmmca.c is questionable. Lock: pi_spinlock Interrupts: Ignored, Blocked, Saved Functions: pi_wake_up(), pi_do_claimed() Use: Appears to be held while accessing or modifying pi_adapter fields. Notes: Could be made static. # # drivers/char # Lock: kbd_controller_lock Interrupts: Saved Functions: keyboard_interrup(), kbd_write_command_w(), kbd_write_output_w(), kbd_write_cmd(), detect_auxiliary_port(), aux_write_dev(), aux_write_ack(), get_from_queue() Use: Appears to be held while accessing or modifying pi_adapter fields. Notes: Several different drivers use locks with this name. All but one have them declared static. qtronix.c has this declared global. That one should also probably be static, or if not, at least renamed to avoid confusion with the other drivers. # # drivers/ieee1394 # Lock: templates_lock Interrupts: Ignored Functions: hl_all_hosts(), hpsb_inc_host_usage(), add_template(), remove_template() Use: Held while accessing or modifying hpsb_host_template (or anything in the linked list represented by hpsb_host_template). Lock: host_info_lock Interrupts: Saved, Blocked Functions: fcp_request(), state_initialized(), handle_iso_listen(), dev_release() Use: Held while accessing or modifying host_info_list (or anything in the linked list represented by host_info_list). # # drivers/isdn # Lock: eicon_lock Interrupts: Saved Functions: idi_handle_ind(), idi_handle_ack_ok(), idi_handle_ack(), idi_send_data(), eicon_io_rcv_dispatch(), eicon_io_transmit(), eicon_command(), if_readstatus(), eicon_putstatus(), eicon_init() Use: Not clear what it is guarding. # # drivers/parport # Lock: parportlist_lock Interrupts: Ignored, Blocked Functions: parport_register_driver(), parport_unregister_driver(), parport_register_port(), parport_unregister_port(), parport_find_number(), parport_find_base() Use: Held while accessing or modifying portlist. Notes: Could be made static. Lock: driverlist_lock Interrupts: Ignored Functions: attach_driver_chain(), detach_driver_chain(), parport_register_driver(), parport_unregister_driver() Use: Held while accessing or modifying driver_chain (or anything in the linked list represented by driver_chain). Notes: Could be made static. # # drivers/s390 # Lock: tub3270_con_bcblock Interrupts: Saved Functions: tub3270_con_write(), tub3270_con_copy() Use: Held while accessing or modifying tub3270_con_bcb. Notes: Could be made static. Not all fields in tub3270_con_bcb are protected by the lock -- should they be? Lock: lock Interrupts: Saved Functions: iucv_sever(), add_pathid(), iucv_connect(), iucv_accept(), top_half_interrupt(), bottom_half_interrupt(), do_int(), iucv_register_program(), iucv_unregister() Use: Apparently used to serialize the above mentioned routines. Notes: Terrible name for a global variable. Could be made static, but still should be renamed. # # drivers/scsi # Lock: sym53c8xx_lock Interrupts: Saved Macros: NCR_LOCK_DRIVER() sym53c8xx.c, sym53c8xx_comm.h; NCR_UNLOCK_DRIVER() sym53c8xx.c, sym53c8xx_comm.h; Use: Held while doing allocation from a device-specific pool. Lock: DRIVER_SMP_LOCK Interrupts: Saved Macros: NCR_LOCK_DRIVER() sym53c8xx_comm.h; NCR_UNLOCK_DRIVER() sym53c8xx_comm.h; Use: Held while doing allocation from a device-specific pool. Notes: This apparently conflicts with the sym53c8xx_lock, above. In fact, it appears that NCR_LOCK_DRIVER is defined using one or the other (or possibly both?) There are sufficient ifdefs that it is confusing to tell. Lock: dc390_drvlock Interrupts: Saved (but see Notes) Macros: DC390_LOCK_DRV() tmscsim.c, scsiiom.c; DC390_UNLOCK_DRV() tmscsim.c, scsiiom.c; DC390_LOCK_DRV_NI() tmscsim.c, scsiiom.c; DC390_UNLOCK_DRV_NI() tmscsim.c, scsiiom.c Use: Unable to tell (see Notes). Notes: This lock does different things (and is implemented different ways) depending on the value of USE_SPINLOCKS, which depends on the Linux version being compiled. It appears that for 2.4.5, this is a global spinlock which is generally used with spinlock_irq_saved(). Older versions may still use global sti/cli constructs. Because of the generous and innovative uses of #defines, it's not possible to easily tell what this lock guards. # # drivers/sound # Lock: sound_loader_lock Interrupts: Ignored Functions: sound_insert_unit(), sound_remove_unit(), soundcore_open() Use: Held while adding or deleting structures to or from any chain in the 16 member chains variable. Notes: Could be made static. # # drivers/video # Lock: hga_reg_lock Interrupts: Saved Functions: hga_clear_screen(), hga_txt_mode(), hga_gfx_mode(), hga_pan(), hga_blank() Use: Held while examining or modifying hga_mode, or when updating the devices registers. Notes: Could be made static. Lock: matroxfb_spinlock Interrupts: Never used Functions: Never used Use: Never used Notes: This spinlock is declared and exported, but apparently never used. It can probably be removed. # # fs/ # Lock: dcache_lock Interrupts: Ignored Functions: is_tree_busy(), autofs4_expire(), try_to_fill_dentry(), autofs4_root_revalidate(), autofs4_dir_rmdir(), coda_flag_children(), dentry_iput(), dput(), d_invalidate(), d_find_alias(), d_prune_aliases(), prune_one_dentry(), prune_dcache(), shrink_dcache_sb(), have_submounts(), select_parent(), d_alloc(), d_instantiate(), d_lookup(), d_validate(), d_delete(), d_rehash(), d_move(), sys_getcwd(), d_genocide(), free_dentries(), __follow_up(), __follow_down(), follow_dotdot(), ncp_dget_fpos(), ncp_renew_dentries(), ncp_invalidate_dircache_entries(), nfs_free_dentries(), nfsd_iget(), d_splice(), nfsd_findparent(), splice(), proc_permission(), ramfs_empty(), dcache_readdir(), add_vfsmnt(), move_vfsmnt(), remove_vfsmnt(), kern_umount(), do_umount(), sys_pivot_root(), umsdos_d_path(), vfat_revalidate(), d_drop(), d_path() Use: Held while accessing or modifying a struct dentry that may be in the dcache (and thus subject to change asynchronously.) Also used to serialize access to the vfs mount list (vfsmntlist). Lock: fat_inode_lock Interrupts: Ignored Functions: fat_attach(), fat_detach(), fat_iget(), fat_clear_inode(), fat_write_inode() Use: Held while accessing or modifying the inode cache associated with the FAT file system. Lock: modlist_lock Interrupts: Saved Functions: sys_create_module(), free_module(), search_exception_table() Use: Held while accessing or modifying the module_list. Notes: Currently also seems to require the BKL during creation -- is that really necessary? Lock: mmlist_lock Interrupts: Ignored Functions: exec_mmap(), mmput(), copy_mm(), swap_out() Use: Held while accessing or modifying the mmlist (primarily, init_mm). Lock: files_lock Interrupts: Ignored Macros: file_list_lock() sysrq.c, tty_io.c, dquot.c, file-table.c, generic.c; file_list_unlock() sysrq.c, tty_io.c, dquot.c, file-table.c, generic.c; Use: Held while accessing or modifying any struct file list. Lock: entry_lock Interrupts: Ignored Functions: lock_entry(), unlock_entry(), hfs_cat_mark_dirty(), get_new_entry(), get_entry(), hfs_cat_put(), hfs_cat_invalidate(), hfs_cat_commit(), hfs_cat_move() Use: Held while accessing or modifying the fields state or hash in any hfs_cat_entry or the field rename_lock in an hfs_mdb structure. The basic function appears to be to hold off changes in the entry list, although that's neither clearly documented nor clear from the code. Notes: This lock is only used in catalog.c, and thus could be static. It is specific to the HFS file system. Lock: pagecache_lock Interrupts: Ignored Functions: remove_inode_page(), __set_page_dirty(), invalidate_inode_pages(), truncate_list_pages(), truncate_inode_pages(), filemap_fdatasync(), filemap_fdatawait(), add_to_page_cache_locked(), add_to_page_cache(), add_to_page_cache_unique(), page_cache_read(), __find_get_page(), drop_behind(), do_generic_file_read(), mincore_page(), __find_get_swapcache_page(), __find_lock_page(), delete_from_swap_cache_nolock(), reclaim_page() Use: Held while examining or modifying the count or state of any page that may already be in use. Lock: inode_lock Interrupts: Ignored Functions: __mark_inode_dirty(), sync_one(), sync_inodes(), write_inode_now(), invalidate_inodes(), prune_icache(), get_empty_inode(), get_new_inode(), iunique(), igrab(), iget4(), insert_inode_hash(), remove_inode_hash(), iput(), remove_dquot_ref() Use: As described in inode.c, this lock is held during inode list manipulations. These lists include the s_dirty list of a superblock, the inode_unused list, the inode_in_use list, and the inode_hashtable list. In addition, it is held to change the i_state of an inode while the inode is in use. Lock: pagemap_lru_lock Interrupts: Ignored Macros: lru_cache_add() fs/buffer.c, mm/filemap.c; lru_cache_del() mm/swap_state.c, mm/filemap.c; Functions: invalidate_inode_pages(), shrink_mmap() Use: Held when examining or modifying the lru_cache. Notes: When the page_cache lock is also needed, it should be acquired first. Lock: swaplock Interrupts: Ignored Macros: swap_list_lock() stram.c, swapfile.c; swap_list_unlock() stram.c, swapfile.c; Use: Held when examining or modifying the swap_list. Notes: When an sdev_lock (swap_info_struct) is also needed, the swaplock should be acquired first. # # net/ # Lock: atm_dev_lock Interrupts: Ignored Functions: atm_release_vcc_sk(), atm_do_connect(), atm_connect_vcc(), atm_ioctl(), free_atm_dev(), atm_dev_register(), sigd_close() Use: Guards ATM device lists. Lock: netdev_fc_lock Interrupts: Ignored, Saved Functions: netdev_register_fc(), netdev_unregister_fc(), netdev_wakeup() Use: Held when examining or modifying netdev_fc_xoff, netdev_fc_mask, or netdev_fc_slots. Notes: Probably could be made static. Lock: inet_peer_idlock Interrupts: Blocked (bh) Functions: inet_getid() Use: Held when examining or modifying the ip_id_count on any struct inet_peer that is already in use. (It is not held on struct inet_peers that are being created and not yet inserted into the pool; see inet_getpeer()). Notes: Probably could be made static. The lock is not held in rt_fill_info() when the field is referenced but not modified; should it be? Lock: inet_peer_unused_lock Interrupts: Blocked (bh) Functions: inet_putpeer(), unlink_from_unused(), cleanup_once() Use: Held when examining or modifying inet_peer_unused_head or inet_peer_unused_tailp. Notes: Probably could be made static. Lock: rpc_credcache_lock Interrupts: Ignored Functions: rpcauth_free_credcache(), rpcauth_gc_credcache(), rpcauth_insert_credcache(), rpcauth_lookup_credcache(), rpcauth_remove_credcache() Use: Held while examining or modifying the credcache of any rpc_auth array in the system. Notes: Might be better to have one per rpc_auth if this is acquired often. Lock: pmap_lock Interrupts: Ignored Functions: rpc_getport(), pmap_getport_done() Use: Despite its name, pmap_lock seems to guard cl_binding in an rpc_client structure. With this flag set, it is apparently assumed that portmap assigning can proceed. This global spinlock protects all rpc_client structures on the system. Notes: Might be better to have one lock per rpc_client. Lock: rpc_queue_lock Interrupts: Blocked (bh) Functions: rpc_run_timer(), rpc_add_timer(), rpc_add_wait_queue(), rpc_remove_wait_queue(), rpc_sleep_on(), rpc_sleep_locked(), rpc_wake_up_task(), rpc_wake_up_next(), rpc_wake_up(), rpc_wake_up_status(), rpc_unlock_task(), __rpc_execute(), __rpc_schedule(), rpc_release_task(), rpc_child_exit(), rpc_run_child(), xprt_lookup_rqst(), xprt_append_pending(), xprt_remove_pending(), xprt_remove_pending_next() Use: Comment suggests: Spinlock for wait queues. Access to the latter also has to be interrupt-safe in order to allow timers to wake up sleeping tasks. However, it appears to be held while examining or modifying other fields in the rpc_task: tk_timeout_fn, tk_running, tk_wakeup, tk_timer, tk_rpcwait, tk_callback, tk_flags, tk_active, tk_lock, tk_status, tk_timeout as well as the global variable childq. Only tk_timeout_fn, tk_wakeup, tk_lock, and childq seem to uniformly adhere to this (lock is always held.) Notes: Appears to guard more than the comments would suggest. Lock: rpc_sched_lock Interrupts: Blocked (bh) Functions: rpc_init_task(), rpc_release_task(), rpc_killall_tasks(), rpc_show_tasks() Use: Comment unhelpfully suggests: Spinlock for other critical sections of code. However, it appears to be held primarily while modifying the global variable all_tasks or the list that it heads up. Notes: Could be made static. Lock: xprt_sock_lock Functions: xprt_adjust_cwnd(), xprt_reconnect(), tcp_state_change(), tcp_write_space(), udp_write_space(), do_xprt_transmit(), xprt_reserve() Use: Comment unhelpfully suggests: Spinlock for critical sections of code. Not at all clear what it truly guards, although it does cause serialization in several apparently disjoint areas. Notes: Could be made static. Lock: xprt_lock Functions: xprt_reconnect(), xprt_reconn_status(), xprt_down_transmit(), xprt_up_transmit() Use: Comment unhelpfully suggests: Spinlock for critical sections of code. Not clear what it truly guards. It seems to reliably guard the struct rpc_xprt field "connecting", but it also is used in some other contexts where it is not clear what it is guarding. Notes: Could be made static. If it is truly used globally to guard fields in individual structures, then it is probably being misused. 5. The Big Kernel Lock # # The Big Kernel Lock # # In general, it is noted that the BKL seems to be held during many # release/close routines, for the duration of the routine. It would seem # this might be done so as to determine if data structures may be safely # release, cleared, or modified in some way upon the last close. If so, # however, it should be noted that those same counters are typically # NOT locked during open, and thus it would seem race conditions are # still possible. Below, if a file seems to employ the lock for this # purpose, it is noted as: # # Held during release function to no obvious purpose. # # In addition, the BKL is held in some places for some obvious operations # without being at all obvious what about that operation requires # locking. So, for example, I note here in several places "held during # mmap" but cannot say, at this point, why (or if) it MUST be held # during an mmap. These are the sorts of questions this document should # be able to answer, when completed. # Lock: kernel_flag Interrupts: Ignored Macros: kernel_locked() fs/dcache.c; release_kernel_lock() kernel/sched.c reacquire_kernel_lock() kernel/sched.c lock_kernel() unlock_kernel() Use: This was, originally, the only lock used in the Linux kernel. Over the years, finer grained locking has evolved but there are still many bits of code which use this lock as their primary lock for certain data or code. Because of its widespread use as the sole lock for so long, it is impossible to generalize its usage. It appears to be used in over 400 different places in the source code. Some specific usages are outlined here. Most usages are through the lock_kernel()/unlock_kernel() interfaces, which utilizes the spin_lock() call. alpha/kernel/osf_sys.c Held while accessing fields in current (the currently running process). Held while doing a ufs, cdfs, or procfs mount. Held while adding a swap device. Held while doing a shared memory attach. Held while doing a osf_proplist_syscall. Held while doing a create_module(). alpha/kernel/ptrace.c Held while doing a ptrace. alpha/kernel/signal.c Held while modifying process flags. (grabbed but not clearly released.) alpha/kernel/traps.c Held while doing a printk. (grabbed but not clearly released.) arm/kernel/ptrace.c Held while doing a ptrace. cris/kernel/ptrace.c Held while doing a ptrace. cris/kernel/signal.c Held while modifying process flags. (grabbed but not clearly released.) cris/kernel/sys_cris.c Held while creating a pipe, and while doing an mmap. i386/kernel/apm.c Held during release function to no obvious purpose. i386/kernel/mtrr.c Held while deleting a memory type region. (It is *not* held during this operation in other places in the same file. i386/kernel/ptrace.c Held while doing a ptrace. ia64/ia32/sys_ia32.c Held while doing a 32-bit ptrace. Held while doing a 32-bit recvmsg. Held while doing a 32-bit query_module. Held while doing a sys_iopl(). Held while doing a 32-bit sysctl(). ia64/kernel/ptrace.c Held while doing a ptrace. m68k/atari/joystick.c Held during release function to no obvious purpose. m68k/bvme6000/rtc.c Held during release function to no obvious purpose. m68k/kernel/process.c Held while doing an execve. m68k/kernel/ptrace.c Held while doing a ptrace. Held while doing a trace. m68k/kernel/sys_m68k.d Held while flushing the cpu cache. mips/kernel/ptrace.c Held while doing a ptrace. mips/kernel/sysirix.c Held while doing an llseek. (?) mips64/kernel/ptrace.c Held while doing a ptrace. Held while doing a 32-bit ptrace. mips64/sgi-ip27/ip27-rtc.c Held during release function to no obvious purpose. parisc/hpux/fs.c Held while doing a 64-bit stat, a 64-bit fstat, or a 64-bit lstat. Held while getting directory entries. parisc/hpux/sys_hpux.c Held during a sys_ustat() call. Held while creating a pipe. parisc/kernel/ptrace.c Held while doing a ptrace(). parisc/kernel/signal.c Held while modifying process flags. (grabbed but not clearly released.) parisc/kernel/sys_parisc.c Held while creating a pipe(), and while doing an mmap(). ppc/kernel/ppc-stub.c Held for reasons that are not clear. ppc/kernel/process.c Held while doing a clone. ppc/kernel/ptrace.c Held while doing a ptrace. s390/kernel/process.c Held while doing a fork (or clone). s390/kernel/ptrace.c Held while doing a ptrace. s390/kernel/signal.c Held while calling do_exit(SIGSEGV). s390/kernel/traps.c Held while processing some traps. s390x/kernel/linux32.c Held while doing a 32-bit mount, a 32-bit module query, or a 32-bit sysctl. s390x/kernel/process.c Held while doing a fork (or clone). s390x/kernel/ptrace.c Held while doing a ptrace. s390x/kernel/signal.c Held while calling do_exit(SIGSEGV). s390x/kernel/traps.c Held while processing some traps. sh/kernel/ptrace.c Held while doing a ptrace. sparc/kernel/ptrace.c Held while doing a ptrace. sparc/kernel/sparc-stub.c Held for reasons that are not clear. sparc/kernel/sys_sparc.c Held during breakpoint processing. sparc/kernel/sys_sunos.c Held when doing a brk(). Held while doing printk's about unsupported system calls. Held while doing a mount(). Held while doing a killpg(). Held while determining the hostid. Held while inspecting the stack of the current process(?) sparc/kernel/unaligned.c Held whil processing a trap. sparc/kernel/windows.c Held while "trying to push the windows in a threads window buffer to the user stack." sparc64/kernel/pci.c Held while writing to the PCI. sparc64/kernel/ptrace.c Held while doing a ptrace(). sparc64/kernel/sys_sparc32.c Held while doing a mount. Held while checking modules. Held while doing a sysctl. sparc64/solaris/fs.c Held while doing calling report_statvfs64(). Held while setting resource limits. sparc64/solaris/ioctl.c Held while executing an ioctl. sparc64/solaris/socket.c Held while doing a sockfd_lookup(). sparc64/solaris/socksys.c Held while freeing a socket structure. Held during release function to no obvious purpose. sparc64/solaris/timod.c Held while doing a getmsg(). Held while doing a putmsg(). drivers/block/acsi_slm.c Held during release function to no obvious purpose. drivers/block/paride/pg.c Not sure what it is protecting. It may be trying to prevent a double free upon release, but if so, it's leaving open other race conditions elsewhere in the code. Release? drivers/block/paride/pt.c Not sure what it is protecting. It may be trying to prevent a double free upon release, but if so, it's leaving open other race conditions elsewhere in the code. Release? drivers/block/rd.c Not sure what it is protecting. It may be initrd_users, but in that case it is not protected everywhere. Release? drivers/char/acquirewdt.c Not sure what it is protecting. acq_lock would appear drivers/char/advantechwdt.c Held during release function to no obvious purpose. drivers/char/agp/agpgart_fe.c Not sure what it is protecting. Seems to always be paired with AGP_LOCK/UNLOCK. Release? drivers/char/busmouse.c Not sure what it is protecting. It appears to be protecting mse->active, but that is not protected anywhere else (e.g., open). Release? drivers/char/drm/ffb_drv.c Held during release function to no obvious purpose. drivers/char/drm/gamma_drv.c Held during release function to no obvious purpose. drivers/char/drm/i810_dma.c Not sure what it is protecting. May be protecting its own drm_file_t structures. drivers/char/drm/mga_drv.c Held during release function to no obvious purpose. drivers/char/drm/r128_drv.c Held during release function to no obvious purpose. drivers/char/drm/radeon_drv.c Held during release function to no obvious purpose. drivers/char/drm/tdfx_drv.c Held during release function to no obvious purpose. drivers/char/drm/vm.c Not sure what it is protecting. drivers/char/dsp56k.c Held while modifying device-specific variable (in_use) during release. No protection during open, however. drivers/char/dtlk.c Held while deleting device-specific timeout. Not held at other times, however. Release? drivers/char/ftape/zftape/zftape-init.c Held while modifying current->blocked (during release), and held while doing an mmap(). drivers/char/i810_rng.c Held during release function to no obvious purpose. drivers/char/lp.c Held during release function to no obvious purpose. drivers/char/mixcomwd.c Held during release function to no obvious purpose. drivers/char/nvram.c Held during release function to no obvious purpose. drivers/char/pc110pad.c Held during release function to no obvious purpose. drivers/char/pc_keyb.c Held during release function to no obvious purpose. drivers/char/pcwd.c Held during release function to no obvious purpose. drivers/char/ppdev.c Held during release function to no obvious purpose. drivers/char/qpmouse.c Held during release function to no obvious purpose. drivers/char/qtronix.c Held during release function to no obvious purpose. drivers/char/raw.c Held during release function to no obvious purpose. drivers/char/sbc60xxwdt.c Held during release function to no obvious purpose. drivers/char/softdog.c Held during release function to no obvious purpose. drivers/char/stallion.c Held while examining a global struct tty_struct. drivers/char/sysrq.c Held while examining emergency_sync_scheduled. drivers/char/tpqic02.c Held during release function to no obvious purpose. drivers/char/tty_io.c Held during do_tty_hangup() -- code suggests it is protecting a data structure I can't find. A comment here screams "FIXME! What are the locking issues here?" which suggests the reasons for grabbing this lock may not be well understood. Held while calling tty->ldisc.read(). Held while issuing write(). Held during release. drivers/char/wdt.c Held during release function to no obvious purpose. drivers/char/wdt285.c Held during release function to no obvious purpose. drivers/char/wdt977.c Held during release function to no obvious purpose. drivers/char/wdt_pci.c Held during release function to no obvious purpose. drivers/i2c/i2c-dev.c Held during release function to no obvious purpose. drivers/i2o/i2o_block.c Held while calling exit_files(). drivers/i2o/i2o_config.c Held during release function to no obvious purpose. drivers/i2o/i2o_core.c Held while calling exit_files(). drivers/ide/ide-tape.c Held during release function to no obvious purpose. drivers/ieee1394/raw1394.c Held during release function to no obvious purpose. drivers/ieee1394/video1394.c Held during mmap call. Held during release function to no obvious purpose. drivers/input/evdev.c Held during release function to no obvious purpose. drivers/input/input.c Held during open function, apparently to guard against an f_ops change. drivers/input/joydev.c Held during release function to no obvious purpose. drivers/input/mousedev.c Held during release function to no obvious purpose. drivers/isdn/avmb1/capi.c Held during release function to no obvious purpose. drivers/isdn/avmb1/capi.c Held during release function to no obvious purpose. drivers/isdn/divert/divert_procfs.c Held while adjusting if_used count or modifying the divert_info linked list. drivers/isdn/hysdn/hysdn_procconf.c Held while examining card in open or close. drivers/isdn/hysdn/hysdn_procfs.c Held while examining card in open or close. drivers/isdn/hysdn/hysdn_proclog.c Held while examining card in open or close. drivers/isdn/isdn_common.c Held during read, write, poll, and close. Can't tell what it is guarding. drivers/macintosh/adb.c Held during release function to no obvious purpose. drivers/macintosh/via-pmu.c Held during release function to no obvious purpose. drivers/md/lvm.c Held during vmalloc() call. drivers/media/video/cpia.h Held to guard access to a static structure (cam_list). drivers/media/video/msp3400.c Held while calling daemonize(). (This is done inconsistently throughout the kernel.) drivers/media/video/tvaudio.c Held during daemonization. drivers/media/video/videodev.c Held during open function. Held during release function to no obvious purpose. Held while doing an mmap(). drivers/net/ppp_generic.c Held during release function to no obvious purpose. drivers/net/wan/cosa.c Held during release function to no obvious purpose. drivers/pci/syscall.c Held during pci_read_config_{byte,word,dword}(). Not sure what it is guarding. drivers/pcmcia/ds.c Held during release function to no obvious purpose. drivers/pnp/isapnp_proc.c (112) guarding against modifying a drivers/sbus/audio/audio.c Held during release function to no obvious purpose. drivers/scsi/cpqfcTSworker.c Held for no obvious purpose (prior to calling exit_mm()?) drivers/scsi/scsi_error.c Held during daemonize(). Not clear why it is held. drivers/scsi/sg.c Held during release function to no obvious purpose. drivers/sgi/char/ds1286.c Held during release function to no obvious purpose. drivers/sgi/char/graphics.c Held during release function to no obvious purpose. drivers/sgi/char/shmiq.c Held while doing an mmap(). Seems to be guarding the local structure shmiqs. drivers/sgi/char/streamable.c Held during release function to no obvious purpose. drivers/sound/cmpci.c Held during release function to no obvious purpose. drivers/sound/dmasound/dmasound_core.c Held during release function to no obvious purpose. drivers/sound/emu10k1/audio.c Held during release function to no obvious purpose. Held while doing an mmap(). drivers/sound/emu10k1/midi.c Held during release function to no obvious purpose. drivers/sound/es1370.c Held during release function to no obvious purpose. Held while doing an mmap(). drivers/sound/es1371.c Held during release function to no obvious purpose. Held while doing an mmap(). drivers/sound/esssolo1.c Held during release function to no obvious purpose. Held while doing an mmap(). drivers/sound/i810_audio.c Held during release function to no obvious purpose. Held while doing an mmap(). drivers/sound/maestro.c Held during release function to no obvious purpose. Held while doing an mmap(). drivers/sound/msnd_pinnacle.c Held during release function to no obvious purpose. drivers/sound/sonicvibes.c Held during release function to no obvious purpose. Held while doing an mmap(). drivers/sound/soundcard.c Held during release function to no obvious purpose. Held while doing an mmap(). Reads and writes apparently made "atomic" by use of big kernel lock. (Local spinlock would be just as effective, if true.) drivers/sound/trident.c Held during release function to no obvious purpose. Held while doing an mmap(). drivers/sound/vwsnd.c Held during release function to no obvious purpose. drivers/sound/wavfront.c Held during release function to no obvious purpose. drivers/usb/audio.c Held during release function to no obvious purpose. Held while doing an mmap(). drivers/usb/dabusb.c Held during release function to no obvious purpose. drivers/usb/devices.c Held while doing an poll(). drivers/usb/devio.c Held while modifying what appears to be local structures during ioctls, opens, and closes. drivers/usb/hub.c Held during daemonize(). Not clear why it is held. drivers/usb/inode.c Held (sometimes) to protect usb_bus_list, apparently. However, this is done inconsistently. drivers/usb/microtek.c Held while modifying what appears to be local structures. drivers/usb/printer.c Held during open function to no obvious purpose. Oddly enough, it is NOT held during the release function! drivers/usb/rio500.c Held during open function to no obvious purpose. drivers/usb/scanner.c Held during open function to no obvious purpose. drivers/usb/storage/usb.c Held during daemonize(). Not clear why it is held. drivers/usb/uhci-debug.h Held during open function to no obvious purpose. drivers/video/fbmem.c Held during release function to no obvious purpose. fs/adfs/inode.c Held while "writing an existing inode back to the directory, and therefore the disk." fs/affs/inode.c Apparently held during changes to or reads from the inode structure. fs/affs/symlink.c Held during use of the inode structure. Held (unnecessarily) during a brelse(). fs/attr.c Apparently held during changes to or reads from the inode structure. fs/autofs/root.c Held during the entire of autofs_revalidate(). It appears to be protecting inode and/or dentry entries but it's not clear. fs/autofs4/root.c Held during the entire of autofs4_dentry_release(). It appears to be protecting inode and/or dentry entries but it's not clear. fs/bfs/file.c Held during bfs_get_block(), but from the comment, it appears to be used for serialization without much thought as to what is actually being protected. fs/bfs/inode.c Held during bfs_write_inode() and bfs_delete_inode(). It appears it might be used to protect inode entries from changing. fs/block_dev.c Held during open routine for block devices and in blkdev_put(); not clear why. Could actually be used to guard bd_openers. fs/buffer.c Held during fsync_dev(), file_fsync(), fsync_super(), and sync_old_buffers(). fs/coda/dir.c Held during all of coda_dentry_revalidate() and coda_revalidate_inode(). fs/coda/file.c Held during coda_open(), coda_release(), and coda_fsync(). fs/coda/psdev.c Held during coda_psdev_write(), coda_psdev_read(), coda_psdev_open(), and coda_psdev_release(). fs/coda/symlink.c Held during coda_symlink_filler(). fs/devfs/base.c Held during write_inode(), open(), d_iput(), d_revalidate_wait(), readlink(), follow_link(). and close(). fs/devices.c Held during get_chrfops() and chrdev_open(). Appears to be guarding the f_op member of the inode structure (*any* inode structure). fs/dquot.c Held during initialize(), drop(), transfer(), and sys_quotactl(). fs/efs/symlink.c Held during symlink_readpage(). fs/exec.c Held during do_coredump() and compute_creds(). fs/ext2/inode.c Held during delete_inode(), discard_prealloc(), get_block(), update_inode(), and write_inode(). fs/fat/inode.c Held during delete_inode(), clear_inode(), and write_inode(), fs/fcntl.c Held during do_fcntl(). fs/fifo.c Held during fifo_open(). fs/filesystems.c Held during sys_nfsservctl(). fs/hfs/inode.c Held during put_inode(). fs/hfs/sysdep.c Held during dentry_iput(), and revalidate_dentry(). fs/hpfs/dir.c Held during dir_release(). fs/hpfs/file.c Held during open(), and file_release(). fs/hpfs/inode.c Held during delete_inode(). fs/hpfs/namei.c Held during symlink_readpage(). fs/ioctl.c Held during sysioctl(). (Is this *really* held during all file system related ioctls??) fs/isofs/inode.c Held during isofs_get_block(). fs/isofs/rock.c Held during rock_ridge_symlink_readpage(). fs/jffs/inode-v23.c Held during jffs_delete_inode(). fs/jffs/intrep.c Held cduring jffs_garbage_collect_thread(). fs/lockd/clntlock.c Held during reclaimer(). fs/lockd/svc.c Held during lockd(). fs/locks.c Held during locks_mandatory_locked(), locks_mandatory_area(), sys_flock(), locks_remove_posix(), locks_remove_flock(), posix_test_lock(), posix_lock_file(), __get_lease(), fcntl_setlease(), lock_may_read(), lock_may_write(), and get_locks_status(). Seems to be held when examining or modifying the i_flock field of an inode. fs/minix/inode.c Held during and write_inode(). fs/minix/itree_common.c Held during and get_block() and sync_file(). fs/namei.c Apparently held to prevent changing of i_op field of inode. Held during permission(), real_lookup(), lookup_hash(), vfs_create(), vfs_mknod(), vfs_mkdir(), vfs_rmdir(), vfs_unlink(), vfs_symlink(), vfs_link(), and do_rename(). fs/ncpfs/dir.c Held during ncp_lookup_validate(). fs/nfs/dir.c Held during nfs_lookup_revalidate() and nfs_dentry_iput(). fs/nfs/file.c Held during nfs_fsync() and nfs_commit_write(). fs/nfs/flushd.c Held during nfs_reqlist_init(), nfs_reqlist_exit(), and inode_schedule_scan(). fs/nfs/inode.c Held during nfs_open(), nfs_release(), and __nfs_revalidate_inode(), fs/nfs/read.c Held during nfs_readpage_sync() and nfs_pagein_one(). fs/nfs/symlink.c Held during nfs_symlink_filler(). fs/nfs/write.c Held during nfs_writepage(), nfs_flush_one(), and nfs_commit_list(). fs/nfsd/nfsctl.c Held during handle_sys_nfsservctl(). fs/nfsd/nfssvc.c Held during nfsd(). fs/ntfs/fs.c Held during ntfs_write_inode() and _ntfs_clear_inode(). fs/open.c Held during vfs_statfs() and filp_close(), apparently to insure that f_op does not change. fs/openpromfs/inode.c Held during property_release(). fs/proc/inode.c Held during de_put(). fs/proc/proc_misc.c Held during locks_read_proc() to insure that i_flock does not change (see get_locks_status()). fs/qnx4/fsync.c Held during sync_file(). fs/qnx4/inode.c Held during delete_inode() and write_inode(). fs/read_write.c Held while calling llseek function. Apparently held to insure that f_op does not change during call. fs/readdir.c Held while calling vfsreaddir function. Apparently held to insure that f_op does not change during call. fs/reiserfs/dir.c Held during dir_fsync(). fs/reiserfs/file.c Held during file_release(), and sync_file(). fs/reiserfs/inode.c Held during delete_inode(), bmap(), get_block(), write_inode(), dirty_inode(), map_block_for_writepage(), and commit_write(). fs/reiserfs/ioctl.c Held for unpack(). fs/reiserfs/journal.c Held for journal_commit_thread(). fs/reiserfs/super.c Held for write_super() and write_super_lockfs(). fs/romfs/inode.c Held during readpage(). fs/smbfs/dir.c Held during smb_dir_open() and smb_lookup_validate(). fs/smbfs/file.c Held during smb_commit_write(), smb_file_open(), and smb_file_release(). Apparently guards the openers field in the samba-specific part of the fs union. fs/smbfs/inode.c Held during smb_revalidate_inode() and smb_delete_inode(). fs/smbfs/sock.c Held during smb_data_callback(). fs/super.c Held during sys_ustat() (to guard superblock structures?) Held during sys_mount(), sys_umount(), and sys_pivot_root(). fs/sysv/fsync.c Held during sync_file(). fs/sysv/inode.c Held during sysv_delete_inode(), sysv_block_map(), sysv_get_block(), and sysv_write_inode(). fs/udf/file.c Held during release_file(). fs/udf/fsync.c Held during sync_file(). fs/udf/inode.c Held during udf_put_inode(), udf_delete_inode(), udf_get_block(), udf_write_inode(), and udf_block_map(). fs/udf/symlink.c Held during symlink_filler(). fs/ufs/inode.c Held during ufs_frag_map(), ufs_getfrag_block(), ufs_write_inode(), and ufs_delete_inode(). include/linux/net.h Held during SOCKCALL_WRAP and SOCKCALL_UWRAP (macros). It appears most socket operations still use the BKL, since these macros are used in the definition of the the ops structures to create inline functions. include/linux/raid/md_compatible.h Macro md_lock_kernel() declared here and actually used in block/md.c. Appears to be held during thread creation. init/main.c Held during start_kernel() and init(). kernel/acct.c Held during check_free_space(), sys_acct(), acct_auto_close(), and acct_process(). Appear to guard acct_active, acct_timer, and acct_file. kernel/exit.c Held in do_exit(). kernel/module.c Held in sys_create_module(), sys_init_module(), sys_delete_module(), sys_query_module(), and sys_get_kernel_syms(). Appears to guard the module list. kernel/sys.c Held during sys_reboot(). kernel/sysctl.c Held during sys_sysctl(). mm/memory.c Held during do_swap_page() and shmem_getpage_locked(), apparently while using the page cache. Held during vmtruncate(), apparently to hold i_ops constant. mm/swapfile.c Held during swapon()/swapoff(). Appears to be protecting the swap list -- doesn't swap_list_lock() do that? net/ipv4/af_inet.c Held during call of dlci_ioctl(). Not clear why this serialization is necessary or what the BKL is guarding. net/netlink/netlink_dev.c Held during release(). net/socket.c Explicitly *released* during a socket ioctl(). net/sunrpc/sched.c Held during rpciod() (daemon creation.) Notes: If you are writing new code (as opposed to fixing or extending old code) you should think very hard before using this lock. It may be that for compatibility reasons, this lock must be utilized in some sections of new code. Where possible, however, you should be attempting to update the new code by using a lock other than the kernel_flag -- one which is designed specifically for the needs of the new code. 6. Global Read/Write spinlocks # # read/write locks # Lock: tasklist_lock Interrupts: read: Ignored write: Blocked and Saved Functions: sys_ptrace(), task_valid(), sys32_ptrace(), sync_thread_rbs(), wrap_mmu_context(), unswap_by_move(), irix_waitsys(), irix_prctl(), irix_syssgi(), mmu_context_overflow(), do_ptrace(), srmmu_set_pgdir(), do_tty_hangup(), disassociate_ctty(), release_dev(), tiocsctty(), do_SAK(), de_thread(), send_sigio(), task_state(), proc_pid_stat(), proc_pid_lookup(), get_pid_list(), proc_read_super(), chroot_fs_refs(), set_pgdir(), unhash_process(), sys_capget(), cap_set_pg, cap_set_all(), sys_capset(), session_of_pgrp(), will_become_orphaned_pgrp(), has_stopped_jobs(), forget_original_parent(), exit_notify(), sys_wait4(), get_pid(), do_fork(), schedule(), setscheduler(), sys_sched_getscheduler(), sys_sched_getparam(), sys_sched_rr_get_interval(), show_state(), kill_pg_info(), kill_sl_info(), kill_proc_info(), kill_something_info(), notify_parent(), sys_setpriority(), sys_getpriority(), sys_setpgid(), sys_getpgid(), sys_getsid(), count_active_tasks(), try_to_unuse(), swap_out(), match_pid(), match_sid() Use: Held for read when inspecting the task list (find_task_by_pid(), for_each_task, or in general, following pointers in the task list when you don't want the structure of the list to change.) Held for write when changing the task list, or any item in the task list which may affects the integrity of the list (modifying pointers to other tasks, for instance) Notes: When kernel_flag is also needed, that lock should be acquired first. Lock: guid_lock Interrupts: read: Saved write: Saved Functions: create_guid_entry(), associate_guid(), hpsb_guid_get_handle() Use: Held for read when inspecting the guid_list. Held for write when adding a new guid entry to guid_list. Notes: Could be made static. Lock: hl_drivers_lock Interrupts: read: Ignored write: Blocked Functions: hpsb_register_highlevel(), hpsb_unregister_highlevel(), highlevel_iso_receive(), highlevel_fcp_request() Macros: DEFINE_MULTIPLEXER() highlevel.c; Use: Held for read when inspecting the hl_drivers list. Held for write when adding a new high level driver entry to the hl_drivers list. Notes: Could be made static. Lock: addr_space_lock Interrupts: read: Ignored write: Blocked Functions: hpsb_register_highlevel(), hpsb_unregister_highlevel(), highlevel_iso_receive(), highlevel_fcp_request() Macros: DEFINE_MULTIPLEXER() highlevel.c; Use: Held for read when inspecting the hl_drivers list. Held for write when adding a new high level driver entry to the hl_drivers list. Notes: Could be made static. Lock: dev_base_lock Interrupts: read: Ignored, Blocked write: Blocked Functions: dev_ifname32(), solaris_i(), bpq_init_driver(), get_strip_dev(), lapbeth_init_driver(), dev_get_by_name(), dev_get(), dev_get_by_index(), dev_clear_fastroute(), dev_ifname(), dev_get_info(), dev_get_wireless_info(), dev_ioctl(), register_netdevice(), unregister_netdevice(), net_dev_init(), dev_mc_read_proc(), rtnetlink_dump_ifinfo(), dn_bind(), dn_dev_dump_ifaddr(), decnet_dev_get_info(), inetdev_by_index(), inet_select_addr(), inet_dump_ifaddr(), inet_forward_change(), ip_mc_procinfo(), addrconf_forward_change(), ipv6_get_saddr(), igmp6_read_proc(), nr_dev_first(), nr_dev_get(), rose_dev_first(), rose_dev_get(), rose_dev_exists(), tc_dump_qdisc(), x25_init(). Use: Excellent explanation found in drivers/net/Space.c. In part, Pure readers hold dev_base_lock for reading. Writers must hold the rtnl semaphore while they loop through the dev_base list, and hold dev_base_lock for writing when they do the actual updates. Lock: notifier_lock Interrupts: read: Never used write: Ignored Functions: notify_chain_register(), notify_chain_unregister() Use: Held for write when modifying any notifier list anywhere. Notes: There is an ordinary spinlock named the same which is static to kcapi.c. A different name should be chosen for one or the other, to avoid confusion. The read side of this read/write lock never seems to be used, effectively making this a spinlock. This usage could be made static. Arguably, each notifier list should have its own lock but modifying the lists is probably an infrequent operation. Lock: xtime_lock Interrupts: read: Ignored, Saved write: Ignored, Saved, Blocked Functions: generally, do_gettimeofday(), do_settimeofday() and timer_interrupt() in each architecture. See each architecture for specific uses. Use: Held for read when examining xtime. Held for write when modifying xtime. Lock: vmlist_lock Interrupts: read: Ignored write: Ignored Functions: read_kcore(), get_vm_area(), vfree(), vread() Use: Held for read when examining vmlist. Held for write when modifying vmlist (or anything in the vmlist chain.) Lock: net_big_sklist_lock Interrupts: read: Never used write: Blocked (bh) Functions: sklist_remove_socket(), sklist_insert_socket() Use: Held for write when modifying any socket list. Notes: Could be made static. Never used for reading, effectively making this a spinlock. Still used by only one driver, apparently (econet). Lock: dn_hash_lock Interrupts: read: Ignored write: Ignored, Blocked (bh) Functions: dn_hash_sock(), dn_unhash_sock(), dn_unhash_sock_bh(), dn_sklist_find_listener(), dn_find_by_skb(), dn_get_info() Use: Held for read when examining dn_sklist. Held for write when modifying dn_sklist. Notes: Could be made static (seems to guard entirely static variables and is only used within one file.) Lock: inetdev_lock Interrupts: read: Ignored write: Ignored, Blocked (bh) Functions: inetdev_init(), inetdev_destroy(), inet_select_addr(), inet_dump_ifaddr(), inet_forward_change(), fib_validate_source(), ip_route_input(), ip_route_output_slow(). Use: Held for read when examining (or using) an ip_ptr from any struct netdevice. Held for write when modifying an ip_ptr from any struct netdevice. Lock: ip_ra_lock Interrupts: read: Ignored write: Ignored, Blocked (bh) Functions: ip_call_ra_chain(), ip_ra_control() Use: Held for read when examining ip_ra_chain (or traversing that list.) Held for write when modifying ip_ra_chain. Since ip_ra_chain is a linked list, this lock is also held when modifying the integrity of that list. Lock: ip_fw_lock Interrupts: read: Ignored write: Ignored, Blocked (bh) Functions: ip_fw_check(), ip_fw_ctl(), ip_chain_name_procinfo(), ipfw_init_or_cleanup() Use: Held for read when examining ip_fw_chains (or traversing that list.) Held for write when modifying ip_fw_chains (or the integrity of that list.) Also held for write when examining or totalling counters, because according to the comments, counters are updated by "readers". Notes: Could be made static. Due to the comment about readers updating counters, it sounds like there is a locking issue. Inspection of the source does not reveal any counter updates while holding the read lock, so either the comment is false, neither the read nor write lock is utilized during these operations, or I missed something in my inspection. Lock: raw_v4_lock Interrupts: read: Ignored write: Blocked (bh) Functions: icmp_unreach(), raw_v4_hash(), raw_v4_unhash(), raw_v4_input(), raw_get_info() Use: Held for read when examining any of the buckets in the hash table raw_v4_htable. Held for write when modifying raw_v4_htable. Since this is a fixed size hash table, the table itself or its integrity cannot change, as with a linked list. Holding the lock for write means you are changing entries in any one of the buckets. Lock: context_overflow_lock Interrupts: read: never used write: Ignored Functions: mmu_context_overflow() Use: Seems to be guarding next_mmu_context. Only used as a write lock, which effectively makes it a simple spin lock. Could be made static. Lock: udp_hash_lock Interrupts: read: Ignored write: Blocked (bh) Functions: udp_v4_get_port(), udp_v4_unhash(), udp_v4_lookup(), udp_v4_mcast_deliver(), udp_get_info(), udp_v6_get_port(), udp_v6_unhash(), udp_v6_lookup(), udpv6_mcast_deliver(), udp6_get_info() Use: Held for read when examining the hash table udp_hash. Held for write when modifying raw_v4_htable. Since this is a fixed size hash table, the table itself or its integrity cannot change, as with a linked list. Holding the lock for write means you are changing any entry in any one of the buckets. Lock: addrconf_lock Interrupts: read: Ignored write: Blocked (bh) Functions: in6_dev_get(), ipv6_add_dev(), addrconf_forward_change(), ipv6_add_addr(), ipv6_get_saddr(), ipv6_get_lladdr(), addrconf_ifdown() Use: Comment states "protects inet6 devices". It is not clear from inspection when this should be acquired read and when it should be acquired write. It appears to guard the ip6_ptr field of all struct netdevices. That is, when one is going to modify that field on any struct net_device, one should acquire this lock. Could be made static. Notes: Seems like this would be better achieved by a per-structure lock. Currently a global lock seems to protect many local fields. Lock: fib6_walker_lock Interrupts: read: Ignored write: Blocked (bh) Functions: fib6_walker_link(), fib6_walker_unlink(), fib6_repair_tree(), fib6_del_route() Use: Guards fib6_walker_list. Held for read when examining fib6_walker_list. This is a linked list, so the integrity of the list is insured while the lock is held. Held for write when modifying any connectivity in fib6_walker_list. Notes: Could be made static. Lock: ip6_fw_lock Interrupts: read: not used write: Blocked (bh) Functions: ip6_rule_add(), ip6_rule_del() Use: Not clear exactly what it guards. Notes: Could be made static. This code is currently ifdeffed out. Lock: ip6_ra_lock Interrupts: read: Ignored write: Blocked (bh) Functions: ip6_ra_control(), ip6_call_ra_chain() Use: Guards ip6_ra_chain. Held for read when examining ip6_ra_chain. This is a linked list, so the integrity of the list is insured while the lock is held. Held for write when modifying any connectivity in ip6_ra_chain. Lock: raw_v6_lock Interrupts: read: Ignored write: Blocked (bh) Functions: icmp6_notify(), raw_v6_hash(), raw_v6_unhash(), ipv6_raw_deliver(), raw6_get_info() Use: Held for read when examining any of the buckets in the hash table raw_v6_htable. Held for write when modifying raw_v6_htable. Since this is a fixed size hash table, the table itself or its integrity cannot change, as with a linked list. Holding the lock for write means you are changing any entry in any one of the buckets. Lock: rt6_lock Interrupts: read: Blocked (bh) write: Blocked (bh) Functions: rt6_redirect(), rt6_get_dflt_router(), rt6_purge_dflt_routers(), rt6_ifdown(), rt6_mtu_change(), inet6_dump_fib(), rt6_proc_info(), fib6_run_gc(), rt6_lookup(), rt6_ins(), ip6_route_input(), ip6_route_output(), ip6_route_del(), ip6_del_rt() Use: Comment claims "protects all the ip6 fib". Held for read when examining any of the tree described by ip6_routing_table. Held for write when modifying any of the structure of the tree described by ip6_routing_table. (Holding this lock for write does not guarantee that the data itself won't change; only that the linkages for the tree won't change.) Lock: qdisc_tree_lock Interrupts: read: Ignored write: Ignored Functions: sch_tree_lock(), sch_tree_unlock(), tcf_tree_lock(), tcf_tree_unlock(), tc_ctl_tfilter(), tc_dump_tfilter(), dev_graft_qdisc(), qdisc_create(), tc_dump_qdisc(), tc_dump_tclass(), dev_activate(), dev_init_scheduler(), dev_shutdown() Use: The comments near the declaration of this variable read, in part: Modifications to data, participating in scheduling must be additionally protected with dev->queue_lock spinlock. The idea is the following: - enqueue, dequeue are serialized via top level device spinlock dev->queue_lock. - tree walking is protected by read_lock( qdisc_tree_lock) and this lock is used only in process context. - updates to tree are made only under rtnl semaphore, hence this lock may be made without local bh disabling. qdisc_tree_lock must be grabbed BEFORE dev->queue_lock! Lock: unix_table_lock Interrupts: read: Ignored write: Ignored Functions: unix_remove_socket(), unix_insert_socket(), unix_find_socket_byname(), unix_find_socket_byinode(), unix_autobind(), unix_bind(), unix_read_proc(), unix_gc() Use: Guards unix_socket_table[]. Held for read when examining the hash tables in unix_socket_table[]. Held for write when modifying any bucket in the hash tables represented in unix_socket_table[]. (Holding this lock for write does not guarantee that the data itself won't change; only that the linkages in the hash buckets won't change.) 7. Additional spin locks # # static locks # Lock: unload_lock Interrupts: Ignored Functions: try_inc_mod_count(), sys_delete_module(), get_module_symbol Use: Held while referencing or updating fields in any struct module. As its name suggests, this is most often used to prevent the module from being unloaded while it is being referenced or modified. Lock: softirq_mask_lock Interrupts: Saved Functions: open_softirq() Use: Held while modifying the contents of softirq_vec. Lock: pm_devs_lock Interrupts: Saved Functions: pm_register(), pm_unregister() Use: Held while modifying the pm_devs list. Lock: irq_controller_lock Interrupts: Saved, Ignored Functions: disable_irq(), enable_irq(), do_IRQ(), do_ecard_IRQ(), setup_arm_irq(), free_irq(), probe_irq_on(), probe_irq_off(), do_cobalt_IRQ(), do_piix4_master_IRQ(), disable_irq_nosync(), setup_irq() Use: Held while modifying the global array irq_desc. Architectures: i386, arm, sh Lock: io_tlb_lock Interrupts: Saved Functions: map_single(), unmap_single() Use: Held while modifying or examining io_tlb_list or io_tlb_index. Architectures: ia64 Lock: uidhash_lock Interrupts: Ignored Functions: free_uid(), alloc_uid() Use: Held when updating or validating the uid hash table. (After acquiring the lock, this should be done by using the routines uid_hash_free(), uid_hash_remove(), uid_hash_find(), or uid_hash_insert().) Lock: lru_list_lock Interrupts: Ignored Functions: sync_buffers(), buffer_insert_inode_queue(), inode_has_buffers(), __invalidate_buffers(), set_blocksize(), fsync_inode_buffers(), osync_inode_buffers(), invalidate_inode_buffers(), getblk(), refile_buffers(), __bforget(), try_to_free_buffers(), show_buffers(), flush_dirty_buffers() Use: Held while reading or writing the lru_list. Notes: if hash_table_lock and unused_list_lock are also used, then the order should be lru_list_lock --> hash_table_lock --> unused_list_lock to avoid deadlocks. The routines __removed_from_queues() and __insert_into_queues() expect this lock to be held upon entry. Lock: unused_list_lock Interrupts: Ignored Functions: get_unused_buffer_head, create_buffers, wait_kio, brw_kiovec, try_to_free_buffers Use: protects the unused_list Notes: If hash_table_lock and lru_list_lock are also used, then the order should be lru_list_lock > hash_table_lock > unused_list_lock to avoid deadlocks. The routine __put_unused_buffer_head() expects this lock to be held upon entry. # # static read/write locks # lock: hash_table_lock Functions: get_hash_table(), __invalidate_buffers(), set_blocksize(), getblk(), __bforget(), try_to_free_buffers() Use: Protects access to hash_table[]. This is a fixed size array, so it really protects each element (bucket) in the array and the next/prev linkage to any additional data items in the bucket. Held for read when examining any bucket in hash_table[]. Held for write when modifying any bucket in hash_table[]. Note this only protects the integrity of the hash bucket linked lists, not the data stored in those lists. Notes: If hash_table_lock and lru_list_lock are also used, then the order should be lru_list_lock > hash_table_lock > unused_list_lock to avoid deadlocks. The routines __remove_from_queues() and __insert_into_queues() expect this lock to be held upon entry. # # locks found in structures # Lock: isl_lock Interrupts: Saved, Ignored Structure: ia64_state_log_s Macros: IA64_LOG_LOCK_INIT() mca.c; IA64_LOG_LOCK() mca.c; IA64_LOG_UNLOCK() mca.c; Use: Held while modifying or reading from ia64_state_log. Lock: __tcp_portalloc_lock Structure: tcp_hashinfo Interrupts: Ignored Functions: tcp_v4_get_port(), tcp_v6_get_port() Use: Held while acquiring a TCP port (essentially, while examining and modifying tcp_port_rover.) 8. Change Log Changes from 2.4.4 to 2.4.5 sys_fcntl64 no longer uses lock_kernel() nfs_lock began using lock_kernel()