From Bugzilla Helper: User-Agent: Mozilla/4.0 (compatible; MSIE 5.5; Windows NT 4.0) Description of problem: While running heavy I/O with RH72, QLogic FC driver 5.38.3, and 2.4.9-enterprise and 2.4.9-21smp we are experiencing this ext3 kernel bug. Topology: 1 - Dell 2450 SMP machine 1 - Dell 5450 SMP machine Each HBA connected to one zone (of two) of a Dell 56F. A dual ported Dell PowerVault sits behind the switch (each port once again connected to a zone). The problem has occured on two distinct setups, similiarly configured. Attached ksymoops information should help. I'm hoping to get some clearer insight as to how this condition would be triggered (ext3_journal_current_handle() == journal_current_handle == current->journal_info). FYI: The kernel is tainted because the 5.38.3 driver checks for kernel versions higher than 2.4.9 (not inclusive) to add the MODULE_LICENSE information. Version-Release number of selected component (if applicable): How reproducible: Sometimes Steps to Reproduce: 1. <Description> 2. 3. Additional info: Enterprise setup: ------------------------------- 8< --------------------------------- ksymoops 2.4.1 on i686 2.4.7-10enterprise. Options used -V (default) -k /proc/ksyms (default) -l /proc/modules (default) -o /lib/modules/2.4.7-10enterprise/ (default) -m /boot/System.map-2.4.7-10enterprise (default) Warning: You did not tell me where to find symbol information. I will assume that the log matches the kernel and modules that are running right now and I'll use the default options above for symbol resolution. If the current kernel and/or modules do not match the log, you can get more accurate output by telling me the kernel version and where to find map, modules, ksyms etc. ksymoops -h explains the options. Error (expand_objects): cannot stat(/lib/ext3.o) for ext3 ksymoops: No such file or directory Error (expand_objects): cannot stat(/lib/jbd.o) for jbd ksymoops: No such file or directory Error (expand_objects): cannot stat(/lib/megaraid.o) for megaraid ksymoops: No such file or directory Error (expand_objects): cannot stat(/lib/aic7xxx.o) for aic7xxx ksymoops: No such file or directory Error (expand_objects): cannot stat(/lib/sd_mod.o) for sd_mod ksymoops: No such file or directory Error (expand_objects): cannot stat(/lib/scsi_mod.o) for scsi_mod ksymoops: No such file or directory Warning (compare_maps): mismatch on symbol partition_name , ksyms_base says c01c09e0, System.map says c0160900. Ignoring ksyms_base entry Warning (compare_maps): mismatch on symbol debug_buff , qla2200 says f8872640, /lib/modules/2.4.7-10enterprise/kernel/drivers/scsi/qla2200.o says f885d580. Ignoring /lib/modules/2.4.7- 10enterprise/kernel/drivers/scsi/qla2200.o entry Warning (compare_maps): mismatch on symbol qla_fo_params , qla2200 says f8872600, /lib/modules/2.4.7-10enterprise/kernel/drivers/scsi/qla2200.o says f885d540.Ignoring /lib/modules/2.4.7- 10enterprise/kernel/drivers/scsi/qla2200.o entry Warning (compare_maps): mismatch on symbol usb_devfs_handle , usbcore says f89a04c0, /lib/modules/2.4.7-10enterprise/kernel/drivers/usb/usbcore.o says f899ffe0. Ignoring /lib/modules/2.4.7- 10enterprise/kernel/drivers/usb/usbcore.o entry Warning (map_ksym_to_module): cannot match loaded module ext3 to a unique module object. Trace may not be reliable. Warning (compare_maps): mismatch on symbol mega_hbas , megaraid says f88458c0, /lib/modules/2.4.7-10enterprise/kernel/drivers/scsi/megaraid.o says f8845600. Ignoring /lib/modules/2.4.7- 10enterprise/kernel/drivers/scsi/megaraid.o entry Warning (compare_maps): mismatch on symbol sd , sd_mod says f881cce4, /lib/modules/2.4.7-10enterprise/kernel/drivers/scsi/sd_mod.o says f881cba0. Ignoring /lib/modules/2.4.7- 10enterprise/kernel/drivers/scsi/sd_mod.o entry Warning (compare_maps): mismatch on symbol proc_scsi , scsi_mod says f8818088, /lib/modules/2.4.7-10enterprise/kernel/drivers/scsi/scsi_mod.o says f8816910. Ignoring /lib/modules/2.4.7- 10enterprise/kernel/drivers/scsi/scsi_mod.o entry Warning (compare_maps): mismatch on symbol scsi_devicelist , scsi_mod says f88180b4, /lib/modules/2.4.7-10enterprise/kernel/drivers/scsi/scsi_mod.o says f881693c. Ignoring /lib/modules/2.4.7- 10enterprise/kernel/drivers/scsi/scsi_mod.o entry Warning (compare_maps): mismatch on symbol scsi_hostlist , scsi_mod says f88180b0, /lib/modules/2.4.7-10enterprise/kernel/drivers/scsi/scsi_mod.o says f8816938. Ignoring /lib/modules/2.4.7- 10enterprise/kernel/drivers/scsi/scsi_mod.o entry Warning (compare_maps): mismatch on symbol scsi_hosts , scsi_mod says f88180b8, /lib/modules/2.4.7-10enterprise/kernel/drivers/scsi/scsi_mod.o says f8816940. Ignoring /lib/modules/2.4.7- 10enterprise/kernel/drivers/scsi/scsi_mod.o entry Warning (compare_maps): mismatch on symbol scsi_logging_level , scsi_mod says f8818084, /lib/modules/2.4.7-10enterprise/kernel/drivers/scsi/scsi_mod.o says f881690c. Ignoring /lib/modules/2.4.7- 10enterprise/kernel/drivers/scsi/scsi_mod.o entry kernel BUG at fsync.c:55! invalid operand: 0000 CPU: 3 EIP: 0010:[e100:__insmod_e100_O/lib/modules/2.4.7-10enterprise/kernel/drive+- 1213164/96] EIP: 0010:[<f8881d14>] Using defaults from ksymoops -t elf32-i386 -a i386 EFLAGS: 00010286 eax: 0000001a ebx: f45e0940 ecx: c02616dc edx: 00008932 esi: f45e0940 edi: f46264a0 ebp: 409b4000 esp: f5b53f6c ds: 0018 es: 0018 ss: 0018 Process qlremote (pid: 1016, stackpage=f5b53000) Stack: f888f7f0 00000037 f45e0940 f678ee40 c012ef80 f678ee40 f46264a0 00000001 f3f8b860 409a3000 fffffff2 c012f05b f3f8b860 409a3000 409b4000 00000004 00000000 f5b52000 00000004 000100fe 409a2974 c010716b 409a3000 000100fe Call Trace: [e100:__insmod_e100_O/lib/modules/2.4.7-10enterprise/kernel/drive+- 1157136/96] [msync_interval+112/160] [sys_msync+171/272] [system_call+51/56] Call Trace: [<f888f7f0>] [<c012ef80>] [<c012f05b>] [<c010716b>] Code: 0f 0b 58 5a 56 e8 62 d5 8b c7 8b b6 a0 00 00 00 89 c3 56 e8 >>EIP; f8881d14 <[ext3].text.start+1cb4/b89f> <===== Trace; f888f7f0 <[ext3].text.end+3ef1/4821> Trace; c012ef80 <msync_interval+70/a0> Trace; c012f05b <sys_msync+ab/110> Trace; c010716b <system_call+33/38> Code; f8881d14 <[ext3].text.start+1cb4/b89f> 00000000 <_EIP>: Code; f8881d14 <[ext3].text.start+1cb4/b89f> <===== 0: 0f 0b ud2a <===== Code; f8881d16 <[ext3].text.start+1cb6/b89f> 2: 58 pop %eax Code; f8881d17 <[ext3].text.start+1cb7/b89f> 3: 5a pop %edx Code; f8881d18 <[ext3].text.start+1cb8/b89f> 4: 56 push %esi Code; f8881d19 <[ext3].text.start+1cb9/b89f> 5: e8 62 d5 8b c7 call c78bd56c <_EIP+0xc78bd56c> c013f280 <fsync_inode_buffers+0/170> Code; f8881d1e <[ext3].text.start+1cbe/b89f> a: 8b b6 a0 00 00 00 mov 0xa0(%esi),%esi Code; f8881d24 <[ext3].text.start+1cc4/b89f> 10: 89 c3 mov %eax,%ebx Code; f8881d26 <[ext3].text.start+1cc6/b89f> 12: 56 push %esi Code; f8881d27 <[ext3].text.start+1cc7/b89f> 13: e8 00 00 00 00 call 18 <_EIP+0x18> f8881d2c < [ext3].text.start+1ccc/b89f> 13 warnings and 6 errors issued. Results may not be reliable. ------------------------------- 8< --------------------------------- 2.4.9-21SMP setup: ------------------------------- 8< --------------------------------- ksymoops 2.4.1 on i686 2.4.9-21custom-SIG. Options used -V (default) -k /proc/ksyms (default) -l /proc/modules (default) -o /lib/modules/2.4.9-21custom-SIG/ (default) -m /boot/System.map-2.4.9-21custom-SIG (default) Warning: You did not tell me where to find symbol information. I will assume that the log matches the kernel and modules that are running right now and I'll use the default options above for symbol resolution. If the current kernel and/or modules do not match the log, you can get more accurate output by telling me the kernel version and where to find map, modules, ksyms etc. ksymoops -h explains the options. Error (expand_objects): cannot stat(/lib/ext3.o) for ext3 Error (expand_objects): cannot stat(/lib/jbd.o) for jbd Error (expand_objects): cannot stat(/lib/aic7xxx.o) for aic7xxx Error (expand_objects): cannot stat(/lib/aacraid.o) for aacraid Error (expand_objects): cannot stat(/lib/sd_mod.o) for sd_mod Error (expand_objects): cannot stat(/lib/scsi_mod.o) for scsi_mod Warning (read_object): no symbols in /lib/modules/2.4.9-21custom- SIG/build/abi/abi.o Warning (read_object): no symbols in /lib/modules/2.4.9-21custom- SIG/build/drivers/addon/cipe/bf-i386.o Warning (read_object): no symbols in /lib/modules/2.4.9-21custom- SIG/build/drivers/addon/addon.o Warning (read_object): no symbols in /lib/modules/2.4.9-21custom- SIG/build/drivers/cdrom/driver.o Warning (read_object): no symbols in /lib/modules/2.4.9-21custom- SIG/build/drivers/char/drm/drm.o Warning (read_object): no symbols in /lib/modules/2.4.9-21custom- SIG/build/drivers/media/radio/radio.o Warning (read_object): no symbols in /lib/modules/2.4.9-21custom- SIG/build/drivers/media/video/video.o Warning (read_object): no symbols in /lib/modules/2.4.9-21custom- SIG/build/drivers/media/media.o Warning (read_object): no symbols in /lib/modules/2.4.9-21custom- SIG/build/drivers/misc/misc.o Warning (read_object): no symbols in /lib/modules/2.4.9-21custom- SIG/build/drivers/net/appletalk/appletalk.o Warning (read_object): no symbols in /lib/modules/2.4.9-21custom- SIG/build/drivers/net/fc/fc.o Warning (read_object): no symbols in /lib/modules/2.4.9-21custom- SIG/build/drivers/net/pcmcia/pcmcia_net.o Warning (read_object): no symbols in /lib/modules/2.4.9-21custom- SIG/build/drivers/net/tokenring/tr.o Warning (read_object): no symbols in /lib/modules/2.4.9-21custom- SIG/build/drivers/net/wan/wan.o Warning (read_object): no symbols in /lib/modules/2.4.9-21custom- SIG/build/drivers/net/wireless/wireless_net.o Warning (read_object): no symbols in /lib/modules/2.4.9-21custom- SIG/build/drivers/parport/driver.o Warning (read_object): no symbols in /lib/modules/2.4.9-21custom- SIG/build/drivers/sound/sounddrivers.o Warning (read_object): no symbols in /lib/modules/2.4.9-21custom- SIG/build/net/ipv4/netfilter/netfilter.o Warning (read_object): no symbols in /lib/modules/2.4.9-21custom- SIG/build/net/ipv6/netfilter/netfilter.o Warning (compare_maps): ksyms_base symbol GPLONLY_pci_hp_change_slot_info not found in System.map. Ignoring ksyms_base entry Warning (compare_maps): ksyms_base symbol GPLONLY_pci_hp_deregister not found in System.map. Ignoring ksyms_base entry Warning (compare_maps): ksyms_base symbol GPLONLY_pci_hp_register not found in System.map. Ignoring ksyms_base entry Warning (compare_maps): mismatch on symbol partition_name , ksyms_base says c01c28e0, System.map says c01617e0. Ignoring ksyms_base entry Warning (compare_maps): mismatch on symbol usb_devfs_handle , usbcore says f08f5180, /lib/modules/2.4.9-21custom-SIG/kernel/drivers/usb/usbcore.o says f08f4ca0. Ignoring /lib/modules/2.4.9-21custom- SIG/kernel/drivers/usb/usbcore.o entry Warning (map_ksym_to_module): cannot match loaded module ext3 to a unique module object. Trace may not be reliable. Warning (map_ksym_to_module): cannot match loaded module jbd to a unique module object. Trace may not be reliable. Warning (map_ksym_to_module): cannot match loaded module aic7xxx to a unique module object. Trace may not be reliable. Warning (map_ksym_to_module): cannot match loaded module aacraid to a unique module object. Trace may not be reliable. Warning (map_ksym_to_module): cannot match loaded module sd_mod to a unique module object. Trace may not be reliable. Warning (map_ksym_to_module): cannot match loaded module scsi_mod to a unique module object. Trace may not be reliable. kernel BUG at fsync.c:55! invalid operand: 0000 CPU: 0 EIP: 0010:[eepro100:__insmod_eepro100_O/lib/modules/2.4.9-21custom- SIG/kernel/d+-763020/96] Tainted: P EIP: 0010:[<f0856b74>] Tainted: P Using defaults from ksymoops -t elf32-i386 -a i386 EFLAGS: 00010296 eax: 0000001a ebx: e0402900 ecx: c02da064 edx: 0001d843 esi: e0402900 edi: 00000000 ebp: 409b8000 esp: e3a05f6c ds: 0018 es: 0018 ss: 0018 Process qlremote (pid: 1698, stackpage=e3a05000) Stack: f0863ac2 00000037 e0402900 ec83c120 c012f630 ec83c120 ead40ce0 00000001 eba22a60 409a7000 fffffff2 c012f70b eba22a60 409a7000 409b8000 00000004 00000000 e3a04000 00000004 000100fe 409a6974 c010719b 409a7000 000100fe Call Trace: [eepro100:__insmod_eepro100_O/lib/modules/2.4.9-21custom- SIG/kernel/d+-709950/96] __insmod_ext3_S.text_L45152 [ext3] 0xea62 Call Trace: [<f0863ac2>] __insmod_ext3_S.text_L45152 [ext3] 0xea62 [<c012f630>] msync_interval [kernel] 0x70 [<c012f70b>] sys_msync [kernel] 0xab [<c010719b>] system_call [kernel] 0x33 Code: 0f 0b 58 5a 56 e8 32 8d 8e cf 8b b6 a0 00 00 00 89 c3 56 e8 >>EIP; f0856b74 <[ext3].text.start+1b14/b05f> <===== Trace; f0863ac2 <[ext3].text.end+3a03/42e1> Trace; c012f630 <msync_interval+70/a0> Trace; c012f70b <sys_msync+ab/110> Trace; c010719b <system_call+33/38> Code; f0856b74 <[ext3].text.start+1b14/b05f> 00000000 <_EIP>: Code; f0856b74 <[ext3].text.start+1b14/b05f> <===== 0: 0f 0b ud2a <===== Code; f0856b76 <[ext3].text.start+1b16/b05f> 2: 58 pop %eax Code; f0856b77 <[ext3].text.start+1b17/b05f> 3: 5a pop %edx Code; f0856b78 <[ext3].text.start+1b18/b05f> 4: 56 push %esi Code; f0856b79 <[ext3].text.start+1b19/b05f> 5: e8 32 8d 8e cf call cf8e8d3c <_EIP+0xcf8e8d3c> c013f8b0 <fsync_inode_buffers+0/180> Code; f0856b7e <[ext3].text.start+1b1e/b05f> a: 8b b6 a0 00 00 00 mov 0xa0(%esi),%esi Code; f0856b84 <[ext3].text.start+1b24/b05f> 10: 89 c3 mov %eax,%ebx Code; f0856b86 <[ext3].text.start+1b26/b05f> 12: 56 push %esi Code; f0856b87 <[ext3].text.start+1b27/b05f> 13: e8 00 00 00 00 call 18 <_EIP+0x18> f0856b8c < [ext3].text.start+1b2c/b05f> 31 warnings and 6 errors issued. Results may not be reliable. ------------------------------- 8< ---------------------------------
Does the 5.38.3 driver have the stackoverflow that was in 5.31 fixed ?
Arjan, I'm looking at the changelog from the RH1 drop you had worked on. There are three places where you note 'buffer/stack overflow': * fix bufferoverflow from the /proc file * fix using stack variables to register IO ports and other resources (and bufferoverflow in the same code) I can see the problem in the code for these two (qla2100_proc_info, qla2100_register_with_Linux). But, for the third entry: * fix stack overflow that caused oopses and/or memory corruption I've scanned the diffs, but cannot seem to trace this down to a specific change beyond the following lines: - request_region(host->io_port, 0xff, drvname); + sprintf(ha->devname, "qla2x00#%02d", host->unique_id); + request_region(host->io_port, 0xff, ha->devname); - /* ER# 4368 */ - sprintf(drvname, "qla2x00#%02d", host->unique_id); Where drvname was originally defined char drvname[9] (bad!). Thanks!
All that has happened here is that we've entered an ext3 function, and the "task->journal" entry in the task_struct is already set. That's the last item in the task struct, and the task struct is allocated just below the kernel stack, so if there is any chance there's a stack overflow involved, this is exactly where you'd expect to see it detected. We really need to eliminate that possibility if there is any question of such overflow in the driver.
Created attachment 49724 [details] simple shell script to check for stack abuse
I've attached a simple shell script to check a compiled driver for functions that have a big stack footprint. It will print a series of lines in the form of sub $0x19c,%esp where "0x19c" is the amount of stack use for the function in question, in (hex) bytes. It writes also a disassembled file called "d" where that line can be looked up to see which C function it corresponds with. Remember that the linux kernel stack that drivers etc can use is 3 Kb or so. (8Kb minus the task struct minus a room you need to keep for irq's to work)
Since we think this is a bug in the driver you are loading, and not in our kernel, I'm closing this as NOTABUG. You can always re-open it if you determine that this is not the case.