Bug 61595 - Assertion failure in ext3_sync_file() at fsync.c:55: "ext3_journal_current_handle() == 0"
Summary: Assertion failure in ext3_sync_file() at fsync.c:55: "ext3_journal_current_ha...
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat Linux
Classification: Retired
Component: kernel
Version: 7.2
Hardware: i686
OS: Linux
medium
medium
Target Milestone: ---
Assignee: Arjan van de Ven
QA Contact: Brian Brock
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2002-03-21 23:02 UTC by Andrew Vasquez
Modified: 2007-04-18 16:41 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2002-03-22 23:14:57 UTC
Embargoed:


Attachments (Terms of Use)
simple shell script to check for stack abuse (107 bytes, application/octet-stream)
2002-03-22 18:32 UTC, Arjan van de Ven
no flags Details

Description Andrew Vasquez 2002-03-21 23:02:00 UTC
From Bugzilla Helper:
User-Agent: Mozilla/4.0 (compatible; MSIE 5.5; Windows NT 4.0)

Description of problem:
While running heavy I/O with RH72, QLogic FC driver 5.38.3, and 
2.4.9-enterprise and 2.4.9-21smp we are experiencing this ext3 kernel
bug.   

Topology:
	1 - Dell 2450 SMP machine
	1 - Dell 5450 SMP machine
	Each HBA connected to one zone (of two) of a Dell 56F.  
	A dual ported Dell PowerVault sits behind the switch (each port
			once again connected to a zone).

The problem has occured on two distinct setups, similiarly configured.
Attached ksymoops information should help.

I'm hoping to get some clearer insight as to how this condition would
be triggered (ext3_journal_current_handle() == journal_current_handle ==
current->journal_info).

FYI:  The kernel is tainted because the 5.38.3 driver checks for
kernel versions higher than 2.4.9 (not inclusive) to add the 
MODULE_LICENSE information.


Version-Release number of selected component (if applicable):


How reproducible:
Sometimes

Steps to Reproduce:
1. <Description>
2.
3.
	

Additional info:

Enterprise setup:
------------------------------- 8< ---------------------------------
ksymoops 2.4.1 on i686 2.4.7-10enterprise.  Options used
     -V (default)
     -k /proc/ksyms (default)
     -l /proc/modules (default)
     -o /lib/modules/2.4.7-10enterprise/ (default)
     -m /boot/System.map-2.4.7-10enterprise (default)

Warning: You did not tell me where to find symbol information.  I will
assume that the log matches the kernel and modules that are running
right now and I'll use the default options above for symbol resolution.
If the current kernel and/or modules do not match the log, you can get
more accurate output by telling me the kernel version and where to find
map, modules, ksyms etc.  ksymoops -h explains the options.

Error (expand_objects): cannot stat(/lib/ext3.o) for ext3
ksymoops: No such file or directory
Error (expand_objects): cannot stat(/lib/jbd.o) for jbd
ksymoops: No such file or directory
Error (expand_objects): cannot stat(/lib/megaraid.o) for megaraid
ksymoops: No such file or directory
Error (expand_objects): cannot stat(/lib/aic7xxx.o) for aic7xxx
ksymoops: No such file or directory
Error (expand_objects): cannot stat(/lib/sd_mod.o) for sd_mod
ksymoops: No such file or directory
Error (expand_objects): cannot stat(/lib/scsi_mod.o) for scsi_mod
ksymoops: No such file or directory
Warning (compare_maps): mismatch on symbol partition_name  , ksyms_base says 
c01c09e0, System.map says c0160900.  Ignoring ksyms_base entry
Warning (compare_maps): mismatch on symbol debug_buff  , qla2200 says 
f8872640, /lib/modules/2.4.7-10enterprise/kernel/drivers/scsi/qla2200.o says 
f885d580.  Ignoring /lib/modules/2.4.7-
10enterprise/kernel/drivers/scsi/qla2200.o entry
Warning (compare_maps): mismatch on symbol qla_fo_params  , qla2200 says 
f8872600, /lib/modules/2.4.7-10enterprise/kernel/drivers/scsi/qla2200.o says 
f885d540.Ignoring /lib/modules/2.4.7-
10enterprise/kernel/drivers/scsi/qla2200.o entry
Warning (compare_maps): mismatch on symbol usb_devfs_handle  , usbcore says 
f89a04c0, /lib/modules/2.4.7-10enterprise/kernel/drivers/usb/usbcore.o says 
f899ffe0.  Ignoring /lib/modules/2.4.7-
10enterprise/kernel/drivers/usb/usbcore.o entry
Warning (map_ksym_to_module): cannot match loaded module ext3 to a unique 
module object.  Trace may not be reliable.
Warning (compare_maps): mismatch on symbol mega_hbas  , megaraid says 
f88458c0, /lib/modules/2.4.7-10enterprise/kernel/drivers/scsi/megaraid.o says 
f8845600.  Ignoring /lib/modules/2.4.7-
10enterprise/kernel/drivers/scsi/megaraid.o entry
Warning (compare_maps): mismatch on symbol sd  , sd_mod says 
f881cce4, /lib/modules/2.4.7-10enterprise/kernel/drivers/scsi/sd_mod.o says 
f881cba0.  Ignoring /lib/modules/2.4.7-
10enterprise/kernel/drivers/scsi/sd_mod.o entry
Warning (compare_maps): mismatch on symbol proc_scsi  , scsi_mod says 
f8818088, /lib/modules/2.4.7-10enterprise/kernel/drivers/scsi/scsi_mod.o says 
f8816910.  Ignoring /lib/modules/2.4.7-
10enterprise/kernel/drivers/scsi/scsi_mod.o entry
Warning (compare_maps): mismatch on symbol scsi_devicelist  , scsi_mod says 
f88180b4, /lib/modules/2.4.7-10enterprise/kernel/drivers/scsi/scsi_mod.o says 
f881693c.  Ignoring /lib/modules/2.4.7-
10enterprise/kernel/drivers/scsi/scsi_mod.o entry
Warning (compare_maps): mismatch on symbol scsi_hostlist  , scsi_mod says 
f88180b0, /lib/modules/2.4.7-10enterprise/kernel/drivers/scsi/scsi_mod.o says 
f8816938.  Ignoring /lib/modules/2.4.7-
10enterprise/kernel/drivers/scsi/scsi_mod.o entry
Warning (compare_maps): mismatch on symbol scsi_hosts  , scsi_mod says 
f88180b8, /lib/modules/2.4.7-10enterprise/kernel/drivers/scsi/scsi_mod.o says 
f8816940.  Ignoring /lib/modules/2.4.7-
10enterprise/kernel/drivers/scsi/scsi_mod.o entry
Warning (compare_maps): mismatch on symbol scsi_logging_level  , scsi_mod says 
f8818084, /lib/modules/2.4.7-10enterprise/kernel/drivers/scsi/scsi_mod.o says 
f881690c.  Ignoring /lib/modules/2.4.7-
10enterprise/kernel/drivers/scsi/scsi_mod.o entry
kernel BUG at fsync.c:55!
invalid operand: 0000
CPU:    3
EIP:    0010:[e100:__insmod_e100_O/lib/modules/2.4.7-10enterprise/kernel/drive+-
1213164/96]
EIP:    0010:[<f8881d14>]
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00010286
eax: 0000001a   ebx: f45e0940   ecx: c02616dc   edx: 00008932
esi: f45e0940   edi: f46264a0   ebp: 409b4000   esp: f5b53f6c
ds: 0018   es: 0018   ss: 0018
Process qlremote (pid: 1016, stackpage=f5b53000)
Stack: f888f7f0 00000037 f45e0940 f678ee40 c012ef80 f678ee40 f46264a0 00000001
       f3f8b860 409a3000 fffffff2 c012f05b f3f8b860 409a3000 409b4000 00000004
       00000000 f5b52000 00000004 000100fe 409a2974 c010716b 409a3000 000100fe
Call Trace: [e100:__insmod_e100_O/lib/modules/2.4.7-10enterprise/kernel/drive+-
1157136/96] [msync_interval+112/160] [sys_msync+171/272] [system_call+51/56]
Call Trace: [<f888f7f0>] [<c012ef80>] [<c012f05b>] [<c010716b>]
Code: 0f 0b 58 5a 56 e8 62 d5 8b c7 8b b6 a0 00 00 00 89 c3 56 e8

>>EIP; f8881d14 <[ext3].text.start+1cb4/b89f>   <=====
Trace; f888f7f0 <[ext3].text.end+3ef1/4821>
Trace; c012ef80 <msync_interval+70/a0>
Trace; c012f05b <sys_msync+ab/110>
Trace; c010716b <system_call+33/38>
Code;  f8881d14 <[ext3].text.start+1cb4/b89f>
00000000 <_EIP>:
Code;  f8881d14 <[ext3].text.start+1cb4/b89f>   <=====
   0:   0f 0b                     ud2a      <=====
Code;  f8881d16 <[ext3].text.start+1cb6/b89f>
   2:   58                        pop    %eax
Code;  f8881d17 <[ext3].text.start+1cb7/b89f>
   3:   5a                        pop    %edx
Code;  f8881d18 <[ext3].text.start+1cb8/b89f>
   4:   56                        push   %esi
Code;  f8881d19 <[ext3].text.start+1cb9/b89f>
   5:   e8 62 d5 8b c7            call   c78bd56c <_EIP+0xc78bd56c> c013f280 
<fsync_inode_buffers+0/170>
Code;  f8881d1e <[ext3].text.start+1cbe/b89f>
   a:   8b b6 a0 00 00 00         mov    0xa0(%esi),%esi
Code;  f8881d24 <[ext3].text.start+1cc4/b89f>
  10:   89 c3                     mov    %eax,%ebx
Code;  f8881d26 <[ext3].text.start+1cc6/b89f>
  12:   56                        push   %esi
Code;  f8881d27 <[ext3].text.start+1cc7/b89f>
  13:   e8 00 00 00 00            call   18 <_EIP+0x18> f8881d2c <
[ext3].text.start+1ccc/b89f>


13 warnings and 6 errors issued.  Results may not be reliable.
------------------------------- 8< ---------------------------------

2.4.9-21SMP setup:

------------------------------- 8< ---------------------------------
ksymoops 2.4.1 on i686 2.4.9-21custom-SIG.  Options used
     -V (default)
     -k /proc/ksyms (default)
     -l /proc/modules (default)
     -o /lib/modules/2.4.9-21custom-SIG/ (default)
     -m /boot/System.map-2.4.9-21custom-SIG (default)

Warning: You did not tell me where to find symbol information.  I will
assume that the log matches the kernel and modules that are running
right now and I'll use the default options above for symbol resolution.
If the current kernel and/or modules do not match the log, you can get
more accurate output by telling me the kernel version and where to find
map, modules, ksyms etc.  ksymoops -h explains the options.

Error (expand_objects): cannot stat(/lib/ext3.o) for ext3
Error (expand_objects): cannot stat(/lib/jbd.o) for jbd
Error (expand_objects): cannot stat(/lib/aic7xxx.o) for aic7xxx
Error (expand_objects): cannot stat(/lib/aacraid.o) for aacraid
Error (expand_objects): cannot stat(/lib/sd_mod.o) for sd_mod
Error (expand_objects): cannot stat(/lib/scsi_mod.o) for scsi_mod
Warning (read_object): no symbols in /lib/modules/2.4.9-21custom-
SIG/build/abi/abi.o
Warning (read_object): no symbols in /lib/modules/2.4.9-21custom-
SIG/build/drivers/addon/cipe/bf-i386.o
Warning (read_object): no symbols in /lib/modules/2.4.9-21custom-
SIG/build/drivers/addon/addon.o
Warning (read_object): no symbols in /lib/modules/2.4.9-21custom-
SIG/build/drivers/cdrom/driver.o
Warning (read_object): no symbols in /lib/modules/2.4.9-21custom-
SIG/build/drivers/char/drm/drm.o
Warning (read_object): no symbols in /lib/modules/2.4.9-21custom-
SIG/build/drivers/media/radio/radio.o
Warning (read_object): no symbols in /lib/modules/2.4.9-21custom-
SIG/build/drivers/media/video/video.o
Warning (read_object): no symbols in /lib/modules/2.4.9-21custom-
SIG/build/drivers/media/media.o
Warning (read_object): no symbols in /lib/modules/2.4.9-21custom-
SIG/build/drivers/misc/misc.o
Warning (read_object): no symbols in /lib/modules/2.4.9-21custom-
SIG/build/drivers/net/appletalk/appletalk.o
Warning (read_object): no symbols in /lib/modules/2.4.9-21custom-
SIG/build/drivers/net/fc/fc.o
Warning (read_object): no symbols in /lib/modules/2.4.9-21custom-
SIG/build/drivers/net/pcmcia/pcmcia_net.o
Warning (read_object): no symbols in /lib/modules/2.4.9-21custom-
SIG/build/drivers/net/tokenring/tr.o
Warning (read_object): no symbols in /lib/modules/2.4.9-21custom-
SIG/build/drivers/net/wan/wan.o
Warning (read_object): no symbols in /lib/modules/2.4.9-21custom-
SIG/build/drivers/net/wireless/wireless_net.o
Warning (read_object): no symbols in /lib/modules/2.4.9-21custom-
SIG/build/drivers/parport/driver.o
Warning (read_object): no symbols in /lib/modules/2.4.9-21custom-
SIG/build/drivers/sound/sounddrivers.o
Warning (read_object): no symbols in /lib/modules/2.4.9-21custom-
SIG/build/net/ipv4/netfilter/netfilter.o
Warning (read_object): no symbols in /lib/modules/2.4.9-21custom-
SIG/build/net/ipv6/netfilter/netfilter.o
Warning (compare_maps): ksyms_base symbol GPLONLY_pci_hp_change_slot_info not 
found in System.map.  Ignoring ksyms_base entry
Warning (compare_maps): ksyms_base symbol GPLONLY_pci_hp_deregister not found 
in System.map.  Ignoring ksyms_base entry
Warning (compare_maps): ksyms_base symbol GPLONLY_pci_hp_register not found in 
System.map.  Ignoring ksyms_base entry
Warning (compare_maps): mismatch on symbol partition_name  , ksyms_base says 
c01c28e0, System.map says c01617e0.  Ignoring ksyms_base entry
Warning (compare_maps): mismatch on symbol usb_devfs_handle  , usbcore says 
f08f5180, /lib/modules/2.4.9-21custom-SIG/kernel/drivers/usb/usbcore.o says 
f08f4ca0.  Ignoring /lib/modules/2.4.9-21custom-
SIG/kernel/drivers/usb/usbcore.o entry
Warning (map_ksym_to_module): cannot match loaded module ext3 to a unique 
module object.  Trace may not be reliable.
Warning (map_ksym_to_module): cannot match loaded module jbd to a unique module 
object.  Trace may not be reliable.
Warning (map_ksym_to_module): cannot match loaded module aic7xxx to a unique 
module object.  Trace may not be reliable.
Warning (map_ksym_to_module): cannot match loaded module aacraid to a unique 
module object.  Trace may not be reliable.
Warning (map_ksym_to_module): cannot match loaded module sd_mod to a unique 
module object.  Trace may not be reliable.
Warning (map_ksym_to_module): cannot match loaded module scsi_mod to a unique 
module object.  Trace may not be reliable.
kernel BUG at fsync.c:55!
invalid operand: 0000
CPU:    0
EIP:    0010:[eepro100:__insmod_eepro100_O/lib/modules/2.4.9-21custom-
SIG/kernel/d+-763020/96]    Tainted: P 
EIP:    0010:[<f0856b74>]    Tainted: P 
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00010296
eax: 0000001a   ebx: e0402900   ecx: c02da064   edx: 0001d843
esi: e0402900   edi: 00000000   ebp: 409b8000   esp: e3a05f6c
ds: 0018   es: 0018   ss: 0018
Process qlremote (pid: 1698, stackpage=e3a05000)
Stack: f0863ac2 00000037 e0402900 ec83c120 c012f630 ec83c120 ead40ce0 00000001 
       eba22a60 409a7000 fffffff2 c012f70b eba22a60 409a7000 409b8000 00000004 
       00000000 e3a04000 00000004 000100fe 409a6974 c010719b 409a7000 000100fe 
Call Trace: [eepro100:__insmod_eepro100_O/lib/modules/2.4.9-21custom-
SIG/kernel/d+-709950/96] __insmod_ext3_S.text_L45152 [ext3] 0xea62 
Call Trace: [<f0863ac2>] __insmod_ext3_S.text_L45152 [ext3] 0xea62 
[<c012f630>] msync_interval [kernel] 0x70 
[<c012f70b>] sys_msync [kernel] 0xab 
[<c010719b>] system_call [kernel] 0x33 
Code: 0f 0b 58 5a 56 e8 32 8d 8e cf 8b b6 a0 00 00 00 89 c3 56 e8 

>>EIP; f0856b74 <[ext3].text.start+1b14/b05f>   <=====
Trace; f0863ac2 <[ext3].text.end+3a03/42e1>
Trace; c012f630 <msync_interval+70/a0>
Trace; c012f70b <sys_msync+ab/110>
Trace; c010719b <system_call+33/38>
Code;  f0856b74 <[ext3].text.start+1b14/b05f>
00000000 <_EIP>:
Code;  f0856b74 <[ext3].text.start+1b14/b05f>   <=====
   0:   0f 0b                     ud2a      <=====
Code;  f0856b76 <[ext3].text.start+1b16/b05f>
   2:   58                        pop    %eax
Code;  f0856b77 <[ext3].text.start+1b17/b05f>
   3:   5a                        pop    %edx
Code;  f0856b78 <[ext3].text.start+1b18/b05f>
   4:   56                        push   %esi
Code;  f0856b79 <[ext3].text.start+1b19/b05f>
   5:   e8 32 8d 8e cf            call   cf8e8d3c <_EIP+0xcf8e8d3c> c013f8b0 
<fsync_inode_buffers+0/180>
Code;  f0856b7e <[ext3].text.start+1b1e/b05f>
   a:   8b b6 a0 00 00 00         mov    0xa0(%esi),%esi
Code;  f0856b84 <[ext3].text.start+1b24/b05f>
  10:   89 c3                     mov    %eax,%ebx
Code;  f0856b86 <[ext3].text.start+1b26/b05f>
  12:   56                        push   %esi
Code;  f0856b87 <[ext3].text.start+1b27/b05f>
  13:   e8 00 00 00 00            call   18 <_EIP+0x18> f0856b8c <
[ext3].text.start+1b2c/b05f>


31 warnings and 6 errors issued.  Results may not be reliable.
------------------------------- 8< ---------------------------------

Comment 1 Arjan van de Ven 2002-03-22 09:43:28 UTC
Does the 5.38.3 driver have the stackoverflow that was in 5.31 fixed ?


Comment 2 Andrew Vasquez 2002-03-22 17:59:35 UTC
Arjan,

I'm looking at the changelog from the RH1 drop you had worked on.  There
are three places where you note 'buffer/stack overflow':


	* fix bufferoverflow from the /proc file
	* fix using stack variables to register IO ports and other resources
  	  (and bufferoverflow in the same code)

I can see the problem in the code for these two (qla2100_proc_info, 
qla2100_register_with_Linux).  But, for the third entry:

	* fix stack overflow that caused oopses and/or memory corruption

I've scanned the diffs, but cannot seem to trace this down to a specific
change beyond the following lines:

	-    request_region(host->io_port, 0xff, drvname);
	+    sprintf(ha->devname, "qla2x00#%02d", host->unique_id);
	+    request_region(host->io_port, 0xff, ha->devname);
 
	-    /* ER# 4368 */
	-    sprintf(drvname, "qla2x00#%02d", host->unique_id);

Where drvname was originally defined char drvname[9] (bad!).

Thanks!

Comment 3 Stephen Tweedie 2002-03-22 18:27:06 UTC
All that has happened here is that we've entered an ext3 function, and the
"task->journal" entry in the task_struct is already set.  That's the last item
in the task struct, and the task struct is allocated just below the kernel
stack, so if there is any chance there's a stack overflow involved, this is
exactly where you'd expect to see it detected.

We really need to eliminate that possibility if there is any question of such
overflow in the driver.

Comment 4 Arjan van de Ven 2002-03-22 18:32:38 UTC
Created attachment 49724 [details]
simple shell script to check for stack abuse

Comment 5 Arjan van de Ven 2002-03-22 18:35:27 UTC
I've attached a simple shell script to check a compiled driver for functions
that have a big stack footprint. 
It will print a series of lines in the form of
sub    $0x19c,%esp

where "0x19c" is the amount of stack use for the function in question, in (hex)
bytes. It writes also a disassembled file called "d" where that line can be
looked up to see which C function it corresponds with.

Remember that the linux kernel stack that drivers etc can use is 3 Kb or so.
(8Kb minus the task struct minus a room you need to keep for irq's to work)


Comment 6 Michael K. Johnson 2002-03-28 16:52:20 UTC
Since we think this is a bug in the driver you are loading, and not in
our kernel, I'm closing this as NOTABUG.  You can always re-open it if
you determine that this is not the case.


Note You need to log in before you can comment on or make changes to this bug.