180389 – Attempting to use SCSI tape drive triggers kernel BUG

Bug 180389 - Attempting to use SCSI tape drive triggers kernel BUG

Summary: Attempting to use SCSI tape drive triggers kernel BUG

Keywords:
Status:	CLOSED WORKSFORME
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	kernel
Sub Component:
Version:	rawhide
Hardware:	i686
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Assignee:	Dave Jones
QA Contact:	Brian Brock
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	FCMETA_SCSI
TreeView+	depends on / blocked

Reported:	2006-02-07 19:24 UTC by Jay Fenlason
Modified:	2015-01-04 22:25 UTC (History)
CC List:	6 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2006-10-03 17:49:24 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
strace output of amcheck command, showing EBUSY return from tape read (143.30 KB, text/plain) 2006-03-13 20:03 UTC, Jay Fenlason	no flags	Details
View All

Description Jay Fenlason 2006-02-07 19:24:09 UTC

Description of problem: 
Attempting to run amcheck on fenlason-rhide.lab.boston.redhat.com triggers a 
kernel BUG.  Subsequent attempts hang. 
 
Version-Release number of selected component (if applicable): 
2.6.15-1.1909_FC5smp 
 
How reproducible: 
Always 
 
Steps to Reproduce: 
1. reboot fenlason-rhide.lab.boston.redhat.com 
2. run "amcheck JaysCube" as root 
3. Look at dmesg, syslog output 
4. run "amcheck JaysCube" again. 
5. Notice that it hangs. 
   
Actual results: 
eth0: no IPv6 routers present 
 target1:0:1: FAST-20 WIDE SCSI 40.0 MB/s ST (50 ns, offset 15) 
st0: Block limits 2 - 16777215 bytes. 
kfree_debugcheck: out of range ptr fffffffch. 
------------[ cut here ]------------ 
kernel BUG at mm/slab.c:2436! 
invalid opcode: 0000 [#1] 
SMP  
last sysfs file: /block/sda/sda1/size 
Modules linked in: ipv6 ppdev autofs4 rfcomm l2cap bluetooth sunrpc lp 
parport_pc parport floppy nvram uhci_hcd snd_cmipci gameport snd_seq_dummy 
matroxfb_base snd_seq_oss matroxfb_DAC1064 matroxfb_accel snd_seq_midi_event 
matroxfb_Ti3026 matroxfb_g450 snd_seq g450_pll matroxfb_misc snd_pcm_oss 
snd_mixer_oss snd_pcm snd_page_alloc snd_opl3_lib snd_timer snd_hwdep st 
snd_mpu401_uart snd_rawmidi snd_seq_device snd matrox_w1 soundcore wire e100 
i2c_piix4 i2c_core mii dm_snapshot dm_zero dm_mirror dm_mod ext3 jbd aic7xxx 
scsi_transport_spi sd_mod scsi_mod 
CPU:    0 
EIP:    0060:[<c015f0f6>]    Not tainted VLI 
EFLAGS: 00010092   (2.6.15-1.1909_FC5smp)  
EIP is at kfree_debugcheck+0x20/0x51 
eax: 00000031   ebx: 0003ffff   ecx: 00000000   edx: 00000082 
esi: fffffffc   edi: fffffffc   ebp: 00000246   esp: d4a59df8 
ds: 007b   es: 007b   ss: 0068 
Process amcheck (pid: 2680, threadinfo=d4a59000 task=d4ad18d0) 
Stack: <0>c0316ea4 fffffffc 00000000 dffdbb00 c015f84c c0149ade 00000000 
dffdbb00  
       dffd7b70 00000246 c016134f dffdc6f8 c172419c 00000000 d49e2a00 c0149ade  
       dcd38310 c172419c dcd38310 c0167ffd 00000a00 00000000 c0167eb8 e0840356  
Call Trace: 
 [<c015f84c>] cache_free_debugcheck+0x1b/0x1b9     [<c0149ade>] 
mempool_free+0x5f/0x63 
 [<c016134f>] kmem_cache_free+0x2a/0x5c     [<c0149ade>] 
mempool_free+0x5f/0x63 
 [<c0167ffd>] bio_free+0x25/0x30     [<c0167eb8>] bio_put+0x28/0x29 
 [<e0840356>] scsi_execute_async+0x13f/0x317 [scsi_mod]     [<e094d677>] 
st_do_scsi+0x1d3/0x228 [st] 
 [<e094f4f9>] st_sleep_done+0x0/0x3a [st]     [<e0951439>] st_read+0x2fb/0x78b 
[st] 
 [<e095113e>] st_read+0x0/0x78b [st]     [<c0163f8d>] vfs_read+0x9f/0x13e 
 [<c01643d9>] sys_read+0x3c/0x63     [<c0103d21>] syscall_call+0x7/0xb 
Code: f0 09 03 6e 31 c0 58 5b 5e 5f c3 56 53 89 c6 8d 98 00 00 00 40 c1 eb 0c 
3b 1d 80 43 48 c0 72 15 50 68 a4 6e 31 c0 e8 e1 48 fc ff <0f> 0b 84 09 03 6e 
31 c0 58 5a 6b c3 28 03 05 90 43 48 c0 8b 00  
Continuing in 1 seconds.    
 <3>Debug: sleeping function called from invalid context at 
include/linux/rwsem.h:43 
in_atomic():0, irqs_disabled():1 
 [<c01246a1>] profile_task_exit+0x13/0x48     [<c0125f66>] do_exit+0x1c/0x6fc 
 [<c01239eb>] printk+0x14/0x18     [<c0105272>] show_stack+0x0/0xa 
 [<c0105868>] do_invalid_op+0x0/0x9d     [<c01058f9>] do_invalid_op+0x91/0x9d 
 [<c015f0f6>] kfree_debugcheck+0x20/0x51     [<c012339d>] 
release_console_sem+0x199/0x1a1 
 [<c0123843>] vprintk+0x195/0x329     [<c0149b19>] mempool_alloc+0x37/0xd3 
 [<c01d2975>] cfq_set_request+0x318/0x43a     [<c01048bf>] 
error_code+0x4f/0x54 
 [<c015f0f6>] kfree_debugcheck+0x20/0x51     [<c015f84c>] 
cache_free_debugcheck+0x1b/0x1b9 
 [<c0149ade>] mempool_free+0x5f/0x63     [<c016134f>] 
kmem_cache_free+0x2a/0x5c 
 [<c0149ade>] mempool_free+0x5f/0x63     [<c0167ffd>] bio_free+0x25/0x30 
 [<c0167eb8>] bio_put+0x28/0x29     [<e0840356>] 
scsi_execute_async+0x13f/0x317 [scsi_mod] 
 [<e094d677>] st_do_scsi+0x1d3/0x228 [st]     [<e094f4f9>] 
st_sleep_done+0x0/0x3a [st] 
 [<e0951439>] st_read+0x2fb/0x78b [st]     [<e095113e>] st_read+0x0/0x78b [st] 
 [<c0163f8d>] vfs_read+0x9f/0x13e     [<c01643d9>] sys_read+0x3c/0x63 
 [<c0103d21>] syscall_call+0x7/0xb     
 
 
Expected results: 
No kernel BUG.  No hang. 
 
Additional info: 
Ping me if you need the root password

Comment 1 Jay Fenlason 2006-02-07 20:04:39 UTC

Also fails with 
2.6.15-1.1914_FC5smp

Comment 2 Jay Fenlason 2006-02-08 20:03:32 UTC

With 2.6.15-1.1915_FC5smp the second amcheck gets EBUSY instead of hanging.  I 
guess that's progress. :-)

Comment 3 Dave Jones 2006-02-10 03:06:15 UTC

Jay, can you try 2.6.15-1.1926 (or higher) ?

Comment 4 Jay Fenlason 2006-02-10 16:09:00 UTC

Same problem with 
2.6.15-1.1926_FC5smp

Comment 5 Jay Fenlason 2006-02-13 18:48:55 UTC

Still fails with 2.6.15-1.1939_FC5smp

Comment 6 Dave Jones 2006-02-19 01:17:33 UTC

I wonder if the scsi_execute_in_process_context() changes in rc4 fix this.
Can you try 1963 or higher ?

Comment 7 Dave Jones 2006-02-19 01:52:31 UTC

actually, looking again, this looks like a double free, not just a sleeping
whilst atomic problem. bad eyes on my part.

Comment 8 Jay Fenlason 2006-02-28 17:52:58 UTC

2.6.15-1.1991_FC5smp still fails 
 
st0: Block limits 2 - 16777215 bytes. 
kfree_debugcheck: out of range ptr fffffffch. 
------------[ cut here ]------------ 
kernel BUG at mm/slab.c:2490! 
invalid opcode: 0000 [#1] 
SMP  
last sysfs file: /block/sda/sda1/size 
Modules linked in: nfsd exportfs lockd nfs_acl ipv6 ppdev autofs4 rfcomm l2cap 
b 
luetooth sunrpc lp parport_pc parport floppy nvram uhci_hcd st sg snd_cmipci 
gam 
eport snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_pcm_oss 
snd_mixer 
_oss snd_pcm snd_page_alloc matroxfb_base snd_opl3_lib matroxfb_DAC1064 
snd_time 
r matroxfb_accel snd_hwdep matroxfb_Ti3026 snd_mpu401_uart matroxfb_g450 
snd_raw 
midi g450_pll snd_seq_device snd matroxfb_misc soundcore matrox_w1 wire e100 
mii 
 i2c_piix4 i2c_core dm_snapshot dm_zero dm_mirror dm_mod ext3 jbd aic7xxx 
scsi_t 
ransport_spi sd_mod scsi_mod 
CPU:    0 
EIP:    0060:[<c015f311>]    Not tainted VLI 
EFLAGS: 00010086   (2.6.15-1.1991_FC5smp #1)  
EIP is at kfree_debugcheck+0x20/0x51 
eax: 00000031   ebx: 0003ffff   ecx: d4018dfc   edx: c0315f6d 
esi: fffffffc   edi: fffffffc   ebp: 00000246   esp: d4018df8 
ds: 007b   es: 007b   ss: 0068 
Process amcheck (pid: 2568, threadinfo=d4018000 task=d46d03b0) 
Stack: <0>c0315f6d fffffffc 00000000 dffdeb00 c015fa51 c0149d2e 00000000 
dffdeb0 
0  
       dffdcb70 00000246 c0161559 dfd6c9d0 c1770170 00000000 de613598 c0149d2e  
       d463cea0 c1770170 d463cea0 c016822d 00000a00 00000000 c01680e8 e0842418  
Call Trace: 
 [<c015fa51>] cache_free_debugcheck+0x1b/0x1b9     [<c0149d2e>] 
mempool_free+0x5 
f/0x63 
 [<c0161559>] kmem_cache_free+0x2a/0x5c     [<c0149d2e>] 
mempool_free+0x5f/0x63 
 [<c016822d>] bio_free+0x25/0x30     [<c01680e8>] bio_put+0x28/0x29 
 [<e0842418>] scsi_execute_async+0x13f/0x317 [scsi_mod]     [<e0a38677>] 
st_do_s 
csi+0x1d3/0x228 [st] 
 [<e0a3a4f9>] st_sleep_done+0x0/0x3a [st]     [<e0a3c439>] st_read+0x2fb/0x78b 
[ 
st] 
 [<e0a3c13e>] st_read+0x0/0x78b [st]     [<c01641c9>] vfs_read+0x9f/0x13e 
 [<c0164615>] sys_read+0x3c/0x63     [<c0103d81>] syscall_call+0x7/0xb 
Code: 26 0a cc 5e 31 c0 58 5b 5e 5f c3 56 53 89 c6 8d 98 00 00 00 40 c1 eb 0c 
3b 
 1d 00 23 48 c0 72 15 50 68 6d 5f 31 c0 e8 16 48 fc ff <0f> 0b ba 09 cc 5e 31 
c0 
 58 5a 6b c3 28 03 05 10 23 48 c0 8b 00  
Continuing in 1 seconds.    
 <3>Debug: sleeping function called from invalid context at 
include/linux/rwsem. 
h:43 
in_atomic():0, irqs_disabled():1 
 [<c0124881>] profile_task_exit+0x13/0x48     [<c01261a3>] do_exit+0x1c/0x6fc 
 [<c010526c>] register_die_notifier+0x0/0x2f     [<c010584e>] 
do_invalid_op+0x0/ 
0x9d 
 [<c01058df>] do_invalid_op+0x91/0x9d     [<c015f311>] 
kfree_debugcheck+0x20/0x5 
1 
 [<c0123993>] vprintk+0x195/0x329     [<c0149d69>] mempool_alloc+0x37/0xd3 
 [<c0160050>] kmem_cache_alloc+0x7f/0x89     [<c0149d69>] 
mempool_alloc+0x37/0xd 
3 
 [<c0149d69>] mempool_alloc+0x37/0xd3     [<c01d2d5a>] 
cfq_set_request+0x318/0x4 
3a 
 [<c01048a7>] error_code+0x4f/0x54     [<c015f311>] kfree_debugcheck+0x20/0x51 
 [<c015fa51>] cache_free_debugcheck+0x1b/0x1b9     [<c0149d2e>] 
mempool_free+0x5 
f/0x63 
 [<c0161559>] kmem_cache_free+0x2a/0x5c     [<c0149d2e>] 
mempool_free+0x5f/0x63 
 [<c016822d>] bio_free+0x25/0x30     [<c01680e8>] bio_put+0x28/0x29 
 [<e0842418>] scsi_execute_async+0x13f/0x317 [scsi_mod]     [<e0a38677>] 
st_do_s 
csi+0x1d3/0x228 [st] 
 [<e0a3a4f9>] st_sleep_done+0x0/0x3a [st]     [<e0a3c439>] st_read+0x2fb/0x78b 
[ 
st] 
 [<e0a3c13e>] st_read+0x0/0x78b [st]     [<c01641c9>] vfs_read+0x9f/0x13e 
 [<c0164615>] sys_read+0x3c/0x63     [<c0103d81>] syscall_call+0x7/0xb

Comment 9 Jay Fenlason 2006-03-13 20:02:08 UTC

With 2.6.15-1.2041_FC5smp the behavior has changed.  Instead of BUGging 
immediately in amcheck, the kernel return -EBUSY on an attempt to read the 
tape drive, then crashes some time later.  I'm attaching the strace from 
amcheck.  At the time of the crash, the serial console said: 
Unable to handle kernel NULL pointer dereference at virtual address 00 
000014 
 printing eip: 
c015e8f4       
*pde = 1e46f001 
Oops: 0000 [#1] 
SMP             
last sysfs file: /block/sda/sda1/size 
Modules linked in: nfsd exportfs lockd nfs_acl ipv6 ppdev autofs4 hidp rfcomm 
l2 
cap bluetooth sunrpc lp parport_pc parport floppy nvram uhci_hcd st sg 
snd_cmipc 
i gameport snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_pcm_oss 
snd_ 
mixer_oss snd_pcm snd_page_alloc snd_opl3_lib snd_timer snd_hwdep 
snd_mpu401_uar 
t snd_rawmidi snd_seq_device snd soundcore matroxfb_base matroxfb_DAC1064 
matrox 
fb_accel matroxfb_Ti3026 matroxfb_g450 g450_pll matroxfb_misc matrox_w1 wire 
i2c 
_piix4 i2c_core e100 mii dm_snapshot dm_zero dm_mirror dm_mod ext3 jbd aic7xxx 
s 
csi_transport_spi sd_mod scsi_mod 
CPU:    1                         
EIP:    0060:[<c015e8f4>]    Not tainted VLI 
EFLAGS: 00010016   (2.6.15-1.2041_FC5smp #1)  
EIP is at free_block+0x40/0xe5                
eax: 00a00000   ebx: dffd5f20   ecx: 00000010   edx: 00000000 
esi: 00000000   edi: dffdeb40   ebp: 00000005   esp: dffeeef8 
ds: 007b   es: 007b   ss: 0068                                
Process events/1 (pid: 9, threadinfo=dffee000 task=c1700050) 
Stack: <0>c1329678 0000001e dffd6820 dffd6820 0000001e dffd6800 00000000 
c015e9f 
5  
       00000000 dffdeb40 dffd5eb0 dffd5f20 dffdeb40 00000000 c015fe6b 00000000  
       dffd5f4c dffdec14 c1511280 c1511284 c16da120 00000282 c0131488 c015fdce  
Call Trace:                                                                     
 [<c015e9f5>] drain_array_locked+0x5c/0x7b     [<c015fe6b>] 
cache_reap+0x9d/0x19 
1 
 [<c0131488>] run_workqueue+0x7f/0xba     [<c015fdce>] cache_reap+0x0/0x191 
 [<c0131c75>] worker_thread+0x0/0x117     [<c0131d5b>] 
worker_thread+0xe6/0x117 
 [<c011da07>] default_wake_function+0x0/0xc     [<c0134589>] kthread+0x9d/0xc9  
 [<c01344ec>] kthread+0x0/0xc9     [<c0102005>] kernel_thread_helper+0x5/0xb   
Code: 00 00 00 8b 44 24 08 8b 34 a8 8d 86 00 00 00 40 c1 e8 0c 6b c0 28 8b 15 
10 
 f3 47 c0 8b 4c 10 24 8b 54 24 20 8b 9c 97 90 00 00 00 <8b> 51 04 39 0a 74 08 
0f 
 0b a7 00 03 fb 30 c0 8b 01 39 48 04 74  
Continuing in 1 seconds.                 
 <3>Debug: sleeping function called from invalid context at 
include/linux/rwsem. 
h:43 
in_atomic():0, irqs_disabled():1 
 [<c0124c15>] profile_task_exit+0x13/0x48     [<c0126537>] do_exit+0x1c/0x6f5 
 [<c010526c>] register_die_notifier+0x0/0x2f     [<c02f32e0>] 
do_page_fault+0x3b 
1/0x5e2 
 [<c015e8f4>] free_block+0x40/0xe5     [<c02f2f2f>] do_page_fault+0x0/0x5e2 
 [<c01048a7>] error_code+0x4f/0x54     [<c015e8f4>] free_block+0x40/0xe5    
 [<c015e9f5>] drain_array_locked+0x5c/0x7b     [<c015fe6b>] 
cache_reap+0x9d/0x19 
1 
 [<c0131488>] run_workqueue+0x7f/0xba     [<c015fdce>] cache_reap+0x0/0x191 
 [<c0131c75>] worker_thread+0x0/0x117     [<c0131d5b>] 
worker_thread+0xe6/0x117 
 [<c011da07>] default_wake_function+0x0/0xc     [<c0134589>] kthread+0x9d/0xc9  
 [<c01344ec>] kthread+0x0/0xc9     [<c0102005>] kernel_thread_helper+0x5/0xb   
BUG: spinlock lockup on CPU#0, syslogd/1704, dffd5f4c (Not tainted)          
 [<c01d6723>] _raw_spin_lock+0xb9/0xd7     [<c015ec35>] 
cache_alloc_refill+0x63/ 
0x489 
 [<c015ebc8>] kmem_cache_alloc+0x50/0x5a     [<c01490d1>] 
mempool_alloc+0x37/0xd 
3 
 [<c01664f5>] bio_alloc_bioset+0x9b/0xf3     [<c0166558>] bio_alloc+0xb/0x17 
 [<c01634f5>] submit_bh+0x78/0x102     [<c0165e8f>] 
__block_write_full_page+0x1f 
3/0x2fa 
 [<e090aec9>] ext3_get_block+0x0/0x6c [ext3]     [<c016605c>] 
block_write_full_p 
age+0xc6/0xcf 
 [<e090aec9>] ext3_get_block+0x0/0x6c [ext3]     [<e090c569>] 
ext3_ordered_write 
page+0xd0/0x162 [ext3] 
 [<e0909c3a>] bget_one+0x0/0x7 [ext3]     [<c0180fcd>] 
mpage_writepages+0x1a1/0x 
2fd 
 [<e090c499>] ext3_ordered_writepage+0x0/0x162 [ext3]     [<c014b7b4>] 
do_writep 
ages+0x2b/0x32 
 [<c01473aa>] __filemap_fdatawrite_range+0x65/0x71     [<c01473d8>] 
filemap_fdat 
awrite+0xf/0x13 
 [<c0163f65>] do_fsync+0x48/0xbb     [<c0103d81>] syscall_call+0x7/0xb

Comment 10 Jay Fenlason 2006-03-13 20:03:37 UTC

Created attachment 126066 [details]
strace output of amcheck command, showing EBUSY return from tape read

Comment 11 Jay Fenlason 2006-03-24 18:41:35 UTC

Still broken with 2.6.16-1.2083_FC6smp

Comment 12 Alan Cox 2006-09-19 21:44:43 UTC

Are we still seeing this with T3 ?

Comment 13 Rik van Riel 2006-10-03 03:37:31 UTC

Jay, could you check this with the latest rawhide kernel?

If it still happens it should probably be a RHEL5 blocker, since quite a few
customers have tape drives.

Comment 14 Jay Fenlason 2006-10-03 16:40:53 UTC

It won't be easy--I updated Amanda, and the new version doesn't use the same 
access patterns to the tape drive.  I'll try reinstalling the old version and 
see what I can get.

Comment 15 Jay Fenlason 2006-10-03 17:49:24 UTC

It's no longer reproducing with amanda-2.4.5p1-3.2 or amanda-2.5.0p2-4 on the 
2.6.18-1.2726.fc6 kernel.  Closing.

Note You need to log in before you can comment on or make changes to this bug.