Bug 180389
| Summary: | Attempting to use SCSI tape drive triggers kernel BUG | ||||||
|---|---|---|---|---|---|---|---|
| Product: | [Fedora] Fedora | Reporter: | Jay Fenlason <fenlason> | ||||
| Component: | kernel | Assignee: | Dave Jones <davej> | ||||
| Status: | CLOSED WORKSFORME | QA Contact: | Brian Brock <bbrock> | ||||
| Severity: | medium | Docs Contact: | |||||
| Priority: | medium | ||||||
| Version: | rawhide | CC: | alan, jfeeney, pfrields, riel, syeghiay, wtogami | ||||
| Target Milestone: | --- | ||||||
| Target Release: | --- | ||||||
| Hardware: | i686 | ||||||
| OS: | Linux | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2006-10-03 17:49:24 UTC | Type: | --- | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Bug Depends On: | |||||||
| Bug Blocks: | 175429 | ||||||
| Attachments: |
|
||||||
Also fails with 2.6.15-1.1914_FC5smp With 2.6.15-1.1915_FC5smp the second amcheck gets EBUSY instead of hanging. I guess that's progress. :-) Jay, can you try 2.6.15-1.1926 (or higher) ? Same problem with 2.6.15-1.1926_FC5smp Still fails with 2.6.15-1.1939_FC5smp I wonder if the scsi_execute_in_process_context() changes in rc4 fix this. Can you try 1963 or higher ? actually, looking again, this looks like a double free, not just a sleeping whilst atomic problem. bad eyes on my part. 2.6.15-1.1991_FC5smp still fails
st0: Block limits 2 - 16777215 bytes.
kfree_debugcheck: out of range ptr fffffffch.
------------[ cut here ]------------
kernel BUG at mm/slab.c:2490!
invalid opcode: 0000 [#1]
SMP
last sysfs file: /block/sda/sda1/size
Modules linked in: nfsd exportfs lockd nfs_acl ipv6 ppdev autofs4 rfcomm l2cap
b
luetooth sunrpc lp parport_pc parport floppy nvram uhci_hcd st sg snd_cmipci
gam
eport snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_pcm_oss
snd_mixer
_oss snd_pcm snd_page_alloc matroxfb_base snd_opl3_lib matroxfb_DAC1064
snd_time
r matroxfb_accel snd_hwdep matroxfb_Ti3026 snd_mpu401_uart matroxfb_g450
snd_raw
midi g450_pll snd_seq_device snd matroxfb_misc soundcore matrox_w1 wire e100
mii
i2c_piix4 i2c_core dm_snapshot dm_zero dm_mirror dm_mod ext3 jbd aic7xxx
scsi_t
ransport_spi sd_mod scsi_mod
CPU: 0
EIP: 0060:[<c015f311>] Not tainted VLI
EFLAGS: 00010086 (2.6.15-1.1991_FC5smp #1)
EIP is at kfree_debugcheck+0x20/0x51
eax: 00000031 ebx: 0003ffff ecx: d4018dfc edx: c0315f6d
esi: fffffffc edi: fffffffc ebp: 00000246 esp: d4018df8
ds: 007b es: 007b ss: 0068
Process amcheck (pid: 2568, threadinfo=d4018000 task=d46d03b0)
Stack: <0>c0315f6d fffffffc 00000000 dffdeb00 c015fa51 c0149d2e 00000000
dffdeb0
0
dffdcb70 00000246 c0161559 dfd6c9d0 c1770170 00000000 de613598 c0149d2e
d463cea0 c1770170 d463cea0 c016822d 00000a00 00000000 c01680e8 e0842418
Call Trace:
[<c015fa51>] cache_free_debugcheck+0x1b/0x1b9 [<c0149d2e>]
mempool_free+0x5
f/0x63
[<c0161559>] kmem_cache_free+0x2a/0x5c [<c0149d2e>]
mempool_free+0x5f/0x63
[<c016822d>] bio_free+0x25/0x30 [<c01680e8>] bio_put+0x28/0x29
[<e0842418>] scsi_execute_async+0x13f/0x317 [scsi_mod] [<e0a38677>]
st_do_s
csi+0x1d3/0x228 [st]
[<e0a3a4f9>] st_sleep_done+0x0/0x3a [st] [<e0a3c439>] st_read+0x2fb/0x78b
[
st]
[<e0a3c13e>] st_read+0x0/0x78b [st] [<c01641c9>] vfs_read+0x9f/0x13e
[<c0164615>] sys_read+0x3c/0x63 [<c0103d81>] syscall_call+0x7/0xb
Code: 26 0a cc 5e 31 c0 58 5b 5e 5f c3 56 53 89 c6 8d 98 00 00 00 40 c1 eb 0c
3b
1d 00 23 48 c0 72 15 50 68 6d 5f 31 c0 e8 16 48 fc ff <0f> 0b ba 09 cc 5e 31
c0
58 5a 6b c3 28 03 05 10 23 48 c0 8b 00
Continuing in 1 seconds.
<3>Debug: sleeping function called from invalid context at
include/linux/rwsem.
h:43
in_atomic():0, irqs_disabled():1
[<c0124881>] profile_task_exit+0x13/0x48 [<c01261a3>] do_exit+0x1c/0x6fc
[<c010526c>] register_die_notifier+0x0/0x2f [<c010584e>]
do_invalid_op+0x0/
0x9d
[<c01058df>] do_invalid_op+0x91/0x9d [<c015f311>]
kfree_debugcheck+0x20/0x5
1
[<c0123993>] vprintk+0x195/0x329 [<c0149d69>] mempool_alloc+0x37/0xd3
[<c0160050>] kmem_cache_alloc+0x7f/0x89 [<c0149d69>]
mempool_alloc+0x37/0xd
3
[<c0149d69>] mempool_alloc+0x37/0xd3 [<c01d2d5a>]
cfq_set_request+0x318/0x4
3a
[<c01048a7>] error_code+0x4f/0x54 [<c015f311>] kfree_debugcheck+0x20/0x51
[<c015fa51>] cache_free_debugcheck+0x1b/0x1b9 [<c0149d2e>]
mempool_free+0x5
f/0x63
[<c0161559>] kmem_cache_free+0x2a/0x5c [<c0149d2e>]
mempool_free+0x5f/0x63
[<c016822d>] bio_free+0x25/0x30 [<c01680e8>] bio_put+0x28/0x29
[<e0842418>] scsi_execute_async+0x13f/0x317 [scsi_mod] [<e0a38677>]
st_do_s
csi+0x1d3/0x228 [st]
[<e0a3a4f9>] st_sleep_done+0x0/0x3a [st] [<e0a3c439>] st_read+0x2fb/0x78b
[
st]
[<e0a3c13e>] st_read+0x0/0x78b [st] [<c01641c9>] vfs_read+0x9f/0x13e
[<c0164615>] sys_read+0x3c/0x63 [<c0103d81>] syscall_call+0x7/0xb
With 2.6.15-1.2041_FC5smp the behavior has changed. Instead of BUGging
immediately in amcheck, the kernel return -EBUSY on an attempt to read the
tape drive, then crashes some time later. I'm attaching the strace from
amcheck. At the time of the crash, the serial console said:
Unable to handle kernel NULL pointer dereference at virtual address 00
000014
printing eip:
c015e8f4
*pde = 1e46f001
Oops: 0000 [#1]
SMP
last sysfs file: /block/sda/sda1/size
Modules linked in: nfsd exportfs lockd nfs_acl ipv6 ppdev autofs4 hidp rfcomm
l2
cap bluetooth sunrpc lp parport_pc parport floppy nvram uhci_hcd st sg
snd_cmipc
i gameport snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_pcm_oss
snd_
mixer_oss snd_pcm snd_page_alloc snd_opl3_lib snd_timer snd_hwdep
snd_mpu401_uar
t snd_rawmidi snd_seq_device snd soundcore matroxfb_base matroxfb_DAC1064
matrox
fb_accel matroxfb_Ti3026 matroxfb_g450 g450_pll matroxfb_misc matrox_w1 wire
i2c
_piix4 i2c_core e100 mii dm_snapshot dm_zero dm_mirror dm_mod ext3 jbd aic7xxx
s
csi_transport_spi sd_mod scsi_mod
CPU: 1
EIP: 0060:[<c015e8f4>] Not tainted VLI
EFLAGS: 00010016 (2.6.15-1.2041_FC5smp #1)
EIP is at free_block+0x40/0xe5
eax: 00a00000 ebx: dffd5f20 ecx: 00000010 edx: 00000000
esi: 00000000 edi: dffdeb40 ebp: 00000005 esp: dffeeef8
ds: 007b es: 007b ss: 0068
Process events/1 (pid: 9, threadinfo=dffee000 task=c1700050)
Stack: <0>c1329678 0000001e dffd6820 dffd6820 0000001e dffd6800 00000000
c015e9f
5
00000000 dffdeb40 dffd5eb0 dffd5f20 dffdeb40 00000000 c015fe6b 00000000
dffd5f4c dffdec14 c1511280 c1511284 c16da120 00000282 c0131488 c015fdce
Call Trace:
[<c015e9f5>] drain_array_locked+0x5c/0x7b [<c015fe6b>]
cache_reap+0x9d/0x19
1
[<c0131488>] run_workqueue+0x7f/0xba [<c015fdce>] cache_reap+0x0/0x191
[<c0131c75>] worker_thread+0x0/0x117 [<c0131d5b>]
worker_thread+0xe6/0x117
[<c011da07>] default_wake_function+0x0/0xc [<c0134589>] kthread+0x9d/0xc9
[<c01344ec>] kthread+0x0/0xc9 [<c0102005>] kernel_thread_helper+0x5/0xb
Code: 00 00 00 8b 44 24 08 8b 34 a8 8d 86 00 00 00 40 c1 e8 0c 6b c0 28 8b 15
10
f3 47 c0 8b 4c 10 24 8b 54 24 20 8b 9c 97 90 00 00 00 <8b> 51 04 39 0a 74 08
0f
0b a7 00 03 fb 30 c0 8b 01 39 48 04 74
Continuing in 1 seconds.
<3>Debug: sleeping function called from invalid context at
include/linux/rwsem.
h:43
in_atomic():0, irqs_disabled():1
[<c0124c15>] profile_task_exit+0x13/0x48 [<c0126537>] do_exit+0x1c/0x6f5
[<c010526c>] register_die_notifier+0x0/0x2f [<c02f32e0>]
do_page_fault+0x3b
1/0x5e2
[<c015e8f4>] free_block+0x40/0xe5 [<c02f2f2f>] do_page_fault+0x0/0x5e2
[<c01048a7>] error_code+0x4f/0x54 [<c015e8f4>] free_block+0x40/0xe5
[<c015e9f5>] drain_array_locked+0x5c/0x7b [<c015fe6b>]
cache_reap+0x9d/0x19
1
[<c0131488>] run_workqueue+0x7f/0xba [<c015fdce>] cache_reap+0x0/0x191
[<c0131c75>] worker_thread+0x0/0x117 [<c0131d5b>]
worker_thread+0xe6/0x117
[<c011da07>] default_wake_function+0x0/0xc [<c0134589>] kthread+0x9d/0xc9
[<c01344ec>] kthread+0x0/0xc9 [<c0102005>] kernel_thread_helper+0x5/0xb
BUG: spinlock lockup on CPU#0, syslogd/1704, dffd5f4c (Not tainted)
[<c01d6723>] _raw_spin_lock+0xb9/0xd7 [<c015ec35>]
cache_alloc_refill+0x63/
0x489
[<c015ebc8>] kmem_cache_alloc+0x50/0x5a [<c01490d1>]
mempool_alloc+0x37/0xd
3
[<c01664f5>] bio_alloc_bioset+0x9b/0xf3 [<c0166558>] bio_alloc+0xb/0x17
[<c01634f5>] submit_bh+0x78/0x102 [<c0165e8f>]
__block_write_full_page+0x1f
3/0x2fa
[<e090aec9>] ext3_get_block+0x0/0x6c [ext3] [<c016605c>]
block_write_full_p
age+0xc6/0xcf
[<e090aec9>] ext3_get_block+0x0/0x6c [ext3] [<e090c569>]
ext3_ordered_write
page+0xd0/0x162 [ext3]
[<e0909c3a>] bget_one+0x0/0x7 [ext3] [<c0180fcd>]
mpage_writepages+0x1a1/0x
2fd
[<e090c499>] ext3_ordered_writepage+0x0/0x162 [ext3] [<c014b7b4>]
do_writep
ages+0x2b/0x32
[<c01473aa>] __filemap_fdatawrite_range+0x65/0x71 [<c01473d8>]
filemap_fdat
awrite+0xf/0x13
[<c0163f65>] do_fsync+0x48/0xbb [<c0103d81>] syscall_call+0x7/0xb
Created attachment 126066 [details]
strace output of amcheck command, showing EBUSY return from tape read
Still broken with 2.6.16-1.2083_FC6smp Are we still seeing this with T3 ? Jay, could you check this with the latest rawhide kernel? If it still happens it should probably be a RHEL5 blocker, since quite a few customers have tape drives. It won't be easy--I updated Amanda, and the new version doesn't use the same access patterns to the tape drive. I'll try reinstalling the old version and see what I can get. It's no longer reproducing with amanda-2.4.5p1-3.2 or amanda-2.5.0p2-4 on the 2.6.18-1.2726.fc6 kernel. Closing. |
Description of problem: Attempting to run amcheck on fenlason-rhide.lab.boston.redhat.com triggers a kernel BUG. Subsequent attempts hang. Version-Release number of selected component (if applicable): 2.6.15-1.1909_FC5smp How reproducible: Always Steps to Reproduce: 1. reboot fenlason-rhide.lab.boston.redhat.com 2. run "amcheck JaysCube" as root 3. Look at dmesg, syslog output 4. run "amcheck JaysCube" again. 5. Notice that it hangs. Actual results: eth0: no IPv6 routers present target1:0:1: FAST-20 WIDE SCSI 40.0 MB/s ST (50 ns, offset 15) st0: Block limits 2 - 16777215 bytes. kfree_debugcheck: out of range ptr fffffffch. ------------[ cut here ]------------ kernel BUG at mm/slab.c:2436! invalid opcode: 0000 [#1] SMP last sysfs file: /block/sda/sda1/size Modules linked in: ipv6 ppdev autofs4 rfcomm l2cap bluetooth sunrpc lp parport_pc parport floppy nvram uhci_hcd snd_cmipci gameport snd_seq_dummy matroxfb_base snd_seq_oss matroxfb_DAC1064 matroxfb_accel snd_seq_midi_event matroxfb_Ti3026 matroxfb_g450 snd_seq g450_pll matroxfb_misc snd_pcm_oss snd_mixer_oss snd_pcm snd_page_alloc snd_opl3_lib snd_timer snd_hwdep st snd_mpu401_uart snd_rawmidi snd_seq_device snd matrox_w1 soundcore wire e100 i2c_piix4 i2c_core mii dm_snapshot dm_zero dm_mirror dm_mod ext3 jbd aic7xxx scsi_transport_spi sd_mod scsi_mod CPU: 0 EIP: 0060:[<c015f0f6>] Not tainted VLI EFLAGS: 00010092 (2.6.15-1.1909_FC5smp) EIP is at kfree_debugcheck+0x20/0x51 eax: 00000031 ebx: 0003ffff ecx: 00000000 edx: 00000082 esi: fffffffc edi: fffffffc ebp: 00000246 esp: d4a59df8 ds: 007b es: 007b ss: 0068 Process amcheck (pid: 2680, threadinfo=d4a59000 task=d4ad18d0) Stack: <0>c0316ea4 fffffffc 00000000 dffdbb00 c015f84c c0149ade 00000000 dffdbb00 dffd7b70 00000246 c016134f dffdc6f8 c172419c 00000000 d49e2a00 c0149ade dcd38310 c172419c dcd38310 c0167ffd 00000a00 00000000 c0167eb8 e0840356 Call Trace: [<c015f84c>] cache_free_debugcheck+0x1b/0x1b9 [<c0149ade>] mempool_free+0x5f/0x63 [<c016134f>] kmem_cache_free+0x2a/0x5c [<c0149ade>] mempool_free+0x5f/0x63 [<c0167ffd>] bio_free+0x25/0x30 [<c0167eb8>] bio_put+0x28/0x29 [<e0840356>] scsi_execute_async+0x13f/0x317 [scsi_mod] [<e094d677>] st_do_scsi+0x1d3/0x228 [st] [<e094f4f9>] st_sleep_done+0x0/0x3a [st] [<e0951439>] st_read+0x2fb/0x78b [st] [<e095113e>] st_read+0x0/0x78b [st] [<c0163f8d>] vfs_read+0x9f/0x13e [<c01643d9>] sys_read+0x3c/0x63 [<c0103d21>] syscall_call+0x7/0xb Code: f0 09 03 6e 31 c0 58 5b 5e 5f c3 56 53 89 c6 8d 98 00 00 00 40 c1 eb 0c 3b 1d 80 43 48 c0 72 15 50 68 a4 6e 31 c0 e8 e1 48 fc ff <0f> 0b 84 09 03 6e 31 c0 58 5a 6b c3 28 03 05 90 43 48 c0 8b 00 Continuing in 1 seconds. <3>Debug: sleeping function called from invalid context at include/linux/rwsem.h:43 in_atomic():0, irqs_disabled():1 [<c01246a1>] profile_task_exit+0x13/0x48 [<c0125f66>] do_exit+0x1c/0x6fc [<c01239eb>] printk+0x14/0x18 [<c0105272>] show_stack+0x0/0xa [<c0105868>] do_invalid_op+0x0/0x9d [<c01058f9>] do_invalid_op+0x91/0x9d [<c015f0f6>] kfree_debugcheck+0x20/0x51 [<c012339d>] release_console_sem+0x199/0x1a1 [<c0123843>] vprintk+0x195/0x329 [<c0149b19>] mempool_alloc+0x37/0xd3 [<c01d2975>] cfq_set_request+0x318/0x43a [<c01048bf>] error_code+0x4f/0x54 [<c015f0f6>] kfree_debugcheck+0x20/0x51 [<c015f84c>] cache_free_debugcheck+0x1b/0x1b9 [<c0149ade>] mempool_free+0x5f/0x63 [<c016134f>] kmem_cache_free+0x2a/0x5c [<c0149ade>] mempool_free+0x5f/0x63 [<c0167ffd>] bio_free+0x25/0x30 [<c0167eb8>] bio_put+0x28/0x29 [<e0840356>] scsi_execute_async+0x13f/0x317 [scsi_mod] [<e094d677>] st_do_scsi+0x1d3/0x228 [st] [<e094f4f9>] st_sleep_done+0x0/0x3a [st] [<e0951439>] st_read+0x2fb/0x78b [st] [<e095113e>] st_read+0x0/0x78b [st] [<c0163f8d>] vfs_read+0x9f/0x13e [<c01643d9>] sys_read+0x3c/0x63 [<c0103d21>] syscall_call+0x7/0xb Expected results: No kernel BUG. No hang. Additional info: Ping me if you need the root password