Bug 247713 - kernel: BUG: soft lockup detected on CPU#1! during mirror leg failure test case
kernel: BUG: soft lockup detected on CPU#1! during mirror leg failure test case
Status: CLOSED WORKSFORME
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: lvm2 (Show other bugs)
5.0
ia64 Linux
low Severity low
: ---
: ---
Assigned To: Jonathan Earl Brassow
Corey Marthaler
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2007-07-10 18:28 EDT by Dean Jansa
Modified: 2010-01-11 22:50 EST (History)
6 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2008-04-01 12:43:15 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Dean Jansa 2007-07-10 18:28:09 EDT
Description of problem:

I happened to notice a soft lockup detection while running single node mirror
tests.  The test case ended up passing, but I wanted to capture this info.  I'm
not sure why the kernel decided there was a soft lockup -- perhaps due to the
large number of "Jul 10 17:12:12 link-13 kernel: device-mapper: raid1: Error
during write occurre" messages during leg failure?


SCENARIO - [fail_leg_during_io]
Creating mirror using device sdb (that we will fail) for primary leg
lvcreate -m 1 -n fail_leg_io -L 500M mirror_sanity /dev/sdb7:0-500
/dev/sdc7:0-500 /dev/sdc6:0-50
Verifying that the mirror is fully syncd, currently at
 ...37.60% ...55.20% ...75.20% ...97.60% ...100.00%
Start some I/O to the mirror before failing it
Disabling device sdb
Attempting I/O to cause mirror conversion


Jul 10 17:12:12 link-13 kernel: device-mapper: raid1: Error during write occurred.
Jul 10 17:12:12 link-13 last message repeated 677 times                        
Jul 10 17:12:12 link-13 kernel: BUG: soft lockup detected on CPU#1!
Jul 10 17:12:12 link-13 kernel:                                                
Jul 10 17:12:12 link-13 kernel: Call Trace:
Jul 10 17:12:12 link-13 kernel:  [<a000000100014140>] show_stack+0x40/0xa0     
Jul 10 17:12:12 link-13 kernel:                                
sp=e00000003f807710 bsp=e00000003f801988                                       
                Jul 10 17:12:12 link-13 kernel:  [<a0000001000141d0>]
dump_stack+0x30/0x60
Jul 10 17:12:12 link-13 kernel:                                
sp=e00000003f8078e0 bsp=e00000003f801970
Jul 10 17:12:12 link-13 kernel:  [<a0000001000e3fe0>] softlockup_tick+0x200/0x240
Jul 10 17:12:12 link-13 kernel:                                
sp=e00000003f8078e0 bsp=e00000003f801928
Jul 10 17:12:12 link-13 kernel:  [<a000000100096bf0>] run_local_timers+0x30/0x60
Jul 10 17:12:12 link-13 kernel:                                
sp=e00000003f8078e0 bsp=e00000003f801910
Jul 10 17:12:12 link-13 kernel:  [<a000000100096ca0>]
update_process_times+0x80/0x100
Jul 10 17:12:12 link-13 kernel:                                
sp=e00000003f8078e0 bsp=e00000003f8018e0
Jul 10 17:12:13 link-13 kernel:  [<a0000001000377c0>] timer_interrupt+0x180/0x360
Jul 10 17:12:14 link-13 kernel:                                
sp=e00000003f8078e0 bsp=e00000003f8018a0
Jul 10 17:12:14 link-13 kernel:  [<a0000001000e4650>] handle_IRQ_event+0x90/0x120
Jul 10 17:12:14 link-13 kernel:                                
sp=e00000003f8078e0 bsp=e00000003f801860
Jul 10 17:12:14 link-13 kernel:  [<a0000001000e4810>] __do_IRQ+0x130/0x420
Jul 10 17:12:14 link-13 kernel:                                
sp=e00000003f8078e0 bsp=e00000003f801818
Jul 10 17:12:14 link-13 kernel:  [<a000000100011c50>] ia64_handle_irq+0xf0/0x1a0
Jul 10 17:12:14 link-13 kernel:                                
sp=e00000003f8078e0 bsp=e00000003f8017e0
Jul 10 17:12:14 link-13 kernel:  [<a00000010000c700>] __ia64_leave_kernel+0x0/0x280
Jul 10 17:12:14 link-13 kernel:                                
sp=e00000003f8078e0 bsp=e00000003f8017e0
Jul 10 17:12:14 link-13 kernel:  [<a000000100079260>] vprintk+0x820/0x940
Jul 10 17:12:14 link-13 kernel:                                
sp=e00000003f807ab0 bsp=e00000003f801738
Jul 10 17:12:14 link-13 kernel:                                
sp=e00000003f8078e0 bsp=e00000003f8018a0
Jul 10 17:12:14 link-13 kernel:  [<a0000001000e4650>] handle_IRQ_event+0x90/0x120
Jul 10 17:12:14 link-13 kernel:                                
sp=e00000003f8078e0 bsp=e00000003f801860
Jul 10 17:12:14 link-13 kernel:  [<a0000001000e4810>] __do_IRQ+0x130/0x420
Jul 10 17:12:14 link-13 kernel:                                
sp=e00000003f8078e0 bsp=e00000003f801818
Jul 10 17:12:14 link-13 kernel:  [<a000000100011c50>] ia64_handle_irq+0xf0/0x1a0
Jul 10 17:12:14 link-13 kernel:                                
sp=e00000003f8078e0 bsp=e00000003f8017e0
Jul 10 17:12:14 link-13 kernel:  [<a00000010000c700>] __ia64_leave_kernel+0x0/0x280
Jul 10 17:12:14 link-13 kernel:                                
sp=e00000003f8078e0 bsp=e00000003f8017e0
Jul 10 17:12:14 link-13 kernel:  [<a000000100079260>] vprintk+0x820/0x940
Jul 10 17:12:14 link-13 kernel:                                
sp=e00000003f807ab0 bsp=e00000003f801738
Jul 10 17:12:14 link-13 kernel:  [<a000000100079410>] printk+0x90/0x1e0
Jul 10 17:12:14 link-13 kernel:                                
sp=e00000003f807b20 bsp=e00000003f8016d0
Jul 10 17:12:14 link-13 kernel:  [<a0000002005e9400>] write_callback+0x80/0x2c0
[dm_mirror]
Jul 10 17:12:14 link-13 kernel:                                
sp=e00000003f807b70 bsp=e00000003f801698
Jul 10 17:12:14 link-13 kernel:  [<a0000002005e96d0>]
write_callback_good_log+0x30/0x60 [dm_mirror]
Jul 10 17:12:14 link-13 kernel:                                
sp=e00000003f807b80 bsp=e00000003f801670
Jul 10 17:12:14 link-13 kernel:  [<a0000002005adbe0>] dec_count+0x140/0x180 [dm_mod]
Jul 10 17:12:14 link-13 kernel:                                
sp=e00000003f807b80 bsp=e00000003f801628
Jul 10 17:12:14 link-13 kernel:  [<a0000002005ae310>] endio+0xf0/0x140 [dm_mod]
Jul 10 17:12:15 link-13 kernel:                                
sp=e00000003f807b80 bsp=e00000003f8015e8
Jul 10 17:12:16 link-13 kernel:  [<a000000100164cf0>] bio_endio+0x130/0x160
Jul 10 17:12:18 link-13 kernel:                                
sp=e00000003f807b80 bsp=e00000003f8015b0
Jul 10 17:12:19 link-13 kernel:  [<a0000002005a0c10>] dec_pending+0x430/0x4a0
[dm_mod]
Jul 10 17:12:19 link-13 kernel:                                
sp=e00000003f807b80 bsp=e00000003f801560
Jul 10 17:12:19 link-13 kernel:  [<a0000002005a11e0>] clone_endio+0x1c0/0x240
[dm_mod]
Jul 10 17:12:19 link-13 kernel:                                
sp=e00000003f807b80 bsp=e00000003f801518
Jul 10 17:12:19 link-13 kernel:  [<a000000100164cf0>] bio_endio+0x130/0x160
Jul 10 17:12:19 link-13 kernel:                                
sp=e00000003f807b80 bsp=e00000003f8014e0
Jul 10 17:12:19 link-13 kernel:  [<a00000010027e780>]
__end_that_request_first+0x3c0/0xd20
Jul 10 17:12:19 link-13 kernel:                                
sp=e00000003f807b80 bsp=e00000003f801478
Jul 10 17:12:19 link-13 kernel:  [<a00000010027f110>]
end_that_request_chunk+0x30/0x60
Jul 10 17:12:19 link-13 kernel:                                
sp=e00000003f807b80 bsp=e00000003f801448
Jul 10 17:12:19 link-13 kernel:  [<a0000002004adb60>]
scsi_end_request+0x40/0x240 [scsi_mod]
Jul 10 17:12:19 link-13 kernel:                                
sp=e00000003f807b80 bsp=e00000003f801400
Jul 10 17:12:19 link-13 kernel:  [<a0000002004ae090>]
scsi_io_completion+0x330/0x880 [scsi_mod]
Jul 10 17:12:19 link-13 kernel:                                
sp=e00000003f807b80 bsp=e00000003f801390
Jul 10 17:12:19 link-13 kernel:  [<a0000002003b7160>] sd_rw_intr+0x5a0/0x620
[sd_mod]
Jul 10 17:12:19 link-13 kernel:                                
sp=e00000003f807b90 bsp=e00000003f801338
Jul 10 17:12:19 link-13 kernel:  [<a0000002004a0c80>]
scsi_finish_command+0x140/0x160 [scsi_mod]
Jul 10 17:12:19 link-13 kernel:                                
sp=e00000003f807ba0 bsp=e00000003f801308
Jul 10 17:12:19 link-13 kernel:  [<a0000002004aedd0>]
scsi_softirq_done+0x290/0x2e0 [scsi_mod]
Jul 10 17:12:19 link-13 kernel:                                
sp=e00000003f807ba0 bsp=e00000003f8012d8
Jul 10 17:12:19 link-13 kernel:  [<a00000010027b9c0>] blk_done_softirq+0x140/0x1a0
Jul 10 17:12:19 link-13 kernel:                                
sp=e00000003f807bb0 bsp=e00000003f8012c0
Jul 10 17:12:19 link-13 kernel:  [<a000000100087170>] __do_softirq+0xf0/0x240
Jul 10 17:12:19 link-13 kernel:                                
sp=e00000003f807bc0 bsp=e00000003f801248
Jul 10 17:12:19 link-13 kernel:  [<a000000100087330>] do_softirq+0x70/0xc0
Jul 10 17:12:19 link-13 kernel:                                
sp=e00000003f807bc0 bsp=e00000003f8011e0
Jul 10 17:12:19 link-13 kernel:  [<a000000100087400>] irq_exit+0x80/0xa0
Jul 10 17:12:19 link-13 kernel:                                
sp=e00000003f807bc0 bsp=e00000003f8011c8
Jul 10 17:12:19 link-13 kernel:  [<a000000100011cd0>] ia64_handle_irq+0x170/0x1a0
Jul 10 17:12:19 link-13 kernel:                                
sp=e00000003f807bc0 bsp=e00000003f801198
Jul 10 17:12:19 link-13 kernel:  [<a00000010000c700>] __ia64_leave_kernel+0x0/0x280
Jul 10 17:12:19 link-13 kernel:      occurred.
Jul 10 17:12:19 link-13 kernel: device-mapper: raid1: Error during write occurred.
Jul 10 17:12:20 link-13 last message repeated 74 times
Jul 10 17:12:20 link-13 kernel: device-mapper: raid1: Error duri: raid1: Error
during write occurred.
Jul 10 17:12:20 link-13 kernel: device-mapper: raid1: Error during write occurred.
Jul 10 17:12:19 link-13 kernel:                                
sp=e00000003f807bc0 bsp=e00000003f801198
Jul 10 17:12:19 link-13 kernel:                                
sp=e00000003f807bc0 bsp=e00000003f801198
Jul 10 17:12:19 link-13 kernel:  [<a00000010000c700>] __ia64_leave_kernel+0x0/0x280





Version-Release number of selected component (if applicable):

lvm2-2.02.16-3.el5

How reproducible:

Not sure.  Haven't tried yet.


Steps to Reproduce:
1.
2.
3.
Comment 1 Dean Jansa 2007-07-10 18:28:52 EDT
Kernel: 2.6.18-8.1.8.el5
Comment 2 Luming Yu 2007-07-24 21:30:35 EDT
is it able to be reproduced with upstream?
Comment 3 Prarit Bhargava 2007-09-04 11:00:05 EDT
Dean, is this still happening?

P.
Comment 4 Dave Wysochanski 2008-04-01 12:43:15 EDT
Please open this issue if you see it again.

Moving to WORKSFORME based on corey's email below:

I think the fact QA hasn't seen that issue in over 8 months is reason enough 
though to close it 'WORKSFORME' if you're trying to get it off the bz list. 
We can always reopen it if we ever see it again.

-Corey

Note You need to log in before you can comment on or make changes to this bug.