Bug 247713

Summary: kernel: BUG: soft lockup detected on CPU#1! during mirror leg failure test case
Product: Red Hat Enterprise Linux 5 Reporter: Dean Jansa <djansa>
Component: lvm2Assignee: Jonathan Earl Brassow <jbrassow>
Status: CLOSED WORKSFORME QA Contact: Corey Marthaler <cmarthal>
Severity: low Docs Contact:
Priority: low    
Version: 5.0CC: agk, dwysocha, jbrassow, mbroz, prarit, prockai
Target Milestone: ---   
Target Release: ---   
Hardware: ia64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2008-04-01 16:43:15 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Dean Jansa 2007-07-10 22:28:09 UTC
Description of problem:

I happened to notice a soft lockup detection while running single node mirror
tests.  The test case ended up passing, but I wanted to capture this info.  I'm
not sure why the kernel decided there was a soft lockup -- perhaps due to the
large number of "Jul 10 17:12:12 link-13 kernel: device-mapper: raid1: Error
during write occurre" messages during leg failure?


SCENARIO - [fail_leg_during_io]
Creating mirror using device sdb (that we will fail) for primary leg
lvcreate -m 1 -n fail_leg_io -L 500M mirror_sanity /dev/sdb7:0-500
/dev/sdc7:0-500 /dev/sdc6:0-50
Verifying that the mirror is fully syncd, currently at
 ...37.60% ...55.20% ...75.20% ...97.60% ...100.00%
Start some I/O to the mirror before failing it
Disabling device sdb
Attempting I/O to cause mirror conversion


Jul 10 17:12:12 link-13 kernel: device-mapper: raid1: Error during write occurred.
Jul 10 17:12:12 link-13 last message repeated 677 times                        
Jul 10 17:12:12 link-13 kernel: BUG: soft lockup detected on CPU#1!
Jul 10 17:12:12 link-13 kernel:                                                
Jul 10 17:12:12 link-13 kernel: Call Trace:
Jul 10 17:12:12 link-13 kernel:  [<a000000100014140>] show_stack+0x40/0xa0     
Jul 10 17:12:12 link-13 kernel:                                
sp=e00000003f807710 bsp=e00000003f801988                                       
                Jul 10 17:12:12 link-13 kernel:  [<a0000001000141d0>]
dump_stack+0x30/0x60
Jul 10 17:12:12 link-13 kernel:                                
sp=e00000003f8078e0 bsp=e00000003f801970
Jul 10 17:12:12 link-13 kernel:  [<a0000001000e3fe0>] softlockup_tick+0x200/0x240
Jul 10 17:12:12 link-13 kernel:                                
sp=e00000003f8078e0 bsp=e00000003f801928
Jul 10 17:12:12 link-13 kernel:  [<a000000100096bf0>] run_local_timers+0x30/0x60
Jul 10 17:12:12 link-13 kernel:                                
sp=e00000003f8078e0 bsp=e00000003f801910
Jul 10 17:12:12 link-13 kernel:  [<a000000100096ca0>]
update_process_times+0x80/0x100
Jul 10 17:12:12 link-13 kernel:                                
sp=e00000003f8078e0 bsp=e00000003f8018e0
Jul 10 17:12:13 link-13 kernel:  [<a0000001000377c0>] timer_interrupt+0x180/0x360
Jul 10 17:12:14 link-13 kernel:                                
sp=e00000003f8078e0 bsp=e00000003f8018a0
Jul 10 17:12:14 link-13 kernel:  [<a0000001000e4650>] handle_IRQ_event+0x90/0x120
Jul 10 17:12:14 link-13 kernel:                                
sp=e00000003f8078e0 bsp=e00000003f801860
Jul 10 17:12:14 link-13 kernel:  [<a0000001000e4810>] __do_IRQ+0x130/0x420
Jul 10 17:12:14 link-13 kernel:                                
sp=e00000003f8078e0 bsp=e00000003f801818
Jul 10 17:12:14 link-13 kernel:  [<a000000100011c50>] ia64_handle_irq+0xf0/0x1a0
Jul 10 17:12:14 link-13 kernel:                                
sp=e00000003f8078e0 bsp=e00000003f8017e0
Jul 10 17:12:14 link-13 kernel:  [<a00000010000c700>] __ia64_leave_kernel+0x0/0x280
Jul 10 17:12:14 link-13 kernel:                                
sp=e00000003f8078e0 bsp=e00000003f8017e0
Jul 10 17:12:14 link-13 kernel:  [<a000000100079260>] vprintk+0x820/0x940
Jul 10 17:12:14 link-13 kernel:                                
sp=e00000003f807ab0 bsp=e00000003f801738
Jul 10 17:12:14 link-13 kernel:                                
sp=e00000003f8078e0 bsp=e00000003f8018a0
Jul 10 17:12:14 link-13 kernel:  [<a0000001000e4650>] handle_IRQ_event+0x90/0x120
Jul 10 17:12:14 link-13 kernel:                                
sp=e00000003f8078e0 bsp=e00000003f801860
Jul 10 17:12:14 link-13 kernel:  [<a0000001000e4810>] __do_IRQ+0x130/0x420
Jul 10 17:12:14 link-13 kernel:                                
sp=e00000003f8078e0 bsp=e00000003f801818
Jul 10 17:12:14 link-13 kernel:  [<a000000100011c50>] ia64_handle_irq+0xf0/0x1a0
Jul 10 17:12:14 link-13 kernel:                                
sp=e00000003f8078e0 bsp=e00000003f8017e0
Jul 10 17:12:14 link-13 kernel:  [<a00000010000c700>] __ia64_leave_kernel+0x0/0x280
Jul 10 17:12:14 link-13 kernel:                                
sp=e00000003f8078e0 bsp=e00000003f8017e0
Jul 10 17:12:14 link-13 kernel:  [<a000000100079260>] vprintk+0x820/0x940
Jul 10 17:12:14 link-13 kernel:                                
sp=e00000003f807ab0 bsp=e00000003f801738
Jul 10 17:12:14 link-13 kernel:  [<a000000100079410>] printk+0x90/0x1e0
Jul 10 17:12:14 link-13 kernel:                                
sp=e00000003f807b20 bsp=e00000003f8016d0
Jul 10 17:12:14 link-13 kernel:  [<a0000002005e9400>] write_callback+0x80/0x2c0
[dm_mirror]
Jul 10 17:12:14 link-13 kernel:                                
sp=e00000003f807b70 bsp=e00000003f801698
Jul 10 17:12:14 link-13 kernel:  [<a0000002005e96d0>]
write_callback_good_log+0x30/0x60 [dm_mirror]
Jul 10 17:12:14 link-13 kernel:                                
sp=e00000003f807b80 bsp=e00000003f801670
Jul 10 17:12:14 link-13 kernel:  [<a0000002005adbe0>] dec_count+0x140/0x180 [dm_mod]
Jul 10 17:12:14 link-13 kernel:                                
sp=e00000003f807b80 bsp=e00000003f801628
Jul 10 17:12:14 link-13 kernel:  [<a0000002005ae310>] endio+0xf0/0x140 [dm_mod]
Jul 10 17:12:15 link-13 kernel:                                
sp=e00000003f807b80 bsp=e00000003f8015e8
Jul 10 17:12:16 link-13 kernel:  [<a000000100164cf0>] bio_endio+0x130/0x160
Jul 10 17:12:18 link-13 kernel:                                
sp=e00000003f807b80 bsp=e00000003f8015b0
Jul 10 17:12:19 link-13 kernel:  [<a0000002005a0c10>] dec_pending+0x430/0x4a0
[dm_mod]
Jul 10 17:12:19 link-13 kernel:                                
sp=e00000003f807b80 bsp=e00000003f801560
Jul 10 17:12:19 link-13 kernel:  [<a0000002005a11e0>] clone_endio+0x1c0/0x240
[dm_mod]
Jul 10 17:12:19 link-13 kernel:                                
sp=e00000003f807b80 bsp=e00000003f801518
Jul 10 17:12:19 link-13 kernel:  [<a000000100164cf0>] bio_endio+0x130/0x160
Jul 10 17:12:19 link-13 kernel:                                
sp=e00000003f807b80 bsp=e00000003f8014e0
Jul 10 17:12:19 link-13 kernel:  [<a00000010027e780>]
__end_that_request_first+0x3c0/0xd20
Jul 10 17:12:19 link-13 kernel:                                
sp=e00000003f807b80 bsp=e00000003f801478
Jul 10 17:12:19 link-13 kernel:  [<a00000010027f110>]
end_that_request_chunk+0x30/0x60
Jul 10 17:12:19 link-13 kernel:                                
sp=e00000003f807b80 bsp=e00000003f801448
Jul 10 17:12:19 link-13 kernel:  [<a0000002004adb60>]
scsi_end_request+0x40/0x240 [scsi_mod]
Jul 10 17:12:19 link-13 kernel:                                
sp=e00000003f807b80 bsp=e00000003f801400
Jul 10 17:12:19 link-13 kernel:  [<a0000002004ae090>]
scsi_io_completion+0x330/0x880 [scsi_mod]
Jul 10 17:12:19 link-13 kernel:                                
sp=e00000003f807b80 bsp=e00000003f801390
Jul 10 17:12:19 link-13 kernel:  [<a0000002003b7160>] sd_rw_intr+0x5a0/0x620
[sd_mod]
Jul 10 17:12:19 link-13 kernel:                                
sp=e00000003f807b90 bsp=e00000003f801338
Jul 10 17:12:19 link-13 kernel:  [<a0000002004a0c80>]
scsi_finish_command+0x140/0x160 [scsi_mod]
Jul 10 17:12:19 link-13 kernel:                                
sp=e00000003f807ba0 bsp=e00000003f801308
Jul 10 17:12:19 link-13 kernel:  [<a0000002004aedd0>]
scsi_softirq_done+0x290/0x2e0 [scsi_mod]
Jul 10 17:12:19 link-13 kernel:                                
sp=e00000003f807ba0 bsp=e00000003f8012d8
Jul 10 17:12:19 link-13 kernel:  [<a00000010027b9c0>] blk_done_softirq+0x140/0x1a0
Jul 10 17:12:19 link-13 kernel:                                
sp=e00000003f807bb0 bsp=e00000003f8012c0
Jul 10 17:12:19 link-13 kernel:  [<a000000100087170>] __do_softirq+0xf0/0x240
Jul 10 17:12:19 link-13 kernel:                                
sp=e00000003f807bc0 bsp=e00000003f801248
Jul 10 17:12:19 link-13 kernel:  [<a000000100087330>] do_softirq+0x70/0xc0
Jul 10 17:12:19 link-13 kernel:                                
sp=e00000003f807bc0 bsp=e00000003f8011e0
Jul 10 17:12:19 link-13 kernel:  [<a000000100087400>] irq_exit+0x80/0xa0
Jul 10 17:12:19 link-13 kernel:                                
sp=e00000003f807bc0 bsp=e00000003f8011c8
Jul 10 17:12:19 link-13 kernel:  [<a000000100011cd0>] ia64_handle_irq+0x170/0x1a0
Jul 10 17:12:19 link-13 kernel:                                
sp=e00000003f807bc0 bsp=e00000003f801198
Jul 10 17:12:19 link-13 kernel:  [<a00000010000c700>] __ia64_leave_kernel+0x0/0x280
Jul 10 17:12:19 link-13 kernel:      occurred.
Jul 10 17:12:19 link-13 kernel: device-mapper: raid1: Error during write occurred.
Jul 10 17:12:20 link-13 last message repeated 74 times
Jul 10 17:12:20 link-13 kernel: device-mapper: raid1: Error duri: raid1: Error
during write occurred.
Jul 10 17:12:20 link-13 kernel: device-mapper: raid1: Error during write occurred.
Jul 10 17:12:19 link-13 kernel:                                
sp=e00000003f807bc0 bsp=e00000003f801198
Jul 10 17:12:19 link-13 kernel:                                
sp=e00000003f807bc0 bsp=e00000003f801198
Jul 10 17:12:19 link-13 kernel:  [<a00000010000c700>] __ia64_leave_kernel+0x0/0x280





Version-Release number of selected component (if applicable):

lvm2-2.02.16-3.el5

How reproducible:

Not sure.  Haven't tried yet.


Steps to Reproduce:
1.
2.
3.

Comment 1 Dean Jansa 2007-07-10 22:28:52 UTC
Kernel: 2.6.18-8.1.8.el5


Comment 2 Luming Yu 2007-07-25 01:30:35 UTC
is it able to be reproduced with upstream?

Comment 3 Prarit Bhargava 2007-09-04 15:00:05 UTC
Dean, is this still happening?

P.

Comment 4 Dave Wysochanski 2008-04-01 16:43:15 UTC
Please open this issue if you see it again.

Moving to WORKSFORME based on corey's email below:

I think the fact QA hasn't seen that issue in over 8 months is reason enough 
though to close it 'WORKSFORME' if you're trying to get it off the bz list. 
We can always reopen it if we ever see it again.

-Corey