Bug 705063

Summary: kernel panic - not syncing: Fatal exception in interrupt
Product: Red Hat Enterprise Linux 6 Reporter: Moran Goldboim <mgoldboi>
Component: kernelAssignee: Red Hat Kernel Manager <kernel-mgr>
Status: CLOSED WORKSFORME QA Contact: Red Hat Kernel QE team <kernel-qe>
Severity: urgent Docs Contact:
Priority: high    
Version: 6.1CC: ambadasbhagat, mflitter
Target Milestone: rc   
Target Release: 6.2   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-07-11 13:38:01 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Attachments:
Description Flags
screenshot none

Description Moran Goldboim 2011-05-16 14:27:16 UTC
Description of problem:

during operation of vdsm on an spm host (doing lots of lvm actions) i got the attached kernel panic 


Version-Release number of selected component (if applicable):
2.6.32-131-0.1.el6

How reproducible:
happened several times, don't have specific reproducer as for now

Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:

Comment 2 Prarit Bhargava 2011-05-20 14:23:33 UTC
No panic attached....

P.

Comment 3 Moran Goldboim 2011-05-22 06:28:30 UTC
Created attachment 500239 [details]
screenshot

It slipped away somehow...

Comment 4 Moran Goldboim 2011-07-18 15:33:22 UTC
happened again on 2.6.32-131.2.1.el6.x86_64

[<ffffffff81247812>] ? __generic_unplug_device+0x32/0x40
 [<ffffffff814db33d>] wait_for_completion+0x1d/0x20
 [<ffffffff8124e45c>] blk_execute_rq+0x8c/0xf0
 [<ffffffff812492a0>] ? blk_rq_bio_prep+0x30/0xc0
 [<ffffffff8124dfd6>] ? blk_rq_map_kern+0xd6/0x150
 [<ffffffff813572fc>] scsi_execute+0xfc/0x160
 [<ffffffff81357556>] scsi_execute_req+0xb6/0x190
 [<ffffffff81358c6c>] scsi_probe_and_add_lun+0x2dc/0xed0
 [<ffffffff81359dbc>] __scsi_scan_target+0x55c/0x750
 [<ffffffff8135a6e0>] scsi_scan_target+0xd0/0xe0
 [<ffffffffa02a95bd>] fc_scsi_scan_rport+0xbd/0xc0 [scsi_transport_fc]
 [<ffffffffa02a9500>] ? fc_scsi_scan_rport+0x0/0xc0 [scsi_transport_fc]
 [<ffffffff810887d0>] worker_thread+0x170/0x2a0
 [<ffffffff8108e100>] ? autoremove_wake_function+0x0/0x40
 [<ffffffff81088660>] ? worker_thread+0x0/0x2a0
 [<ffffffff8108dd96>] kthread+0x96/0xa0
 [<ffffffff8100c1ca>] child_rip+0xa/0x20
 [<ffffffff8108dd00>] ? kthread+0x0/0xa0
 [<ffffffff8100c1c0>] ? child_rip+0x0/0x20
INFO: task async/2:1400 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
async/2       D 0000000000000000     0  1400      2 0x00000000
 ffff880002f25a50 0000000000000046 ffff880002f259d0 ffff880005f8c940
 0000000000011220 ffff880002f25a10 ffff880002f25a28 ffff880005f8c970
 ffff88000683a638 ffff880002f25fd8 000000000000f598 ffff88000683a638
Call Trace:
 [<ffffffff814db5a5>] schedule_timeout+0x215/0x2e0
 [<ffffffffa02ff295>] ? qla2xxx_queuecommand+0x245/0x290 [qla2xxx]
 [<ffffffff814db223>] wait_for_common+0x123/0x180
 [<ffffffff8105dc20>] ? default_wake_function+0x0/0x20
 [<ffffffff81247812>] ? __generic_unplug_device+0x32/0x40
 [<ffffffff814db33d>] wait_for_completion+0x1d/0x20
 [<ffffffff8124e45c>] blk_execute_rq+0x8c/0xf0
 [<ffffffff8124a955>] ? blk_get_request+0x75/0xa0
 [<ffffffff813572fc>] scsi_execute+0xfc/0x160
 [<ffffffff81357556>] scsi_execute_req+0xb6/0x190
 [<ffffffffa029cf56>] sd_revalidate_disk+0x156/0x18d0 [sd_mod]
 [<ffffffff8107a11b>] ? try_to_del_timer_sync+0x7b/0xe0
 [<ffffffff814db52a>] ? schedule_timeout+0x19a/0x2e0
 [<ffffffffa029e7ae>] sd_probe_async+0xde/0x210 [sd_mod]
 [<ffffffff810962a2>] async_thread+0x102/0x250
 [<ffffffff8105dc20>] ? default_wake_function+0x0/0x20
 [<ffffffff810961a0>] ? async_thread+0x0/0x250
 [<ffffffff8108dd96>] kthread+0x96/0xa0
 [<ffffffff8100c1ca>] child_rip+0xa/0x20
 [<ffffffff8108dd00>] ? kthread+0x0/0xa0
 [<ffffffff8100c1c0>] ? child_rip+0x0/0x20
qla2xxx 0000:06:00.1: scsi(4:2:2): DEVICE RESET FAILED: HBA not online.
qla2xxx 0000:06:00.1: scsi(4:2:3): DEVICE RESET ISSUED.
[-- MARK -- Mon Jul 18 10:25:00 2011]
qla2xxx 0000:06:00.1: scsi(4:2:3): DEVICE RESET FAILED: HBA not online.
qla2xxx 0000:06:00.1: scsi(4:2:4): DEVICE RESET ISSUED.
[-- MARK -- Mon Jul 18 10:30:00 2011]
qla2xxx 0000:06:00.1: scsi(4:2:4): DEVICE RESET FAILED: HBA not online.
qla2xxx 0000:06:00.1: scsi(4:2:4): TARGET RESET ISSUED.
[-- MARK -- Mon Jul 18 10:35:00 2011]
qla2xxx 0000:06:00.1: scsi(4:2:4): TARGET RESET FAILED: HBA not online.
qla2xxx 0000:06:00.1: scsi(4:2:3): BUS RESET ISSUED.
[-- mgoldboi.redhat.com attached -- Mon Jul 18 10:36:35 2011]

qla2xxx 0000:06:00.1: qla2xxx_eh_bus_reset: reset failed
qla2xxx 0000:06:00.1: scsi(4:2:3): ADAPTER RESET ISSUED.
qla2xxx 0000:06:00.1: qla2xxx_eh_host_reset: reset failed
sd 4:0:2:3: Device offlined - not ready after error recovery
sd 4:0:2:2: Device offlined - not ready after error recovery
scsi 4:0:2:4: Device offlined - not ready after error recovery
sd 4:0:2:2: [sdj] Unhandled error code
sd 4:0:2:2: [sdj] Result: hostbyte=DID_OK driverbyte=DRIVER_TIMEOUT
sd 4:0:2:2: [sdj] CDB: Read(10): 28 00 00 00 00 08 00 00 08 00
end_request: I/O error, dev sdj, sector 8
Buffer I/O error on device sdj, logical block 1
sd 4:0:2:3: rejecting I/O to offline device
sd 4:0:2:2: rejecting I/O to offline device
sd 4:0:2:0: Unexpected response from lun 4 while scanning, scan aborted
sd 4:0:2:3: rejecting I/O to offline device
Dev sdj: unable to read RDB block 8
 unable to read partition table
sd 4:0:2:3: rejecting I/O to offline device
sd 4:0:2:2: [sdj] Attached SCSI disk
sd 4:0:2:3: [sdk] READ CAPACITY failed
sd 4:0:2:3: [sdk] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
sd 4:0:2:3: [sdk] Sense not available.
sd 4:0:2:3: rejecting I/O to offline device
sd 4:0:2:3: [sdk] Write Protect is off
sd 4:0:2:3: rejecting I/O to offline device
sd 4:0:2:3: [sdk] Asking for cache data failed
sd 4:0:2:3: [sdk] Assuming drive cache: write through
------------[ cut here ]------------
kernel BUG at fs/sysfs/group.c:65!
invalid opcode: 0000 [#1] SMP 
last sysfs file: /sys/devices/pci0000:00/0000:00:03.0/0000:03:00.0/0000:04:00.0/0000:05:00.0/host2/port-2:0/end_device-2:0/target2:0:0/2:0:0:0/block/sda/sda2/dev
CPU 0 
Modules linked in: qla2xxx vhost_net bridge mpt2sas scsi_transport_fc sd_mod sr_mod dm_round_robin ext4 ixgbe i7core_edac iTCO_wdt ghes power_meter kvm_intel macvtap stp dm_multipath raid_class scsi_transport_sas scsi_tgt ata_piix pata_acpi ata_generic crc_t10dif cdrom scsi_dh_emc jbd2 mbcache bnx2 sg mdio dca edac_core iTCO_vendor_support hed serio_raw microcode dcdbas hwmon kvm tun macvlan ipv6 llc sunrpc dm_snapshot dm_zero dm_mirror dm_region_hash dm_log dm_mod

Modules linked in: qla2xxx vhost_net bridge mpt2sas scsi_transport_fc sd_mod sr_mod dm_round_robin ext4 ixgbe i7core_edac iTCO_wdt ghes power_meter kvm_intel macvtap stp dm_multipath raid_class scsi_transport_sas scsi_tgt ata_piix pata_acpi ata_generic crc_t10dif cdrom scsi_dh_emc jbd2 mbcache bnx2 sg mdio dca edac_core iTCO_vendor_support hed serio_raw microcode dcdbas hwmon kvm tun macvlan ipv6 llc sunrpc dm_snapshot dm_zero dm_mirror dm_region_hash dm_log dm_mod
Pid: 1400, comm: async/2 Not tainted 2.6.32-131.2.1.el6.x86_64 #1 PowerEdge R810
RIP: 0010:[<ffffffff811e8487>]  [<ffffffff811e8487>] internal_create_group+0xf7/0x1a0
RSP: 0018:ffff880002f25d40  EFLAGS: 00010246
RAX: 00000000fffffff2 RBX: ffff8800055bcaa0 RCX: ffff8800084980c0
RDX: ffffffff81a5b480 RSI: 0000000000000000 RDI: ffff8800055d5870
RBP: ffff880002f25d90 R08: 0000000000000005 R09: 0000000000000000
R10: 0000000000000002 R11: ffff8800067af378 R12: ffff8800055bcaa0
R13: ffff8800055d5870 R14: ffffffff81a5b480 R15: ffff8800091faa80
FS:  0000000000000000(0000) GS:ffff880004000000(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 00000000004b47a5 CR3: 000000000543d000 CR4: 00000000000006f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process async/2 (pid: 1400, threadinfo ffff880002f24000, task ffff88000683a080)
Stack:
 ffff880002f25da0 0000000081336f91 0000003036313a38 ffff880002f25db0
<0> ffff880002f25d70 ffff8800055bcaa0 ffff8800055bcaa0 ffff8800055d5860
<0> ffff8800055d5800 ffff8800091faa80 ffff880002f25da0 ffffffff811e8563
Call Trace:
 [<ffffffff811e8563>] sysfs_create_group+0x13/0x20
 [<ffffffff810f7724>] blk_trace_init_sysfs+0x14/0x20
 [<ffffffff8124bb90>] blk_register_queue+0x40/0x100
 [<ffffffff8125119e>] add_disk+0xae/0x160
 [<ffffffffa029e80b>] sd_probe_async+0x13b/0x210 [sd_mod]
 [<ffffffff810962a2>] async_thread+0x102/0x250
 [<ffffffff8105dc20>] ? default_wake_function+0x0/0x20
 [<ffffffff810961a0>] ? async_thread+0x0/0x250
 [<ffffffff8108dd96>] kthread+0x96/0xa0
 [<ffffffff8100c1ca>] child_rip+0xa/0x20
 [<ffffffff8108dd00>] ? kthread+0x0/0xa0
 [<ffffffff8100c1c0>] ? child_rip+0x0/0x20
Code: 8b 04 24 48 85 c0 74 27 41 83 c7 01 8b 55 bc 85 d2 74 b1 48 8b 30 48 89 df e8 76 be ff ff eb a4 48 83 7f 30 00 0f 85 49 ff ff ff <0f> 0b eb fe 48 8b 5d c8 31 d2 48 85 db 74 18 3e ff 0b 0f 94 c0 
RIP  [<ffffffff811e8487>] internal_create_group+0xf7/0x1a0
 RSP <ffff880002f25d40>
---[ end trace 986a1ae440e18ccb ]---
Kernel panic - not syncing: Fatal exception
Pid: 1400, comm: async/2 Tainted: G      D    ----------------   2.6.32-131.2.1.el6.x86_64 #1
Call Trace:
 [<ffffffff814da16e>] ? panic+0x78/0x143
 [<ffffffff814de1b4>] ? oops_end+0xe4/0x100
 [<ffffffff8100f2eb>] ? die+0x5b/0x90
 [<ffffffff814dda84>] ? do_trap+0xc4/0x160
 [<ffffffff8100ceb5>] ? do_invalid_op+0x95/0xb0
 [<ffffffff811e8487>] ? internal_create_group+0xf7/0x1a0
 [<ffffffff8118d5da>] ? ilookup5+0x4a/0x60
 [<ffffffff8100bf5b>] ? invalid_op+0x1b/0x20
 [<ffffffff811e8487>] ? internal_create_group+0xf7/0x1a0
 [<ffffffff811e8563>] ? sysfs_create_group+0x13/0x20
 [<ffffffff810f7724>] ? blk_trace_init_sysfs+0x14/0x20
 [<ffffffff8124bb90>] ? blk_register_queue+0x40/0x100
 [<ffffffff8125119e>] ? add_disk+0xae/0x160
 [<ffffffffa029e80b>] ? sd_probe_async+0x13b/0x210 [sd_mod]
 [<ffffffff810962a2>] ? async_thread+0x102/0x250
 [<ffffffff8105dc20>] ? default_wake_function+0x0/0x20
 [<ffffffff810961a0>] ? async_thread+0x0/0x250
 [<ffffffff8108dd96>] ? kthread+0x96/0xa0
 [<ffffffff8100c1ca>] ? child_rip+0xa/0x20
 [<ffffffff8108dd00>] ? kthread+0x0/0xa0
 [<ffffffff8100c1c0>] ? child_rip+0x0/0x20

Comment 6 ambadas 2011-10-04 08:15:59 UTC
kernel panic - not syncing: Attempt to kill init!

Comment 7 RHEL Program Management 2011-10-07 15:35:18 UTC
Since RHEL 6.2 External Beta has begun, and this bug remains
unresolved, it has been rejected as it is not proposed as
exception or blocker.

Red Hat invites you to ask your support representative to
propose this request, if appropriate and relevant, in the
next release of Red Hat Enterprise Linux.

Comment 9 Siddharth Nagar 2012-07-11 13:38:01 UTC
Closing as I believe this is no longer an issue. Please re-open if necessary.