Bug 826425

Summary: [NetApp 5.7 z- KVM Bug] Host Loss seen on Qlogic FC SAN Booted LUN.
Product: Red Hat Enterprise Linux 5 Reporter: Ranjan <ranjan.kumar>
Component: device-mapper-multipathAssignee: Ben Marzinski <bmarzins>
Status: CLOSED INSUFFICIENT_DATA QA Contact: Storage QE <storage-qe>
Severity: high Docs Contact:
Priority: unspecified    
Version: 5.7CC: agk, bdonahue, bmarzins, bmr, dwysocha, heinzm, jwest, msnitzer, prajnoha, prockai, xdl-redhat-bugzilla, zkabelac
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-02-06 19:26:09 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Ranjan 2012-05-30 07:37:57 UTC
Description of problem:

A SAN Booted RHEL 5.7z-KVM host, with its root LUN mapped to NetApp Storage array over FC, hangs during IO with fabric faults. 
When the host hangs, no network service (ssh/telnet/rsh) is operational. Multiple kernel messages logged in syslog reporting about CPU lock-ups (soft lockup - CPU#10 stuck for X seconds) on different processes like quemu-kvm,multipathd,kjournald etc

Version-Release number of selected component (if applicable):
Host Kernel : 2.6.18-274.18.1.el5
Multipath (device-mapper-multipath-0.4.7-46.el5)
LVM2 (lvm2-2.02.84-6.el5)
HBA : Qlogic 2562
Driver : 8.03.07.03.05.07-k 
Firmware : 5.03.16 (95)

How reproducible:
frequently.

Steps to Reproduce:
1. Map 40 LUNs (with 4 FC paths each, i.e, 160 SCSI devices) from controllers
and configure multipath devices on the host.
2.Create 10 LVs on the dm-multipath devices.
3.Create 4 VMs on 4 LVs. And map rest of the LVs to the VMs.
4.Start IO to the LVs and Introduce fabric faults repeatedly.

  
Actual results:
Hypervisor hangs.

Expected results:
Should not hang.

Additional info:
It happens only on Qlogic host.

Comment 1 loberman 2012-09-16 03:40:19 UTC
Hello

Can you be more specific about the comment "introduce fabric faults"
Also how long does the hypervisor hang?, does it recover.
Can you share the message logs.

More data is needed here please.

Comment 2 Zdenek Kabelac 2013-05-14 19:32:43 UTC
Is it possible to obtain kernel traces - so it could be seen, where the process are being stuck ?

Comment 3 RHEL Program Management 2014-01-29 10:36:50 UTC
This request was evaluated by Red Hat Product Management for inclusion
in a Red Hat Enterprise Linux release.  Product Management has
requested further review of this request by Red Hat Engineering, for
potential inclusion in a Red Hat Enterprise Linux release for currently
deployed products.  This request is not yet committed for inclusion in
a release.

Comment 4 Red Hat Bugzilla 2023-09-14 01:29:29 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days