Bug 471879

Summary: [NetApp 5.3 bug] SAN boot LUN kernel panics on 5.3 snap 2
Product: Red Hat Enterprise Linux 5 Reporter: Naveen Reddy <naveenr>
Component: kernelAssignee: Ben Marzinski <bmarzins>
Status: CLOSED DUPLICATE QA Contact: Martin Jenner <mjenner>
Severity: urgent Docs Contact:
Priority: high    
Version: 5.3CC: agk, andriusb, coughlan, ddumas, hdegoede, mbroz, naveenr, xdl-redhat-bugzilla
Target Milestone: rcKeywords: OtherQA, Regression
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2008-12-02 20:01:29 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 373081    
Attachments:
Description Flags
Attaching the image(snapshot) taken during kernel panic.
none
Attaching the serial console output during the panic none

Description Naveen Reddy 2008-11-17 13:29:09 UTC
Description of problem: Kernel panics after the installation of RHEL5.3 GAsnapshot2 on a SANboot LUN.


Version-Release number of selected component (if applicable):
OS - RHEL5.3 GASnapshot2


How reproducible:
Always


Steps to Reproduce:
1. Install RHEL5.3 GASnapshot2 on a SANboot LUN(multipath device)
2. After installing, it will ask for reboot. Reboot the host.
3. The kernel panics.
  
Actual results:
Kernel panics after the first reboot after installation.


Expected results:
The system should boot normally wihtout any panic

Additional info:
This issue was not seen in GASnapshot1.
Attaching the image(snapshot) taken during kernel panic.

Comment 1 Naveen Reddy 2008-11-17 13:33:04 UTC
Created attachment 323763 [details]
Attaching the image(snapshot) taken during kernel panic.

Comment 2 Tom Coughlan 2008-11-17 14:54:00 UTC
Naveen,

It would be helpful if you can get the serial console output, showing all the
boot messages prior to the crash. Thanks.

Tom

Comment 3 Tom Coughlan 2008-11-17 15:14:38 UTC
This worked in snap 1 and fails in snap 2. 

Any ideas from the Anaconda team?

Comment 4 Hans de Goede 2008-11-17 15:54:21 UTC
Naveen,

This looks like software raid (mirroring) not being recognized as such, have you configured software raid during installation?

Can you please tell us what kind of partitioning scheme you are using (software raid or not, lvm or not, etc.) ?

Comment 5 Naveen Reddy 2008-11-18 05:19:39 UTC
Created attachment 323847 [details]
Attaching the serial console output during the panic

Comment 6 Naveen Reddy 2008-11-18 05:41:32 UTC
(In reply to comment #4)
> Naveen,
> This looks like software raid (mirroring) not being recognized as such, have
> you configured software raid during installation?
> Can you please tell us what kind of partitioning scheme you are using (software
> raid or not, lvm or not, etc.) ?

Iam going with the default installation scheme.
So that includes LVM. And software RAID is not used.

Comment 8 Andrius Benokraitis 2008-11-18 15:28:55 UTC
Naveen - I've been informed by the anaconda team to try the Snap 3 bits when they are released and report your findings here if you could... a lot of changes are going into Snap 3.

Comment 9 Naveen Reddy 2008-11-18 15:43:04 UTC
Hi Andrius,

Ok. Then I will try with the snapshot3 and will post the results.

Comment 10 Tom Coughlan 2008-11-18 16:20:56 UTC
(In reply to comment #5)
> Created an attachment (id=323847) [details]
> Attaching the serial console output during the panic

Humm, unfortunately not much to go on there.

It looks like you have done an install to a multipath lpfc Fibre Channel disk. 

The source of the problem is that the root volume is not found:

device-mapper: table: 253:2: linear: dm-linear: Device lookup failed

The other problem: "Found duplicate PV" is because there are multiple paths to the PV. This is discussed here:

http://kbase.redhat.com/faq/FAQ_96_11252.shtm

Were you seeing the duplicate PV message previously (5.3 snapshot 1, or 5.2)? 

Will the system boot if you disconnect all but one path to the Fibre Channel boot/root disk?

Tom

Comment 11 Naveen Reddy 2008-11-20 07:17:08 UTC
This problem is still seen on Snapshot3.

Comment 12 Naveen Reddy 2008-11-21 05:32:17 UTC
(In reply to comment #10)
> (In reply to comment #5)
> > Created an attachment (id=323847) [details] [details]
> > Attaching the serial console output during the panic
> Humm, unfortunately not much to go on there.
> It looks like you have done an install to a multipath lpfc Fibre Channel disk. 
> The source of the problem is that the root volume is not found:
> device-mapper: table: 253:2: linear: dm-linear: Device lookup failed
> The other problem: "Found duplicate PV" is because there are multiple paths to
> the PV. This is discussed here:
> http://kbase.redhat.com/faq/FAQ_96_11252.shtm
> Were you seeing the duplicate PV message previously (5.3 snapshot 1, or 5.2)? 
> Will the system boot if you disconnect all but one path to the Fibre Channel
> boot/root disk?
> Tom

I did not see these duplicate PV messages previously on 5.2. 
I installed OS on SANboot LUN (with multiple paths to it) and then I disconnected all paths but one. Still kernel did panic.

Comment 13 Ben Marzinski 2008-11-21 20:11:52 UTC
Did you try taking multipath out of the picture, and seeing if this happens when you try to setup a SANboot system with the root LVM directly on to of the scsi device (instead of on top of the multipath device)?   You said that you disconnected all the paths but one, but did you reinstall after that without multipath?

Comment 14 Naveen Reddy 2008-11-22 04:44:08 UTC
The installation on top of scsi device is successful. No panic in this scenario.

Comment 15 Ben Marzinski 2008-12-02 08:09:28 UTC
Well, I'm not exactly sure what is wrong yet, but I know that the bug is in nash.

Comment 16 Ben Marzinski 2008-12-02 19:57:43 UTC
This issue appears to be the same as bz #471879, which is fixed in nash-5.1.19.6-41

Comment 17 Ben Marzinski 2008-12-02 20:01:00 UTC
oops. I meant to say that this is the same as bz #471689.

Comment 18 Ben Marzinski 2008-12-02 20:01:29 UTC

*** This bug has been marked as a duplicate of bug 471689 ***

Comment 19 Denise Dumas 2008-12-02 20:14:24 UTC
And nash-5.1.19.6-41 will be included in Snapshot 5.  Naveen, thanks for your patience with this.

Comment 20 Naveen Reddy 2008-12-09 06:03:59 UTC
This issue is fixed in Snapshot5.