Bug 471879 - [NetApp 5.3 bug] SAN boot LUN kernel panics on 5.3 snap 2
[NetApp 5.3 bug] SAN boot LUN kernel panics on 5.3 snap 2
Status: CLOSED DUPLICATE of bug 471689
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel (Show other bugs)
5.3
All Linux
high Severity urgent
: rc
: ---
Assigned To: Ben Marzinski
Martin Jenner
: OtherQA, Regression
Depends On:
Blocks: 373081
  Show dependency treegraph
 
Reported: 2008-11-17 08:29 EST by Naveen Reddy
Modified: 2009-06-20 04:09 EDT (History)
8 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2008-12-02 15:01:29 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Attaching the image(snapshot) taken during kernel panic. (135.93 KB, image/png)
2008-11-17 08:33 EST, Naveen Reddy
no flags Details
Attaching the serial console output during the panic (1001 bytes, text/plain)
2008-11-18 00:19 EST, Naveen Reddy
no flags Details

  None (edit)
Description Naveen Reddy 2008-11-17 08:29:09 EST
Description of problem: Kernel panics after the installation of RHEL5.3 GAsnapshot2 on a SANboot LUN.


Version-Release number of selected component (if applicable):
OS - RHEL5.3 GASnapshot2


How reproducible:
Always


Steps to Reproduce:
1. Install RHEL5.3 GASnapshot2 on a SANboot LUN(multipath device)
2. After installing, it will ask for reboot. Reboot the host.
3. The kernel panics.
  
Actual results:
Kernel panics after the first reboot after installation.


Expected results:
The system should boot normally wihtout any panic

Additional info:
This issue was not seen in GASnapshot1.
Attaching the image(snapshot) taken during kernel panic.
Comment 1 Naveen Reddy 2008-11-17 08:33:04 EST
Created attachment 323763 [details]
Attaching the image(snapshot) taken during kernel panic.
Comment 2 Tom Coughlan 2008-11-17 09:54:00 EST
Naveen,

It would be helpful if you can get the serial console output, showing all the
boot messages prior to the crash. Thanks.

Tom
Comment 3 Tom Coughlan 2008-11-17 10:14:38 EST
This worked in snap 1 and fails in snap 2. 

Any ideas from the Anaconda team?
Comment 4 Hans de Goede 2008-11-17 10:54:21 EST
Naveen,

This looks like software raid (mirroring) not being recognized as such, have you configured software raid during installation?

Can you please tell us what kind of partitioning scheme you are using (software raid or not, lvm or not, etc.) ?
Comment 5 Naveen Reddy 2008-11-18 00:19:39 EST
Created attachment 323847 [details]
Attaching the serial console output during the panic
Comment 6 Naveen Reddy 2008-11-18 00:41:32 EST
(In reply to comment #4)
> Naveen,
> This looks like software raid (mirroring) not being recognized as such, have
> you configured software raid during installation?
> Can you please tell us what kind of partitioning scheme you are using (software
> raid or not, lvm or not, etc.) ?

Iam going with the default installation scheme.
So that includes LVM. And software RAID is not used.
Comment 8 Andrius Benokraitis 2008-11-18 10:28:55 EST
Naveen - I've been informed by the anaconda team to try the Snap 3 bits when they are released and report your findings here if you could... a lot of changes are going into Snap 3.
Comment 9 Naveen Reddy 2008-11-18 10:43:04 EST
Hi Andrius,

Ok. Then I will try with the snapshot3 and will post the results.
Comment 10 Tom Coughlan 2008-11-18 11:20:56 EST
(In reply to comment #5)
> Created an attachment (id=323847) [details]
> Attaching the serial console output during the panic

Humm, unfortunately not much to go on there.

It looks like you have done an install to a multipath lpfc Fibre Channel disk. 

The source of the problem is that the root volume is not found:

device-mapper: table: 253:2: linear: dm-linear: Device lookup failed

The other problem: "Found duplicate PV" is because there are multiple paths to the PV. This is discussed here:

http://kbase.redhat.com/faq/FAQ_96_11252.shtm

Were you seeing the duplicate PV message previously (5.3 snapshot 1, or 5.2)? 

Will the system boot if you disconnect all but one path to the Fibre Channel boot/root disk?

Tom
Comment 11 Naveen Reddy 2008-11-20 02:17:08 EST
This problem is still seen on Snapshot3.
Comment 12 Naveen Reddy 2008-11-21 00:32:17 EST
(In reply to comment #10)
> (In reply to comment #5)
> > Created an attachment (id=323847) [details] [details]
> > Attaching the serial console output during the panic
> Humm, unfortunately not much to go on there.
> It looks like you have done an install to a multipath lpfc Fibre Channel disk. 
> The source of the problem is that the root volume is not found:
> device-mapper: table: 253:2: linear: dm-linear: Device lookup failed
> The other problem: "Found duplicate PV" is because there are multiple paths to
> the PV. This is discussed here:
> http://kbase.redhat.com/faq/FAQ_96_11252.shtm
> Were you seeing the duplicate PV message previously (5.3 snapshot 1, or 5.2)? 
> Will the system boot if you disconnect all but one path to the Fibre Channel
> boot/root disk?
> Tom

I did not see these duplicate PV messages previously on 5.2. 
I installed OS on SANboot LUN (with multiple paths to it) and then I disconnected all paths but one. Still kernel did panic.
Comment 13 Ben Marzinski 2008-11-21 15:11:52 EST
Did you try taking multipath out of the picture, and seeing if this happens when you try to setup a SANboot system with the root LVM directly on to of the scsi device (instead of on top of the multipath device)?   You said that you disconnected all the paths but one, but did you reinstall after that without multipath?
Comment 14 Naveen Reddy 2008-11-21 23:44:08 EST
The installation on top of scsi device is successful. No panic in this scenario.
Comment 15 Ben Marzinski 2008-12-02 03:09:28 EST
Well, I'm not exactly sure what is wrong yet, but I know that the bug is in nash.
Comment 16 Ben Marzinski 2008-12-02 14:57:43 EST
This issue appears to be the same as bz #471879, which is fixed in nash-5.1.19.6-41
Comment 17 Ben Marzinski 2008-12-02 15:01:00 EST
oops. I meant to say that this is the same as bz #471689.
Comment 18 Ben Marzinski 2008-12-02 15:01:29 EST

*** This bug has been marked as a duplicate of bug 471689 ***
Comment 19 Denise Dumas 2008-12-02 15:14:24 EST
And nash-5.1.19.6-41 will be included in Snapshot 5.  Naveen, thanks for your patience with this.
Comment 20 Naveen Reddy 2008-12-09 01:03:59 EST
This issue is fixed in Snapshot5.

Note You need to log in before you can comment on or make changes to this bug.