Bug 751351 - Kickstart --useexisting LVM Fails on HP CCISS Hardware on RHEL 5.7
Summary: Kickstart --useexisting LVM Fails on HP CCISS Hardware on RHEL 5.7
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: anaconda
Version: 5.7
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: rc
: ---
Assignee: Martin Kolman
QA Contact: Release Test Team
URL:
Whiteboard:
Depends On:
Blocks: 921048 928844
TreeView+ depends on / blocked
 
Reported: 2011-11-04 13:37 UTC by Stephen Benjamin
Modified: 2013-10-01 00:30 UTC (History)
4 users (show)

Fixed In Version: anaconda-11.1.2.263-1
Doc Type: Bug Fix
Doc Text:
Cause: The lvm tool on cciss devices returns pv paths delimited by ! but Anaconda uses / delimited paths. Consequence: Installation failed in some cases on cciss devices. Fix: Replace ! by / in lvm tool output. Result: Installation now works fine in the cases where it previously failed.
Clone Of:
Environment:
Last Closed: 2013-10-01 00:30:07 UTC
Target Upstream Version:


Attachments (Terms of Use)
Anaconda log from failed kickstart (39.86 KB, text/plain)
2011-11-04 13:37 UTC, Stephen Benjamin
no flags Details
Kickstart file (sensitive data in %post removed) (2.28 KB, text/plain)
2011-11-04 13:39 UTC, Stephen Benjamin
no flags Details
An instrumeted updates image for getting more data about the bug (BZ 751351) (1.41 MB, application/octet-stream)
2013-05-16 15:52 UTC, Martin Kolman
no flags Details
anaconda.log with update image cciss_updates_v1.img (41.44 KB, text/plain)
2013-05-17 08:54 UTC, Michal Kovarik
no flags Details
instrumented debug image V2 (1.41 MB, application/octet-stream)
2013-05-27 20:17 UTC, Martin Kolman
no flags Details
instrumented debug image V3 (1.41 MB, application/octet-stream)
2013-05-27 20:18 UTC, Martin Kolman
no flags Details
anaconda.log V2 (59.19 KB, text/plain)
2013-05-28 13:14 UTC, Michal Kovarik
no flags Details
anaconda.log V3 (59.28 KB, text/plain)
2013-05-28 13:14 UTC, Michal Kovarik
no flags Details
Kistart from atodorov modified to work with virtio disks (1.91 KB, text/plain)
2013-07-22 18:23 UTC, Martin Kolman
no flags Details
Kickstart from atodorov modified to work with virtio disks - reversed disk order in second pass (1.91 KB, text/plain)
2013-07-22 20:04 UTC, Martin Kolman
no flags Details
LVM layout after second pass with reversed disk order (10.07 KB, image/png)
2013-07-22 20:06 UTC, Martin Kolman
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2013:1354 0 normal SHIPPED_LIVE anaconda bug fix update 2013-09-30 21:12:35 UTC

Description Stephen Benjamin 2011-11-04 13:37:33 UTC
Created attachment 531767 [details]
Anaconda log from failed kickstart

Description of problem:

I have a kickstart that generates 1 of 2 different possible partitioning schemes -- one for a new machine, one for rebuilding a machine.  The goal is preservation of /appdata if it already exists.  

In the case of a new build, kickstart is successful.  The second, however fails on RHEL 5.7 with this message from Anaconda:

"The following errors occured with your partitioning:

You have not defined a root partition (/), which is required for installation of Red Hat Enterprise Linux Server to continue."

However, this exact kickstart works fine on this same blade when using RHEL 5.3 media.


Version-Release number of selected component (if applicable):
RHEL 5.7

How reproducible:
Always

Steps to Reproduce:
1.  Kickstart a new machine with attached kickstart
2.  Kickstart succeeds (1st partitioning scheme)
3.  Kickstart again (2nd partitioning scheme -- lv_nfs exists)

Actual results:
Anaconda fails with the error message above

Expected results:
Anaconda uses the existing logical volumes, formatting all except for lv_nfs (/appdata).


Additional info:

This kickstart works with RHEL 5.7 in a KVM virtual machine using VirtIO drives.  I don't know if this is specific to HP Smart Arrays,but anaconda.log contains the lines below.  Is it possible something is not being escaped correctly in Anaconda?

Volume group vg_sys using non-existent partition /dev/cciss!c1d0p2
Volume group vg_cluster using non-existent partition /dev/cciss!c0d0p1

Additionally, since my %pre script is using lvs, I thought this might be related to BZ#652417, however, using lvm vgchange -an doesn't help.

Comment 1 Stephen Benjamin 2011-11-04 13:39:10 UTC
Created attachment 531768 [details]
Kickstart file (sensitive data in %post removed)

Comment 2 Stephen Benjamin 2012-01-20 15:19:11 UTC
We ended up resolving this by handling /appdata in %pre entirely and removing this from anaconda's control. 

Not an ideal solution, and this is still probably a bug in anaconda in RHEL 5.7.

Comment 5 Martin Kolman 2013-05-16 15:52:11 UTC
Created attachment 748900 [details]
An instrumeted updates image for getting more data about the bug (BZ 751351)

Comment 6 Martin Kolman 2013-05-16 15:58:27 UTC
Looks like I need some more information, so I've created an instrumented updates image (cciss_updates_v1.img) that logs quite a lot of additional data, which should help me fix the bug. 

Could you run it on the affected hardware and send me the anaconda log file ? Thanks in advance !

Comment 7 Stephen Benjamin 2013-05-16 17:28:26 UTC
Hi Martin, I was a consultant on-site at the customer, I haven't been to this customer in a while.  I'll ask them if they still have the old RHEL5 build infrastructure in place to try this, but I think they've completely moved to RHEL 6 for the HP blades (and not even using this /appdata kickstart code anymore).

Comment 8 Michal Kovarik 2013-05-17 08:54:00 UTC
Created attachment 749258 [details]
anaconda.log with update image cciss_updates_v1.img

Hi Martin,
I am able to provide you desired informations.

Comment 9 Martin Kolman 2013-05-22 12:56:37 UTC
From closer examination, it seems that the sequence of commands in the attached kickstart should not work at all, because:

First pass:
* creates 2 partitions on c0d0
 - /boot
 - pv.01
* creates 1 partition on c1d0
 - pv.02

Second pass
* destroys pv.02 with "part /boot --onpart=cciss/c1d0p1" (using just --onpart will still format the partition with default filesystem, to skip formating, both --onpart and --noformat need to be used at the same time according to the docs[1])
- "pv.02" is the first & only partition on c1d0, so c1d0p1 in the part command can be referring only to it
* as a result, "logvol /appdata --noformat --name=lv_nfs --vgname=vg_cluster" will fail, as "vg_cluster" was on "pv.02", that just got overwritten by /boot

So unless there is some additional smart-array related magic in action, the kickstart should not work in the first place and if it ever did, it might have been due to some now fixed regression or partitioning changes done outside of Anaconda.

[1] https://access.redhat.com/site/documentation/en-US/Red_Hat_Enterprise_Linux/5/html/Installation_Guide/s1-kickstart2-options.html

Comment 10 Stephen Benjamin 2013-05-23 07:39:52 UTC
First pass:
--driveorder=cciss/c1d0,cciss/c0d0

Second pass: 
--driveorder=cciss/c0d0,cciss/c1d0



"* as a result, "logvol /appdata --noformat --name=lv_nfs --vgname=vg_cluster" will fail, as "vg_cluster" was on "pv.02", that just got overwritten by /boot"

That's an incorrect understanding of what was happening.  For some reason in the second pass the drive order was reversed, pv.02 is on c0d0 in the second pass.  I don't really remember why this was done, maybe in a later step after initial provisioning we were doing something about order of the storage controllers.  I'll have to check with the customer.

But this snippet of kickstart worked for years up to and including RHEL 5.6 and stopped working in 5.7.

Comment 11 Martin Kolman 2013-05-23 16:28:14 UTC
(In reply to Stephen Benjamin from comment #10)
> First pass:
> --driveorder=cciss/c1d0,cciss/c0d0
> 
> Second pass: 
> --driveorder=cciss/c0d0,cciss/c1d0
> 
> 
> 
> "* as a result, "logvol /appdata --noformat --name=lv_nfs
> --vgname=vg_cluster" will fail, as "vg_cluster" was on "pv.02", that just
> got overwritten by /boot"
> 
> That's an incorrect understanding of what was happening.  For some reason in
> the second pass the drive order was reversed, pv.02 is on c0d0 in the second
> pass.  I don't really remember why this was done, maybe in a later step
> after initial provisioning we were doing something about order of the
> storage controllers.  I'll have to check with the customer.
Oh, so in the second pass c0d0 = c1d0 and c1d0 = c0d0 ? Yeah, that would explain it your log.

BTW, but even like this, it could still fail, because of partition ordering. Even though:
"part /boot --fstype=ext3 --size=100 --ondisk=cciss/c0d0 --asprimary"
is run before
"part pv.01 --size=135000 --grow --asprimary --ondisk=cciss/c0d0"
it does not mean Anaconda will create the partitions in this order. Citing the Anaconda kickstart wiki page[1]:
"Anaconda may create partitions in any particular order, so it is safer to use labels than absolute partition names."
While this note does not appear in the the RHEL 5 kickstart documentation, it is mentioned behaving like this on RHEL 5 in bug 410011 comment 21

So Anaconda could theoretically create pv.01 as the first partition and /boot as second, meaning that pv.01 would be obliterated by 
"part /boot --onpart=cciss/c1d0p1"
in the second pass, together with the vg_sys that hosts / & swap.

> 
> But this snippet of kickstart worked for years up to and including RHEL 5.6
> and stopped working in 5.7.
Thanks! This information makes the period this bug got introduced in quite a bit narrower.

[1] http://fedoraproject.org/wiki/Anaconda/Kickstart#part_or_partition

Comment 12 Martin Kolman 2013-05-27 20:17:29 UTC
Created attachment 753674 [details]
instrumented debug image V2

Comment 13 Martin Kolman 2013-05-27 20:18:16 UTC
Created attachment 753675 [details]
instrumented debug image V3

Comment 14 Martin Kolman 2013-05-27 20:24:44 UTC
(In reply to Michal Kovarik from comment #8)
> Created attachment 749258 [details]
> anaconda.log with update image cciss_updates_v1.img
> 
> Hi Martin,
> I am able to provide you desired informations.
Thanks a lot for the logs !

Based on them, I made further instrumented images (V2 and V3) - could you run them with the same kickstart and hardware configuration & post the logs ?
Thanks in advance !

BTW, the difference between the two images:
V2 - basically V1 with more debugging messages
V3 - V2 with partition name parsing reverted to 5.6 behavior

Comment 15 Martin Kolman 2013-05-28 09:34:56 UTC
I'll just add that the V2 and V3 images are configured to wait indefinitely once Anaconda exits (for easier debugging), so If are getting that, it is not a bug.

Comment 16 Michal Kovarik 2013-05-28 13:14:08 UTC
Created attachment 753917 [details]
anaconda.log V2

Comment 17 Michal Kovarik 2013-05-28 13:14:38 UTC
Created attachment 753918 [details]
anaconda.log V3

Comment 20 Alexander Todorov 2013-06-10 10:58:55 UTC
Now testing with RHEL5.10-Server-20130530.0, anaconda-11.1.2.263-2


                 +-----------------+ Error +-----------------+                  
                 |                                           |                  
                 | Error mounting device vg_cluster/lv_nfs   |                  
                 | as /appdata: Invalid argument             |                  
                 |                                           |                  
                 | This most likely means this partition     |                  
                 | has not been formatted.                   |                  
                 |                                           |                  
                 | Press OK to reboot your system.           |                  
                 |                                           |                  
                 |                  +----+                   |                  
                 |                  | OK |                   |                  
                 |                  +----+                   |                  
                 |                                           |                  
                 |                                           |                  
                 +-------------------------------------------+         



Martin, what does this mean?

Comment 21 Martin Kolman 2013-06-10 15:21:51 UTC
(In reply to Alexander Todorov from comment #20)
> Now testing with RHEL5.10-Server-20130530.0, anaconda-11.1.2.263-2
> 
> 
>                  +-----------------+ Error +-----------------+              
> 
>                  |                                           |              
> 
>                  | Error mounting device vg_cluster/lv_nfs   |              
> 
>                  | as /appdata: Invalid argument             |              
> 
>                  |                                           |              
> 
>                  | This most likely means this partition     |              
> 
>                  | has not been formatted.                   |              
> 
>                  |                                           |              
> 
>                  | Press OK to reboot your system.           |              
> 
>                  |                                           |              
> 
>                  |                  +----+                   |              
> 
>                  |                  | OK |                   |              
> 
>                  |                  +----+                   |              
> 
>                  |                                           |              
> 
>                  |                                           |              
> 
>                  +-------------------------------------------+         
> 
> 
> 
> Martin, what does this mean?
Have you changed the order of the disks for the second pass as mentioned by Stephen in comment 10 ?

Comment 22 Alexander Todorov 2013-06-11 06:26:42 UTC
Yes, see kickstart config in comment #19

Comment 23 Martin Kolman 2013-06-13 14:27:24 UTC
Indeed, the kickstart you are using changes the order of the disks in the second pass, but that expects that the physical order of the disks also changed, as mentioned by Stephen in comment 10. That's probably what is causing the error you are seeing.
(As c0d1p1, which holds the pv.02 partition will be overwritten by the boot partition and therefore vg_cluster and all its LVs are gone and can't be reused.)

I see two possible solutions:
1) swap the physical location of the disks (c0d0 -> c0d1, c0d1 -> c0d0) and run the unchanged kickstart

2) don't swap the order of the disks in the kickstart for the second pass
The second pass would then look like this:
    echo "bootloader --location mbr --driveorder=cciss/c0d0,cciss/c0d1" > /tmp/part-include
    echo "part /boot/efi --onpart=cciss/c0d0p1" >> /tmp/part-include
    echo "volgroup vg_sys --useexisting" >> /tmp/part-include
    echo "logvol / --fstype=ext3 --useexisting --name=lv_root --vgname=vg_sys" >> /tmp/part-include
    echo "logvol swap --fstype=swap --useexisting --name=lv_swap --vgname=vg_sys" >> /tmp/part-include
    echo "volgroup vg_cluster --noformat" >> /tmp/part-include
    echo "logvol /appdata --noformat --name=lv_nfs --vgname=vg_cluster" >> /tmp/part-include

Comment 24 Alexander Todorov 2013-06-14 07:52:25 UTC
Hi Stephen,
did you physically change the order of disks in the system?

Comment 25 Alexander Todorov 2013-06-14 07:59:05 UTC
Hi Martin,
I don't think I need to physically swap the disks because: 


1) I didn't do it and I was able to reproduce the exact same traceback with 5.7. How is it possible to reproduce if I had to swap the disks physically and I didn't?

2) In the given kickstart snippet:

First pass:
bootloader --location mbr --driveorder=cciss/c0d0,cciss/c0d1
clearpart --drives=cciss/c0d0,cciss/c0d1 --all --initlabel
part /boot/efi --fstype=vfat --size=100 --ondisk=cciss/c0d0 --asprimary


/boot/efi lands on 1st partition of the first disk (c0d0p1).

Second pass:
bootloader --location mbr --driveorder=cciss/c0d1,cciss/c0d0
part /boot/efi --onpart=cciss/c0d1p1


Now c0d0 becomes c0d1 because the drive order in kickstart is reversed. So c0d1p1 is the same partition where /boot/efi originally was in the first place.

Comment 26 Stephen Benjamin 2013-06-17 09:16:57 UTC
Hi all,

There was nothing that needed to be done on the hardware side in terms of swapping the disks before the second pass.  I don't even remember why the --driveorder is reversed, although I guess I had it like that for some reason.

What happens if you use --driveorder=cciss/c0d0,cciss/c0d1 on both passes?

Comment 27 Alexander Todorov 2013-07-12 11:08:45 UTC
Update:

testing with the latest available tree: 

RHEL5.10-Server-20130701.4/anaconda-11.1.2.263-2.ia64

Tested with steps from comment #19, the result is as in comment #20. Now going to test comment #26.

Comment 28 Alexander Todorov 2013-07-12 12:26:36 UTC
(In reply to Stephen Benjamin from comment #26)
> Hi all,
> 
> There was nothing that needed to be done on the hardware side in terms of
> swapping the disks before the second pass.  I don't even remember why the
> --driveorder is reversed, although I guess I had it like that for some
> reason.
> 
> What happens if you use --driveorder=cciss/c0d0,cciss/c0d1 on both passes?


See comment #20. That happens. Moving back to ASSIGNED.

Comment 29 Martin Kolman 2013-07-18 15:03:35 UTC
I'll just add that --driveorder does not change order of the disks in any way, it just sets the boot order (it's parameter for the bootloader kickstart command after all). Citing the documentation[1]:
"
--driveorder
 — Specify which drive is first in the BIOS boot order. For example:

bootloader --driveorder=sda,hda

"

So --driveorder should be completely irrelevant to this issue.

[1] https://access.redhat.com/site/documentation/en-US/Red_Hat_Enterprise_Linux/5/html/Installation_Guide/s1-kickstart2-options.html

Comment 30 Alexander Todorov 2013-07-19 08:35:49 UTC
Hi Martin,
I need to know if what I see is expected after the fix or another bug? When you patched this what was the supposed behavior ?

Comment 31 Martin Kolman 2013-07-22 18:23:00 UTC
Created attachment 777003 [details]
Kistart from atodorov modified to work with virtio disks

Comment 32 Martin Kolman 2013-07-22 18:40:55 UTC
(In reply to Alexander Todorov from comment #30)
> Hi Martin,
> I need to know if what I see is expected after the fix or another bug? 
I think it's a different bug or an issue wuth the kickstart itself. 

I've modified your kickstart for use with virtio drives, only changing:
cciss/c0d0 -> vda
cciss/c0d0 -> vdb
and using /boot in place of /boot/efi

The kickstart successfully completes the first pass, but fails on the second pass with:

"An error occurred trying to format vdb1. This problem is serious, and the install cannot continue."

While not being the exactly same error you are getting, if the error was smartarray/cciss related, the kickstart should work fine when using virtual drives.

So that's why I think this is unrelated to the original cciss bug.


>When you patched this what was the supposed behavior ?
The second pass should run to the end and you should get a bootable install. Before submitting the patch, I've tested it on a single disk cciss machine and it worked fine.

Comment 33 Martin Kolman 2013-07-22 20:04:58 UTC
Created attachment 777049 [details]
Kickstart from atodorov modified to work with virtio disks - reversed disk order in second pass

Comment 34 Martin Kolman 2013-07-22 20:06:22 UTC
Created attachment 777050 [details]
LVM layout after second pass with reversed disk order

Comment 35 Martin Kolman 2013-07-22 20:10:36 UTC
For the record, I've changed the order of the disks in the second pass (kickstart is attached). This causes the second pass to finish successfully and create the desired partition layout (see screenshot).

I think this conclusively shows that the disk order for the second pass is wrong in the original kickstart.

Comment 36 Alexander Todorov 2013-07-23 08:30:23 UTC
(In reply to Martin Kolman from comment #35)
> For the record, I've changed the order of the disks in the second pass
> (kickstart is attached). This causes the second pass to finish successfully
> and create the desired partition layout (see screenshot).
> 

Note:

this comment refers to the line:

echo "part /boot --onpart=vdb1" >> /tmp/part-include

in comment #31 vs. comment #34

Comment 37 Alexander Todorov 2013-07-23 10:58:23 UTC
Testing again on hp-rx2660-01.rhts.eng.brq.redhat.com with disk order reversed, as suggested by Martin:

text
key --skip
install
reboot
auth  --useshadow  --enablemd5 
firewall --disable
firstboot --disable
selinux --disabled
keyboard us
lang en_US
timezone --utc Etc/UTC
rootpw ......
skipx

%include /tmp/part-include

%packages
@core
@base

%pre 
(
#!/bin/bash
#disk1=""
#disk2=""

# let's see if this appliance was bootstrapped before
lvm lvs | grep -q lv_nfs
declare -i ret=$?
# BZ 652417 (?):
lvm vgchange -an

if [ "$ret" -eq 0 ]; then
    echo "bootloader --location mbr --driveorder=cciss/c0d1,cciss/c0d0" > /tmp/part-include
    echo "part /boot/efi --onpart=cciss/c0d0p1" >> /tmp/part-include
    echo "volgroup vg_sys --useexisting" >> /tmp/part-include
    echo "logvol / --fstype=ext3 --useexisting --name=lv_root --vgname=vg_sys" >> /tmp/part-include
    echo "logvol swap --fstype=swap --useexisting --name=lv_swap --vgname=vg_sys" >> /tmp/part-include
    echo "volgroup vg_cluster --noformat" >> /tmp/part-include
    echo "logvol /appdata --noformat --name=lv_nfs --vgname=vg_cluster" >> /tmp/part-include
else   
    echo "bootloader --location mbr --driveorder=cciss/c0d0,cciss/c0d1" > /tmp/part-include
    echo "clearpart --drives=cciss/c0d0,cciss/c0d1 --all --initlabel" >> /tmp/part-include
    echo "part /boot/efi --fstype=vfat --size=100 --ondisk=cciss/c0d0 --asprimary" >> /tmp/part-include
    echo "part pv.01 --size=1350 --grow --asprimary --ondisk=cciss/c0d0" >> /tmp/part-include
    echo "volgroup vg_sys --pesize=32768 pv.01" >> /tmp/part-include
    echo "logvol / --fstype ext3 --name=lv_root --vgname=vg_sys --size=3000 --maxsize=20000 --grow" >> /tmp/part-include
    echo "logvol swap --fstype swap --name=lv_swap --vgname=vg_sys --size=1024 --maxsize=4096" >> /tmp/part-include
    echo "part pv.02 --size=8000 --grow --ondisk=cciss/c0d1 --asprimary" >> /tmp/part-include
    echo "volgroup vg_cluster --pesize=32768 pv.02" >> /tmp/part-include
    echo "logvol /appdata --fstype ext3 --name=lv_nfs --vgname=vg_cluster --size=7500" >> /tmp/part-include
fi
) > /tmp/ks-pre.log 2>&1



Installation works without issues and the previously created /appdata directory is present and content is intact. 

Moving to VERIFIED. Please re-open if seen again.

Comment 38 errata-xmlrpc 2013-10-01 00:30:07 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-1354.html


Note You need to log in before you can comment on or make changes to this bug.