Bug 208947

Summary: problems with preexisting swraid
Product: [Fedora] Fedora Reporter: Clyde E. Kunkel <clydekunkel7734>
Component: anacondaAssignee: Peter Jones <pjones>
Status: CLOSED WONTFIX QA Contact:
Severity: high Docs Contact:
Priority: medium    
Version: rawhideCC: dcantrell, gilboad
Target Milestone: ---   
Target Release: ---   
Hardware: i386   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2007-03-16 12:57:38 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 150224    
Attachments:
Description Flags
per request in comment #2
none
per request in comment #2
none
lspci output from console 2 of attempted network install from rawhide
none
anaconda log from attempted rawhide network install
none
syslog from attempted rawhide network install
none
anaconda log from attempted 10/11/2006 rawhide network install
none
syslog from attempted rawhide 10/11/2006 network install
none
anaconda dump subsequent to selecting 'Use free space on selected...'
none
FC6 Final install syslog after FC5 had been booted
none
FC6 Final install syslog after Rawhide had been booted none

Description Clyde E. Kunkel 2006-10-02 18:58:50 UTC
Description of problem:
Using DVD of FC6 prerelease (T4), anaconda does not detect all LVs.
System is a test system that uses ordinary ext3 boot partitions, ordinary swap
partitions, and LVs over software raid5 for all other filesystems.


Version-Release number of selected component (if applicable):
FC6T4

How reproducible:
Boot DVD, use linux nompath to get by early failure (bz 208423)

Steps to Reproduce:
1. Boot FC6T4 DVD, specify linux nompath
2. select custom partition layout
3. see that the preexisting LVs on VolGroup1 not present, which is where the
root directory was to be created.
  
Actual results:
VolGroup1 not displayed, VolGroup0 was displayed.

Expected results:
All VolGroups displayed, allowing selection of root LV

Additional info:
Went to console 2 and found that software raid 5 devices not seen and LVs not
seen.  Started raid 5 devices using mdadm --assemble /dev/md0 and md1 using
appropriate --UUID for both.  Then did a mdadm --detail /dev/md0 and md1, all
ok, then vgscan -ay and all volgroups seen and all LVs.  Went back to anaconda,
clicked back, back, then next.  Now all VolGroups and LVs are seen.  Setup
partitions scheme selected , same as I did with FC6T3 DVD: boot on /dev/hda19,
format ext3, root on VolGroup1/FC6T3, format ext3, /home on
VolGroup1/FedoraHomes, noformat.  Continued to software selection, then the
following error box popped up after dependencies were checked and install selected:
An error occurred trying to format VolGroup1/FC6T3. This problem is serious, and
the install cannot continue. Press (enter) to reboot the system.

I went to console 2, and the VolGroup1 and /dev/md1 were no longer visible.
Console 5 error msg:  Could not stat /dev/VolGroup1/FC6T3 -- no such file or
directory.  The device apparently does not exit.  Did you specify it correctly?

Rebooted and found that my FC6T3 installation was still good.

Will try install again, but this time before clicking the next to last button to
install software will go to console 2 and redo /dev/md1 and VolGroup1 and see if
then the formating will occur normally and the rest of the install succeed.

Comment 1 Clyde E. Kunkel 2006-10-02 19:42:35 UTC
On retry created raid 5 devices and LVs before selecting Create Custom Layout. 
However found that they were disabled in the custom layout screens. 
Ren-enabled, clicked back, back, next and did my standard test layout, all raids
and LVs showing up.  However, just before clicking install, checked on console 2
and all were again disabled, re-enabled, back to anaconda, clicked next and same
error occurred.

FC6T4 is not installable on my system which is an LVM system over software raid
5.  Boot and swap partitions are on standard non-raid, non-LVM ext3 partitions.

Comment 2 Peter Jones 2006-10-03 00:08:05 UTC
Can you attach the files /tmp/anaconda.log and /tmp/syslog that are present when
you notice the problem?

Comment 3 Clyde E. Kunkel 2006-10-03 00:37:52 UTC
OK.  Stand-by.  I did try today's (10/2/06) rawhide with a net install and same
problem occurred.


Comment 4 Clyde E. Kunkel 2006-10-03 01:00:22 UTC
I don't believe what is presently happening.  Going back to the FC6T4 DVD,
anaconda now sees all software raid devices and the install is purring along. 
The only change is that I updated the exisiting FC6T3 system to today's rawhide
before trying to satisfy comment #2.  If the installation of T4 works
successfully, I will try the net install since it originally showed the same
problem.

Maybe something changed the status of /dev/md1 with todays rawhide update that
allowed T4 to correctly identify it????????????

Comment 5 Clyde E. Kunkel 2006-10-03 01:55:29 UTC
Created attachment 137615 [details]
per request in comment #2

From network install of today's (10/2/06) rawhide.  /dev/md1 not seen, that is
where the test LV partition resided.

Comment 6 Clyde E. Kunkel 2006-10-03 01:56:14 UTC
Created attachment 137616 [details]
per request in comment #2

Comment 7 Clyde E. Kunkel 2006-10-03 03:15:53 UTC
There is a difference between a FC6T4 system and a FC6T4+rawhide of 10/2/06
system.  Mdadm --detail reports that different partitions are being used to
makeup the raid 5 arrays depending on the version of FC installed.  When the
system is at a state prior to FC6T4, the installation of FC6T4 wants to use the
configuration in the FC6T4 column and md1 cannot be activated.  If the pre-FC6T4
system is yum updated first to 10/2/06, then the configuration in the next
column is used and FC6T4 installs ok.  The correct configuration is the second
column.  I looked at the UUIDs in both configurations and those were consistent
and as expected.  
           FC6T4       FC6T4+rawhide
/dev/md0  /dev/sda1   /dev/sdc1
          /dev/sdb1   /dev/sdd1
          /dev/sdc1   /dev/sda1

/dev/md1  /dev/sda2   /dev/sdc2
          /dev/sdb2   /dev/sdd2
          /dev/sdd1   /dev/sdb1

/dev/md2  /dev/sda3   /dev/sdc3
          /dev/sdb3   /dev/sdd3

I did not do an lspci and that may have provided some information since I have
noticed that distributions FC, Redhat and variants number partions differently
from Debian, SuSE and Gentoo.  


Comment 8 Clyde E. Kunkel 2006-10-04 04:26:38 UTC
If you do not want additional data on this problem, just let me know.  Tried a
new install today by installing FC6T3 from DVD on a new LV.  Then tried a
network install of today's (10/3/06) rawhide.  Software raid 5 device /dev/md1
was listed as foreign in anaconda and, of course, no LVs that were on it.  Then
yum updated the FC6T3 installation to today's rawhide and tried again, but same
results.  Noticed that the partitions on the SATA drives have been given
different letter names.  lspci from network attempt, syslog and anaconda.log
attached.


Comment 9 Clyde E. Kunkel 2006-10-04 04:28:05 UTC
Created attachment 137717 [details]
lspci output from console 2 of attempted network install from rawhide

Comment 10 Clyde E. Kunkel 2006-10-04 04:29:07 UTC
Created attachment 137718 [details]
anaconda log from attempted rawhide network install

Comment 11 Clyde E. Kunkel 2006-10-04 04:30:09 UTC
Created attachment 137719 [details]
syslog from attempted rawhide network install

Comment 12 Clyde E. Kunkel 2006-10-04 15:45:10 UTC
Network install of rawhide of 10/4/2006 works.  All software raid 5 devices are
seen as LVM PVs.  Partition numbering of SATA drives is correct.  Will continue
to test each daily rawhide as time permits.

Comment 13 Clyde E. Kunkel 2006-10-05 21:46:48 UTC
Today's (10/05/2006) rawhide works fine.

Comment 14 Clyde E. Kunkel 2006-10-08 04:31:34 UTC
Network install of 10/07/06 rawhide fails to detect software raid 5 device /dev/md1.

Comment 15 Clyde E. Kunkel 2006-10-10 17:15:40 UTC
Network install of 10/10/2006 does see all software raid 5 devices and LVs. 
However, may have hit closed bug 209462 (updated with an anaconda dump) AND
noticed that the raid 1 device containing the / LV was degraded when system
booted with a working version of rawhide.  I don't know if there are
relationships here or not.

Comment 16 Clyde E. Kunkel 2006-10-11 16:18:30 UTC
Created attachment 138252 [details]
anaconda log from attempted 10/11/2006 rawhide network install

Comment 17 Clyde E. Kunkel 2006-10-11 16:21:19 UTC
Created attachment 138253 [details]
syslog from attempted rawhide 10/11/2006 network install

Network install of 10/11/2006 failed due to prexisting software raid 5 device
not found, again. Logs attached here and in comment #16.

Comment 18 Clyde E. Kunkel 2006-10-12 15:40:51 UTC
Rawhide of 10/12/2006 does not work either.

Comment 19 Clyde E. Kunkel 2006-10-13 20:08:59 UTC
Rawhide 20061013 network install detected ALL software raid 5 devices.  

However, install failed before any packages could be selected with an exception
that may be related to yum-3.0.5 issues or bug 209462.

Comment 20 Clyde E. Kunkel 2006-10-14 18:02:55 UTC
Failed rawhide 20061014.  Did not detect software raid 5 /dev/md1.

In partitioner, did see each partition correctly as Linux raid autodetect and
did see the correct /dev/sdxx partitions as the raid devices, but listed
/dev/md1 as foreign.  Syslog reports not enough devices for /dev/md1 even tho
they are shown correctly in the partitioner.  Booting a working (rawhide
20061012) installed on 0x83 partitions only, fdisk -l showed all Linux raid
partitions correctly and UUIDs consistent with expected working software raidsets.

During the attempted install on LVs, used back, back, next, next to see if the
partitioner changed its mind.  No change except that the sort sequence of
/dev/md0, /dev/md2, /dev/md1 changed each time as follows:

Try 1     Try 2     Try 3     Try 4     Try 5
/dev/md0  /dev/md0  /dev/md1  /dev/md0  /dev/md2
/dev/md2  /dev/md1  /dev/md2  /dev/md2  /dev/md0
/dev/md1  /dev/md2  /dev/md0  /dev/md1  /dev/md2

Finally, clicked back and selected Use free space on selected.  An exception
occurred when next clicked, log attached at next comment tho this is probably an
unrelated bug as there was no free space available and I was expecting a message
saying so and a graceful opportunity to go back.


Comment 21 Clyde E. Kunkel 2006-10-14 18:05:29 UTC
Created attachment 138516 [details]
anaconda dump subsequent to selecting 'Use free space on selected...'

Probably unrelated to this bug.  If a new bug is desired, please say so I will
gladly enter it.

Comment 22 Clyde E. Kunkel 2006-10-15 12:50:12 UTC
Is this going to be fixed for FC6 final?

Comment 23 Jesse Keating 2006-10-15 17:20:01 UTC
We haven't been able to reproduce this internally in order to debug or fix it. 
If we are able to in the future, a fix will be most likely issued in an
updates.img file that can be used at install time.

Comment 24 Clyde E. Kunkel 2006-10-15 23:19:12 UTC
What additional information can I provide to help?

Comment 25 Clyde E. Kunkel 2006-10-18 05:53:37 UTC
A network install of rawhide 20061017 works.  I will try each new rawhide 
until the final isos are public.

I have no idea what is different except for the kernel and the installer was 
looking for i586 kernels which I haven't synced to my mirror for some time.  I 
resynced for the i586 kernels.  Also, I have been running a FC5 test 
installation on the same box for the last couple of days.

Comment 26 Clyde E. Kunkel 2006-10-20 21:44:15 UTC
Please bear with me on this, I am not sure what it means but and I know all are
busy getting FC6 out the door, but:

Current daily rawhide network installs, a FC6T3 DVD and FC6T4 DVD will ONLY see
all software raid 5 devices during installation attempts when the last OS booted
before the attempted clean install on the test box was FC5 or another distro
like Debian (Sarge or Etch).

I have repeated this test multiple times over the past two days and it holds
consistently true.  To install FC6, I have to first boot FC5 or another distro,
then restart with the FC6 DVD or netinstall CD in the CD-ROM drive. Then FC6
sees all raid arrays and I can perform a normal install.  However, if I try to
then repeat the install, the problem occurs and I have to reboot with FC5 or
Debian in order to repeat the FC6 install.  I have tried from cold iron and the
same holds true.  I have run memtest86 for hours and it is clean.  I see no
error msgs anywhere to indicate hardware probs.

The test box has /dev/hda with multiple 0x83 partitions used as /boot partitions
for different distros and LVs over software raid on SATA drives for the root
volumes.  Swap is on /dev/hda.  The mobo is an ASUS P4C800-Deluxe with a 3GHz
P4, 1 GB RAM and has two 120GB drives on Promise SATA and two 250GB drives on
IBM SATA.  The 120GB drives each contain one partition of type linux raid
autodetect and the two 250 GB drives each have 2 120 GB raid partitions and a
third raid partition of about 10 GB.  I am running multiple LVs over the
software raid 5 and single raid 1 PVs (/dev/md0, md1 and md2).  According to
FC5, ls -l /dev/disk/by-id, the raid partition letter/numbers are as the next
table shows and according to FC6 they are different.  Also, the parititons for
the raid arrays are assembled somewhat differently.  /dev/md1 is always seen as
"foreign" during the failed FC6 installs and correctly as a PV for the
successful installs.

FC5	MD & Dev #	FC6	MD & Dev #
sdb			sdd		
  sdb1	1  2		 sdd1	1  2
sda			sdc
  sda1	0  2		 sdc1	0  2
sdd			sdb
  sdd1	0  1		 sdb1	0  1
  sdd2	1  1		 sdb2	1  1
  sdd3	2  1		 sdb3	2  0
sdc			sda
  sdc1	0  0		 sda1	0  0
  sdc2	1  0		 sda2	1  1
  sdc3	2  0		 sda3	2  0

Something seems to be leaving the disks in a different state depending on which
distro is used.  It seems FC6 is leaving the disks, or maybe the software raid
arrays, in a state that does not permit them to be properly seen during a
subsequent FC6 install attempt.

Thanks for bearing  with me, and please, let me know what else I can do to help
resolve this.

Comment 27 Clyde E. Kunkel 2006-10-23 15:51:54 UTC
If this should be a new bug, let me know, but:

Installing FC6 from DVD (found a mirror that was open) on a different test
system, the SATA disk partitions on a Promise controller are not being
recognized and therefor the software raid devices using those partitions are not
being seen.  FC5 works fine, all SATA partitions seen.  Rawhide works fine, all
SATA paritions seen.  FC4 even works without problems.

What is going on here?

Comment 28 Jesse Keating 2006-10-23 15:58:03 UTC
(In reply to comment #27)
> If this should be a new bug, let me know, but:
> 
> Installing FC6 from DVD (found a mirror that was open) on a different test
> system, the SATA disk partitions on a Promise controller are not being
> recognized and therefor the software raid devices using those partitions are not
> being seen.  FC5 works fine, all SATA partitions seen.  Rawhide works fine, all
> SATA paritions seen.  FC4 even works without problems.
> 
> What is going on here?

This sounds like #197441


Comment 29 Clyde E. Kunkel 2006-10-24 02:49:44 UTC
Re: 197441, not sure.  But, using linux nompath solved the problem on that box.
 Now if I could just get the original system reported here to work.

Comment 30 Clyde E. Kunkel 2006-10-26 17:14:28 UTC
Created attachment 139489 [details]
FC6 Final install syslog after FC5 had been booted

Fedora Core 6 final dvd install exhibits the same behavior of previous
attempts:  if the test system was running FC5 or any non-Fedora distribution
and rebooted into the FC6 DVD, then all sofware raid devices are seen.	If it
was running rawhide, not all devices are seen.	The syslog of install after FC5
boot is attached.  I did not continue the installation past the partitioning
step.  The syslog after running rawhide and booting into FC6 DVD is next.

Comment 31 Clyde E. Kunkel 2006-10-26 17:18:08 UTC
Created attachment 139490 [details]
FC6 Final install syslog after Rawhide had been booted

If the test system was running rawhide (or FC6T3+rawhide or FC6T4+rawhide) and
the FC6 DVD is booted, then /dev/md1 is seen as foreign and not an LVM PV as it
should and thus the VolGroup on it is not seen.  The syslog for this result is
attached.  Did not go beyond the partitioning step.

I looked at both and see some differences that may be significant to those of
you who understand the entries.  I don't and would appreciate any help you can
provide.

Comment 32 Clyde E. Kunkel 2006-11-01 17:43:29 UTC
Problem still exists with Rawhide of 20061101.  Should I open a new bz?

Comment 33 Clyde E. Kunkel 2007-03-16 12:57:38 UTC
Since no one seems to want to do anything with this bz, as the originator I am
closing it.  Still a problem tho.