Bug 847547

Summary: installer should zero out all newly created devices to remove spurious metadata
Product: [Fedora] Fedora Reporter: Frederick Roeber <bugzilla>
Component: anacondaAssignee: David Lehman <dlehman>
Status: CLOSED WONTFIX QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 17CC: anaconda-maint-list, g.kaviyarasu, jonathan, vanmeeuwen+fedora
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-08-01 17:01:18 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Example kickstart file none

Description Frederick Roeber 2012-08-12 21:34:15 UTC
Created attachment 603827 [details]
Example kickstart file

Description of problem:

My kickstart file creates two partitions on each of two identical disks.  The smaller partitions are raid-1'd together to become the unencrypted /boot directory.  The larger ones are raid-1'd together to become the physical volume from which logical volumes are allocated.

If the physical volume is not encrypted, the install works fine.  

If it is encrypted, though, I get a "CryptoError: luks_format failed for /dev/md0" error.  Looking further, I see that not all of the physical partitions were created.

Same kickstart file works with F16.

If I pre-partition the disks, and just use "--onpart" in the kickstart file, it works with encryption.


Version-Release number of selected component (if applicable):

This is the pure F17 dvd; anaconda version 17.29.

How reproducible:

Every time I try to install.

Steps to Reproduce:

My trimmed-down example kickstart file is attached.  In short:

|zerombr
|clearpart --all --drives=sda,sdb
|
|part raid.1   --ondisk=/dev/sda  --size=500
|part raid.2   --ondisk=/dev/sdb  --size=500
|part raid.3   --ondisk=/dev/sda  --size=304744
|part raid.4   --ondisk=/dev/sdb  --size=304744
|
|raid /boot --fstype=ext4              --level=1 --device=md1 raid.1 raid.2
|raid pv.1  --encrypted --passphrase=p --level=1 --device=md0 raid.3 raid.4

This fails.  Remove the "--encrypted --passphrase=p" and it works.  If, after installing on an unencrypted system, I re-install with

|part raid.1   --onpart=/dev/sda2
|part raid.2   --onpart=/dev/sdb2
|part raid.3   --onpart=/dev/sda1
|part raid.4   --onpart=/dev/sdb1
|
|raid /boot --fstype=ext4              --level=1 --device=md1 raid.1 raid.2
|raid pv.1  --encrypted --passphrase=p --level=1 --device=md0 raid.3 raid.4

Then this works.

(It makes no difference if /boot is on raid, as above, or if /boot is directly on a single partition and /altboot is on the other.)

If I do an interactive (graphical) install, then creating the above custom layout works.

Actual results:

"CryptoError: luks_format failed for /dev/md0"

Further, using parted to print the partition tables, I see that one disk has only one partition created.  This is probably why the /dev/md0 is unhappy.

Expected results:

It should install properly.

Additional info:

If this doesn't ring a bell, let me know and I can start attaching logfiles.

Comment 1 Frederick Roeber 2012-10-01 22:55:33 UTC
The missing partition is a red herring; it's only not there because the action-sorting step put its creation after the luksFormat step.

I'm now trying to figure out how to create an update.img with a crypto.py that implements dolog...

Comment 2 Frederick Roeber 2012-10-03 03:24:18 UTC
Okay, got it.  (Sort of.)

I now see that a relevant point is that I've been working on creating some automatic kickstart files, the testing of which involves re-installing over and over with substantially similar configurations.  In particular, the partitioning is identical from one instance to the next.

So when the new partitions are created, they pick up exactly what was there before, so the raid-1 image then also has exactly what it had before, which was a LUKS image.  The cryptsetup code, down below _crypt_format_luks1() somewhere, sees the existing LUKS image and returns an EINVAL.

The answer is to wipe the LUKS image first.  I ended up creating this script:

#!/bin/sh
echo "scanning md devices..."
for d in /dev/md*; do
  [[ -b ${d} ]] || continue
  h=$(dd if=${d} of=/dev/stdout bs=4 count=1 2>/dev/null)
  if [[ "${h}" == "LUKS" ]]; then
    echo "Device ${d} looks suspicious"
    dd if=/dev/zero of=${d} bs=1024 count=1 2>/dev/null
  fi
done
echo "... scan done"

and calling it from a %pre section.  This is a crude hammer -- it'll wipe whatever LUKS systems it finds -- but it works for my autoinstalls.


What I don't get is why this doesn't come up in other cases, like a non-raided simple partition.  But I have another challenge to go on to...


I'm leaving this 'NEW' because the work-around is just that.  A better solution might be to add another storage 'action' between 'Create Device mdarray md<N>' and 'Create Format luks on mdarray md<N>' which just zeroes a page as above.

Comment 3 David Lehman 2012-10-04 15:15:11 UTC
We actually do this exact thing for new partitions, but not for other new devices. We should be doing it regardless of device type.

Comment 4 Fedora End Of Life 2013-07-04 05:47:29 UTC
This message is a reminder that Fedora 17 is nearing its end of life.
Approximately 4 (four) weeks from now Fedora will stop maintaining
and issuing updates for Fedora 17. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '17'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 17's end of life.

Bug Reporter:  Thank you for reporting this issue and we are sorry that 
we may not be able to fix it before Fedora 17 is end of life. If you 
would still like  to see this bug fixed and are able to reproduce it 
against a later version  of Fedora, you are encouraged  change the 
'version' to a later Fedora version prior to Fedora 17's end of life.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 5 Fedora End Of Life 2013-08-01 17:01:25 UTC
Fedora 17 changed to end-of-life (EOL) status on 2013-07-30. Fedora 17 is 
no longer maintained, which means that it will not receive any further 
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of 
Fedora please feel free to reopen this bug against that version.

Thank you for reporting this bug and we are sorry it could not be fixed.