Bug 1022810 - error detecting raid1 thin pool layout
error detecting raid1 thin pool layout
Status: CLOSED CURRENTRELEASE
Product: Fedora
Classification: Fedora
Component: python-blivet (Show other bugs)
20
x86_64 Unspecified
unspecified Severity unspecified
: ---
: ---
Assigned To: David Lehman
Fedora Extras Quality Assurance
abrt_hash:c2362866fafe1e510ffec2af673...
:
Depends On:
Blocks: 1029915
  Show dependency treegraph
 
Reported: 2013-10-24 02:05 EDT by Marian Csontos
Modified: 2014-12-02 14:13 EST (History)
12 users (show)

See Also:
Fixed In Version: python-blivet-0.42-1
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 1029915 (view as bug list)
Environment:
Last Closed: 2014-12-02 14:13:37 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
File: anaconda-tb (268.63 KB, text/plain)
2013-10-24 02:05 EDT, Marian Csontos
no flags Details
File: anaconda.log (5.19 KB, text/plain)
2013-10-24 02:05 EDT, Marian Csontos
no flags Details
File: environ (405 bytes, text/plain)
2013-10-24 02:05 EDT, Marian Csontos
no flags Details
File: ks.cfg (4.73 KB, text/plain)
2013-10-24 02:05 EDT, Marian Csontos
no flags Details
File: lsblk_output (10.90 KB, text/plain)
2013-10-24 02:05 EDT, Marian Csontos
no flags Details
File: nmcli_dev_list (4.28 KB, text/plain)
2013-10-24 02:05 EDT, Marian Csontos
no flags Details
File: os_info (291 bytes, text/plain)
2013-10-24 02:05 EDT, Marian Csontos
no flags Details
File: program.log (52.63 KB, text/plain)
2013-10-24 02:05 EDT, Marian Csontos
no flags Details
File: storage.log (103.85 KB, text/plain)
2013-10-24 02:05 EDT, Marian Csontos
no flags Details
File: syslog (99.96 KB, text/plain)
2013-10-24 02:05 EDT, Marian Csontos
no flags Details
File: ifcfg.log (511 bytes, text/plain)
2013-10-24 02:05 EDT, Marian Csontos
no flags Details
File: packaging.log (974 bytes, text/plain)
2013-10-24 02:05 EDT, Marian Csontos
no flags Details
Proposed fix (920 bytes, patch)
2013-11-13 09:04 EST, Marian Csontos
no flags Details | Diff

  None (edit)
Description Marian Csontos 2013-10-24 02:05:03 EDT
Description of problem:
Installing on a system with thin-pool on raid. Crash occured when I selected storage spoke.

Anaconda tries to activate private LVM volume pool_tdata.

Version-Release number of selected component:
anaconda-20.25.1-1

The following was filed automatically by anaconda:
anaconda 20.25.1-1 exception report
Traceback (most recent call first):
  File "/usr/lib/python2.7/site-packages/blivet/devicelibs/lvm.py", line 439, in lvactivate
    raise LVMError("lvactivate failed for %s: %s" % (lv_name, msg))
  File "/usr/lib/python2.7/site-packages/blivet/devices.py", line 2684, in _setup
    lvm.lvactivate(self.vg.name, self._name)
  File "/usr/lib/python2.7/site-packages/blivet/devices.py", line 718, in setup
    self._setup(orig=orig)
  File "/usr/lib/python2.7/site-packages/blivet/devicetree.py", line 1314, in addLV
    lv_device.setup()
  File "/usr/lib/python2.7/site-packages/blivet/devicetree.py", line 1337, in handleVgLvs
    addLV(*lv_data[i])
  File "/usr/lib/python2.7/site-packages/blivet/devicetree.py", line 1424, in handleUdevLVMPVFormat
    self.handleVgLvs(vg_device)
  File "/usr/lib/python2.7/site-packages/blivet/devicetree.py", line 1711, in handleUdevDeviceFormat
    self.handleUdevLVMPVFormat(info, device)
  File "/usr/lib/python2.7/site-packages/blivet/devicetree.py", line 1075, in addUdevDevice
    self.handleUdevDeviceFormat(info, device)
  File "/usr/lib/python2.7/site-packages/blivet/devicetree.py", line 1936, in _populate
    self.addUdevDevice(dev)
  File "/usr/lib/python2.7/site-packages/blivet/devicetree.py", line 1880, in populate
    self._populate()
  File "/usr/lib/python2.7/site-packages/blivet/__init__.py", line 417, in reset
    self.devicetree.populate(cleanupOnly=cleanupOnly)
  File "/usr/lib/python2.7/site-packages/blivet/__init__.py", line 144, in storageInitialize
    storage.reset()
  File "/usr/lib64/python2.7/threading.py", line 764, in run
    self.__target(*self.__args, **self.__kwargs)
  File "/usr/lib64/python2.7/site-packages/pyanaconda/threads.py", line 168, in run
    threading.Thread.run(self, *args, **kwargs)
LVMError: lvactivate failed for [pool_tdata]: running lvm lvchange -a y vg_stacked/[pool_tdata] failed

Additional info:
cmdline:        /usr/bin/python  /sbin/anaconda
cmdline_file:   method=http://10.34.48.241/fedora/linux/development/20/x86_64/os/  ks=http://192.168.144.1/stackerr.f20i.ks
executable:     /sbin/anaconda
hashmarkername: anaconda
kernel:         3.11.6-300.fc20.x86_64
product:        Fedora
release:        Cannot get release name.
type:           anaconda
version:        20
Comment 1 Marian Csontos 2013-10-24 02:05:09 EDT
Created attachment 815617 [details]
File: anaconda-tb
Comment 2 Marian Csontos 2013-10-24 02:05:12 EDT
Created attachment 815618 [details]
File: anaconda.log
Comment 3 Marian Csontos 2013-10-24 02:05:16 EDT
Created attachment 815619 [details]
File: environ
Comment 4 Marian Csontos 2013-10-24 02:05:20 EDT
Created attachment 815620 [details]
File: ks.cfg
Comment 5 Marian Csontos 2013-10-24 02:05:23 EDT
Created attachment 815621 [details]
File: lsblk_output
Comment 6 Marian Csontos 2013-10-24 02:05:27 EDT
Created attachment 815622 [details]
File: nmcli_dev_list
Comment 7 Marian Csontos 2013-10-24 02:05:31 EDT
Created attachment 815623 [details]
File: os_info
Comment 8 Marian Csontos 2013-10-24 02:05:35 EDT
Created attachment 815624 [details]
File: program.log
Comment 9 Marian Csontos 2013-10-24 02:05:38 EDT
Created attachment 815625 [details]
File: storage.log
Comment 10 Marian Csontos 2013-10-24 02:05:42 EDT
Created attachment 815626 [details]
File: syslog
Comment 11 Marian Csontos 2013-10-24 02:05:45 EDT
Created attachment 815627 [details]
File: ifcfg.log
Comment 12 Marian Csontos 2013-10-24 02:05:49 EDT
Created attachment 815628 [details]
File: packaging.log
Comment 13 Marian Csontos 2013-10-24 02:17:55 EDT
Suggesting as blocker not meeting following requirement:

>  Guided partitioning
>
> When using the guided partitioning flow, the installer must be able to:
> 
> - Complete an installation using any combination of disk configuration
>   options it allows the user to select 

At least it does not eat data :-)
Comment 14 Mike Ruckman 2013-10-24 13:31:43 EDT
Discussed in 2013-10-24 Go/NoGo meeting [1]. Accepted as a blocker for violating Custom Partitioning beta criterion: "When using the custom partitioning flow, the installer must be able to: [...] Correctly interpret, and modify as described below, any disk with a valid ms-dos or gpt disk label and partition table containing ext4 partitions, LVM and/or btrfs volumes, and/or software RAID arrays at RAID levels 0, 1 and 5 containing ext4 partitions"

[1] http://meetbot.fedoraproject.org/meetbot/fedora-meeting-2/2013-10-24/
[2] http://fedoraproject.org/wiki/Fedora_20_Beta_Release_Criteria#Custom_partitioning
Comment 15 David Lehman 2013-10-24 15:04:07 EDT
You created a thin pool with segment type raid1 outside the installer?
Comment 16 Marian Csontos 2013-10-25 01:30:08 EDT
Yes, I have tried to install Fedora on a machine which already had got a pool on RAID1:

    lvcreate -n pool -L 6G -m 2 --type raid1 vg_stacked
    lvcreate -n poolmeta -L 256M -m 2 --type raid1 vg_stacked
    lvconvert --thinpool vg_stacked/pool --poolmetadata vg_stacked/poolmeta
    lvcreate -T -n lv_master --virtualsize 4G vg_stacked/pool

-- Martian
Comment 17 Adam Williamson 2013-10-29 12:12:28 EDT
For blocker review purposes: as per the last couple of comments, this is not anaconda falling over its own LVM thinp creation logic or barfing on a layout it created itself on an earlier run. This is anaconda crashing when encountering a very unusual LVM layout that was created by the user outside of anaconda, that anaconda itself is not capable of creating.

Looking at the blocker review meeting where this bug was considered - http://meetbot.fedoraproject.org/meetbot/fedora-meeting-2/2013-10-24/f20_beta_gono-go_meeting.2013-10-24-17.01.log.html - I don't think this was fully understood when the bug was accepted as a blocker. So I'm proposing it to be re-discussed.

dlehman thinks the criterion cited may be more broad than it was intended to be: it implies that anaconda must be able to read absolutely any valid LVM layout, which is apparently a very wide scope. I believe our intent was more that it should be able to interpret commonly-encountered LVM layouts, or just LVM layouts that it is itself capable of creating. We're going to look at ways of improving that criterion.
Comment 18 Adam Williamson 2013-10-29 12:13:52 EDT
<dlehman> adamw: if we must support anything lvm can do, holy hell
<dlehman> lvm is the emacs of storage
<dlehman> I'd be happy to make an explicit list of supported lvm configurations if you'll put it in the criteria
<adamw> sure, we can do that
<adamw> but anyway, yeah, i think the discussion on that bug was a little off-base, i think they were thinking this was anaconda tripping over itself or at least over a layout it had itself created, not one that had been stuffed in from outside
<dlehman> yeah, this is lvs created with non-standard segment types, which anaconda does not do at all
Comment 19 Mike Ruckman 2013-10-30 12:36:36 EDT
Discussed in 2013-10-30 Blocker Review Meeting [1]. Voted a RejectedBlocker. The crash occurs in a very special use case when a special kind of LVM is used. It is not possible to create this LVM layout in anaconda itself, it must be pre-created in advance using other tools. We believe this is not serious enough violation to block Beta.

[1] http://meetbot.fedoraproject.org/fedora-blocker-review/2013-10-30/
Comment 20 Marian Csontos 2013-11-04 12:24:58 EST
Sorry to bother again but I would like to ask you to reconsider if not for Beta blocker than for Final blocker.

Rationale:

1. Anaconda UI may not be able to create such a configurations which is already a major drawback but accepted as long as one can create storage layout manually and reread it which is as I understand a supported feature.

But it MUST be much more robust and MUST not blow on meaningful storage configurations or we risk more advanced users will be disqualified from using it which should not be an acceptable limitation and any such limitation must be addressed.

2. This is a configuration seen by lvm-team as one of preferred device stacks.

Using whole VG on a single md-raid device is not only much less flexible but is just wrong as for example for thin-metadata any other RAID level than RAID1 does not make much sense (metadata should be on fastest available storage and redundancy is strongly recommended and RAID{4,5,6} are not good for speed.)

My Conclusion:

As there is no other installation tool anaconda MUST try much harder to cater for everyone not only for low-end.
Comment 21 Marian Csontos 2013-11-13 09:04:17 EST
Created attachment 823462 [details]
Proposed fix

The patch works for me (except it uncovers another systemd/udevd/lvm related issues during system boot.)
Comment 22 Marian Csontos 2013-11-13 09:10:30 EST
I would like to ask you to reevaluate this bug as it is an issue which can not be worked around except by changing disk layout on system.

David, could you review the one-line fix, please?
Comment 23 David Lehman 2013-11-13 10:44:22 EST
(In reply to Marian Csontos from comment #21)
> Created attachment 823462 [details]
> Proposed fix
> 
> The patch works for me (except it uncovers another systemd/udevd/lvm related
> issues during system boot.)

Does it also work fine for every other possible permutation of lvm that works without it? I don't expect you to answer that question, but understand that this is a dangerous place to be messing around in order to try to handle one extreme minority case.
Comment 24 Marian Csontos 2013-11-13 11:29:42 EST
How is crashing better than skipping the device which is already done for other internal devices?

No one except LVM tools should ever touch those internal devices.

The good rule of thumb seems to be: If the tool does not understand the device it should better avoid it.

Applied: If blivet absolutely needs to work with them it should do so by explicitly naming the devices it understands.

Now it is known to not work for at least one combination which may seem like an "extreme minority case" but this case is one of the suggested preferred stacks - see above my comment #20 and if it is rare now we hope it will not stay so and we are effectively limiting where Fedora can be installed.
Comment 25 Mike Ruckman 2013-11-14 13:35:32 EST
Discussed in 2013-11-14 Blocker Review Meeting [1]. This was voted a RejectedFreezeException. This is fine if it is fixed before Final freeze. But during the freeze we feel that this is a risky patch that might cause different side effects. If developers feel that this should get included during freeze and it's safe, please re-propose

[1] http://meetbot.fedoraproject.org/fedora-blocker-review/2013-11-14/
Comment 26 Chris Murphy 2013-11-14 17:07:56 EST
This bug exemplifies why custom partitioning should be decoupled from OS installation. The installer seems to need exceptional capabilities to create/modify/destroy in order to also be able to install to existing storage layouts.

The installer is better off being directed at specific volumes to use for installing the OS, rather than being able to create arbitrary layouts with under hood tools that are changing significantly with each Fedora release; it destabilizes the installer because it can't stabilize in the development process until the tools it depends on stabilize first.

Decoupling would make the installer more stable, and enables users to create storage layouts for purposes other than installing an OS. It could consume or create metadata to communicate the purpose of each layout component to the installer.

I increasingly think it's unfair to expect the anaconda team to support the creation/modification/destruction of arbitrary storage layouts. The complexity of these layouts is increasing, even in basic or default use cases. And by the way, the Windows and OS X installers have extremely limited capability compared to even the anaconda guided partitioning path. If you want esoteric layouts, you have to use a different utility designed for it.

And as for the use of LVM integrate RAID, I think that might need to go through the feature process because it's a rather significant change in how raid layouts look and behave, how they're monitored, and the user space tools for recovering from failed devices. I can count on one hand how many people are familiar with LVM integrate RAID.
Comment 27 Marian Csontos 2013-11-26 11:43:52 EST
(In reply to David Lehman from comment #23)
> (In reply to Marian Csontos from comment #21)
> > Created attachment 823462 [details]
> > Proposed fix
> > 
> > The patch works for me (except it uncovers another systemd/udevd/lvm related
> > issues during system boot.)
> 
> Does it also work fine for every other possible permutation of lvm that
> works without it? I don't expect you to answer that question, but understand
> that this is a dangerous place to be messing around in order to try to
> handle one extreme minority case.

David, so far I have tested most of meaningful stacks and have not found any regressions with this patch. Some stacks can not be used for mountpoints - I will open more bugs against LVM2/anaconda for those.

I will run more tests during next few days and will post results.

(In reply to Chris Murphy from comment #26)
> I increasingly think it's unfair to expect the anaconda team to support the
> creation/modification/destruction of arbitrary storage layouts.

Chris, I am not asking anaconda to do any of above to existing LVs. Just the opposite - I want it to not touch any existing devices.

As discussed on a meeting during DevConf 2013 in Brno anaconda UI would be limited aiming at users with no specific LVM knowledge and anyone wanting anything more complex should use either kickstart file or modify the layout on console and just reread storage and assign mount points in anaconda and that is basically all I expect from anaconda.

> The
> complexity of these layouts is increasing, even in basic or default use
> cases. And by the way, the Windows and OS X installers have extremely
> limited capability compared to even the anaconda guided partitioning path.
> If you want esoteric layouts, you have to use a different utility designed
> for it.

Why I am screaming this much is we are excluding some users from installing Fedora not because it is limited.

There are no other tools I am aware of for installing Fedora (except perhaps running yum install in a chroot) so IMO bugs like this should be treated more seriously.

Instead the release is blocking on:

a) bugs with known workarounds:
  - Bug 1027947 - just do not try to resize it or resize it on console
b) use cases which IMHO should be better avoided:
  - Bug 1013586 - resizing NTFS? And without backup?
c) caused by user's mistake:
  - Bug 1027965, Bug 1028367 - just retry and do not do that thing again

Lot of them can be closed by documenting them.

> 
> And as for the use of LVM integrate RAID, I think that might need to go
> through the feature process because it's a rather significant change in how
> raid layouts look and behave, how they're monitored, and the user space
> tools for recovering from failed devices. I can count on one hand how many
> people are familiar with LVM integrate RAID.

Fortunately not many so not many people should run into this bug.

Unfortunately not many as the way anaconda stacks VG on md-raid is just wrong as I tried to explain above:

(In reply to Marian Csontos from comment #20)
> 2. This is a configuration seen by lvm-team as one of preferred device
> stacks.
> 
> Using whole VG on a single md-raid device is not only much less flexible but
> is just wrong as for example for thin-metadata any other RAID level than
> RAID1 does not make much sense (metadata should be on fastest available
> storage and redundancy is strongly recommended and RAID{4,5,6} are not good
> for speed.)

And not only RAID{4,5,6} are bad for speed, there is a serious risk of unrecoverable damage to metadata where on power failure whole stripe may be incorrect.

I will attach and maintain an update.img.

Note You need to log in before you can comment on or make changes to this bug.