| Summary: | error detecting raid1 thin pool layout | ||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | [Fedora] Fedora | Reporter: | Marian Csontos <mcsontos> | ||||||||||||||||||||||||||||
| Component: | python-blivet | Assignee: | David Lehman <dlehman> | ||||||||||||||||||||||||||||
| Status: | CLOSED CURRENTRELEASE | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||||||||||||||||||||||||||
| Severity: | unspecified | Docs Contact: | |||||||||||||||||||||||||||||
| Priority: | unspecified | ||||||||||||||||||||||||||||||
| Version: | 20 | CC: | amulhern, anaconda-maint-list, awilliam, bcl, bugzilla, dlehman, g.kaviyarasu, jonathan, mcsontos, mruckman, robatino, vanmeeuwen+fedora | ||||||||||||||||||||||||||||
| Target Milestone: | --- | ||||||||||||||||||||||||||||||
| Target Release: | --- | ||||||||||||||||||||||||||||||
| Hardware: | x86_64 | ||||||||||||||||||||||||||||||
| OS: | Unspecified | ||||||||||||||||||||||||||||||
| Whiteboard: | abrt_hash:c2362866fafe1e510ffec2af673d9d08cc69c248b15b890cbd681e0220ae0788 RejectedBlocker RejectedFreezeException | ||||||||||||||||||||||||||||||
| Fixed In Version: | python-blivet-0.42-1 | Doc Type: | Bug Fix | ||||||||||||||||||||||||||||
| Doc Text: | Story Points: | --- | |||||||||||||||||||||||||||||
| Clone Of: | |||||||||||||||||||||||||||||||
| : | 1029915 (view as bug list) | Environment: | |||||||||||||||||||||||||||||
| Last Closed: | 2014-12-02 19:13:37 UTC | Type: | --- | ||||||||||||||||||||||||||||
| Regression: | --- | Mount Type: | --- | ||||||||||||||||||||||||||||
| Documentation: | --- | CRM: | |||||||||||||||||||||||||||||
| Verified Versions: | Category: | --- | |||||||||||||||||||||||||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||||||||||||||||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||||||||||||||||||||||||
| Bug Depends On: | |||||||||||||||||||||||||||||||
| Bug Blocks: | 1029915 | ||||||||||||||||||||||||||||||
| Attachments: |
|
||||||||||||||||||||||||||||||
Created attachment 815617 [details]
File: anaconda-tb
Created attachment 815618 [details]
File: anaconda.log
Created attachment 815619 [details]
File: environ
Created attachment 815620 [details]
File: ks.cfg
Created attachment 815621 [details]
File: lsblk_output
Created attachment 815622 [details]
File: nmcli_dev_list
Created attachment 815623 [details]
File: os_info
Created attachment 815624 [details]
File: program.log
Created attachment 815625 [details]
File: storage.log
Created attachment 815626 [details]
File: syslog
Created attachment 815627 [details]
File: ifcfg.log
Created attachment 815628 [details]
File: packaging.log
Suggesting as blocker not meeting following requirement: > Guided partitioning > > When using the guided partitioning flow, the installer must be able to: > > - Complete an installation using any combination of disk configuration > options it allows the user to select At least it does not eat data :-) Discussed in 2013-10-24 Go/NoGo meeting [1]. Accepted as a blocker for violating Custom Partitioning beta criterion: "When using the custom partitioning flow, the installer must be able to: [...] Correctly interpret, and modify as described below, any disk with a valid ms-dos or gpt disk label and partition table containing ext4 partitions, LVM and/or btrfs volumes, and/or software RAID arrays at RAID levels 0, 1 and 5 containing ext4 partitions" [1] http://meetbot.fedoraproject.org/meetbot/fedora-meeting-2/2013-10-24/ [2] http://fedoraproject.org/wiki/Fedora_20_Beta_Release_Criteria#Custom_partitioning You created a thin pool with segment type raid1 outside the installer? Yes, I have tried to install Fedora on a machine which already had got a pool on RAID1:
lvcreate -n pool -L 6G -m 2 --type raid1 vg_stacked
lvcreate -n poolmeta -L 256M -m 2 --type raid1 vg_stacked
lvconvert --thinpool vg_stacked/pool --poolmetadata vg_stacked/poolmeta
lvcreate -T -n lv_master --virtualsize 4G vg_stacked/pool
-- Martian
For blocker review purposes: as per the last couple of comments, this is not anaconda falling over its own LVM thinp creation logic or barfing on a layout it created itself on an earlier run. This is anaconda crashing when encountering a very unusual LVM layout that was created by the user outside of anaconda, that anaconda itself is not capable of creating. Looking at the blocker review meeting where this bug was considered - http://meetbot.fedoraproject.org/meetbot/fedora-meeting-2/2013-10-24/f20_beta_gono-go_meeting.2013-10-24-17.01.log.html - I don't think this was fully understood when the bug was accepted as a blocker. So I'm proposing it to be re-discussed. dlehman thinks the criterion cited may be more broad than it was intended to be: it implies that anaconda must be able to read absolutely any valid LVM layout, which is apparently a very wide scope. I believe our intent was more that it should be able to interpret commonly-encountered LVM layouts, or just LVM layouts that it is itself capable of creating. We're going to look at ways of improving that criterion. <dlehman> adamw: if we must support anything lvm can do, holy hell <dlehman> lvm is the emacs of storage <dlehman> I'd be happy to make an explicit list of supported lvm configurations if you'll put it in the criteria <adamw> sure, we can do that <adamw> but anyway, yeah, i think the discussion on that bug was a little off-base, i think they were thinking this was anaconda tripping over itself or at least over a layout it had itself created, not one that had been stuffed in from outside <dlehman> yeah, this is lvs created with non-standard segment types, which anaconda does not do at all Discussed in 2013-10-30 Blocker Review Meeting [1]. Voted a RejectedBlocker. The crash occurs in a very special use case when a special kind of LVM is used. It is not possible to create this LVM layout in anaconda itself, it must be pre-created in advance using other tools. We believe this is not serious enough violation to block Beta. [1] http://meetbot.fedoraproject.org/fedora-blocker-review/2013-10-30/ Sorry to bother again but I would like to ask you to reconsider if not for Beta blocker than for Final blocker.
Rationale:
1. Anaconda UI may not be able to create such a configurations which is already a major drawback but accepted as long as one can create storage layout manually and reread it which is as I understand a supported feature.
But it MUST be much more robust and MUST not blow on meaningful storage configurations or we risk more advanced users will be disqualified from using it which should not be an acceptable limitation and any such limitation must be addressed.
2. This is a configuration seen by lvm-team as one of preferred device stacks.
Using whole VG on a single md-raid device is not only much less flexible but is just wrong as for example for thin-metadata any other RAID level than RAID1 does not make much sense (metadata should be on fastest available storage and redundancy is strongly recommended and RAID{4,5,6} are not good for speed.)
My Conclusion:
As there is no other installation tool anaconda MUST try much harder to cater for everyone not only for low-end.
Created attachment 823462 [details]
Proposed fix
The patch works for me (except it uncovers another systemd/udevd/lvm related issues during system boot.)
I would like to ask you to reevaluate this bug as it is an issue which can not be worked around except by changing disk layout on system. David, could you review the one-line fix, please? (In reply to Marian Csontos from comment #21) > Created attachment 823462 [details] > Proposed fix > > The patch works for me (except it uncovers another systemd/udevd/lvm related > issues during system boot.) Does it also work fine for every other possible permutation of lvm that works without it? I don't expect you to answer that question, but understand that this is a dangerous place to be messing around in order to try to handle one extreme minority case. How is crashing better than skipping the device which is already done for other internal devices? No one except LVM tools should ever touch those internal devices. The good rule of thumb seems to be: If the tool does not understand the device it should better avoid it. Applied: If blivet absolutely needs to work with them it should do so by explicitly naming the devices it understands. Now it is known to not work for at least one combination which may seem like an "extreme minority case" but this case is one of the suggested preferred stacks - see above my comment #20 and if it is rare now we hope it will not stay so and we are effectively limiting where Fedora can be installed. Discussed in 2013-11-14 Blocker Review Meeting [1]. This was voted a RejectedFreezeException. This is fine if it is fixed before Final freeze. But during the freeze we feel that this is a risky patch that might cause different side effects. If developers feel that this should get included during freeze and it's safe, please re-propose [1] http://meetbot.fedoraproject.org/fedora-blocker-review/2013-11-14/ This bug exemplifies why custom partitioning should be decoupled from OS installation. The installer seems to need exceptional capabilities to create/modify/destroy in order to also be able to install to existing storage layouts. The installer is better off being directed at specific volumes to use for installing the OS, rather than being able to create arbitrary layouts with under hood tools that are changing significantly with each Fedora release; it destabilizes the installer because it can't stabilize in the development process until the tools it depends on stabilize first. Decoupling would make the installer more stable, and enables users to create storage layouts for purposes other than installing an OS. It could consume or create metadata to communicate the purpose of each layout component to the installer. I increasingly think it's unfair to expect the anaconda team to support the creation/modification/destruction of arbitrary storage layouts. The complexity of these layouts is increasing, even in basic or default use cases. And by the way, the Windows and OS X installers have extremely limited capability compared to even the anaconda guided partitioning path. If you want esoteric layouts, you have to use a different utility designed for it. And as for the use of LVM integrate RAID, I think that might need to go through the feature process because it's a rather significant change in how raid layouts look and behave, how they're monitored, and the user space tools for recovering from failed devices. I can count on one hand how many people are familiar with LVM integrate RAID. (In reply to David Lehman from comment #23) > (In reply to Marian Csontos from comment #21) > > Created attachment 823462 [details] > > Proposed fix > > > > The patch works for me (except it uncovers another systemd/udevd/lvm related > > issues during system boot.) > > Does it also work fine for every other possible permutation of lvm that > works without it? I don't expect you to answer that question, but understand > that this is a dangerous place to be messing around in order to try to > handle one extreme minority case. David, so far I have tested most of meaningful stacks and have not found any regressions with this patch. Some stacks can not be used for mountpoints - I will open more bugs against LVM2/anaconda for those. I will run more tests during next few days and will post results. (In reply to Chris Murphy from comment #26) > I increasingly think it's unfair to expect the anaconda team to support the > creation/modification/destruction of arbitrary storage layouts. Chris, I am not asking anaconda to do any of above to existing LVs. Just the opposite - I want it to not touch any existing devices. As discussed on a meeting during DevConf 2013 in Brno anaconda UI would be limited aiming at users with no specific LVM knowledge and anyone wanting anything more complex should use either kickstart file or modify the layout on console and just reread storage and assign mount points in anaconda and that is basically all I expect from anaconda. > The > complexity of these layouts is increasing, even in basic or default use > cases. And by the way, the Windows and OS X installers have extremely > limited capability compared to even the anaconda guided partitioning path. > If you want esoteric layouts, you have to use a different utility designed > for it. Why I am screaming this much is we are excluding some users from installing Fedora not because it is limited. There are no other tools I am aware of for installing Fedora (except perhaps running yum install in a chroot) so IMO bugs like this should be treated more seriously. Instead the release is blocking on: a) bugs with known workarounds: - Bug 1027947 - just do not try to resize it or resize it on console b) use cases which IMHO should be better avoided: - Bug 1013586 - resizing NTFS? And without backup? c) caused by user's mistake: - Bug 1027965, Bug 1028367 - just retry and do not do that thing again Lot of them can be closed by documenting them. > > And as for the use of LVM integrate RAID, I think that might need to go > through the feature process because it's a rather significant change in how > raid layouts look and behave, how they're monitored, and the user space > tools for recovering from failed devices. I can count on one hand how many > people are familiar with LVM integrate RAID. Fortunately not many so not many people should run into this bug. Unfortunately not many as the way anaconda stacks VG on md-raid is just wrong as I tried to explain above: (In reply to Marian Csontos from comment #20) > 2. This is a configuration seen by lvm-team as one of preferred device > stacks. > > Using whole VG on a single md-raid device is not only much less flexible but > is just wrong as for example for thin-metadata any other RAID level than > RAID1 does not make much sense (metadata should be on fastest available > storage and redundancy is strongly recommended and RAID{4,5,6} are not good > for speed.) And not only RAID{4,5,6} are bad for speed, there is a serious risk of unrecoverable damage to metadata where on power failure whole stripe may be incorrect. I will attach and maintain an update.img. |
Description of problem: Installing on a system with thin-pool on raid. Crash occured when I selected storage spoke. Anaconda tries to activate private LVM volume pool_tdata. Version-Release number of selected component: anaconda-20.25.1-1 The following was filed automatically by anaconda: anaconda 20.25.1-1 exception report Traceback (most recent call first): File "/usr/lib/python2.7/site-packages/blivet/devicelibs/lvm.py", line 439, in lvactivate raise LVMError("lvactivate failed for %s: %s" % (lv_name, msg)) File "/usr/lib/python2.7/site-packages/blivet/devices.py", line 2684, in _setup lvm.lvactivate(self.vg.name, self._name) File "/usr/lib/python2.7/site-packages/blivet/devices.py", line 718, in setup self._setup(orig=orig) File "/usr/lib/python2.7/site-packages/blivet/devicetree.py", line 1314, in addLV lv_device.setup() File "/usr/lib/python2.7/site-packages/blivet/devicetree.py", line 1337, in handleVgLvs addLV(*lv_data[i]) File "/usr/lib/python2.7/site-packages/blivet/devicetree.py", line 1424, in handleUdevLVMPVFormat self.handleVgLvs(vg_device) File "/usr/lib/python2.7/site-packages/blivet/devicetree.py", line 1711, in handleUdevDeviceFormat self.handleUdevLVMPVFormat(info, device) File "/usr/lib/python2.7/site-packages/blivet/devicetree.py", line 1075, in addUdevDevice self.handleUdevDeviceFormat(info, device) File "/usr/lib/python2.7/site-packages/blivet/devicetree.py", line 1936, in _populate self.addUdevDevice(dev) File "/usr/lib/python2.7/site-packages/blivet/devicetree.py", line 1880, in populate self._populate() File "/usr/lib/python2.7/site-packages/blivet/__init__.py", line 417, in reset self.devicetree.populate(cleanupOnly=cleanupOnly) File "/usr/lib/python2.7/site-packages/blivet/__init__.py", line 144, in storageInitialize storage.reset() File "/usr/lib64/python2.7/threading.py", line 764, in run self.__target(*self.__args, **self.__kwargs) File "/usr/lib64/python2.7/site-packages/pyanaconda/threads.py", line 168, in run threading.Thread.run(self, *args, **kwargs) LVMError: lvactivate failed for [pool_tdata]: running lvm lvchange -a y vg_stacked/[pool_tdata] failed Additional info: cmdline: /usr/bin/python /sbin/anaconda cmdline_file: method=http://10.34.48.241/fedora/linux/development/20/x86_64/os/ ks=http://192.168.144.1/stackerr.f20i.ks executable: /sbin/anaconda hashmarkername: anaconda kernel: 3.11.6-300.fc20.x86_64 product: Fedora release: Cannot get release name. type: anaconda version: 20