Bug 2247872 - Don't write /etc/lvm/devices/system.devices when not doing an end-user install
Summary: Don't write /etc/lvm/devices/system.devices when not doing an end-user install
Keywords:
Status: ON_QA
Alias: None
Product: Fedora
Classification: Fedora
Component: anaconda
Version: 39
Hardware: All
OS: Linux
unspecified
high
Target Milestone: ---
Assignee: Vojtech Trefny
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard: https://discussion.fedoraproject.org...
Depends On:
Blocks: F40FinalFreezeException, FinalFreezeException
TreeView+ depends on / blocked
 
Reported: 2023-11-04 00:07 UTC by Adam Williamson
Modified: 2024-04-16 13:25 UTC (History)
14 users (show)

Fixed In Version: anaconda-40.22.3-1.fc40
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:
Type: Bug
Embargoed:


Attachments (Terms of Use)

Description Adam Williamson 2023-11-04 00:07:15 UTC
Since 9cccada80d21d30b4b4adc8919e278d7dbc316d1 , anaconda writes a /etc/lvm/devices/system.devices file when PVs are present in the install.

This is a problem when we're building disk images - specifically, the Server disk image - because the disk image winds up with this file present, referring to the backing device by whatever name it had on the builder, e.g.:

# LVM uses devices listed in this file.
# Created by LVM command lvmdevices pid 3077 at Thu Oct 26 04:02:32 2023
VERSION=1.1.4
IDTYPE=devname IDNAME=/dev/vda3 DEVNAME=/dev/vda3 PVID=elqwwdUQpHfHhxMiZvaYfiJUhyU83Vr2 PART=3

...but the device may well not be '/dev/vda3' when you come to boot the generated disk image on some other piece of hardware (in fact it almost certainly *won't* be on any physical system, as 'vdX' is reserved for virtual storage devices). This seems to cause problems with many LVM operations you might attempt. Probably this file should not be written unless this is an "end user" install (i.e. it should not be written if we're building a disk or live image).

See https://bugzilla.redhat.com/show_bug.cgi?id=2246871 for some earlier discussion of this, I am splitting it out into a separate bug report for clarity, and nominating it for CommonBugs status.

I suspect this affects F38 and earlier too, but we noticed it with F39 (pboy, would be interesting if you could check if this was also the case with earlier releases).

Comment 1 Kamil Páral 2023-11-06 12:02:04 UTC
Nominating for a F40 blocker discussion.

Comment 2 David Teigland 2023-11-06 14:25:10 UTC
OS images should not include /etc/lvm/devices/system.devices.  It is specific to the hardware of the system.

Ideally, image installers will have methods to generate system.devices after a system has been installed (e.g. run "vgimportdevices -a" after install.)  LVM will run fine without a devices file, but will be missing the advantages it provides.  In the future we might look at having lvm itself detect a newly installed system and generate a local system.devices itself.

Comment 3 Peter Robinson 2023-11-06 14:43:09 UTC
Maybe having a oneshot systemd script with a ConditionFirstBoot would allow it to check for /etc/lvm/devices/system.devices and do the bits needed?

Comment 4 David Teigland 2023-11-06 14:51:51 UTC
(In reply to Peter Robinson from comment #3)
> Maybe having a oneshot systemd script with a ConditionFirstBoot would allow
> it to check for /etc/lvm/devices/system.devices and do the bits needed?

I'll take a look at that, it sounds like it might work.  Thanks for the suggestion.

Comment 5 Peter Boy 2023-11-07 10:43:14 UTC
(In reply to Adam Williamson from comment #0)
> Since 9cccada80d21d30b4b4adc8919e278d7dbc316d1 , anaconda writes a
> /etc/lvm/devices/system.devices file when PVs are present in the install.
... 
> See https://bugzilla.redhat.com/show_bug.cgi?id=2246871 for some earlier
> discussion of this, I am splitting it out into a separate bug report for
> clarity, and nominating it for CommonBugs status.
> 
> I suspect this affects F38 and earlier too, but we noticed it with F39
> (pboy, would be interesting if you could check if this was also the case
> with earlier releases).

I checked F38 and F37. In those distribution images the directory /etc/lvm/devices is empty. So we didn't have that issue there. According to my finding, while "normally" using the system, a system.devices file was never created during system operation. So we always missed the advantages David mentioned in #2.

I also checked again the libvirt x86 VM images. It contains the system.devices file, too. But it causes no issue because a VM usually uses a vda3. 

So, the question may be, what made Anaconda to leave the /etc/lvm/devices directory for aarch54 empty in F37/38, but not in F39. And why created the file with x86.  



Furthermore, in F38 using arm-image-installer 3.8, it recognized the existence for a VG of the server system with the same name as in the SBC image and announced it would be renamed to fedora-server. But it did not, the name is still fedora. That caused no issue as far as I know, neither in generating the mSD card nor during the system operations using the mSD.

But if you try to install the system from a system booted from a mSD card, to an onboard eMMC storage, the installation fails not because VGs with the same name, but 2 partitions with the same UUID. But that's probably off-topic here and worth a separate bug.

Comment 6 Vojtech Trefny 2023-11-15 10:32:03 UTC
Proposed fixes for Anaconda and Blivet:
 - https://github.com/rhinstaller/anaconda/pull/5325
 - https://github.com/storaged-project/blivet/pull/1169/

With this we won't create the devices file when running image install. If we want a oneshot service to create it during the first boot, I think it should be handled by LVM, not by the installer.

Comment 7 Peter Boy 2024-01-31 14:05:35 UTC
I think these solutions fix a follow-up problem, the root cause is somewhere else, and it doesn't fix the broader problem. 

See #2258764 

The LVM group made an unannounced change in F38 that modified the behavior of vgscan and vgchange. It changes in lvm.conf option  'use_devicesfile = 1'  (previously 0). 

Therefore, both commands now work differently for new VGs and limit the search to devices listed in the /etc/lvm/devices/system.devices file. Unfortunately, this renders both commands more or less useless for practical LVM administration tasks. If you want to integrate a new VG into a running system, it will most likely be on a new device.

It is therefore probably better to revert to the previous state.

The LVM Group does not appear to be keen to do this, and has not yet specified any reason why this change has been made.

Comment 8 Peter Boy 2024-02-07 18:05:32 UTC
Just an additional proposal:

A complete solution would include

1. Remove the device file from the image, because as mentioned in #2: OS images should not include /etc/lvm/devices/system.devices. It is specific to the hardware of the system.

2. Create the device file specifically for the installed hardware at first system boot, because upstream introduced that feature and we miss features otherwise (see #2)


3. Resolve the vgscan/vgchange issue by

* either change the package and set use_devicesfile = 0 in lvm.conf, if LVM group is happy with this
* or change lvm.conf just for server, unless LVM group does advise against it
* or rewrite various scripts, specifically arm-imange-installer to use vgimportdevices -a before any other LVM commands, as mentioned in Bug #2258764   #3 by Paul Howarth  and reset the device file afterward, as well as adjust a lot of our documentation, because it is missleading or incomplete now.

Comment 9 Peter Boy 2024-02-07 20:25:16 UTC
David Teigland provided additional information in another thread I would like to share here: 

===
use_devicesfile=1 is meant to be the standard way of using lvm for years now, and it should have been that way in fedora at least a couple years ago.

The devices file feature was introduced to lvm primarily as an "opt in" mechanism for using devices with lvm.  Previously, lvm has always had an "opt out" approach in which devices needed to rejected with the filter to stop lvm from using them.  Over the past several years, it's become increasingly likely that lvm devices attached to a host no longer belong to the host and it's not safe for the host to assume it can use them. e.g. lvm devices are quite likely to belong to a guest VM, and there were many instances of hosts using and corrupting lvm devices that were in use by a guest VM.  Similar problems exist with machines connected to a SAN.
===

So it's probably best not to permanently change LVM configuration nor permanently remove the device file.

Comment 10 Peter Boy 2024-02-07 21:35:04 UTC
Sorry for this "additional addition"

David has kindly provided further information and suggestions (see https://bugzilla.redhat.com/show_bug.cgi?id=2258764#c10). 

Accordingly, points 1. and 2. from #8 (and the already proposed PR) make sense, and the third alternative from 3.

In arm-image-installer, either "--devices /dev/foo" would have to be added to the LVM commands or the device would have to be added to device file at the beginning and then removed at the end (lvmdevices --adddev|deldev  /dev/foo). 

So I think after a longer round trip we end up with an adaptation of the Anaconda image generation quasi as a side effect of the original issue and a modification of the arm-image-installer script as assumed at the beginning.

Comment 11 Geoffrey Marr 2024-02-13 16:20:57 UTC
Discussed during the 2024-02-12 blocker review meeting: [0]

The decision to delay the classification of this as a blocker bug was made as pboy is working through the plan and the implications here, we will delay the decision for a bit so we have a clearer picture.

[0] https://meetbot.fedoraproject.org/blocker-review_matrix_fedoraproject-org/2024-02-12/f40-blocker-review.2024-02-12-17.05.txt

Comment 12 Paul Whalen 2024-02-20 20:51:20 UTC
We could `rm /etc/lvm/devices/system.devices` in %post of the kickstart used to create the image. 

If the file is included, LVM commands don't work as expected.

Comment 13 Peter Boy 2024-02-21 08:28:20 UTC
@Vojtech Trefny 

>Proposed fixes for Anaconda and Blivet:
> - https://github.com/rhinstaller/anaconda/pull/5325
> - https://github.com/storaged-project/blivet/pull/1169/

What is the current status here? As far as I see it has been merged (Jan10), but with  Fedora 40 Branched 20240219  the image file still includes the devices file.

Comment 14 Peter Boy 2024-02-21 09:10:24 UTC
@pwhalen

>If the file is included, LVM commands don't work as expected.

Well, you overlook that the "work as expected" has changed!

With version F39 the new "work as expected" is that the vg* commands look into the devices file and work only on those devices listed therein. If you connect a new device and want to have it included, you have 
either to add it on purpose to the devices file (opt in) using 'lvmdevices --adddev  /dev/foo'
or instruct each vg* command to use additional devices using the command line option --devices /dev/foo

See my description in #9 and #10 or - if you prefer to read the original and not just my citation - you may check https://bugzilla.redhat.com/show_bug.cgi?id=2258764, specifically #8, #9, #10

Therefore, we have to adjust the images *and* the arm-image-installer script to the new system-wide way LVM works. We missed this for F39 because unfortunately this change was discussed neither on the devel mailing list nor in a change proposal (see https://bugzilla.redhat.com/show_bug.cgi?id=2258764#c4).

Comment 15 Peter Boy 2024-02-21 09:19:58 UTC
@pwhalen: As an addendum: During our entire discussion last year about the F39 installation problem, we were "on the wrong side of the fence" the whole time because we didn't know about that change. Nearly all the time and efforts we spent with Bug https://bugzilla.redhat.com/show_bug.cgi?id=2246871 was for nothing. And all the arguments and considerations we made there turned out to be incorrect and completely beside the point.  We now have to rethink this.

Comment 16 Vojtech Trefny 2024-02-21 09:27:41 UTC
(In reply to Peter Boy from comment #13)
> @Vojtech Trefny 
> 
> >Proposed fixes for Anaconda and Blivet:
> > - https://github.com/rhinstaller/anaconda/pull/5325
> > - https://github.com/storaged-project/blivet/pull/1169/
> 
> What is the current status here? As far as I see it has been merged (Jan10),
> but with  Fedora 40 Branched 20240219  the image file still includes the
> devices file.

I forgot there is one more place where the LVM devices file is being written be Anaconda: https://github.com/rhinstaller/anaconda/pull/5484

Comment 17 Paul Whalen 2024-02-21 15:44:54 UTC
(In reply to Peter Boy from comment #14)
> @pwhalen
> 
> >If the file is included, LVM commands don't work as expected.
> 
> Well, you overlook that the "work as expected" has changed!

If the file is deleted, functionality returns to what was in previous releases. 

> 
> Therefore, we have to adjust the images *and* the arm-image-installer script
> to the new system-wide way LVM works. We missed this for F39 because
> unfortunately this change was discussed neither on the devel mailing list
> nor in a change proposal (see
> https://bugzilla.redhat.com/show_bug.cgi?id=2258764#c4).

Let's keep this BZ focused on LVM. Please open another for any issues encountered with the arm-image-installer.

Comment 18 Peter Boy 2024-02-21 16:51:05 UTC
@Vojtech Trefny: Just a question: Does the Anaconda image handling capability allow configuring the image to issue something as "vgimportdevices -a" at first boot on the target system? That would ensure we get the same configuration for systems installed with ISO file and installed using an image file. I would like to get a consistent installation result across the various installation methods, specifically for Fedora Server Edition. Affected would be the aarch64 installation iamge we are talking about all the time here and the KVM image.

Comment 19 David Teigland 2024-02-21 17:41:42 UTC
I'm curious about any advice about the current best practice for a first-boot, one-shot service that would run vgimportdevices -a.  Googling found various examples of this sort of thing, but most were fairly old, and possibly outdated.

I'm also thinking about a variation of vgimportdevices to run here, which would basically be "vgimportdevices <rootvg>", and only import LVM devices for the root VG, rather than everything.  We don't really know if other VGs, which happen to be attached and visible during first boot, are truely safe for the host to be using.  So, it would be safest to import only the root VG, and require the admin to decide themselves which other VGs the host should have access to.  That said, anaconda does run vgimportdevices -a during install, which is going to help in cases where the user does want to access other existing VGs after install.

Comment 20 Peter Boy 2024-02-21 23:55:25 UTC
> Let's keep this BZ focused on LVM. Please open another for any issues encountered with the arm-image-installer.

Created: New system-wide LVM configuration requires adaptation of arm-image-installer (https://bugzilla.redhat.com/show_bug.cgi?id=2265422)

Comment 21 Geoffrey Marr 2024-03-05 04:23:10 UTC
Discussed during the 2024-03-04 blocker review meeting: [0]

The decision to classify this bug as a "RejectedBlocker (Final)" and an "AcceptedFreezeException (Final)" was made as this cannot block the release as it affects the qcow2 image only, which is not in the release-blocking list for Fedora 40. As it's a significant issue in a non-blocking image, we grant it a freeze exception.

[0] https://meetbot.fedoraproject.org/blocker-review_matrix_fedoraproject-org/2024-03-04/f40-blocker-review.2024-03-04-17.00.log.txt

Comment 22 Peter Boy 2024-03-27 11:30:38 UTC
The release Beta 1.10 still contains the wrong devices file in bpth images: 
* Fedora-Server-KVM-40_Beta-1.10.x86_64.qcow2
* Fedora-Server-40_Beta-1.10.aarch64.raw.xz

We have a FE for this bug, so it could still get fixed for the final release. Any chance? 

> I'm curious about any advice about the current best practice for a first-boot, one-shot service that would run vgimportdevices -a.  Googling found various examples of this sort of thing, but most were fairly old, and possibly outdated.

In a recent posting Stephen Gallagher (sgallagh) referred to https://docs.fedoraproject.org/en-US/packaging-guidelines/Initial_Service_Setup/ as the Fedora way to handle this.


> I'm also thinking about a variation of vgimportdevices to run here, which would basically be "vgimportdevices <rootvg>", and only import LVM devices for the root VG, rather than everything.  We don't really know if other VGs, which happen to be attached and visible during first boot, are truely safe for the host to be using. 

That would be the very conservative and very cautious approach. I think the VGs that are configured during the installation or during the generation of an image should be sufficiently safe to include.

Comment 23 Vojtech Trefny 2024-03-28 13:46:14 UTC
> We have a FE for this bug, so it could still get fixed for the final release. Any chance? 

The Anaconda part of the fix is included in 40.22.3 which was released few days ago  so this should be fixed in the next compose

Comment 24 Adam Williamson 2024-04-10 01:11:15 UTC
Peter, can you confirm if this is good now? Thanks!

Comment 25 Peter Boy 2024-04-10 06:09:32 UTC
I checked with branched 2024-04-06. Both images, still, contain the file /etc/lvm/devices/system.devices with the content of /dev/vda3. 

But if I dd the Fedora-Server-40-20240406.n.0.aarch64.raw.xz directly to a box containing tow-boot SPI (so it can boot the root filesystem directly w/o modifications by arm-image-installer) the system.devices file is obviously changed at first boot and contains the correct partition entries. 

I haven't checked this with the KVM image, yet. 

At the end, it looks like a successful fix here, different from the early planning to remove a system.devices file for images. But I'm not sure. Would be helpful to get some information from the maintainers.

Comment 26 Peter Boy 2024-04-10 20:22:11 UTC
Tested with rc 1.12

* Both Server image files (Fedora-Server-KVM-40-1.12.x86_64.qcow2 & Fedora-Server-40-1.12.aarch64.raw.xz) still contain a devices file in /etc/lvm/devices/system.devices, which refers to /dev/vda3, obviously from the build host.

* If you leave this unchanged, the devices file in the KVM image will be left as /dev/vda3, which is correct for a libvirt KVM, and for the ARM SBC device correctly changed to mmcblk1p3  

* If you manually delete the devices file in each of the images, then after the first boot the /etc/lvm/devices subdirectory in both images  remains empty. 


So, the fix provides a correct result, albeit differently than originally thought. 

Nevertheless, we need information on the rules according to which the correction is made in order to be able to document this in the documentation and in the release notes.

Comment 27 Vojtech Trefny 2024-04-16 13:25:49 UTC
I am glad it works now, but unfortunately not thanks to the changes that I made. I just found the fixes for Anaconda and Blivet doesn't work in this situation at all -- the LVM devices feature is skipped only during image installation and the server images are not installed this way. As far as I can tell (at least for the KVM image), it's just a normal installation so we won't be possible to tell that we need to skip writing /etc/lvm/devices/system.devices in this case so this needs to be solved in a different way, for example in post script in kickstart.


Note You need to log in before you can comment on or make changes to this bug.