Bug 2208039
| Summary: | lvm handles trailing spaces in wwids differently in 9.2 | ||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 9 | Reporter: | c.stackpole | ||||||||||||||
| Component: | lvm2 | Assignee: | David Teigland <teigland> | ||||||||||||||
| lvm2 sub component: | Storage | QA Contact: | cluster-qe <cluster-qe> | ||||||||||||||
| Status: | CLOSED ERRATA | Docs Contact: | |||||||||||||||
| Severity: | high | ||||||||||||||||
| Priority: | high | CC: | agk, cmarthal, heinzm, jbrassow, mcsontos, mihai, mikel, msnitzer, prajnoha, sisatia, teigland, toracat, zkabelac | ||||||||||||||
| Version: | 9.2 | Keywords: | Triaged | ||||||||||||||
| Target Milestone: | rc | Flags: | pm-rhel:
mirror+
|
||||||||||||||
| Target Release: | --- | ||||||||||||||||
| Hardware: | x86_64 | ||||||||||||||||
| OS: | Linux | ||||||||||||||||
| Whiteboard: | |||||||||||||||||
| Fixed In Version: | lvm2-2.03.21-3.el9 | Doc Type: | If docs needed, set a value | ||||||||||||||
| Doc Text: | Story Points: | --- | |||||||||||||||
| Clone Of: | Environment: | ||||||||||||||||
| Last Closed: | 2023-11-07 08:53:33 UTC | Type: | Bug | ||||||||||||||
| Regression: | --- | Mount Type: | --- | ||||||||||||||
| Documentation: | --- | CRM: | |||||||||||||||
| Verified Versions: | Category: | --- | |||||||||||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||||||||||
| Embargoed: | |||||||||||||||||
| Attachments: |
|
||||||||||||||||
Created attachment 1965215 [details]
Second - no var/home mounts, but swap is there
Created attachment 1965216 [details]
Third - diff between system.devices and vgscan
Created attachment 1965217 [details]
Fourth - delete system.devices as workaround
Could you show the contents of /etc/lvm/devices/system.devices in the 9.1 system, then after upgrading to 9.2 recreate system.devices (using vgimportdevices -a) and show the new contents of the file created from 9.2? This might be related to the repeating underscores in the device id. In 9.2 lvm adopted the more traditional approach of condensing repeated underscores to a single underscore, while still working with the form of repeating underscores that had been used in 9.1. Created attachment 1965242 [details] Fifth - Differences when regenerating the file https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/9/html/configuring_and_managing_logical_volumes/limiting-lvm-device-visibility-and-usage_configuring-and-managing-logical-volumes Using the above as a guideline, I deleted the old, rebooted into 9.2, then regenerated the file with lvmdevices per the documentation. See the fifth screenshot for differences. I reset the VM back to 9.1 and manually removed the excess _'s from the /etc/lvm/devices/system.devices file - reboot into 9.2 worked as expected. (it's also weird that 9.1 has VERSION=1.1.2 but 9.2 has VERSION=1.1.1 but that makes no difference at all in my testing. I suspect that there's a script that removed the excess _'s but doesn't account for them if they are there from a previous version. I wonder if the VERSION change might be related to a regression? I don't know - maybe it's just weird but it is a difference. Thanks for the response David! Sorry I didn't see it until I posted above. Looks like we are both on the same path and I agree with your hunch after my recent testing. I suspect it's the underscore at the end of the 9.1 wwid causing the problem. There is probably a space or control character in that position which 9.2 is ignoring and which 9.1 is converting to an underscore. That would require a fix in 9.2 to cope with the value created in 9.1. We could confirm this with the output of: hexdump -C /sys/block/sda/device/wwid hexdump -C /sys/block/sda/device/vpd_pg80 hexdump -C /sys/block/sda/device/vpd_pg83 from 9.1 and 9.2. I'm interested in both 9.1 and 9.2 in case the upgraded kernel has made a slight change to the data it's reporting (which we've seen before.) The vpd files may not exist, which is fine, but I'd like to collect that in case they are helpful in understanding the wwid value. Thanks Created attachment 1965271 [details]
9.1 hexdump
The difference between 9.1 and 9.2 is exactly the same. Checking it again to see if I goofed something in this test.
Nope. Diffs are the same. Verified on the 9.1-fresh-install snapshot, the 9.1-post-update-but-before-reboot snapshot, and reboots into both the broken state and post-manual-edit-removing-_ 9.2. Thanks, I've written a test case to verify that the line feed character at the end of the wwid is the cause of the problem. In 9.0/9.1 the line feed character was replaced by underscore, and in 9.2 the line feed character is ignored. This causes the wwid string comparison to not match in 9.2, so the device is not found. This will require a simple fix in 9.2. My diagnosis in comment 10 was incorrect. The problem is the trailing space character at the end of the WWID. In 9.0 and 9.1 lvm would replace that trailing space with _, but in in 9.2 trailing spaces are ignored. This causes the unmatching WWIDs which causes lvm to ignore the device. The fix for 9.2 will be to ignore trailing _ in IDNAME values. fix in main: https://sourceware.org/git/?p=lvm2.git;a=commit;h=4cdb178968b44125c41dee6dd28997283c0afefa Thanks for your effort in figuring this out! The following test uses fake sysfs dirs containing the problematic wwid and pointing lvm at that. 1. In the lvm.conf devices section. device_id_sysfs_dir = "/test/sys/" 2. Get major:minor of the device to use in testing. $ ls -l /dev/sdh brw-rw----. 1 root disk 8, 112 May 11 14:46 /dev/sdh 3. Set up sysfs files for major:minor. $ mkdir -p /test/sys/dev/block/8:112/device $ echo "t10.ATA VBOX HARDDISK VB9c10d318-188d9ebc " > /test/sys/dev/block/8\:112/device/wwid 4. Check that the latest lvm version reduces repeated spaces and ignores trailing spaces. $ pvcreate /dev/sdh $ rm /etc/lvm/devices/system.devices $ lvmdevices --adddev /dev/sdh $ grep sdh /etc/lvm/devices/system.devices IDTYPE=sys_wwid IDNAME=t10.ATA_VBOX_HARDDISK_VB9c10d318-188d9ebc DEVNAME=/dev/sdh PVID=YZEDsWin3kJsoLrvlFj6f8vWNGS7hPdn $ pvs -o name,deviceid /dev/sdh PV DeviceID /dev/sdh $ vgcreate hh /dev/sdh Volume group "hh" successfully created with system ID dct-rhel9-cluster-n0 $ pvs -o name,deviceid /dev/sdh PV DeviceID /dev/sdh t10.ATA_VBOX_HARDDISK_VB9c10d318-188d9ebc 5. Check that the latest lvm version recognizes previous IDNAME format with repeated underscores and trailing underscore (the repeated underscores are correctly recognized in 9.2, but not the trailing underscore.) $ sed -e 's/t10.ATA_VBOX_HARDDISK_VB9c10d318-188d9ebc/t10.ATA_____VBOX_HARDDISK___________________________VB9c10d318-188d9ebc_/g' -i /etc/lvm/devices/system.devices $ pvs /dev/sdh PV VG Fmt Attr PSize PFree /dev/sdh hh lvm2 a-- 1020.00m 1020.00m We are able to reproduce the issue for RHEL9.2 new installations as well.
Steps to reproduce:
1. Go to Azure Marketplace and deploy image of the following reference:
"imageReference": {
"publisher": "RedHat",
"offer": "RHEL",
"sku": "9_2",
"version": "Latest",
"exactVersion": "9.2.2023052501"
},
2. Perform 'pvdisplay' command.
Error: [root@AYAN-2905-RHEL92 ~]# pvdisplay
Devices file sys_wwid naa.6002248040317a85abbd3ecc0f15951e PVID qGGXfRI6eVu8m8W9n6g5UbCHyqhXy6NO last seen on /dev/sda4 not found.
ISO consumed to build the image: RHEL-9.2-RC-1.0
@
(In reply to Simrat Pal Singh Satia from comment #15) > We are able to reproduce the issue for RHEL9.2 new installations as well. > > Steps to reproduce: > 1. Go to Azure Marketplace and deploy image of the following reference: > "imageReference": { > "publisher": "RedHat", > "offer": "RHEL", > "sku": "9_2", > "version": "Latest", > "exactVersion": "9.2.2023052501" > }, > > 2. Perform 'pvdisplay' command. > > Error: [root@AYAN-2905-RHEL92 ~]# pvdisplay > Devices file sys_wwid naa.6002248040317a85abbd3ecc0f15951e PVID > qGGXfRI6eVu8m8W9n6g5UbCHyqhXy6NO last seen on /dev/sda4 not found. > > ISO consumed to build the image: RHEL-9.2-RC-1.0 > > @ (In reply to Simrat Pal Singh Satia from comment #15) > We are able to reproduce the issue for RHEL9.2 new installations as well. When this new 9.2 install is finished, what is the content of /etc/lvm/devices/system.devices? > Steps to reproduce: > 1. Go to Azure Marketplace and deploy image of the following reference: > "imageReference": { > "publisher": "RedHat", > "offer": "RHEL", > "sku": "9_2", > "version": "Latest", > "exactVersion": "9.2.2023052501" > }, I don't understand this step, it's not something I've ever seen before. > # pvdisplay > Devices file sys_wwid naa.6002248040317a85abbd3ecc0f15951e PVID qGGXfRI6eVu8m8W9n6g5UbCHyqhXy6NO last seen on /dev/sda4 not found. The device with this wwid must have been used during the 9.2 installation. Can you check what the current wwid of the /dev/sda is? $ cat /sys/block/sda/device/wwid (In reply to Simrat Pal Singh Satia from comment #15) > We are able to reproduce the issue for RHEL9.2 new installations as well. This should be tracked in a new bug, it's not related to the handling of spaces/underscores in wwid. [root@rhel92 azureuser]# cat /etc/lvm/devices/system.devices # LVM uses devices listed in this file. # Created by LVM command lvmdevices pid 3043 at Mon May 22 14:26:33 2023 VERSION=1.1.4 IDTYPE=sys_wwid IDNAME=naa.6002248040317a85abbd3ecc0f15951e DEVNAME=/dev/sda4 PVID=qGGXfRI6eVu8m8W9n6g5UbCHyqhXy6NO PART=4 Please let me know if this required a new bug or has similar RCA to the existing bug. Thanks, Simrat (In reply to Simrat Pal Singh Satia from comment #19) > [root@rhel92 azureuser]# cat /etc/lvm/devices/system.devices > # LVM uses devices listed in this file. > # Created by LVM command lvmdevices pid 3043 at Mon May 22 14:26:33 2023 > VERSION=1.1.4 > IDTYPE=sys_wwid IDNAME=naa.6002248040317a85abbd3ecc0f15951e > DEVNAME=/dev/sda4 PVID=qGGXfRI6eVu8m8W9n6g5UbCHyqhXy6NO PART=4 That looks normal. > Please let me know if this required a new bug or has similar RCA to the > existing bug. I don't think you're dealing with trailing spaces on wwids. It sounds like you're problem is somehow related to creating/unconfiguring/recreating OS images, so you should create a new bug with the details surrounding how these OS images are meant to be configured for the instances running them. When OS images are copied/cloned, the system.devices file cannot be copied, and needs to be recreated by each OS instance for the device it's using. Marking Verified:Tested in the latest build. The trailing spaces are no longer present.
kernel-5.14.0-332.el9 BUILT: Mon Jun 26 06:16:51 PM CEST 2023
lvm2-2.03.21-3.el9 BUILT: Thu Jul 13 08:50:26 PM CEST 2023
lvm2-libs-2.03.21-3.el9 BUILT: Thu Jul 13 08:50:26 PM CEST 2023
[root@virt-009 ~]# grep device_id_sysfs_dir /etc/lvm/lvm.conf
device_id_sysfs_dir = "/test/sys/"
[root@virt-009 ~]# ls -l /dev/sdf
brw-rw----. 1 root disk 8, 80 Jul 25 21:36 /dev/sdf
[root@virt-009 ~]# mkdir -p /test/sys/dev/block/8:80/device
[root@virt-009 ~]# echo "t10.ATA VBOX HARDDISK VB9c10d318-188d9ebc " > /test/sys/dev/block/8\:80/device/wwid
[root@virt-009 ~]# pvcreate /dev/sdf
Physical volume "/dev/sdf" successfully created.
[root@virt-009 ~]# cat /etc/lvm/devices/system.devices
# LVM uses devices listed in this file.
# Created by LVM command pvcreate pid 2576021 at Wed Jul 26 21:04:10 2023
VERSION=1.1.13063
IDTYPE=devname IDNAME=/dev/vda2 DEVNAME=/dev/vda2 PVID=efoHgim4mazczPn658nUoSWQtlHz6kjn PART=2
IDTYPE=sys_wwid IDNAME=t10.ATA_VBOX_HARDDISK_VB9c10d318-188d9ebc DEVNAME=/dev/sdf PVID=buapmDqciGJPDooSQ5wYXCo9rXRSiQvw
[root@virt-009 ~]# rm /etc/lvm/devices/system.devices
rm: remove regular file '/etc/lvm/devices/system.devices'? y
[root@virt-009 ~]# lvmdevices --adddev /dev/sdf
[root@virt-009 ~]# cat /etc/lvm/devices/system.devices
# LVM uses devices listed in this file.
# Created by LVM command lvmdevices pid 2576042 at Wed Jul 26 21:04:36 2023
VERSION=1.1.1
IDTYPE=sys_wwid IDNAME=t10.ATA_VBOX_HARDDISK_VB9c10d318-188d9ebc DEVNAME=/dev/sdf PVID=buapmDqciGJPDooSQ5wYXCo9rXRSiQvw
Any ETA on this? The change is in 9.3. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (lvm2 bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2023:6633 |
Created attachment 1965214 [details] First - Boot issue Description of problem: After update to 9.2, LVM partitions can not be mounted due to bad information in /etc/lvm/devices/system.devices . This has been challenging to replicate in some environments, however, it seems that VirtualBox can replicated this problem 100% of the times I, or those I've asked to verify, have tried. It does not seem to matter what the OS under VirtualBox is nor the version (all variations are recent/updated though). Replication on physical machine HAS occurred, but not reliably in my own testing. Version-Release number of selected component (if applicable): 9.1 -> 9.2 How reproducible: Every time one devices where it seems there are sys_wwid such as VirtualBox. Steps to Reproduce: For best replication: Install VirtualBox. Download RHEL 9.1 - click through the install and set a root password and manually adjust the partitions. While this isn't the only layout that has caused problems, it's the one that I've been able to trigger this issue on every single time. Size doesn't matter, just the LVM partition layout: /boot 1GiB / 20 GiB LVM /home 20 GiB LVM swap 10 GiB LVM /var 20 GiB Let the installer reboot into the distro. Subscribe with auto-attach to get updates. [Might also want to snapshot. ;-)] Verify you still have 9.1 $ cat /etc/redhat-release Red Hat Enterprise Linux release 9.1 (Plow) $ dnf clean all && dnf update -y # should pull in 9.2 $ reboot See first attached image for /var and /home having issues mounting. Give password for maintenance. Then notice that /var and /home didn't mount, but swap did! ?? ?? *shrug* See second attached picture. $ lsblk Try to scan for lvm or mount lvm's. Should get an error about device last seen isn't found. See third attached picture [please ignore the different LVM ID's - I retested when I wanted more then one screenshot to share] Now delete /etc/lvm/devices/system.devices and rescan. See fourth attached picture. Now it is safe to reboot and continue using 9.2 Actual results: Failed boot due to missing mount points Expected results: A successful boot with mount points where they should be. Additional info: