So if I understand correctly, the problem on OCP was caused by a change of disk id. From: scsi-0QEMU_QEMU_HARDDISK_d844e3a1-4fb0-4411-9 to: scsi-0QEMU_QEMU_HARDDISK_d844e3a1-4fb0-4411-944b-7789f14ab435 I recall seeing these shorter serial numbers on older RHV... And just reproduced it now, see below: RHV 4.4, CL 4.3 - qemu-kvm-5.1.0-20.module+el8.3.1+9918+230f5c26.x86_64 ----------------------------------------------------------------------- <type arch='x86_64' machine='pc-i440fx-rhel7.6.0'>hvm</type> ... <disk type='block' device='disk' snapshot='no'> <driver name='qemu' type='raw' cache='none' error_policy='stop' io='native'/> <source dev='/rhev/data-center/mnt/blockSD/5f01590d-b3e7-499b-a799-72f3589783e9/images/b78c4cb1-fa98-410e-9542-42c60bd28e02/3350aa21-d12c-4966-a3e6-cfac5b4245dd' index='1'> <seclabel model='dac' relabel='no'/> </source> <backingStore/> <target dev='sda' bus='scsi'/> <serial>b78c4cb1-fa98-410e-9542-42c60bd28e02</serial> <-------- <boot order='2'/> <alias name='ua-b78c4cb1-fa98-410e-9542-42c60bd28e02'/> <address type='drive' controller='0' bus='0' target='0' unit='0'/> </disk> # udevadm info /dev/sda E: ID_SERIAL=0QEMU_QEMU_HARDDISK_b78c4cb1-fa98-410e-9542-42c60bd28e02 <------ full serial Link, full serial: lrwxrwxrwx. 1 root root 9 May 12 23:50 /dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_b78c4cb1-fa98-410e-9542-42c60bd28e02 -> ../../sda RHV 4.4, CL 4.3 - qemu-kvm-rhev-2.12.0-48.el7_9.2.x86_64 ----------------------------------------------- <type arch='x86_64' machine='pc-i440fx-rhel7.6.0'>hvm</type> ... <disk type='block' device='disk' snapshot='no'> <driver name='qemu' type='raw' cache='none' error_policy='stop' io='native'/> <source dev='/rhev/data-center/mnt/blockSD/5f01590d-b3e7-499b-a799-72f3589783e9/images/b78c4cb1-fa98-410e-9542-42c60bd28e02/3350aa21-d12c-4966-a3e6-cfac5b4245dd'> <seclabel model='dac' relabel='no'/> </source> <backingStore/> <target dev='sda' bus='scsi'/> <serial>b78c4cb1-fa98-410e-9542-42c60bd28e02</serial> <boot order='1'/> <alias name='ua-b78c4cb1-fa98-410e-9542-42c60bd28e02'/> <address type='drive' controller='0' bus='0' target='0' unit='0'/> </disk> # udevadm info /dev/sda | grep SERIAL E: ID_SERIAL=0QEMU_QEMU_HARDDISK_b78c4cb1-fa98-410e-9 <------ missing link, serial is : lrwxrwxrwx. 1 root root 9 May 13 00:30 /dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_b78c4cb1-fa98-410e-9 -> ../../sda So the XML given to libvirt is the same, and qemu-kvm command line contains the full serial. -drive file=/rhev/data-center/mnt/blockSD/5f01590d-b3e7-499b-a799-72f3589783e9/images/b78c4cb1-fa98-410e-9542-42c60bd28e02/3350aa21-d12c-4966-a3e6-cfac5b4245dd,format=raw,if=none,id=drive-ua-b78c4cb1-fa98-410e-9542-42c60bd28e02,serial=b78c4cb1-fa98-410e-9542-42c60bd28e02,werror=stop,rerror=stop,cache=none,aio=native -device scsi-hd,bus=ua-9ca4a624-8c69-4860-bfac-2abea5c72381.0,channel=0,scsi-id=0,lun=0,drive=drive-ua-b78c4cb1-fa98-410e-9542-42c60bd28e02,id=ua-b78c4cb1-fa98-410e-9542-42c60bd28e02,bootindex=1,write-cache=o I assume this is a difference between old (el7, 2.12) and newer (el8, 5.1) qemu-kvm, where the older version does not pass the full serial, maybe it doesn't fit somewhere. So I don't think this is a RHV bug. And maybe not even a qemu-kvm/libvirt bug, as the behaviour of the new version seems more correct than the old (full serial).
If this is production down, here is a *very* ugly hack if you want to truncate the disk serial number on newer RHV, so that it looks like the truncated serial on RHV 4.3/RHEL7 hypervisors. Place this file on the RHV hypervisor, as executable: # cat /usr/libexec/vdsm/hooks/before_vm_start/99_truncate_serial #!/usr/bin/python3 import hooking def main(): domxml = hooking.read_domxml() for disk in domxml.getElementsByTagName('disk'): if disk.getAttribute('device') != 'disk': continue serial = disk.getElementsByTagName('serial')[0] id = serial.firstChild.data serial.firstChild.data = id[:20] hooking.write_domxml(domxml) if __name__ == "__main__": main() From now any VMs started on it will have the serial truncated to the first 20 chars of the uuid, so that it looks similar to RHV4.3/RHEL7. # ll /dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_b78c4cb1-fa98-410e-9 lrwxrwxrwx. 1 root root 9 May 13 01:43 /dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_b78c4cb1-fa98-410e-9 -> ../../sda Note it will affect all VMs started on the host, so use with care. Can be improved to run just on select VMs by using a custom property.
(In reply to Germano Veit Michel from comment #7) > If this is production down, here is a *very* ugly hack if you want to > truncate the disk serial number on newer RHV, so that it looks like the > truncated serial on RHV 4.3/RHEL7 hypervisors. another possible workaround (i haven't tried though) could be switching to virtio-block. It is likely that this issue is happening only to virti-scsi disks. (based on input from pkrempa and commit https://gitlab.com/libvirt/libvirt/-/commit/a1dce96236f6d35167924fa7e6a70f58f394b23c)
(In reply to Michal Skrivanek from comment #11) > (In reply to Germano Veit Michel from comment #7) > > If this is production down, here is a *very* ugly hack if you want to > > truncate the disk serial number on newer RHV, so that it looks like the > > truncated serial on RHV 4.3/RHEL7 hypervisors. > > another possible workaround (i haven't tried though) could be switching to > virtio-block. It is likely that this issue is happening only to virti-scsi > disks. (based on input from pkrempa and commit > https://gitlab.com/libvirt/libvirt/-/commit/ > a1dce96236f6d35167924fa7e6a70f58f394b23c) Well, it does truncate the serial, but it changes the id anyway due to the interface change: lrwxrwxrwx. 1 root root 9 May 13 01:43 /dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_b78c4cb1-fa98-410e-9 -> ../../sda VS lrwxrwxrwx. 1 root root 9 May 13 19:46 /dev/disk/by-id/virtio-b78c4cb1-fa98-410e-9 -> ../../vda So I don't think this will help much...
(In reply to Germano Veit Michel from comment #13) > (In reply to Michal Skrivanek from comment #11) > > (In reply to Germano Veit Michel from comment #7) > > > If this is production down, here is a *very* ugly hack if you want to > > > truncate the disk serial number on newer RHV, so that it looks like the > > > truncated serial on RHV 4.3/RHEL7 hypervisors. > > > > another possible workaround (i haven't tried though) could be switching to > > virtio-block. It is likely that this issue is happening only to virti-scsi > > disks. (based on input from pkrempa and commit > > https://gitlab.com/libvirt/libvirt/-/commit/ > > a1dce96236f6d35167924fa7e6a70f58f394b23c) > > Well, it does truncate the serial, but it changes the id anyway due to the > interface change: > > lrwxrwxrwx. 1 root root 9 May 13 01:43 > /dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_b78c4cb1-fa98-410e-9 -> ../../sda > VS > lrwxrwxrwx. 1 root root 9 May 13 19:46 > /dev/disk/by-id/virtio-b78c4cb1-fa98-410e-9 -> ../../vda > > So I don't think this will help much... sure, it's a completely different interface from guest POV, but it should stay consistent between 4.3 and 4.4 hosts
this seems to be a libvirt/qemu change in behavior between RHEL 7 and RHEL 8 (possibly related to commit https://gitlab.com/libvirt/libvirt/-/commit/a1dce96236f6d35167924fa7e6a70f58f394b23c) Moving to libvirt
> Also, I have to reiterate my concern about using /dev/disk/by-* symlinks. Mission critical disk enumeration should be done by reaching to udev directly, e.g. via libudev or its bindings. That's the authoritative source of storage configuration, is able to handle duplicate identifiers and is a race-free access. The /dev/disk/by-* symlinks are being randomly overwritten in case a duplicate identifier (e.g. multipath) appear in the system. @Tomas - I am the maintainer of local-storage-operator(LSO) in OCP and have some follow up questions. There are two ways LSO can be configured - in one mode user specifies individual devices via (/dev/sda or /dev/disk/by-id/wwwwwww) or they can ask LSO to "claim" all available devices. I would like us to fix device discovery process using libudev or something similar for second option (because user is not specifying devices anyways). Does calling udevadm via following command: udevadm info --query=path --name=/dev/sda works same as using libudev? You have any more pointers? For first option - where user/admin manually specifies individual device-ids via LocalVolume object: apiVersion: "local.storage.openshift.io/v1" kind: "LocalVolume" metadata: name: "example" spec: storageClassDevices: - storageClassName: "foobar" volumeMode: Filesystem fsType: ext4 devicePaths: - /dev/disk/by-id/wwwwwwww This is bit more tricky. I am not sure if it is "fixable" unless LSO decides to not use device that user provisioned.
Reprodueced: Start a VM with a scsi disk like this: <disk type='file' device='disk'> <driver name='qemu' type='qcow2'/> <source file='/var/lib/libvirt/images/jgao.qcow2'/> <backingStore/> <target dev='sda' bus='scsi'/> <serial>b78c4cb1-fa98-410e-9542-42c60bd28e02</serial> <alias name='ua-b78c4cb1-fa98-410e-9542-42c60bd28e02'/> <address type='drive' controller='0' bus='0' target='0' unit='0'/> </disk> Then check the ID_SERIAL|ID_SCSI_SERIAL: For VM on libvirt-4.5.0-36.el7_9.3.x86_64 qemu-kvm-rhev-2.12.0-48.el7_9.2.x86_64: [root@fedora ~]# udevadm info --name sda|grep -E '(ID_SERIAL|ID_SCSI_SERIAL)' E: ID_SERIAL=0QEMU_QEMU_HARDDISK_b78c4cb1-fa98-410e-9 E: ID_SERIAL_SHORT=b78c4cb1-fa98-410e-9 E: ID_SCSI_SERIAL=b78c4cb1-fa98-410e-9542-42c60bd28e02 For VM on libvirt-7.3.0-1.el8.x86_64 qemu-kvm-5.2.0-12.module+el8.4.0+10354+98272afe.x86_64: E: ID_SERIAL=0QEMU_QEMU_HARDDISK_b78c4cb1-fa98-410e-9542-42c60bd28e02 E: ID_SERIAL_SHORT=b78c4cb1-fa98-410e-9542-42c60bd28e02 E: ID_SCSI_SERIAL=b78c4cb1-fa98-410e-9542-42c60bd28e02 Their ID_SERIAL are different. That's why the bug happens after OS upgrade.
Hi, I've investigated the history of how we've ended up with this change. I've summarized it in an upstream mailing list post: https://listman.redhat.com/archives/libvir-list/2021-June/msg00066.html A short summary is that qemu is silently truncating the serial in 3 different ways depending on how it's configured and how it's queried. Additionally the change was done more than 2 years ago, and reverting to silent truncation of less than we do now would also be considerable a regression, thus I don't think we can revert to 20 characters for now. Looking at the logs/sosreports above, I can see that even the old version where the serial was truncated in one of the VPD pages already contains a qemu patch (see the message above where it's mentioned) which raised the limit to 36 (to fit an UUID) on the second query. This can be seen in the following snippet from Comment #2: P: /devices/pci0000:00/0000:00:06.0/virtio2/host0/target0:0:0/0:0:0:4/block/sdb | P: /devices/pci0000:00/0000:00:06.0/virtio2/host0/target0:0:0/0:0:0:4/block/sdb N: sdb | N: sdb S: disk/by-id/scsi-0QEMU_QEMU_HARDDISK_cb66542c-10d4-4578-8 | S: disk/by-id/scsi-0QEMU_QEMU_HARDDISK_e1baf1ef-e053-45be-9a51-64dd2fe59c60 S: disk/by-id/scsi-SQEMU_QEMU_HARDDISK_cb66542c-10d4-4578-8955-92265791f1ed | S: disk/by-id/scsi-SQEMU_QEMU_HARDDISK_e1baf1ef-e053-45be-9a51-64dd2fe59c60 The link starting with '/dev/disk/by-id/scsi-S' already contains the full UUID in both cases, thus if that link is used instead the result will be identical in both deployments. I've also filed the following 2 qemu bugs to prevent silent truncation and to synchronise the reported serial lengths: https://bugzilla.redhat.com/show_bug.cgi?id=1967666 https://bugzilla.redhat.com/show_bug.cgi?id=1967668 Another possible stop-gap is to limit the values used in <serial> to 20 characters which behave the same in both versions.
After confirming with upstream that reverting to silent truncation of the serial to 20 characters would be a regression too from upstream point of view I've added the following notice into the documentation: commit 2c8b341af83501730f3767c8738e909ef72f6a28 Author: Peter Krempa <pkrempa> Date: Fri Jun 4 14:08:40 2021 +0200 docs: formatdomain: Document disk serial truncation status quo Disk serials are truncated arbitrarily and silently by qemu depending on the device type and how they are configured. Since changing the current state would lead to more regressions than we have now, document that the truncation is arbitrary. Signed-off-by: Peter Krempa <pkrempa> Reviewed-by: Michal Privoznik <mprivozn> diff --git a/docs/formatdomain.rst b/docs/formatdomain.rst index c1ee9fedda..da4d93a787 100644 --- a/docs/formatdomain.rst +++ b/docs/formatdomain.rst @@ -3146,6 +3146,16 @@ paravirtualized driver is specified via the ``disk`` element. may look like ``<serial>WD-WMAP9A966149</serial>``. Not supported for scsi-block devices, that is those using disk ``type`` 'block' using ``device`` 'lun' on ``bus`` 'scsi'. :since:`Since 0.7.1` + + Note that depending on hypervisor and device type the serial number may be + truncated silently. IDE/SATA devices are commonly limited to 20 characters. + SCSI devices depending on hypervisor version are limited to 20, 36 or 247 + characters. + + Hypervisors may also start rejecting overly long serials instead of + truncating them in the future so it's advised to avoid the implicit + truncation by testing the desired serial length range with the desired device + and hypervisor combination. ``wwn`` If present, this element specifies the WWN (World Wide Name) of a virtual hard disk or CD-ROM drive. It must be composed of 16 hexadecimal digits. Unfortunately from libvirt's point of view the resolution would be CLOSED_CANTFIX. Returning to RHV to investigate other possible solutions. In case there's none please close as CANTFIX.
Peter - can you provide a recommendation to OCP about which symlink to use (in case symlinks aren't user specified) - if a device has multiple QEMU/libvirt generated symlinks? I would probably need to ensure that whenever possible local-storage-operator uses longest (recommended?) version.
The symlinks in /dev/disk/by-id/ are generated by udev based on the disk identification data. Now based on what I've summarized in the upstream mailing list post the following applies: There are two identification VPD pages of a SCSI disk: 0x80: - This is present only when a serial number is configured - results in symlinks with prefix of /dev/disk/by-id/scsi-S - length is limited to 36 chars (fits a full UUID including dashes) - the limit of 36 chars is since qemu-2.7 - most of rhel-av-7 series - all of rhel-8 qemus 0x83: - if serial is not configured, it contains the device alias (limited to 247 chars) - if serial is configured, it contains the serial (limited to 20 chars, old versions, 247 chars new versions) - results in udev symlink with prefix of /dev/disk/by-id/scsi-0 Based on the above and the data reported from pre-upgrade and post-upgrade systems I'd suggest to use the symlinks starting with the prefix '/dev/disk/by-id/scsi-S' which in both cases fit an UUID fully. In the future I hope that qemu will unify the handling of the serial numbers and also reject non-conformant serials rather than silently truncate them.
Seems we now understand the reasons for this problem and also how to avoid it. Is there anything else to look at in RHV or OCP on RHV or is this now rather a documentation of OCP in general, or something LSO specific?
(In reply to Hemant Kumar from comment #19) > @Tomas - I am the maintainer of local-storage-operator(LSO) in OCP and have > some follow up questions. There are two ways LSO can be configured - in one > mode user specifies individual devices via (/dev/sda or > /dev/disk/by-id/wwwwwww) or they can ask LSO to "claim" all available > devices. I would like us to fix device discovery process using libudev or > something similar for second option (because user is not specifying devices > anyways). Does calling udevadm via following command: > > udevadm info --query=path --name=/dev/sda > > works same as using libudev? You have any more pointers? Yes, it's reaching to the same database. However this way you get a composite path containing multiple layers with totally irrelevant and volatile information like port numbers, HBA bus ID and assigned kernel block device name. > For first option - where user/admin manually specifies individual device-ids > via LocalVolume object: > > apiVersion: "local.storage.openshift.io/v1" > kind: "LocalVolume" > metadata: > name: "example" > spec: > storageClassDevices: > - storageClassName: "foobar" > volumeMode: Filesystem > fsType: ext4 > devicePaths: > - /dev/disk/by-id/wwwwwwww My point was to stop using device paths and use e.g. filesystem UUID, partition UUID and similar unique identifiers. Of course there's always a risk of duplicate identifiers present in the system and care must be taken in choosing the most unique one under given conditions.
Thanks for all the updates. There's not much we can do about it in RHV or libvirt so I'm moving this back to the original OCP bug 1954916. Closing with resolution from comment #22, please continue the discussion on potential LSO changes in bug 1954916
> My point was to stop using device paths and use e.g. filesystem UUID, partition UUID and similar unique identifiers. Of course there's always a risk of duplicate identifiers present in the system and care must be taken in choosing the most unique one under given conditions. we can't use filesystem UUID, that implies there should be a filesystem on it before it can be used in a workload. There are whole set of applications that consume raw block devices. > Thanks for all the updates. There's not much we can do about it in RHV or libvirt so I'm moving this back to the original OCP. These device ids were specified by end users in their applications. Say an application that consumes raw block device from /dev/disk/by-id/qemu-xxxx and after a minor upgrade and reboot the symlink no longer is created. OCP is basically just an agent between end-user and RHV, on its own it is not picking anything (at least in this case). > 0x83: - if serial is not configured, it contains the device alias (limited to 247 chars) - if serial is configured, it contains the serial (limited to 20 chars, old versions, 247 chars new versions) - results in udev symlink with prefix of /dev/disk/by-id/scsi-0 So problem seems to be that older version generated those 20 chars serial but newer version generates 247 chars serials? Is that accurate? What happens if both 20 char and 247 chars serials are generated both at the same time? Is that a breaking change?
what do you mean by "at the same time"? It's just that when you boot the VM on RHEL7-based hypervisor it will contain truncated UUID, and on RHEL8-based hypervisor it will contain the whole UUID. Isn't it better to use /dev/disk/by-id/scsi-S as Petr describes in comment #24? That should remain the same.
Thanks for the explanation. > Isn't it better to use /dev/disk/by-id/scsi-S as Petr describes in comment #24? That should remain the same. LSO currently takes the first symlink found in /dev/disk/by-id that points to the right disk, which seems to be the bad one. How can it determine which symlink is the right one, without knowing anything about QEMU limitations in various versions? LSO runs on many platforms and so far we avoided platform specific hacks.
(In reply to Jan Safranek from comment #30) > LSO currently takes the first symlink found in /dev/disk/by-id that points > to the right disk, which seems to be the bad one. How can it determine which > symlink is the right one, without knowing anything about QEMU limitations in > various versions? You might get an inspiration in SCSI identifiers sorting that we implemented for lsscsi: https://github.com/doug-gilbert/lsscsi/blob/main/src/lsscsi.c#L1698 Related RHEL bugzilla: bug 1846566 Note that there are various interpretations of the SCSI VPD standard pages and thus the identifier preference might vary.
> what do you mean by "at the same time"? It's just that when you boot the VM on RHEL7-based hypervisor it will contain truncated UUID, and on RHEL8-based hypervisor it will contain the whole UUID. Thanks for the answers. What I was trying to ask is, is it a breaking change - if RHEL8 based hypervisor(my apologies for simplifying who generates those ids) generated both 20 chars and 247 chars UUIDs? I don't see how it could be breaking change, since folks who depend on full UUID will have it and those who upgraded from RHEL-7 (or older version) can use shorter version of UUID if they were using it. > Isn't it better to use /dev/disk/by-id/scsi-S as Petr describes in comment #24? That should remain the same. In our case - it is an end user who asked us to use a device in /dev/disk/by-id/scsi-xxx and as per our own docs - https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/storage_administration_guide/persistent_naming "Red Hat Enterprise Linux automatically maintains the proper mapping from the WWID-based device name to a current /dev/sd name on that system. Applications can use the /dev/disk/by-id/ name to reference the data on the disk, even if the path to the device changes, and even when accessing the device from different systems." So we are saying - the names in /dev/disk/by-id/* are not actually persistent and may disappear on upgrade. This breaks the entire premise of persistent disk names. I don't know why it is hard to see. To answer your question - yes we could use /dev/disk/by-id/scsi-Sxx name as Petr described, but that is not what user asked us to use. We will implementing a workaround and hardcoding an heuristic which is very much platform specific and breaks user expectations (user asked us to use device-A and we are going behind user and using symlink in device-B because reasons!).
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days