Hide Forgot
Description of problem: Sometimes after pvcreate I only get a single change event for the pv and sometimes ID_FS_TYPE is unset in the properties associated with that event. Other times I might get two or even three events, usually all but the first have the correct ID_FS_TYPE. Sometimes when I only get a single event it has the correct ID_FS_TYPE. Version-Release number of selected component (if applicable): lvm2-2.02.140-3.fc24 systemd-228-7.gite35a787.fc24 How reproducible: This is an intermittent failure. I have only hit it in qemu/kvm vms running both F23 and current rawhide. I have reproduced using virtio partition, virtio disk, and scsi disk. I was unable to reproduce on a partition on bare metal, although that does not necessarily prove anything as I only tried ten or so times. Steps to Reproduce: 1. (shell 1) udevadm monitor --udev --property 2. (shell 2) pvcreate <blockdev> 3. repeat, with pvremove between, until the pvcreate yields a single change event with ID_FS_TYPE= (generally takes me no more than five tries) Actual results: ID_FS_TYPE is unset right after running pvcreate Expected results: ID_FS_TYPE=LVM2_member Additional info: I can provide debug journal, pvcreate -vvv output, &c as needed, but this is not difficult to reproduce in my experience.
[root@localhost ~]# udevadm settle [root@localhost ~]# pvremove /dev/vdb Labels on physical volume "/dev/vdb" successfully wiped [root@localhost ~]# udevadm settle [root@localhost ~]# pvcreate /dev/vdb Physical volume "/dev/vdb" successfully created [root@localhost ~]# udevadm info --name=vdb P: /devices/pci0000:00/0000:00:08.0/virtio3/block/vdb N: vdb E: DEVNAME=/dev/vdb E: DEVPATH=/devices/pci0000:00/0000:00:08.0/virtio3/block/vdb E: DEVTYPE=disk E: MAJOR=252 E: MINOR=16 E: MPATH_SBIN_PATH=/sbin E: SUBSYSTEM=block E: TAGS=:systemd: E: USEC_INITIALIZED=4412657 The other shell showed the following: UDEV [47769.180549] change /devices/pci0000:00/0000:00:08.0/virtio3/block/vdb (block) .ID_FS_TYPE_NEW= ACTION=change DEVNAME=/dev/vdb DEVPATH=/devices/pci0000:00/0000:00:08.0/virtio3/block/vdb DEVTYPE=disk ID_FS_TYPE= MAJOR=252 MINOR=16 MPATH_SBIN_PATH=/sbin SEQNUM=3641 SUBSYSTEM=block TAGS=:systemd: USEC_INITIALIZED=4412657
Please provide log of this problem (with proper device of your choice): while : ; do udevadm settle echo REMOVED udevadm info /dev/loop0 pvcreate /dev/loop0 udevadm settle echo CREATED udevadm info /dev/loop0 pvremove /dev/loop0 done where you observe the bug. If you hit the issue in this loop - capture disk content with 'dd' in cached mode as well as in direct-io read. First suspect here is virt-machine caching.
Created attachment 1118517 [details] debug data demonstrating issue This was reproduced with caching disabled for vdb. Note that the udev db is not consistent with the disk contents. This system is not doing anything else aside from running the OS, so this is not an issue of resource contention or anything like that.
Created attachment 1118518 [details] same journal snippet but with long lines intact
kernel-4.5.0-0.rc0.git1.1.fc24.x86_64
Well, there are normally 5 device "open RW" calls with subsequent "close calls" which should produce 5 events (if I slow down the code, there are really 5 events generated!). Thing that we see only 1 (or 2, or 3 or..) is that udev is coalescing these events if they're very near each other in time. I have a suspicion that the source of the bug is somewhere here around this logic... So, let's try to create a simple shell script (let's call it "run_blkid.sh", placed directly in /): #!/bin/bash /usr/sbin/blkid -p $@ &> /run/run_blkid.log Then adding associated udev rule to call this script (/lib/udev/rules.d/60-persistent-storage.rules): # probe filesystem metadata of disks KERNEL!="sr*", IMPORT{program}="/run_blkid.sh $devnode" KERNEL!="sr*", IMPORT{builtin}="blkid This way, we can compare what blkid sees when runnning it directly, saving log to /run/run_blkid.log. If we find that blkid -p and that builtin blkid differs, there's a bug somewhere in the blkid. If it does not differ, it's a bug in udev that probably that coalesces events in odd way so that some events are dropped, but they shouldn't be... David, can you try this?
...what kind of device is it on the host that the virtio device is bound to?
(In reply to Peter Rajnoha from comment #6) > that udev is coalescing these events if they're very near each other in > time. I have a suspicion that the source of the bug is somewhere here around > this logic... (Not exactly udev, but the inotify mechanism. We had very similar bug once - bug #1084944, but in the end it was not reproducible anymore.)
(In reply to Peter Rajnoha from comment #8) > (Not exactly udev, but the inotify mechanism. We had very similar bug once - > bug #1084944, but in the end it was not reproducible anymore.) (Hmm, and that was also hit for virtio devices!)
I've been using file-backed virtio devices.
$ cat /tmp/run_blkid.log /dev/vdb2: PART_ENTRY_SCHEME="gpt" PART_ENTRY_UUID="81471727-d151-4d3f-a9cc-dfb149be42ac" PART_ENTRY_TYPE="0fc63daf-8483-4772-8e79-3d69d8477de4" PART_ENTRY_NUMBER="2" PART_ENTRY_OFFSET="5859328" PART_ENTRY_SIZE="1953792" PART_ENTRY_DISK="252:16"
OK, thanks. This looks like blkid is still seeing old data even after the pvcreate - so caching problem. Though we're using O_DIRECT, it may still be not enough. Please, try following scratch builds and see if you can still reproduce. They use different way of forcing flushing/syncing data to disk before closing device after writing, so it's worth to try both. Please, run with -vvvv so we can check the debug log then. (this one makes dev_flush call unconditional while closing device which then calls BLKFLSBUF ioctl on that device) http://koji.fedoraproject.org/koji/taskinfo?taskID=12769665 (this one calls fdatasycn directly on device) http://koji.fedoraproject.org/koji/taskinfo?taskID=12769850 If neither of these two help, it must be a bug somewhere in layers underneath - the way caching is done for virtio devices bound to files on host system.
I was able to reproduce with both test packages. I only ran a dnf update in betweeen -- no reboot, &c.
(In reply to David Lehman from comment #3) > Created attachment 1118517 [details] > debug data demonstrating issue > > This was reproduced with caching disabled for vdb. Note that the udev db is > not consistent with the disk contents. Have you tried "directsync" cache mode in qemu/virt-manager? Is it reproducible even with that one?
I was able to reproduce using "directsync", although it may have taken more attempts. I was had to try pvremove+pvcreate an average of perhaps three times to reproduce with cache mode "none", while I have only reproduced once so far in at least twenty attempts with "directsync".
(In reply to David Lehman from comment #15) > I was able to reproduce using "directsync", although it may have taken more > attempts. I was had to try pvremove+pvcreate an average of perhaps three > times to reproduce with cache mode "none", while I have only reproduced once > so far in at least twenty attempts with "directsync". Very odd - looks like a decent bug in handling caching. Can you try older kernels? (...the guest is rawhide, what is running on host though?)
This bug appears to have been reported against 'rawhide' during the Fedora 24 development cycle. Changing version to '24'. More information and reason for this action is here: https://fedoraproject.org/wiki/Fedora_Program_Management/HouseKeeping/Fedora24#Rawhide_Rebase
This message is a reminder that Fedora 24 is nearing its end of life. Approximately 2 (two) weeks from now Fedora will stop maintaining and issuing updates for Fedora 24. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as EOL if it remains open with a Fedora 'version' of '24'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version. Thank you for reporting this issue and we are sorry that we were not able to fix it before Fedora 24 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora, you are encouraged change the 'version' to a later Fedora version prior this bug is closed as described in the policy above. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete.
Fedora 24 changed to end-of-life (EOL) status on 2017-08-08. Fedora 24 is no longer maintained, which means that it will not receive any further security or bug fix updates. As a result we are closing this bug. If you can reproduce this bug against a currently maintained version of Fedora please feel free to reopen this bug against that version. If you are unable to reopen this bug, please file a new report against the current release. If you experience problems, please add a comment to this bug. Thank you for reporting this bug and we are sorry it could not be fixed.