Bug 1300814 - ID_FS_TYPE unset in change event following pvcreate
Summary: ID_FS_TYPE unset in change event following pvcreate
Keywords:
Status: CLOSED EOL
Alias: None
Product: Fedora
Classification: Fedora
Component: lvm2
Version: 24
Hardware: x86_64
OS: Linux
unspecified
unspecified
Target Milestone: ---
Assignee: LVM and device-mapper development team
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-01-21 19:34 UTC by David Lehman
Modified: 2017-08-08 12:41 UTC (History)
12 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-08-08 12:41:23 UTC
Type: Bug


Attachments (Terms of Use)
debug data demonstrating issue (2.70 KB, text/plain)
2016-01-26 13:46 UTC, David Lehman
no flags Details
same journal snippet but with long lines intact (2.39 KB, text/plain)
2016-01-26 13:49 UTC, David Lehman
no flags Details

Description David Lehman 2016-01-21 19:34:23 UTC
Description of problem:
Sometimes after pvcreate I only get a single change event for the pv and sometimes ID_FS_TYPE is unset in the properties associated with that event. Other times I might get two or even three events, usually all but the first have the correct ID_FS_TYPE. Sometimes when I only get a single event it has the correct ID_FS_TYPE.

Version-Release number of selected component (if applicable):
lvm2-2.02.140-3.fc24
systemd-228-7.gite35a787.fc24

How reproducible:
This is an intermittent failure. I have only hit it in qemu/kvm vms running both F23 and current rawhide. I have reproduced using virtio partition, virtio disk, and scsi disk. I was unable to reproduce on a partition on bare metal, although that does not necessarily prove anything as I only tried ten or so times.


Steps to Reproduce:
1. (shell 1) udevadm monitor --udev --property
2. (shell 2) pvcreate <blockdev>
3. repeat, with pvremove between, until the pvcreate yields a single change event with ID_FS_TYPE= (generally takes me no more than five tries)

Actual results:
ID_FS_TYPE is unset right after running pvcreate

Expected results:
ID_FS_TYPE=LVM2_member

Additional info:
I can provide debug journal, pvcreate -vvv output, &c as needed, but this is not difficult to reproduce in my experience.

Comment 1 David Lehman 2016-01-21 19:53:11 UTC
[root@localhost ~]# udevadm settle
[root@localhost ~]# pvremove /dev/vdb
  Labels on physical volume "/dev/vdb" successfully wiped
[root@localhost ~]# udevadm settle
[root@localhost ~]# pvcreate /dev/vdb
  Physical volume "/dev/vdb" successfully created
[root@localhost ~]# udevadm info --name=vdb
P: /devices/pci0000:00/0000:00:08.0/virtio3/block/vdb
N: vdb
E: DEVNAME=/dev/vdb
E: DEVPATH=/devices/pci0000:00/0000:00:08.0/virtio3/block/vdb
E: DEVTYPE=disk
E: MAJOR=252
E: MINOR=16
E: MPATH_SBIN_PATH=/sbin
E: SUBSYSTEM=block
E: TAGS=:systemd:
E: USEC_INITIALIZED=4412657


The other shell showed the following:

UDEV  [47769.180549] change   /devices/pci0000:00/0000:00:08.0/virtio3/block/vdb (block)
.ID_FS_TYPE_NEW=
ACTION=change
DEVNAME=/dev/vdb
DEVPATH=/devices/pci0000:00/0000:00:08.0/virtio3/block/vdb
DEVTYPE=disk
ID_FS_TYPE=
MAJOR=252
MINOR=16
MPATH_SBIN_PATH=/sbin
SEQNUM=3641
SUBSYSTEM=block
TAGS=:systemd:
USEC_INITIALIZED=4412657

Comment 2 Zdenek Kabelac 2016-01-21 21:28:51 UTC
Please provide log of this problem (with proper device of your choice):

while : ; do 

 udevadm settle
 echo REMOVED
 udevadm info /dev/loop0
 pvcreate /dev/loop0
 udevadm settle
 echo CREATED
 udevadm info  /dev/loop0
 pvremove /dev/loop0

done

where you observe the bug.
If you hit the issue in this loop - capture disk content with  'dd' 
in cached mode as well as in direct-io read.

First suspect here is virt-machine caching.

Comment 3 David Lehman 2016-01-26 13:46:26 UTC
Created attachment 1118517 [details]
debug data demonstrating issue

This was reproduced with caching disabled for vdb. Note that the udev db is not consistent with the disk contents. This system is not doing anything else aside from running the OS, so this is not an issue of resource contention or anything like that.

Comment 4 David Lehman 2016-01-26 13:49:45 UTC
Created attachment 1118518 [details]
same journal snippet but with long lines intact

Comment 5 David Lehman 2016-01-26 13:51:53 UTC
kernel-4.5.0-0.rc0.git1.1.fc24.x86_64

Comment 6 Peter Rajnoha 2016-01-26 15:55:38 UTC
Well, there are normally 5 device "open RW" calls with subsequent "close calls" which should produce 5 events (if I slow down the code, there are really 5 events generated!). Thing that we see only 1 (or 2, or 3 or..) is that udev is coalescing these events if they're very near each other in time. I have a suspicion that the source of the bug is somewhere here around this logic...

So, let's try to create a simple shell script (let's call it "run_blkid.sh", placed directly in /):

  #!/bin/bash
  /usr/sbin/blkid -p $@ &> /run/run_blkid.log

Then adding associated udev rule to call this script (/lib/udev/rules.d/60-persistent-storage.rules):

  # probe filesystem metadata of disks
  KERNEL!="sr*", IMPORT{program}="/run_blkid.sh $devnode"
  KERNEL!="sr*", IMPORT{builtin}="blkid

This way, we can compare what blkid sees when runnning it directly, saving log to /run/run_blkid.log. If we find that blkid -p and that builtin blkid differs, there's a bug somewhere in the blkid. If it does not differ, it's a bug in udev that probably that coalesces events in odd way so that some events are dropped, but they shouldn't be...

David, can you try this?

Comment 7 Peter Rajnoha 2016-01-26 16:02:42 UTC
...what kind of device is it on the host that the virtio device is bound to?

Comment 8 Peter Rajnoha 2016-01-26 16:17:57 UTC
(In reply to Peter Rajnoha from comment #6)
> that udev is coalescing these events if they're very near each other in
> time. I have a suspicion that the source of the bug is somewhere here around
> this logic...

(Not exactly udev, but the inotify mechanism. We had very similar bug once - bug #1084944, but in the end it was not reproducible anymore.)

Comment 9 Peter Rajnoha 2016-01-26 16:19:08 UTC
(In reply to Peter Rajnoha from comment #8)
> (Not exactly udev, but the inotify mechanism. We had very similar bug once -
> bug #1084944, but in the end it was not reproducible anymore.)

(Hmm, and that was also hit for virtio devices!)

Comment 10 David Lehman 2016-01-26 18:12:10 UTC
I've been using file-backed virtio devices.

Comment 11 David Lehman 2016-01-26 18:36:56 UTC
$ cat /tmp/run_blkid.log 
/dev/vdb2: PART_ENTRY_SCHEME="gpt" PART_ENTRY_UUID="81471727-d151-4d3f-a9cc-dfb149be42ac" PART_ENTRY_TYPE="0fc63daf-8483-4772-8e79-3d69d8477de4" PART_ENTRY_NUMBER="2" PART_ENTRY_OFFSET="5859328" PART_ENTRY_SIZE="1953792" PART_ENTRY_DISK="252:16"

Comment 12 Peter Rajnoha 2016-02-01 12:29:01 UTC
OK, thanks. This looks like blkid is still seeing old data even after the pvcreate - so caching problem. Though we're using O_DIRECT, it may still be not enough. Please, try following scratch builds and see if you can still reproduce. They use different way of forcing flushing/syncing data to disk before closing device after writing, so it's worth to try both. Please, run with -vvvv so we can check the debug log then.

(this one makes dev_flush call unconditional while closing device which then calls BLKFLSBUF ioctl on that device)
http://koji.fedoraproject.org/koji/taskinfo?taskID=12769665

(this one calls fdatasycn directly on device)
http://koji.fedoraproject.org/koji/taskinfo?taskID=12769850

If neither of these two help, it must be a bug somewhere in layers underneath - the way caching is done for virtio devices bound to files on host system.

Comment 13 David Lehman 2016-02-02 16:03:58 UTC
I was able to reproduce with both test packages. I only ran a dnf update in betweeen -- no reboot, &c.

Comment 14 Peter Rajnoha 2016-02-04 10:49:25 UTC
(In reply to David Lehman from comment #3)
> Created attachment 1118517 [details]
> debug data demonstrating issue
> 
> This was reproduced with caching disabled for vdb. Note that the udev db is
> not consistent with the disk contents.

Have you tried "directsync" cache mode in qemu/virt-manager? Is it reproducible even with that one?

Comment 15 David Lehman 2016-02-04 14:38:26 UTC
I was able to reproduce using "directsync", although it may have taken more attempts. I was had to try pvremove+pvcreate an average of perhaps three times to reproduce with cache mode "none", while I have only reproduced once so far in at least twenty attempts with "directsync".

Comment 16 Peter Rajnoha 2016-02-08 09:53:24 UTC
(In reply to David Lehman from comment #15)
> I was able to reproduce using "directsync", although it may have taken more
> attempts. I was had to try pvremove+pvcreate an average of perhaps three
> times to reproduce with cache mode "none", while I have only reproduced once
> so far in at least twenty attempts with "directsync".

Very odd - looks like a decent bug in handling caching. Can you try older kernels? (...the guest is rawhide, what is running on host though?)

Comment 17 Jan Kurik 2016-02-24 14:18:59 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 24 development cycle.
Changing version to '24'.

More information and reason for this action is here:
https://fedoraproject.org/wiki/Fedora_Program_Management/HouseKeeping/Fedora24#Rawhide_Rebase

Comment 18 Fedora End Of Life 2017-07-25 19:49:32 UTC
This message is a reminder that Fedora 24 is nearing its end of life.
Approximately 2 (two) weeks from now Fedora will stop maintaining
and issuing updates for Fedora 24. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as EOL if it remains open with a Fedora  'version'
of '24'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version'
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not
able to fix it before Fedora 24 is end of life. If you would still like
to see this bug fixed and are able to reproduce it against a later version
of Fedora, you are encouraged  change the 'version' to a later Fedora
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's
lifetime, sometimes those efforts are overtaken by events. Often a
more recent Fedora release includes newer upstream software that fixes
bugs or makes them obsolete.

Comment 19 Fedora End Of Life 2017-08-08 12:41:23 UTC
Fedora 24 changed to end-of-life (EOL) status on 2017-08-08. Fedora 24 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
bug.

Thank you for reporting this bug and we are sorry it could not be fixed.


Note You need to log in before you can comment on or make changes to this bug.