Bug 1020806 - All libguestfs LVM operations fail on Debian/Ubuntu
Summary: All libguestfs LVM operations fail on Debian/Ubuntu
Keywords:
Status: CLOSED UPSTREAM
Alias: None
Product: Virtualization Tools
Classification: Community
Component: libguestfs
Version: unspecified
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Richard W.M. Jones
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2013-10-18 10:19 UTC by Richard W.M. Jones
Modified: 2014-02-27 12:43 UTC (History)
5 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2014-02-27 12:43:31 UTC
Embargoed:


Attachments (Terms of Use)
config-3.2.0-4-amd64 (126.07 KB, text/plain)
2014-02-27 12:06 UTC, Richard W.M. Jones
no flags Details

Description Richard W.M. Jones 2013-10-18 10:19:06 UTC
Description of problem:

All LVM operations fail on Debian/Ubuntu builds of libguestfs
.. for me, but not for Hilko Bengen's official package.

You can reproduce this very easily by doing:

$ guestfish -N bootrootlv exit
libguestfs: error: lvcreate_free:   /dev/VG/LV: not found: device not cleared
  Aborting. Failed to wipe start of new LV.
guestfish: error creating prepared disk image 'bootrootlv' on 'test1.img': failed to create LV: /dev/VG/LV: lvcreate_free:   /dev/VG/LV: not found: device not cleared
  Aborting. Failed to wipe start of new LV.

Related bug 727925, bug 690308.

With LVM debugging enabled, we see the error "Uevent not generated!".

Since this works for Hilko, I'm guessing this is some sort
of missing file in the appliance, but all the files do appear
to be in place.

Version-Release number of selected component (if applicable):

libguestfs 1.23.33

How reproducible:

100%

Steps to Reproduce:

See above.

Comment 1 Richard W.M. Jones 2014-01-24 14:45:18 UTC
Works on baremetal Debian 7 testing with libguestfs 1.24.5.

Comment 2 Richard W.M. Jones 2014-01-24 14:46:47 UTC
(In reply to Richard W.M. Jones from comment #1)
> Works on baremetal Debian 7 testing with libguestfs 1.24.5.

Sorry I was running the wrong command.  It does NOT work with
libguestfs 1.24.5 on baremetal.  The error is the same as above:

$ guestfish -N bootrootlv exit -x
libguestfs: trace: set_pgroup true
libguestfs: trace: set_pgroup = 0
libguestfs: trace: add_drive "test1.img" "format:raw"
libguestfs: trace: add_drive = 0
libguestfs: trace: is_config
libguestfs: trace: is_config = 1
libguestfs: trace: launch
libguestfs: trace: get_tmpdir
libguestfs: trace: get_tmpdir = "/tmp"
libguestfs: trace: get_cachedir
libguestfs: trace: get_cachedir = "/var/tmp"
libguestfs: trace: launch = 0
libguestfs: trace: blockdev_getss "/dev/sda"
libguestfs: trace: blockdev_getss = 512
libguestfs: trace: part_init "/dev/sda" "mbr"
libguestfs: trace: part_init = 0
libguestfs: trace: part_add "/dev/sda" "primary" 64 65599
libguestfs: trace: part_add = 0
libguestfs: trace: part_add "/dev/sda" "primary" 65600 -64
libguestfs: trace: part_add = 0
libguestfs: trace: mkfs "ext2" "/dev/sda1"
libguestfs: trace: mkfs = 0
libguestfs: trace: pvcreate "/dev/sda2"
libguestfs: trace: pvcreate = 0
libguestfs: trace: vgcreate "VG" "/dev/sda2"
libguestfs: trace: vgcreate = 0
libguestfs: trace: lvcreate_free "LV" "VG" 100
libguestfs: trace: lvcreate_free = -1 (error)
libguestfs: error: lvcreate_free:   /dev/VG/LV: not found: device not cleared
  Aborting. Failed to wipe start of new LV.
guestfish: error creating prepared disk image 'bootrootlv' on 'test1.img': failed to create LV: /dev/VG/LV: lvcreate_free:   /dev/VG/LV: not found: device not cleared
  Aborting. Failed to wipe start of new LV.
libguestfs: trace: close
libguestfs: trace: internal_autosync
libguestfs: trace: internal_autosync = 0

Comment 3 Richard W.M. Jones 2014-01-24 15:04:52 UTC
Also fails when running libguestfs in a Debian testing VM
(ie. TCG case).  libguestfs 1.24.3 in this case, kernel 3.2.0-4-amd64.

Comment 4 Richard W.M. Jones 2014-02-26 15:50:23 UTC
Also broken in the same way with supermin 5, so it's (probably)
not an artifact of supermin.

Comment 5 Richard W.M. Jones 2014-02-26 18:04:20 UTC
Adding prajnoha at the suggestion of agk.

I don't really know where to start with this bug, although there's
an idea it may happen because of mismatched lvm & udev versions.
Note this is Debian/testing.

I can reproduce this bug at will on a VM that I use.  It has:

lvm2	2.02.98-6+b1
udev	175-7.2
sysvinit	2.88dsf-41+deb7u1

It is using sysvinit (not systemd).

Comment 6 Richard W.M. Jones 2014-02-26 18:06:35 UTC
Also occurs with updated packages:

lvm2	2.02.104-2
udev	204-7
sysvinit	2.88dsf-51

Comment 7 Richard W.M. Jones 2014-02-26 18:25:51 UTC
I have put the faulty Debian appliance here:

  http://oirase.annexia.org/tmp/appliance.d/

Download those three files and uncompress them with 'unxz':

  unxz kernel.xz
  unxz initrd.xz
  unxz root.xz

Create a scratch disk to play with:

  truncate -s 10G scratch.disk

Boot the Debian appliance with the following command:

  qemu-kvm -kernel kernel -initrd initrd \
    -drive file=scratch.disk,if=virtio \
    -drive file=root,if=virtio \
    -append 'console=ttyS0 guestfs_rescue=1' \
    -nographic -serial stdio -monitor none

It should boot up and drop you into a shell:

  I have no name!@(none):/#

Try the following commands in this shell in order to demonstrate the bug:

  # pvcreate /dev/vda 
    Physical volume "/dev/vda" successfully created
  # vgcreate VG /dev/vda
    Volume group "VG" successfully created
  # lvcreate -L 1G -n LV /dev/VG
    /dev/VG/LV: not found: device not cleared
    Aborting. Failed to wipe start of new LV.

Comment 8 Peter Rajnoha 2014-02-27 12:01:47 UTC
I've tried the lvcreate with the verbose log (lvcreate -vvvv) and I can see:

240 #libdm-common.c:2262         Udev cookie 0xd4da7ca (semid 32768) created
241 #libdm-common.c:2282         Udev cookie 0xd4da7ca (semid 32768) incremented to 1
242 #libdm-common.c:2154         Udev cookie 0xd4da7ca (semid 32768) incremented to 2
243 #libdm-common.c:2395         Udev cookie 0xd4da7ca (semid 32768) assigned to RESUME task(5) with flags DISABLE_LIBRARY_FALLBACK (0x120)
244 #ioctl/libdm-iface.c:1750         dm resume   (253:2) NF   [16384] (*1)
245 #ioctl/libdm-iface.c:1784         Uevent not generated! Calling udev_complete internally to avoid process lock-up.

So the udev synchronization in lvm tools is messed up. I've a suspicion this exactly because of bug #759402 comment #83.

Would it be possible to send the config for the kernel used in those images?
Mainly, I'm interested in the value of CONFIG_UEVENT_HELPER_PATH.

Comment 9 Richard W.M. Jones 2014-02-27 12:06:17 UTC
Created attachment 868489 [details]
config-3.2.0-4-amd64

Attached is the config from the Debian kernel.

Comment 10 Peter Rajnoha 2014-02-27 12:17:11 UTC
Thing here is that normally LVM2 relies on udev completely to create the dev nodes and symlinks (at least in recent versions). If I enable "verify_udev_operations = 1" in lvm.conf, it works BUT libdevmapper detects that udev has not created the dev nodes/symlinks:

I have no name!@(none):/etc/lvm# lvcreate -l1 VG
  /dev/mapper/VG-lvol2 not set up by udev: Falling back to direct node creation.
  The link /dev/VG/lvol2 should have been created by udev but it was not found. Falling back to direct link creation.
  Logical volume "lvol2" created

And udev has proper content in its database:

I have no name!@(none):/etc/lvm# udevadm info --name=/dev/VG/lvol2

S: VG/lvol2
S: disk/by-id/dm-name-VG-lvol2
S: disk/by-id/dm-uuid-LVM-j1NtwqABOIcvlDuIKchCI8hJcj0ciUBPMvxGpf7jrL6JW0oU4CKTtqZ8eVop8mTv
S: mapper/VG-lvol2

Which means the uevent must have been sent!

So the only problem here is that the udev processing is not synchronized with lvm tools and lvm tries to use the /dev/mapper content before its ready to use.
And it's not synchronized because the kernel function sending the uevent reported that the uevent was not sent properly (or some error occured) - libdevmapper automatically tries to avoid waiting infinitely for the event that may not come in this case.

So we need to find out why the kernel is reporting that there's been a problem in sending the event (iirc, it's the kobject_uevent/kobject_uevent_env function exactly which return value is checked).

Comment 11 Peter Rajnoha 2014-02-27 12:30:21 UTC
OK, something is setting the "-e" for the /proc/sys/kernel/hotplug (which is a runtime setting for CONFIG_UEVENT_HELPER_PATH). If I make it blank, it works then:

I have no name!@(none):/# cat /proc/sys/kernel/hotplug 
-e 
I have no name!@(none):/# echo "" > /proc/sys/kernel/hotplug 
I have no name!@(none):/# lvcreate -l1 VG
  Logical volume "lvol3" created

I guess it's some init script then? If yes, it must be removed so the /proc/sys/kernel/hotplug stays blank.

Comment 12 Peter Rajnoha 2014-02-27 12:35:49 UTC
Kay, can we finally deprecate this thing somehow? (I mean the CONFIG_UEVENT_HELPER_PATH and /proc/sys/kernel/hotplug)

Comment 13 Peter Rajnoha 2014-02-27 12:38:04 UTC
Would be probably fine if /proc/sys/kernel/hotplug is not exposed at all...

Comment 14 Richard W.M. Jones 2014-02-27 12:42:45 UTC
The reason this only affects Debian?  Debian has /bin/sh as dash not
bash, so when we mistakenly did:
  echo -e '\000\000\000\000' > /proc/sys/kernel/hotplug
on Debian this caused "-e" to be written to the file.

Comment 15 Richard W.M. Jones 2014-02-27 12:43:31 UTC
Upstream fix:

https://github.com/libguestfs/libguestfs/commit/7c8af234305f0fe2cb5a9042bd58fe3735e8cd73

Will appear in libguestfs >= 1.25.39.


Note You need to log in before you can comment on or make changes to this bug.