RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1153589 - virt-v2v will hang when converting esx guest before disk copy phase
Summary: virt-v2v will hang when converting esx guest before disk copy phase
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: libguestfs
Version: 7.1
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: rc
: ---
Assignee: Richard W.M. Jones
QA Contact: Virtualization Bugs
URL:
Whiteboard: V2V
Depends On: 1154778
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-10-16 09:23 UTC by tingting zheng
Modified: 2015-03-05 13:46 UTC (History)
10 users (show)

Fixed In Version: libguestfs-1.28.1-1.3.el7
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2015-03-05 13:46:25 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Log file of conversion of esx guest. (25.78 KB, text/plain)
2014-10-16 09:23 UTC, tingting zheng
no flags Details
Log with udev & vgchange debugging enabled (214.26 KB, text/plain)
2014-10-20 10:58 UTC, Richard W.M. Jones
no flags Details
New log file of esx guest hang. (25.52 KB, text/plain)
2014-10-21 11:51 UTC, tingting zheng
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2015:0303 0 normal SHIPPED_LIVE libguestfs bug fix and enhancement update 2015-03-05 17:34:44 UTC

Description tingting zheng 2014-10-16 09:23:21 UTC
Created attachment 947535 [details]
Log file of conversion of esx guest.

Description
virt-v2v will hang when converting esx guest before disk copy phase

Version:
virt-v2v-1.27.63-1.1.el7.x86_64
libvirt-1.2.8-5.el7.x86_64
qemu-kvm-rhev-2.1.2-1.rwmj1.el7.x86_64/qemu-kvm-rhev-2.1.2-3.el7.x86_64

How reproducible:
100%

Steps to Reproduce:
1.Use virt-v2v to convert a guest on esx server.
# virt-v2v -ic vpx://administrator.7.125/tzheng-test/10.66.71.84/?no_verify=1 esx-rhel6
[   0.0] Opening the source -i libvirt -ic vpx://administrator.7.125/tzheng-test/10.66.71.84/?no_verify=1 esx-rhel6
Enter administrator's password for 10.66.7.125: 
Enter host password for user 'administrator':
[ 156.0] Creating an overlay to protect the source from being modified
[ 173.0] Opening the overlay
[ 340.0] Initializing the target -o libvirt -os default
[ 340.0] Inspecting the overlay

2.Check the log file,it will hang at:
Starting /init script ...
[    1.012436] systemd-udevd[87]: starting version 208
[    1.101454] input: PC Speaker as /devices/platform/pcspkr/input/input1
[    1.111576] ACPI Exception: AE_BAD_PARAMETER, Thread 486923008 could not acquire Mutex [0x1] (20130517/utmutex-285)
[    1.126385] piix4_smbus 0000:00:01.3: SMBus Host Controller at 0x700, revision 0
[    1.323107] input: ImExPS/2 Generic Explorer Mouse as /devices/platform/i8042/serio1/input/input2
[   21.299759] Error: Driver 'pcspkr' is already registered, aborting...
[   51.744144] sd 2:0:0:0: [sda] abort
[/usr/lib/tmpfiles.d/systemd.conf:26] Failed to replace specifiers: /var/log/journal/%m
[/usr/lib/tmpfiles.d/systemd.conf:28] Failed to replace specifiers: /run/log/journal/%m
/init: line 73: /sys/block/hd*/queue/scheduler: No such file or directory
[   99.818037] sd 2:0:0:0: [sda] abort
/init: line 73: /sys/block/ubd*/queue/scheduler: No such file or directory
/init: line 73: /sys/block/vd*/queue/scheduler: No such file or directory
mdadm: No arrays found in config file or automatically
[  117.402528] device-mapper: uevent: version 1.0.3
[  117.404479] device-mapper: ioctl: 4.27.0-ioctl (2013-10-30) initialised: dm-devel
  lvmetad is not active yet, using direct activation during sysinit
  2 logical volume(s) in volume group "VolGroup" now active


Actual results:
virt-v2v hangs during conversion of esx guests before disk copy phase

Comment 2 Richard W.M. Jones 2014-10-20 10:58:24 UTC
Created attachment 948501 [details]
Log with udev & vgchange debugging enabled

I can reproduce this from my local machine against the 10.66.7.125
mentioned in the bug report.

The hang happens every time, when running the vgchange command inside
the appliance:

+ lvm vgchange -aay --sysinit
  lvmetad is not active yet, using direct activation during sysinit
  2 logical volume(s) in volume group "VolGroup" now active
[hangs here until you kill qemu]

I changed the init script to add lots of debugging to both
udevd and vgchange.  I will attach the full output.  The problem
appears to be something to do with timeouts?

Comment 3 Peter Rajnoha 2014-10-20 12:38:24 UTC
Seems to be hung on blkid call:

IMPORT builtin 'blkid' /usr/lib/udev/rules.d/13-dm-disk.rules:23

This is one of the last rules executed before the udev process times out and LVM does not receive any notification about completing udev job then (which would be executed as 95-dm-notify.rules) - hence the hung in vgchange.

We need to find out why blkid hangs here.

Comment 4 Richard W.M. Jones 2014-10-20 12:52:26 UTC
Thanks for looking into this Peter.  There is an additional
fact which may not be obvious from the bug description:

The backing disk is remote, and accessed over https, both of
which means we expect quite long delays between guest userspace
making a disk request and the guest kernel seeing a result.  And
by "long delays" I mean -- could be over the 60 second timeout
(in rare circumstances).

I'm going to try to bump up every timeout I can find and see
if that makes a difference.

Comment 5 Peter Rajnoha 2014-10-20 12:54:48 UTC
You may try to increase the timeout and see if it helps. It's defined in /lib/udev/rules.d/11-dm-lvm.rules and this line exactly:

  OPTIONS+="event_timeout=180"

The timeout is in seconds (by default, we set 3 minutes... if we didn't set it explicitly this way, it would be only 30 seconds by systemd defaults even!). So try this one - I think it may help.

Comment 6 Richard W.M. Jones 2014-10-20 14:32:37 UTC
I'm still experimenting here, but bumping the udev event
timeout up to 10 minutes has not visibly made any difference.

Next I'm going to modify qemu to see if there is any disk
I/O going on during the hang.

Comment 7 Richard W.M. Jones 2014-10-20 16:01:29 UTC
The problem is our fix for bug 1151033.  I was able to demonstrate
that the current bug is "fixed" by leaving the readahead window at
the default size (2 MB).

Increasing the readahead window (to 64 MB) which was the fix I
added for bug 1151033, has made the conversion stage fail.

What is needed is to start conversion with a small readahead
window (performance of conversion is not critical), and then
increase the readahead window to something very large once
we start copying.

The fix for this is unfortunately not trivial.

Comment 8 Richard W.M. Jones 2014-10-20 21:44:47 UTC
I have pushed this series of 10 [sic] patches upstream which
allow the readahead to be set small during conversion and large
during copy:

https://github.com/libguestfs/libguestfs/commit/0b49defc2b4307e1f9159b862637978129aaed29
https://github.com/libguestfs/libguestfs/commit/9ddfbad814e55553d9d1cea08134311c12923cfe
https://github.com/libguestfs/libguestfs/commit/9596fc44ff522f5f993a3c5ef9bb24a9a1b4a996
https://github.com/libguestfs/libguestfs/commit/3596165282ccf2c5896894ec4e9a71c6da788463
https://github.com/libguestfs/libguestfs/commit/63387fd8d0d77f7fdaaad14e5053b86ae51cbd6e
https://github.com/libguestfs/libguestfs/commit/0084736f5fe75f62f72f0014333b32ab753b1554
https://github.com/libguestfs/libguestfs/commit/9281dc7d44b7b02c6470a61425aa177e6525ee88
https://github.com/libguestfs/libguestfs/commit/a468fde01687914de501f0a95cd5a40986daec29
https://github.com/libguestfs/libguestfs/commit/b8f826b7ac1e7f90f670f474c3582b56063cdef6
https://github.com/libguestfs/libguestfs/commit/496d0c45bc5e8c361d2cccb20b0f3a64443b05ab

Tingting: Even this long series of patches probably will *not* fix this
bug for you, for a couple of reasons:

(1) It doesn't really have any effect unless we can get bug 1154778
fixed.  I can probably supply a patched systemd in the meantime.

(2) The real problem is the vCenter server on 10.66.7.125 is
just too slow.  Is there any possibility of getting a faster
server for this?  Or better network connectivity?

However I have tested it successfully on my own server.

Comment 10 Richard W.M. Jones 2014-10-21 07:26:14 UTC
(In reply to Richard W.M. Jones from comment #8)
> (2) The real problem is the vCenter server on 10.66.7.125 is
> just too slow.  Is there any possibility of getting a faster
> server for this?  Or better network connectivity?

BTW by "slow" I mean network latency too.  If you can test from
another machine in the lab which is connected by as few hops
as possible, it may work better.

Comment 11 Richard W.M. Jones 2014-10-21 07:35:23 UTC
(In reply to Richard W.M. Jones from comment #8)
> (1) It doesn't really have any effect unless we can get bug 1154778
> fixed.  I can probably supply a patched systemd in the meantime.

Scratch build of systemd with event-timeout fix:

http://brewweb.devel.redhat.com/brew/taskinfo?taskID=8139893

Comment 12 tingting zheng 2014-10-21 11:51:29 UTC
Created attachment 948903 [details]
New log file of esx guest hang.

Tested with:
libguestfs-1.28.1-1.2.el7.x86_64
virt-v2v-1.28.1-1.2.el7.x86_64

It still hangs and attached the new log file.

Comment 13 Richard W.M. Jones 2014-10-21 22:10:50 UTC
The patches in comment 8 break virt-p2v conversions, so setting back
to ASSIGNED.

Comment 16 zhoujunqin 2014-10-28 09:00:01 UTC
Try to verify with package:
virt-v2v-1.28.1-1.5.el7.x86_64
libguestfs-1.28.1-1.5.el7.x86_64

steps:
Env: esx5.5 with vcenter5.5
# virt-v2v -ic vpx://root.111.25/tzheng-demo/10.66.106.63/?no_verify=1 test-rhel6 
[   0.0] Opening the source -i libvirt -ic vpx://root.111.25/tzheng-demo/10.66.106.63/?no_verify=1 test-rhel6
Enter root's password for 10.66.111.25: 
Enter host password for user 'root':
[  33.0] Creating an overlay to protect the source from being modified
[  34.0] Opening the overlay
[  42.0] Initializing the target -o libvirt -os default
[  42.0] Inspecting the overlay
[  90.0] Checking for sufficient free disk space in the guest
[  90.0] Estimating space required on target for each disk
[  90.0] Converting Red Hat Enterprise Linux Server release 6.4 (Santiago) to run on KVM
virt-v2v: This guest has virtio drivers installed.
[ 432.0] Mapping filesystem data to avoid copying unused and blank areas
[ 433.0] Closing the overlay
[ 433.0] Copying disk 1/1 to /var/lib/libvirt/images/test-rhel6-sda (raw)
    (100.00/100%)
[ 558.0] Creating output metadata
Pool default refreshed

Domain test-rhel6 defined from /tmp/v2vlibvirt8cf4b5.xml

[ 559.0] Finishing off

Result: Conversion finished successfully and speed is ok, guest can boot up successfully.
So move this bug from ON_QA to VERIFIED.

Comment 18 errata-xmlrpc 2015-03-05 13:46:25 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2015-0303.html


Note You need to log in before you can comment on or make changes to this bug.