Bug 1465849

Summary: v2v hangs on removing vmware-tool
Product: Red Hat Enterprise Linux 7 Reporter: Jaroslav Spanko <jspanko>
Component: libguestfsAssignee: Richard W.M. Jones <rjones>
Status: CLOSED ERRATA QA Contact: Virtualization Bugs <virt-bugs>
Severity: high Docs Contact:
Priority: high    
Version: 7.4CC: cww, jcoscia, jspanko, juzhou, mtessun, mxie, mzhan, ptoscano, tzheng
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: V2V
Fixed In Version: libguestfs-1.40.1-1.el7 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-08-06 12:44:11 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1477905, 1621895    
Bug Blocks: 1420851    
Attachments:
Description Flags
uninstall script
none
hang for 20 minutes
none
Seg fault in one case
none
RHEL hang none

Description Jaroslav Spanko 2017-06-28 11:05:09 UTC
Description of problem:
During the import from Vmware(Suse,RHEL VMs) the import hangs on the 
"/usr/bin/vmware-uninstall-tools.pl"
it stay 10-30 minutes on this tasks, after that once returned Segmentation fault, second time we have to kill v2v.
We tried 3 different VM and it hangs on the same

Version-Release number of selected component (if applicable):
7.4 beta
libguestfs-1.36.3-5.el7.suse_1.x86_64.rpm
libguestfs-1.36.3-5.el7.x86_64

How reproducible:
100 %

Steps to Reproduce:
1. run virt-v2v
2. hang on "/usr/bin/vmware-uninstall-tools.pl"

Actual results:
One Segmentation fault, one kill, one hang for long time

Expected results:
It is possible to handle this better ? We never go to the error
warning (f_"VMware tools was detected, but uninstallation failed.  The error message was: %s (ignored)")

Additional info:
PoC for Customer, it failed on Suse and RHEL VM's
I will try generate dumps as soon as will be possible
strace
strace: Process 16062 attached
restart_syscall(<... resuming interrupted poll ...>
Did you see something similar ?

Comment 2 Pino Toscano 2017-06-28 11:15:30 UTC
Can you please run virt-v2v with the -v -x parameters added, and provide the full log? I.e. something like:

  $ virt-v2v -v -x [all the other parameters already used] 2>&1 | tee v2v.log

Comment 3 Jaroslav Spanko 2017-06-28 11:24:51 UTC
uploaded
thanks

Comment 4 Jaroslav Spanko 2017-06-28 11:25:18 UTC
Created attachment 1292613 [details]
v2v hangs log

Comment 5 Pino Toscano 2017-06-28 12:06:34 UTC
Ehm you forgot -x too...

Comment 6 Richard W.M. Jones 2017-06-29 09:17:25 UTC
Also I don't think we have the Perl script.  Could you download it
and attach it to the bug:

virt-copy-out -a guest /usr/bin/vmware-uninstall-tools.pl /tmp
& upload /tmp/vmware-uninstall-tools.pl

See also comment 5.

Comment 8 Jaroslav Spanko 2017-06-29 13:18:35 UTC
I am not able to provide the /usr/bin/vmware-uninstall-tools.pl now, I will try to reproduce this as we faced this during remote session with cu every time.
Thanks

Comment 9 Jaroslav Spanko 2017-07-17 12:39:40 UTC
Attached the pl script, I am trying to reproduce it but I'm not sure as till now with no success.
Thanks !

Comment 10 Jaroslav Spanko 2017-07-17 12:40:25 UTC
Created attachment 1299845 [details]
uninstall script

Comment 11 Richard W.M. Jones 2017-07-17 14:33:54 UTC
I tried to reproduce this using the following steps:

(1) Download VMwaretools-10.0.6-3560309.zip (or any other version) from
VMware's site.

(2) Unpack the ZIP file, find linux.iso, open it [eg using guestfish]
and extract VMwareTools-10.0.6-3560309.tar.gz (exact version number
may be different).

(3) Install a new RHEL 7.3 guest with VMware tools enabled:

$ virt-builder rhel-7.3 \
    --install bash,perl,net-tools \
    --copy-in /var/tmp/VMwareTools-10.0.6-3560309.tar.gz:/tmp \
    --run-command 'cd /tmp && tar zxf /tmp/VMwareTools-10.0.6-3560309.tar.gz' \
    --run-command 'cd /tmp/vmware-tools-distrib &&
                   ./vmware-install.pl -d default --force-install'

(4) Perform a conversion:

$ virt-v2v -i disk rhel-7.3.img -o null

However I could not reproduce the hang (which is possibly not surprising
because VMware tools isn't "really" installed here).

Comment 12 Richard W.M. Jones 2017-07-17 14:41:24 UTC
I had a closer look at the log provided and (1) it's a SUSE guest and
(2) the guest is hanging when rebuilding the kdump initrd.

I've heard that one before ...

I wonder if we could set rootdev as we do in this patch:

https://www.redhat.com/archives/libguestfs/2017-June/msg00000.html

A speculative patch would look like this:

diff --git a/v2v/convert_linux.ml b/v2v/convert_linux.ml
index c34bf3e91..5d650561f 100644
--- a/v2v/convert_linux.ml
+++ b/v2v/convert_linux.ml
@@ -304,7 +304,11 @@ let rec convert (g : G.guestfs) inspect source output rcaps =
     let uninstaller = "/usr/bin/vmware-uninstall-tools.pl" in
     if g#is_file ~followsymlinks:true uninstaller then (
       try
-        ignore (g#command [| uninstaller |]);
+        if family = `SUSE_family then
+          ignore (g#sh (sprintf "/usr/bin/env rootdev=%s %s"
+                                inspect.i_root uninstaller))
+        else
+          ignore (g#command [| uninstaller |]);
 
         (* Reload Augeas to detect changes made by vbox tools uninst. *)
         Linux.augeas_reload g


Pino, do we have any SUSE Enterprise templates we can use?

Comment 13 Pino Toscano 2017-07-17 16:52:13 UTC
(In reply to Richard W.M. Jones from comment #12)
> I had a closer look at the log provided and (1) it's a SUSE guest and
> (2) the guest is hanging when rebuilding the kdump initrd.
> 
> I've heard that one before ...
> 
> I wonder if we could set rootdev as we do in this patch:
> 
> https://www.redhat.com/archives/libguestfs/2017-June/msg00000.html
> 
> A speculative patch would look like this:
> 
> diff --git a/v2v/convert_linux.ml b/v2v/convert_linux.ml
> index c34bf3e91..5d650561f 100644
> --- a/v2v/convert_linux.ml
> +++ b/v2v/convert_linux.ml
> @@ -304,7 +304,11 @@ let rec convert (g : G.guestfs) inspect source output
> rcaps =
>      let uninstaller = "/usr/bin/vmware-uninstall-tools.pl" in
>      if g#is_file ~followsymlinks:true uninstaller then (
>        try
> -        ignore (g#command [| uninstaller |]);
> +        if family = `SUSE_family then
> +          ignore (g#sh (sprintf "/usr/bin/env rootdev=%s %s"
> +                                inspect.i_root uninstaller))

Better use g#command here too, as done when mkinitrd is run:

         ignore (g#command [| "/usr/bin/env";
                              "rootdev=" ^ inspect.i_root;
                              uninstaller |]);

this way quoting issues are avoided.

Also, theoretically this could be done regardless of the distro, I don't think on other distros the rootdev environment variable should do much. (Although it can be avoided, to be safe.)

> Pino, do we have any SUSE Enterprise templates we can use?

Nope, neither on VMware.

Also, I recall being told the hang happened also on other guests than SUSE, i.e. RHEL, but I cannot find references now.  Jaroslav, do you remember more?

Comment 14 Jaroslav Spanko 2017-07-18 08:06:29 UTC
> Also, I recall being told the hang happened also on other guests than SUSE,
> i.e. RHEL, but I cannot find references now.  Jaroslav, do you remember more?

Hi Pino,Rich
Yes it happened for SLES 11 and RHEL 6.x. I attached screenshot from the remote sessions, unfortunately is all what i have for now.
There are 3 pictures where the conversion hanged for ~20-30 minutes, I am not sure if it was related to CU environment but was no related to one distro.
 
Thanks a lot !

Comment 15 Jaroslav Spanko 2017-07-18 08:07:31 UTC
Created attachment 1300309 [details]
hang for 20 minutes

Comment 16 Jaroslav Spanko 2017-07-18 08:08:08 UTC
Created attachment 1300310 [details]
Seg fault in one case

Comment 17 Jaroslav Spanko 2017-07-18 08:08:31 UTC
Created attachment 1300311 [details]
RHEL hang

Comment 18 Richard W.M. Jones 2017-07-18 09:09:43 UTC
(In reply to Jaroslav Spanko from comment #16)
> Created attachment 1300310 [details]
> Seg fault in one case

VMware is running a tool which communicates with its own hypervisor.
However there's no VMware hypervisor because everything is run under
qemu, so instead of doing the right thing it crashes.  I actually
talked to VMware about this a while back and they fixed it.

(In reply to Jaroslav Spanko from comment #17)
> Created attachment 1300311 [details]
> RHEL hang

The RHEL hang is distinctly different from the SUSE hang, although
with less information available.

Unfortunately I wasn't able to reproduce the RHEL hang using a RHEL
6.8 guest and the same steps as in comment 11.

Comment 19 Richard W.M. Jones 2017-07-18 11:47:26 UTC
I spend a couple of days on this and read a lot of documentation, but
I still cannot work out how to enable kdump on OpenSUSE.  I think a
better approach is that I prepare a package containing the proposed
patch and the reporter can see if it makes any difference.

Comment 20 Pino Toscano 2017-07-18 12:45:54 UTC
I just wonder whether the hang, and the crash are two different results of the same issue, i.e. the vmware-uninstall-tools.pl script trying to communicate with VMX, and failing.

Comment 21 Richard W.M. Jones 2017-07-18 12:52:50 UTC
(In reply to Pino Toscano from comment #20)
> I just wonder whether the hang, and the crash are two different results of
> the same issue, i.e. the vmware-uninstall-tools.pl script trying to
> communicate with VMX, and failing.

Likely there are several different things going on, the only
common theme being that they are all caused by running
vmware-uninstall-tools.pl.

#1 The log posted in comment 4 looks almost certainly as if
the SUSE mkdumprd script is hanging (see my full analysis in
comment 12).  This is what the proposed patch may fix.

#2 There is also a hang in a RHEL 6 guest, but that cannot be the
same thing as above.

#3 There is also a segfault in a VMware tool, which is caused by
the VMware port not being available.

It could be as you say that #2 and #3 are different aspects of
the same thing, or not.  I'm unable to reproduce any of them.

Comment 22 Richard W.M. Jones 2017-07-18 13:12:56 UTC
As a bit of clarification, it seems like #1 is a problem, but would
not cause a hang.

Comment 25 Pino Toscano 2017-08-14 09:25:22 UTC
Note that in the short term (i.e. for the whole 7.4.z) we will disable the execution of vmware-uninstall-tools.pl, since it turning out to be problematic -- see also bug #1480623.

Regarding 7.5, so far it will be disabled too (bug #1477905), unless we find out what are the exact issues the uninstallation script is hitting, and we fix them somehow.

Comment 26 Pino Toscano 2017-08-22 06:56:38 UTC
Self-note: fixing this for RHEL 7.5 will make bug 1477905 and bug 1481930 obsolete.

Comment 27 Pino Toscano 2018-12-13 15:28:28 UTC
A couple of months ago I sent this upstream series:
https://www.redhat.com/archives/libguestfs/2018-October/msg00044.html
In particular, patch #2 deals with the uninstallation of VMware tools from tarball:
https://www.redhat.com/archives/libguestfs/2018-October/msg00046.html
the approach chosen works around the behaviour of the vmware-uninstall-tools.pl script, making sure that it does not do more work than needed, and thus it should not take long anymore.

This was implemented as commit 04a157d0f8529e8bf6a5bc0ac4ee146aaedc3c15, which is included in libguestfs >= v1.39.12.

Comment 28 Pino Toscano 2019-01-17 11:54:50 UTC
This bug will be fixed by the rebase scheduled for RHEL 7.7, see bug 1621895.

Comment 30 zhoujunqin 2019-04-29 10:17:31 UTC
Try to verify this bug with new build:
libvirt-4.5.0-15.el7.x86_64
libguestfs-1.40.2-4.el7.x86_64
virt-v2v-1.40.2-4.el7.x86_64
qemu-kvm-rhev-2.12.0-27.el7.x86_64
python-ovirt-engine-sdk4-4.3.1-1.el7ev.x86_64
rhv-guest-tools-iso-4.3-6.el7ev.noarch
kernel: 3.10.0-1040.el7.x86_64

rhv:4.3.0.1-0.1.el7

Steps:
Scenario-1 Testing with SLES guests with VMware Tools installation type is "Open-source VMware Tools"
1.1 Prepare a SLES guests with VMware Tools installation type is "Open-source VMware Tools"
1.2 Login in vm and find that there is vmtoolsd process running in vm.

# ps -ef |grep vmtoolsd 
root      848     1  0 05:33 ?        00:00:00 /usr/sbin/vmtoolsd
root      2084     1  0 05:35 ?        00:00:00 /usr/lib/vmware-tools/sbin64/vmtoolsd -n vmusr
1.3 Shutdown vm and convert vm from vmware to rhv by virt-v2v.
#  virt-v2v -ic vpx://root.73.141/data/10.73.75.219/?no_verify=1 -it vddk -io vddk-libdir=/home/vmware-vix-disklib-distrib -io  vddk-thumbprint=1F:97:34:5F:B6:C2:BA:66:46:CB:1A:71:76:7D:6B:50:1E:03:00:EA esx6.7-sles15-x86_64-GUI  --password-file /tmp/passwd -o rhv-upload -oc https://ibm-x3250m5-03.rhts.eng.pek2.redhat.com/ovirt-engine/api -os nfs_data -op /tmp/rhvpasswd -oo rhv-cafile=/home/ca.pem  -oo rhv-direct=true -oo rhv-cluster=nfs -of raw  -b ovirtmgmt
Exception AttributeError: "'module' object has no attribute 'dump_plugin'" in <module 'threading' from '/usr/lib64/python2.7/threading.pyc'> ignored
[   0.2] Opening the source -i libvirt -ic vpx://root.73.141/data/10.73.75.219/?no_verify=1 esx6.7-sles15-x86_64-GUI -it vddk  -io vddk-libdir=/home/vmware-vix-disklib-distrib -io vddk-thumbprint=1F:97:34:5F:B6:C2:BA:66:46:CB:1A:71:76:7D:6B:50:1E:03:00:EA
[   1.9] Creating an overlay to protect the source from being modified
[   5.4] Opening the overlay
[  11.7] Inspecting the overlay
[  29.5] Checking for sufficient free disk space in the guest
[  29.5] Estimating space required on target for each disk
[  29.5] Converting SUSE Linux Enterprise Server 15 to run on KVM
virt-v2v: QEMU Guest Agent installed for this guest.
virt-v2v: This guest has virtio drivers installed.
[  72.2] Mapping filesystem data to avoid copying unused and blank areas
virt-v2v: warning: fstrim on guest filesystem /dev/sda1 failed.  Usually 
you can ignore this message.  To find out more read "Trimming" in 
virt-v2v(1).

Original message: fstrim: fstrim: /sysroot/: the discard operation is not 
supported
[  73.9] Closing the overlay
[  74.2] Assigning disks to buses
[  74.2] Checking if the guest needs BIOS or UEFI to boot
virt-v2v: This guest requires UEFI on the target to boot.
[  74.2] Initializing the target -o rhv-upload -oc https://ibm-x3250m5-03.rhts.eng.pek2.redhat.com/ovirt-engine/api -op /tmp/rhvpasswd -os nfs_data
[  75.5] Copying disk 1/1 to qemu URI json:{ "file.driver": "nbd", "file.path": "/var/tmp/rhvupload.C8JY1A/nbdkit0.sock", "file.export": "/" } (raw)
    (100.00/100%)
[ 835.9] Creating output metadata
[ 857.3] Finishing off

1.4 After finishing v2v conversion, power on guest at rhv, login guest and check vmware-tools service status

# ll /usr/sbin/vmtoolsd
ls: cannot access /usr/sbin/vmtoolsd: No such file or directory

Result: virt-v2v doesn't hang while removing VMware tools during conversion, and VMware tools can be removed successfully.


Scenario-2 Testing with RHEL6_10 guest installed VMware tools from tarball

2.1 Login in vm and find that there is vmtoolsd process running in vm.

# ps -ef |grep vmtoolsd 
root      1666     1  0 05:33 ?        00:00:00 /usr/sbin/vmtoolsd
root      3002     1  0 05:35 ?        00:00:00 /usr/lib/vmware-tools/sbin64/vmtoolsd -n vmusr


2.2 Shutdown vm and convert vm from vmware to rhv by virt-v2v.

# virt-v2v -ic vpx://root.73.141/data/10.73.75.219/?no_verify=1 -it vddk -io vddk-libdir=/home/vmware-vix-disklib-distrib -io  vddk-thumbprint=1F:97:34:5F:B6:C2:BA:66:46:CB:1A:71:76:7D:6B:50:1E:03:00:EA esx6.7-rhel6.10-x86_64-vmware-tools --password-file /tmp/passwd -o rhv-upload -oc https://ibm-x3250m5-03.rhts.eng.pek2.redhat.com/ovirt-engine/api -os nfs_data -op /tmp/rhvpasswd -oo rhv-cafile=/home/ca.pem  -oo rhv-direct=true -oo rhv-cluster=nfs -of raw  -b ovirtmgmt
Exception AttributeError: "'module' object has no attribute 'dump_plugin'" in <module 'threading' from '/usr/lib64/python2.7/threading.pyc'> ignored
[   0.2] Opening the source -i libvirt -ic vpx://root.73.141/data/10.73.75.219/?no_verify=1 esx6.7-rhel6.10-x86_64-vmware-tools -it vddk  -io vddk-libdir=/home/vmware-vix-disklib-distrib -io vddk-thumbprint=1F:97:34:5F:B6:C2:BA:66:46:CB:1A:71:76:7D:6B:50:1E:03:00:EA
[   2.0] Creating an overlay to protect the source from being modified
[   3.7] Opening the overlay
[  34.5] Inspecting the overlay
[  47.7] Checking for sufficient free disk space in the guest
[  47.7] Estimating space required on target for each disk
[  47.7] Converting Red Hat Enterprise Linux Server release 6.10 (Santiago) to run on KVM
virt-v2v: QEMU Guest Agent installed for this guest.
virt-v2v: This guest has virtio drivers installed.
[ 142.4] Mapping filesystem data to avoid copying unused and blank areas
[ 142.8] Closing the overlay
[ 143.0] Assigning disks to buses
[ 143.0] Checking if the guest needs BIOS or UEFI to boot
[ 143.0] Initializing the target -o rhv-upload -oc https://ibm-x3250m5-03.rhts.eng.pek2.redhat.com/ovirt-engine/api -op /tmp/rhvpasswd -os nfs_data
[ 145.1] Copying disk 1/1 to qemu URI json:{ "file.driver": "nbd", "file.path": "/var/tmp/rhvupload.mIB9Ne/nbdkit0.sock", "file.export": "/" } (raw)
    (100.00/100%)
[ 790.0] Creating output metadata
[ 812.4] Finishing off

2.3 After finishing v2v conversion, power on guest at rhv, login guest and check vmware-tools service status

# ll /usr/sbin/vmtoolsd
ls: cannot access /usr/sbin/vmtoolsd: No such file or directory

Result: virt-v2v doesn't hang while removing VMware tools during conversion, and VMware tools can be removed successfully.

I also test with many other rhel and sles vms,  virt-v2v doesn't hang while removing VMware tools during conversion, so I move this bug from ON_QA to VERIFIED.

Comment 32 errata-xmlrpc 2019-08-06 12:44:11 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2019:2096