Bug 1953286

Summary:	No error shows when using virt-v2v -o rhv to convert guest to data domain
Product:	Red Hat Enterprise Linux 9	Reporter:	xinyli
Component:	virt-v2v	Assignee:	Laszlo Ersek <lersek>
Status:	CLOSED WONTFIX	QA Contact:	Vera <vwu>
Severity:	low	Docs Contact:
Priority:	low
Version:	9.0	CC:	juzhou, kkiwi, lersek, mxie, rjones, tyan, tzheng, virt-bugs, vwu, xiaodwan
Target Milestone:	rc	Keywords:	Triaged
Target Release:	9.0 Beta
Hardware:	x86_64
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2022-10-25 10:09:17 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description xinyli 2021-04-25 08:08:50 UTC

Description of problem:

When using virt-v2v -o rhv to convert guest to RHV4.4 data domain, no error shows and the guest can be converted successfully. After conversion, no guest can be found from RHV data domain.
Version-Release number of selected component (if applicable):
virt-v2v-1.43.3-2.el9.x86_64

libvirt-7.0.0-4.el9.x86_64

qemu-kvm-5.2.0-11.el9.x86_64

RHV4.4

How reproducible:
100%

Steps to Reproduce:
1.# virt-v2v -ic vpx://root.198.169/data/10.73.199.217/?no_verify=1 -it vddk -io vddk-libdir=/home/vddk7.0 -io  vddk-thumbprint=B5:52:1F:B4:21:09:45:24:51:32:56:F6:63:6A:93:5D:54:08:2D:78  -o rhv -of qcow2 -ip /home/passwd  -os 10.73.224.195:/home/nfs_data -b ovirtmgmt  esx7.0-rhel7.8-x86_64

[   0.0] Opening the source -i libvirt -ic vpx://root.198.169/data/10.73.199.217/?no_verify=1 esx7.0-rhel7.8-x86_64 -it vddk  -io vddk-libdir=/home/vddk7.0 -io vddk-thumbprint=B5:52:1F:B4:21:09:45:24:51:32:56:F6:63:6A:93:5D:54:08:2D:78

[   1.8] Creating an overlay to protect the source from being modified

[   2.7] Opening the overlay

[   8.0] Inspecting the overlay

[  30.2] Checking for sufficient free disk space in the guest

[  30.2] Estimating space required on target for each disk

[  30.2] Converting Red Hat Enterprise Linux Server 7.8 (Maipo) to run on KVM

virt-v2v: This guest has virtio drivers installed.

[ 169.3] Mapping filesystem data to avoid copying unused and blank areas

[ 170.3] Closing the overlay

[ 170.6] Assigning disks to buses

[ 170.6] Checking if the guest needs BIOS or UEFI to boot

[ 170.6] Initializing the target -o rhv -os 10.73.224.195:/home/nfs_data

[ 170.8] Copying disk 1/1 to /tmp/v2v.Q7Pa88/13dd2359-da6f-4718-b6dc-c891c6689469/images/41c5ef8a-0909-4121-a352-f9cd3e0d6b3c/5d419d76-5fea-4428-8fb5-cc6de8f9ffbe (qcow2)

    (100.00/100%)

[ 483.2] Creating output metadata

[ 483.3] Finishing off

2. After conversion, no guest can be found from RHV data domain.

Actual results:
The conversion can be completed without any errors.

Expected results:
error shows and conversion failed.

Additional info:
This command can normally prompt errors when converted to RHV4.3:

# virt-v2v -ic vpx://root.198.169/data/10.73.199.217/?no_verify=1 -it vddk -io vddk-libdir=/home/vddk7.0 -io  vddk-thumbprint=B5:52:1F:B4:21:09:45:24:51:32:56:F6:63:6A:93:5D:54:08:2D:78  -o rhv -of qcow2 -ip /home/passwd  -os 10.66.144.40:/home/nfs_data -b ovirtmgmt  esx7.0-rhel7.8-x86_64

mount.nfs: Connection timed out

virt-v2v: error: mount command failed, see earlier errors.


This probably means you didn't specify the right Export Storage Domain path 

[-os 10.66.144.40:/home/nfs_data], or else you need to rerun virt-v2v as root.

Comment 2 Laszlo Ersek 2022-05-26 10:12:46 UTC

Two comments here:

(1) In comment#0, the RHV 4.4 vs. RHV 4.3 output comparison is bogus. The RHV 4.4 output is configured with a correct (reachable) "-os" option, while the RHV 4.3 output is configured with an unreachable "-os" option. (The IP addresses differ!)

That's an apples-to-oranges comparison; most likely, if the wrong "-os" had been passed in the RHV 4.4 output case, virt-v2v would have emitted an error message just the same. So, the "RHV 4.3" section of the report should be summarily ignored.

(2) We've had a very similar bug: bug 2027598.

It's not an *obvious* duplicate by any means; under bug 2027598, we ultimately fixed a regression from upstream v2v commit 255722cbf39a ("v2v: Modular virt-v2v", 2021-09-07), which was first released in v1.45.90. Conversely, the present report is for virt-v2v-1.43.3-2.el9.x86_64 -- so it could be something different.

Either way, we cannot diagnose the current problem without reproducing the issue, and accessing "engine.log" etc. on the RHV server -- and ever since fixing bug 2027598, we've not seen reports of "guest just uploaded to storage domain is not visible".

So my take is that, whatever reason, the present symptom will not (should not) reproduce. Can someone from QE try to reproduce this? Thanks.

Comment 6 Laszlo Ersek 2022-07-07 11:08:27 UTC

Rich highlighted the expression "data domain" in the BZ title (also in comment#0).

Conversion directly to a data domain is not supported at all; I don't even understand where this comes from -- a "just in case" attempt, or exploring error behavior? The documentation at <https://libguestfs.org/virt-v2v-output-rhv.1.html> clearly says "Export Storage Domain", so writing to a Data Domain directory instead is a clear violation of the docs. What I don't understand is if that's an honest mistake by the reporter, or a genuine attempt to do something that's outside of the documentation. In the latter case, this ticket looks very much like NOTABUG to me. (If you intentionally violate the docs, what do you expect?)

Comment 7 Vera 2022-07-07 13:35:18 UTC

(In reply to Laszlo Ersek from comment #6)
> Rich highlighted the expression "data domain" in the BZ title (also in
> comment#0).
> 
> Conversion directly to a data domain is not supported at all; I don't even
> understand where this comes from -- a "just in case" attempt, or exploring
> error behavior? The documentation at
> <https://libguestfs.org/virt-v2v-output-rhv.1.html> clearly says "Export
> Storage Domain", so writing to a Data Domain directory instead is a clear
> violation of the docs. What I don't understand is if that's an honest
> mistake by the reporter, or a genuine attempt to do something that's outside
> of the documentation. In the latter case, this ticket looks very much like
> NOTABUG to me. (If you intentionally violate the docs, what do you expect?)

Correct. The export destination to "/home/nfs_data" is not supported.

Retesting to "nfs_export" is passed.

with the versions:
libvirt-8.5.0-1.el9.x86_64
guestfs-tools-1.48.2-4.el9.x86_64
qemu-img-7.0.0-8.el9.x86_64
libguestfs-1.48.4-1.el9.x86_64
virt-v2v-2.0.7-1.el9.x86_64
nbdkit-1.30.6-2.el9.x86_64

rhv 4.4.10.6-0.1.el8ev


Steps:
1.Convert a guest from VMware to rhv4.4 via -o rhv by v2v
# virt-v2v -ic vpx://root.73.141/data/10.73.75.219/?no_verify=1 -it vddk -io vddk-libdir=/root/vddk_libdir/latest -io vddk-thumbprint=1F:97:34:5F:B6:C2:BA:66:46:CB:1A:71:76:7D:6B:50:1E:03:00:EA  -ip /v2v-ops/esxpw   -o rhv -os 10.73.195.48:/home/nfs_export -b ovirtmgmt  esx6.7-rhel8.6-x86_64
[   0.2] Setting up the source: -i libvirt -ic vpx://root.73.141/data/10.73.75.219/?no_verify=1 -it vddk esx6.7-rhel8.6-x86_64
[   2.1] Opening the source
[   7.6] Inspecting the source
[  27.0] Checking for sufficient free disk space in the guest
[  27.0] Converting Red Hat Enterprise Linux 8.6 (Ootpa) to run on KVM
virt-v2v: This guest has virtio drivers installed.
[ 176.6] Mapping filesystem data to avoid copying unused and blank areas
[ 177.6] Closing the overlay
[ 177.9] Assigning disks to buses
[ 177.9] Checking if the guest needs BIOS or UEFI to boot
[ 177.9] Setting up the destination: -o rhv
[ 180.2] Copying disk 1/1
█ 100% [****************************************]
[1492.2] Creating output metadata
[1495.7] Finishing off

2. Check if the guest is in "Storage--> Storage Domains--->nfs_export --> VM Import" on rhv4.4 after conversion

3. Import the guest and check the stauts on rhv4.4

Comment 8 Vera 2022-07-07 13:43:54 UTC

And also tried "to data domain". 

The process shows conversion is successfully entirely, but in the end it fails to umount the NFS.

Laszlo, do you think it can make more clearly on the message?

# virt-v2v -ic vpx://root.227.27/data/10.73.199.217/?no_verify=1 -it vddk -io vddk-libdir=/home/vddk7.0.3 -io  vddk-thumbprint=76:75:59:0E:32:F5:1E:58:69:93:75:5A:7B:51:32:C5:D1:6D:F1:21  -o rhv -of qcow2 -ip /v2v-ops/esxpw  -os 10.73.224.195:/home/nfs_data -b ovirtmgmt  esx7.0-rhel8.4-x86_64
[   0.9] Setting up the source: -i libvirt -ic vpx://root.227.27/data/10.73.199.217/?no_verify=1 -it vddk esx7.0-rhel8.4-x86_64
[   2.9] Opening the source
[  18.0] Inspecting the source
[  24.3] Checking for sufficient free disk space in the guest
[  24.3] Converting Red Hat Enterprise Linux 8.4 (Ootpa) to run on KVM
virt-v2v: The QEMU Guest Agent will be installed for this guest at first 
boot.
virt-v2v: This guest has virtio drivers installed.
[  80.2] Mapping filesystem data to avoid copying unused and blank areas
[  81.4] Closing the overlay
[  81.7] Assigning disks to buses
[  81.7] Checking if the guest needs BIOS or UEFI to boot
[  81.7] Setting up the destination: -o rhv
[  84.1] Copying disk 1/1
█ 100% [****************************************]
[ 306.3] Creating output metadata
[ 306.4] Finishing off
umount.nfs4: /tmp/v2v.6sOqT4: device is busy

Comment 9 Richard W.M. Jones 2022-07-07 14:00:50 UTC

The unmount failure must surely be incidental.  We are actually using
plain umount, but I think it should be safe (as long as we sync first??)
to call umount -l (lazy unmount) instead.

https://github.com/libguestfs/virt-v2v/blob/4368b94ee1724c16aa35c0ee42ce4c51ce037b5a/output/output_rhv.ml#L210

Comment 11 Laszlo Ersek 2022-07-08 05:37:38 UTC

(In reply to Vera from comment #8)
> The process shows conversion is successfully entirely, but in the end
> it fails to umount the NFS.
>
> Laszlo, do you think it can make more clearly on the message?
>
> [ 306.4] Finishing off
> umount.nfs4: /tmp/v2v.6sOqT4: device is busy

(In reply to Richard W.M. Jones from comment #9)
> The unmount failure must surely be incidental.  We are actually using
> plain umount, but I think it should be safe (as long as we sync
> first??) to call umount -l (lazy unmount) instead.

The umount failure does not look good. I'm quite opposed to lazy
umounting; if we fail to umount, then something is arguably wrong, and
laziness only masks it. That didn't work well for dnf / livecd-creator
either, did it. At the end of conversion, especially at the end of a
successful conversion, nothing should be holding the temporary directory
open.

Vera, did you "cd /tmp/v2v.6sOqT4" from a different terminal during
conversion?

If so, then please don't bother opening a new ticket, and I wouldn't say
that the error message should be improved either.

If you didn't create the separate reference(s) to "/tmp/v2v.6sOqT4",
then we should find out what did, and change *that*.

How reproducible is this symptom? Unless it is reproducible (and I'm
seeing it for the first time now), we can't really do anything about it
-- unclear what to change, and unclear how to test a potential fix.

Comment 12 Vera 2022-07-08 07:47:49 UTC

(In reply to Laszlo Ersek from comment #11)
> (In reply to Vera from comment #8)
> > The process shows conversion is successfully entirely, but in the end
> > it fails to umount the NFS.
> >
> > Laszlo, do you think it can make more clearly on the message?
> >
> > [ 306.4] Finishing off
> > umount.nfs4: /tmp/v2v.6sOqT4: device is busy
> 
> (In reply to Richard W.M. Jones from comment #9)
> > The unmount failure must surely be incidental.  We are actually using
> > plain umount, but I think it should be safe (as long as we sync
> > first??) to call umount -l (lazy unmount) instead.
> 
> The umount failure does not look good. I'm quite opposed to lazy
> umounting; if we fail to umount, then something is arguably wrong, and
> laziness only masks it. That didn't work well for dnf / livecd-creator
> either, did it. At the end of conversion, especially at the end of a
> successful conversion, nothing should be holding the temporary directory
> open.
> 
> Vera, did you "cd /tmp/v2v.6sOqT4" from a different terminal during
> conversion?
> 
> If so, then please don't bother opening a new ticket, and I wouldn't say
> that the error message should be improved either.
> 
> If you didn't create the separate reference(s) to "/tmp/v2v.6sOqT4",
> then we should find out what did, and change *that*.
> 
> How reproducible is this symptom? Unless it is reproducible (and I'm
> seeing it for the first time now), we can't really do anything about it
> -- unclear what to change, and unclear how to test a potential fix.

Laszlo,  

I didn't "cd /tmp/v2v.6sOqT4" from a different terminal during conversion.

I checked the mountpoints to different data domain.

# df -h |grep nfs
10.73.224.195:/home/nfs_data            900G  247G  654G  28% /tmp/v2v.6sOqT4
10.73.195.48:/home/nfs_data             923G  129G  794G  14% /tmp/v2v.3IhNFD

This can be 100% reproducible. Please check the attachment on the log in debug mode.

Comment 14 Laszlo Ersek 2022-07-08 07:55:03 UTC

Thanks; can you run (as root) "fuser -v /tmp/v2v.6sOqT4"?

Comment 15 Laszlo Ersek 2022-07-08 08:00:11 UTC

Better yet, please run

lsof -b -w -- /tmp/v2v.6sOqT4

Comment 16 Laszlo Ersek 2022-07-08 08:00:52 UTC

Aaargh, the lsof manual is broken. The '-b' option makes it unusable in effect. So please do this:

lsof -w -- /tmp/v2v.6sOqT4

Comment 17 Vera 2022-07-12 05:15:17 UTC

# lsof -w -- /tmp/v2v.Mf9CpG
# fuser -v  /tmp/v2v.Mf9CpG
                     USER        PID ACCESS COMMAND
/tmp/v2v.Mf9CpG:     root     kernel mount /tmp/v2v.Mf9CpG


And any other steps then?

Comment 18 Laszlo Ersek 2022-07-12 07:57:46 UTC

Unfortunately, fuser is pretty useless in this case (I was surprised to see that, but that's why I also requested the lsof command).

Now, that lsof *also* doesn't print anything useful makes me frown; it suggests that whatever process prevented the umount from working no longer exists :/ I think we'll have to investigate that separately. Can you please file a new BZ? (I've never seen this issue myself and don't know how to reproduce it, so we might need access to your environment.) Thanks!

Comment 19 Richard W.M. Jones 2022-07-12 08:08:26 UTC

I think we're going to need a specially patched virt-v2v package which
runs lsof just before trying to unmount the disk.  Likely whatever process
was holding the mount open was only there temporarily.  I agree this is a
separate (new) bug.

Comment 22 Richard W.M. Jones 2022-07-13 09:57:11 UTC

Probably running lsof slightly delays the unmount, allowing the
process that was holding the mountpoint to finish.  You might need
to run the conversion a few times to see the error.

Comment 24 Richard W.M. Jones 2022-07-14 10:07:37 UTC

> # lsof -w --  /tmp/v2v.Hu1IE4/

Was this command issued while virt-v2v was running?

Anyway the patch attempts to run lsof on the temporary mountpoint
just before unmounting it.  It would still print the error if the
unmount fails.  If it isn't printing the error, it's likely that
running lsof from virt-v2v slightly delays the unmount enough so that
whatever program was holding the mountpoint open (prob nbdkit) goes away.

There's no other way to diagnose this except to run the conversion
more and hope the error happens.  If it never happens then the
patched virt-v2v experiment was not successful.

Comment 26 Richard W.M. Jones 2022-07-14 11:24:14 UTC

Vera, I do believe this issue "umount.nfs4: /tmp/v2v.XXXXXX: device is busy"
is a new, separate bug and a new BZ should be filed for it.

I'm going to guess at a sequence of events which could cause the (new) bug:

(1) We're using -o rhv-upload with implicit -of raw

This runs an nbdkit command similar to:

  nbdkit --exit-with-parent file /tmp/v2v.XXXXXX/very-long-filename cache=none

where /tmp/v2v.XXXXXX is a randomly named NFS mount point.

(2) At the end of successful conversion, nbdcopy exits, closing the nbdkit
socket, which causes nbdkit-file-plugin to close the file descriptor.  However
this happens asynchronously.  If virt-v2v is fast enough then step (3) will
happen before this is complete.  (Note that we use nbdcopy --flush so
nbdcopy should not exit until all data is flushed to persistent storage,
which is why this should be safe.)

(3) Several "on_exit" handlers run as virt-v2v shuts down.  They do:

 (a) If the conversion was not successful, remove the output.

 (b) Remove the output socket.

 (c) Unmount /tmp/v2v.XXXXXX

 (d) Kill nbdkit.

These do not run in any well-defined order, so eg. nbdkit might be killed
after unmounting.  Also they are asynchronous, eg. killing nbdkit doesn't
mean that it exits immediately.  And of course it might overlap with (2).

The upshot is there is no guarantee that nbdkit has closed the file
descriptor before we try to unmount.  The window should be very small
which is likely why adding the lsof command is easily enough to hide
the problem.

There are some complicated solutions to this which I'll propose when we've
got a new bug.

Comment 27 Vera 2022-07-15 03:00:38 UTC

(In reply to Richard W.M. Jones from comment #24)
> > # lsof -w --  /tmp/v2v.Hu1IE4/
> 
> Was this command issued while virt-v2v was running?

A: Yes. I ran the " lsof " command while v2v was running and with the build: virt-v2v-2.0.7-1.1.bz1953286.el9.x86_64.


I tried several times with the latest versions:
nbdkit-1.30.7-1.el9.x86_64
virt-v2v-2.0.7-1.el9.x86_64
libvirt-8.5.0-1.el9.x86_64
guestfs-tools-1.48.2-4.el9.x86_64
qemu-img-7.0.0-8.el9.x86_64

1.check mountpoint before running virt-v2v:
# df -h|grep /tmp
# 

2. Run virt-v2v to convert guest to rhv data domain
# virt-v2v -ic vpx://root.73.141/data/10.73.75.219/?no_verify=1 -it vddk -io vddk-libdir=/root/vddk_libdir/latest -io vddk-thumbprint=1F:97:34:5F:B6:C2:BA:66:46:CB:1A:71:76:7D:6B:50:1E:03:00:EA  -ip /v2v-ops/esxpw   -o rhv -os 10.73.195.48:/home/nfs_data -b ovirtmgmt  esx6.7-rhel8.4-x86_64
[   0.2] Setting up the source: -i libvirt -ic vpx://root.73.141/data/10.73.75.219/?no_verify=1 -it vddk esx6.7-rhel8.4-x86_64
[   2.1] Opening the source
[   7.4] Inspecting the source
[  15.8] Checking for sufficient free disk space in the guest
[  15.8] Converting Red Hat Enterprise Linux 8.4 (Ootpa) to run on KVM
virt-v2v: The QEMU Guest Agent will be installed for this guest at first 
boot.
virt-v2v: This guest has virtio drivers installed.
[  81.9] Mapping filesystem data to avoid copying unused and blank areas
[  83.6] Closing the overlay
[  83.9] Assigning disks to buses
[  83.9] Checking if the guest needs BIOS or UEFI to boot
[  83.9] Setting up the destination: -o rhv
[  85.4] Copying disk 1/1
█ 100% [****************************************]
[ 324.7] Creating output metadata
[ 325.9] Finishing off

Check the mountpoint during virt-v2v is running:
# df -h|grep /tmp
10.73.195.48:/home/nfs_data             923G  140G  783G  16% /tmp/v2v.N7e53k

3. check the mnt again after finishing:
# df -h|grep /tmp
#


The issue of "device is busy" seems to be solved.

Comment 33 Vera 2022-10-27 03:31:31 UTC

Hi,Laszloe

As comment31, we think this bug should be closed as CANTFIX or WONTFIX. 

Cause the bug involves RHV and this testing is a negative test.

Thanks,
Vera

Comment 34 Laszlo Ersek 2022-10-27 08:52:14 UTC

Hi Vera, WONTFIX is fine too.