Bug 1953286
Summary: | No error shows when using virt-v2v -o rhv to convert guest to data domain | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 9 | Reporter: | xinyli |
Component: | virt-v2v | Assignee: | Laszlo Ersek <lersek> |
Status: | CLOSED WONTFIX | QA Contact: | Vera <vwu> |
Severity: | low | Docs Contact: | |
Priority: | low | ||
Version: | 9.0 | CC: | juzhou, kkiwi, lersek, mxie, rjones, tyan, tzheng, virt-bugs, vwu, xiaodwan |
Target Milestone: | rc | Keywords: | Triaged |
Target Release: | 9.0 Beta | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2022-10-25 10:09:17 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
xinyli
2021-04-25 08:08:50 UTC
Two comments here: (1) In comment#0, the RHV 4.4 vs. RHV 4.3 output comparison is bogus. The RHV 4.4 output is configured with a correct (reachable) "-os" option, while the RHV 4.3 output is configured with an unreachable "-os" option. (The IP addresses differ!) That's an apples-to-oranges comparison; most likely, if the wrong "-os" had been passed in the RHV 4.4 output case, virt-v2v would have emitted an error message just the same. So, the "RHV 4.3" section of the report should be summarily ignored. (2) We've had a very similar bug: bug 2027598. It's not an *obvious* duplicate by any means; under bug 2027598, we ultimately fixed a regression from upstream v2v commit 255722cbf39a ("v2v: Modular virt-v2v", 2021-09-07), which was first released in v1.45.90. Conversely, the present report is for virt-v2v-1.43.3-2.el9.x86_64 -- so it could be something different. Either way, we cannot diagnose the current problem without reproducing the issue, and accessing "engine.log" etc. on the RHV server -- and ever since fixing bug 2027598, we've not seen reports of "guest just uploaded to storage domain is not visible". So my take is that, whatever reason, the present symptom will not (should not) reproduce. Can someone from QE try to reproduce this? Thanks. Rich highlighted the expression "data domain" in the BZ title (also in comment#0). Conversion directly to a data domain is not supported at all; I don't even understand where this comes from -- a "just in case" attempt, or exploring error behavior? The documentation at <https://libguestfs.org/virt-v2v-output-rhv.1.html> clearly says "Export Storage Domain", so writing to a Data Domain directory instead is a clear violation of the docs. What I don't understand is if that's an honest mistake by the reporter, or a genuine attempt to do something that's outside of the documentation. In the latter case, this ticket looks very much like NOTABUG to me. (If you intentionally violate the docs, what do you expect?) (In reply to Laszlo Ersek from comment #6) > Rich highlighted the expression "data domain" in the BZ title (also in > comment#0). > > Conversion directly to a data domain is not supported at all; I don't even > understand where this comes from -- a "just in case" attempt, or exploring > error behavior? The documentation at > <https://libguestfs.org/virt-v2v-output-rhv.1.html> clearly says "Export > Storage Domain", so writing to a Data Domain directory instead is a clear > violation of the docs. What I don't understand is if that's an honest > mistake by the reporter, or a genuine attempt to do something that's outside > of the documentation. In the latter case, this ticket looks very much like > NOTABUG to me. (If you intentionally violate the docs, what do you expect?) Correct. The export destination to "/home/nfs_data" is not supported. Retesting to "nfs_export" is passed. with the versions: libvirt-8.5.0-1.el9.x86_64 guestfs-tools-1.48.2-4.el9.x86_64 qemu-img-7.0.0-8.el9.x86_64 libguestfs-1.48.4-1.el9.x86_64 virt-v2v-2.0.7-1.el9.x86_64 nbdkit-1.30.6-2.el9.x86_64 rhv 4.4.10.6-0.1.el8ev Steps: 1.Convert a guest from VMware to rhv4.4 via -o rhv by v2v # virt-v2v -ic vpx://root.73.141/data/10.73.75.219/?no_verify=1 -it vddk -io vddk-libdir=/root/vddk_libdir/latest -io vddk-thumbprint=1F:97:34:5F:B6:C2:BA:66:46:CB:1A:71:76:7D:6B:50:1E:03:00:EA -ip /v2v-ops/esxpw -o rhv -os 10.73.195.48:/home/nfs_export -b ovirtmgmt esx6.7-rhel8.6-x86_64 [ 0.2] Setting up the source: -i libvirt -ic vpx://root.73.141/data/10.73.75.219/?no_verify=1 -it vddk esx6.7-rhel8.6-x86_64 [ 2.1] Opening the source [ 7.6] Inspecting the source [ 27.0] Checking for sufficient free disk space in the guest [ 27.0] Converting Red Hat Enterprise Linux 8.6 (Ootpa) to run on KVM virt-v2v: This guest has virtio drivers installed. [ 176.6] Mapping filesystem data to avoid copying unused and blank areas [ 177.6] Closing the overlay [ 177.9] Assigning disks to buses [ 177.9] Checking if the guest needs BIOS or UEFI to boot [ 177.9] Setting up the destination: -o rhv [ 180.2] Copying disk 1/1 █ 100% [****************************************] [1492.2] Creating output metadata [1495.7] Finishing off 2. Check if the guest is in "Storage--> Storage Domains--->nfs_export --> VM Import" on rhv4.4 after conversion 3. Import the guest and check the stauts on rhv4.4 And also tried "to data domain". The process shows conversion is successfully entirely, but in the end it fails to umount the NFS. Laszlo, do you think it can make more clearly on the message? # virt-v2v -ic vpx://root.227.27/data/10.73.199.217/?no_verify=1 -it vddk -io vddk-libdir=/home/vddk7.0.3 -io vddk-thumbprint=76:75:59:0E:32:F5:1E:58:69:93:75:5A:7B:51:32:C5:D1:6D:F1:21 -o rhv -of qcow2 -ip /v2v-ops/esxpw -os 10.73.224.195:/home/nfs_data -b ovirtmgmt esx7.0-rhel8.4-x86_64 [ 0.9] Setting up the source: -i libvirt -ic vpx://root.227.27/data/10.73.199.217/?no_verify=1 -it vddk esx7.0-rhel8.4-x86_64 [ 2.9] Opening the source [ 18.0] Inspecting the source [ 24.3] Checking for sufficient free disk space in the guest [ 24.3] Converting Red Hat Enterprise Linux 8.4 (Ootpa) to run on KVM virt-v2v: The QEMU Guest Agent will be installed for this guest at first boot. virt-v2v: This guest has virtio drivers installed. [ 80.2] Mapping filesystem data to avoid copying unused and blank areas [ 81.4] Closing the overlay [ 81.7] Assigning disks to buses [ 81.7] Checking if the guest needs BIOS or UEFI to boot [ 81.7] Setting up the destination: -o rhv [ 84.1] Copying disk 1/1 █ 100% [****************************************] [ 306.3] Creating output metadata [ 306.4] Finishing off umount.nfs4: /tmp/v2v.6sOqT4: device is busy The unmount failure must surely be incidental. We are actually using plain umount, but I think it should be safe (as long as we sync first??) to call umount -l (lazy unmount) instead. https://github.com/libguestfs/virt-v2v/blob/4368b94ee1724c16aa35c0ee42ce4c51ce037b5a/output/output_rhv.ml#L210 (In reply to Vera from comment #8) > The process shows conversion is successfully entirely, but in the end > it fails to umount the NFS. > > Laszlo, do you think it can make more clearly on the message? > > [ 306.4] Finishing off > umount.nfs4: /tmp/v2v.6sOqT4: device is busy (In reply to Richard W.M. Jones from comment #9) > The unmount failure must surely be incidental. We are actually using > plain umount, but I think it should be safe (as long as we sync > first??) to call umount -l (lazy unmount) instead. The umount failure does not look good. I'm quite opposed to lazy umounting; if we fail to umount, then something is arguably wrong, and laziness only masks it. That didn't work well for dnf / livecd-creator either, did it. At the end of conversion, especially at the end of a successful conversion, nothing should be holding the temporary directory open. Vera, did you "cd /tmp/v2v.6sOqT4" from a different terminal during conversion? If so, then please don't bother opening a new ticket, and I wouldn't say that the error message should be improved either. If you didn't create the separate reference(s) to "/tmp/v2v.6sOqT4", then we should find out what did, and change *that*. How reproducible is this symptom? Unless it is reproducible (and I'm seeing it for the first time now), we can't really do anything about it -- unclear what to change, and unclear how to test a potential fix. (In reply to Laszlo Ersek from comment #11) > (In reply to Vera from comment #8) > > The process shows conversion is successfully entirely, but in the end > > it fails to umount the NFS. > > > > Laszlo, do you think it can make more clearly on the message? > > > > [ 306.4] Finishing off > > umount.nfs4: /tmp/v2v.6sOqT4: device is busy > > (In reply to Richard W.M. Jones from comment #9) > > The unmount failure must surely be incidental. We are actually using > > plain umount, but I think it should be safe (as long as we sync > > first??) to call umount -l (lazy unmount) instead. > > The umount failure does not look good. I'm quite opposed to lazy > umounting; if we fail to umount, then something is arguably wrong, and > laziness only masks it. That didn't work well for dnf / livecd-creator > either, did it. At the end of conversion, especially at the end of a > successful conversion, nothing should be holding the temporary directory > open. > > Vera, did you "cd /tmp/v2v.6sOqT4" from a different terminal during > conversion? > > If so, then please don't bother opening a new ticket, and I wouldn't say > that the error message should be improved either. > > If you didn't create the separate reference(s) to "/tmp/v2v.6sOqT4", > then we should find out what did, and change *that*. > > How reproducible is this symptom? Unless it is reproducible (and I'm > seeing it for the first time now), we can't really do anything about it > -- unclear what to change, and unclear how to test a potential fix. Laszlo, I didn't "cd /tmp/v2v.6sOqT4" from a different terminal during conversion. I checked the mountpoints to different data domain. # df -h |grep nfs 10.73.224.195:/home/nfs_data 900G 247G 654G 28% /tmp/v2v.6sOqT4 10.73.195.48:/home/nfs_data 923G 129G 794G 14% /tmp/v2v.3IhNFD This can be 100% reproducible. Please check the attachment on the log in debug mode. Thanks; can you run (as root) "fuser -v /tmp/v2v.6sOqT4"? Better yet, please run lsof -b -w -- /tmp/v2v.6sOqT4 Aaargh, the lsof manual is broken. The '-b' option makes it unusable in effect. So please do this: lsof -w -- /tmp/v2v.6sOqT4 # lsof -w -- /tmp/v2v.Mf9CpG # fuser -v /tmp/v2v.Mf9CpG USER PID ACCESS COMMAND /tmp/v2v.Mf9CpG: root kernel mount /tmp/v2v.Mf9CpG And any other steps then? Unfortunately, fuser is pretty useless in this case (I was surprised to see that, but that's why I also requested the lsof command). Now, that lsof *also* doesn't print anything useful makes me frown; it suggests that whatever process prevented the umount from working no longer exists :/ I think we'll have to investigate that separately. Can you please file a new BZ? (I've never seen this issue myself and don't know how to reproduce it, so we might need access to your environment.) Thanks! I think we're going to need a specially patched virt-v2v package which runs lsof just before trying to unmount the disk. Likely whatever process was holding the mount open was only there temporarily. I agree this is a separate (new) bug. Probably running lsof slightly delays the unmount, allowing the process that was holding the mountpoint to finish. You might need to run the conversion a few times to see the error. > # lsof -w -- /tmp/v2v.Hu1IE4/
Was this command issued while virt-v2v was running?
Anyway the patch attempts to run lsof on the temporary mountpoint
just before unmounting it. It would still print the error if the
unmount fails. If it isn't printing the error, it's likely that
running lsof from virt-v2v slightly delays the unmount enough so that
whatever program was holding the mountpoint open (prob nbdkit) goes away.
There's no other way to diagnose this except to run the conversion
more and hope the error happens. If it never happens then the
patched virt-v2v experiment was not successful.
Vera, I do believe this issue "umount.nfs4: /tmp/v2v.XXXXXX: device is busy" is a new, separate bug and a new BZ should be filed for it. I'm going to guess at a sequence of events which could cause the (new) bug: (1) We're using -o rhv-upload with implicit -of raw This runs an nbdkit command similar to: nbdkit --exit-with-parent file /tmp/v2v.XXXXXX/very-long-filename cache=none where /tmp/v2v.XXXXXX is a randomly named NFS mount point. (2) At the end of successful conversion, nbdcopy exits, closing the nbdkit socket, which causes nbdkit-file-plugin to close the file descriptor. However this happens asynchronously. If virt-v2v is fast enough then step (3) will happen before this is complete. (Note that we use nbdcopy --flush so nbdcopy should not exit until all data is flushed to persistent storage, which is why this should be safe.) (3) Several "on_exit" handlers run as virt-v2v shuts down. They do: (a) If the conversion was not successful, remove the output. (b) Remove the output socket. (c) Unmount /tmp/v2v.XXXXXX (d) Kill nbdkit. These do not run in any well-defined order, so eg. nbdkit might be killed after unmounting. Also they are asynchronous, eg. killing nbdkit doesn't mean that it exits immediately. And of course it might overlap with (2). The upshot is there is no guarantee that nbdkit has closed the file descriptor before we try to unmount. The window should be very small which is likely why adding the lsof command is easily enough to hide the problem. There are some complicated solutions to this which I'll propose when we've got a new bug. (In reply to Richard W.M. Jones from comment #24) > > # lsof -w -- /tmp/v2v.Hu1IE4/ > > Was this command issued while virt-v2v was running? A: Yes. I ran the " lsof " command while v2v was running and with the build: virt-v2v-2.0.7-1.1.bz1953286.el9.x86_64. I tried several times with the latest versions: nbdkit-1.30.7-1.el9.x86_64 virt-v2v-2.0.7-1.el9.x86_64 libvirt-8.5.0-1.el9.x86_64 guestfs-tools-1.48.2-4.el9.x86_64 qemu-img-7.0.0-8.el9.x86_64 1.check mountpoint before running virt-v2v: # df -h|grep /tmp # 2. Run virt-v2v to convert guest to rhv data domain # virt-v2v -ic vpx://root.73.141/data/10.73.75.219/?no_verify=1 -it vddk -io vddk-libdir=/root/vddk_libdir/latest -io vddk-thumbprint=1F:97:34:5F:B6:C2:BA:66:46:CB:1A:71:76:7D:6B:50:1E:03:00:EA -ip /v2v-ops/esxpw -o rhv -os 10.73.195.48:/home/nfs_data -b ovirtmgmt esx6.7-rhel8.4-x86_64 [ 0.2] Setting up the source: -i libvirt -ic vpx://root.73.141/data/10.73.75.219/?no_verify=1 -it vddk esx6.7-rhel8.4-x86_64 [ 2.1] Opening the source [ 7.4] Inspecting the source [ 15.8] Checking for sufficient free disk space in the guest [ 15.8] Converting Red Hat Enterprise Linux 8.4 (Ootpa) to run on KVM virt-v2v: The QEMU Guest Agent will be installed for this guest at first boot. virt-v2v: This guest has virtio drivers installed. [ 81.9] Mapping filesystem data to avoid copying unused and blank areas [ 83.6] Closing the overlay [ 83.9] Assigning disks to buses [ 83.9] Checking if the guest needs BIOS or UEFI to boot [ 83.9] Setting up the destination: -o rhv [ 85.4] Copying disk 1/1 █ 100% [****************************************] [ 324.7] Creating output metadata [ 325.9] Finishing off Check the mountpoint during virt-v2v is running: # df -h|grep /tmp 10.73.195.48:/home/nfs_data 923G 140G 783G 16% /tmp/v2v.N7e53k 3. check the mnt again after finishing: # df -h|grep /tmp # The issue of "device is busy" seems to be solved. Hi,Laszloe As comment31, we think this bug should be closed as CANTFIX or WONTFIX. Cause the bug involves RHV and this testing is a negative test. Thanks, Vera Hi Vera, WONTFIX is fine too. |