Description of problem: See this patch series: https://www.redhat.com/archives/libguestfs/2019-September/msg00255.html and especially the final patch: https://www.redhat.com/archives/libguestfs/2019-September/msg00267.html Steps to Reproduce: With this change enabled, virt-v2v connections should survive if there are brief network interruptions or for example if VMware is rebooted or the SSH daemon associated with the transfer is killed.
It's possibly not safe to use it on the -o rhv-upload plugin, will need to check.
This was fixed/implemented in virt-v2v 1.42.0, which was introduced in RHEL AV 8.3.0.
Verify the bug with below builds: virt-v2v-1.42.0-3.module+el8.3.0+6497+b190d2a5.x86_64 libguestfs-1.42.0-1.module+el8.3.0+6496+d39ac712.x86_64 libvirt-6.3.0-1.module+el8.3.0+6478+69f490bb.x86_64 qemu-kvm-5.0.0-0.module+el8.3.0+6620+5d5e1420.x86_64 nbdkit-1.20.1-1.module+el8.3.0+6491+7f1d5b30.x86_64 Steps: Scenario1: Disconnect ESXi host on vSphere client during v2v conversion(with vddk) 1.1 Convert a guest from VMware via vddk by virt-v2v #virt-v2v -ic vpx://root.198.169/data/10.73.199.217/?no_verify=1 -it vddk -io vddk-libdir=/home/vmware-vix-disklib-distrib -io vddk-thumbprint=B5:52:1F:B4:21:09:45:24:51:32:56:F6:63:6A:93:5D:54:08:2D:78 esx7.0-rhel8.2-x86_64 -ip /home/passwd -o rhv-upload -oo rhv-cafile=/home/ca.pem -oo rhv-direct -oc https://ibm-x3250m5-03.rhts.eng.pek2.redhat.com/ovirt-engine/api -op /home/rhvpasswd -of raw -oo rhv-cluster=NFS -os nfs_data -b ovirtmgmt -v -x 1.2 Disconnect ESXi host on vSphere client during v2v conversion, can find retrying info in v2v debug log ..... nbdkit: error: [NFC ERROR]NfcNetTcpSetError: Unexpected EOF .... nbdkit: vddk[2]: debug: VixDiskLib: VixDiskLib_Read: Read 128 sectors at 20761984 failed. Error 14009 (The server refused connection) (DiskLib error 2338: NBD_ERR_NETWORK_CONNECT) at 6151. nbdkit: vddk[2]: error: VixDiskLib_Read: The server refused connection nbdkit: vddk[2]: debug: pread failed: original errno = 5 nbdkit: vddk[2]: debug: retry 1: waiting 2 seconds before retrying nbdkit: vddk[2]: debug: vddk: reopen readonly=1 nbdkit: vddk[2]: debug: vddk: finalize nbdkit: vddk[2]: debug: vddk: close .... .... nbdkit: vddk[2]: error: VixDiskLib_Open: [esx7.0-matrix] esx7.0-rhel8.2-x86_64/esx7.0-rhel8.2-x86_64.vmdk: Unknown error nbdkit: vddk[2]: debug: VDDK call: VixDiskLib_Disconnect (connection) nbdkit: vddk[2]: debug: VixDiskLib: VixDiskLib_Disconnect: Disconnect. nbdkit: vddk[2]: debug: VixDiskLib: VixDiskLib_FreeConnectParams: Free connection parameters. nbdkit: vddk[2]: debug: VDDK call: VixDiskLib_FreeConnectParams (params) nbdkit: vddk[2]: debug: VixDiskLib: VixDiskLib_FreeConnectParams: Free connection parameters. nbdkit: vddk[2]: debug: vddk: open returned handle (nil) nbdkit: vddk[2]: debug: pread failed: original errno = 108 nbdkit: vddk[2]: debug: retry 5: waiting 32 seconds before retrying nbdkit: vddk[2]: debug: vddk: reopen readonly=1 nbdkit: vddk[2]: debug: vddk: open readonly=1 ..... ..... nbdkit: vddk[2]: debug: VixDiskLib: VixDiskLib_Open: Cannot open disk [esx7.0-matrix] esx7.0-rhel8.2-x86_64/esx7.0-rhel8.2-... nbdkit: vddk[2]: debug: could not recover after 5 retries nbdkit: vddk[2]: debug: sending error reply: Cannot send after transport endpoint shutdown nbdkit: vddk[2]: debug: extents failed: original errno = 0 nbdkit: vddk[2]: debug: retry 1: waiting 2 seconds before retrying ..... 1.3 Reconnect ESXi host on vSphere client, v2v conversion can continue and finish successfully ..... nbdkit: vddk[2]: error: VixDiskLib_Open: [esx7.0-matrix] esx7.0-rhel8.2-x86_64/esx7.0-rhel8.2-x86_64.vmdk: Unknown error ..... nbdkit: vddk[2]: debug: vddk: open returned handle (nil) nbdkit: vddk[2]: debug: pread failed: original errno = 108 nbdkit: vddk[2]: debug: retry 4: waiting 16 seconds before retrying nbdkit: vddk[2]: debug: vddk: reopen readonly=1 nbdkit: vddk[2]: debug: vddk: open readonly=1 ..... nbdkit: vddk[2]: debug: VixDiskLib: VixDiskLib_OpenEx: Open a disk. nbdkit: vddk[2]: debug: NBD_ClientOpen: attempting to create connection to vpxa-nfc://[esx7.0-matrix] esx7.0-rhel8.2-x86_64/esx7.0-rhel8.2-x86_64.vmdk.199.217:902 nbdkit: vddk[2]: debug: Setting NFC log level to 1 ..... [ 285.5] Copying disk 1/1 to qemu URI json:{ "file.driver": "nbd", "file.path": "/tmp/v2vnbdkit.FyQcLu/nbdkit4.sock", "file.export": "/" } (raw) ..... [ 914.2] Finishing off ..... Scenario2: Restart /etc/init.d/hostd on ESXi host during v2v conversion(with vddk) 2.1 Convert a guest from VMware via vddk by virt-v2v #virt-v2v -ic vpx://root.198.169/data/10.73.199.217/?no_verify=1 -it vddk -io vddk-libdir=/home/vmware-vix-disklib-distrib -io vddk-thumbprint=B5:52:1F:B4:21:09:45:24:51:32:56:F6:63:6A:93:5D:54:08:2D:78 esx7.0-rhel8.2-x86_64 -ip /home/passwd -o local -os /home -v -x 2.2 Restart /etc/init.d/hostd on ESXi host during v2v conversion, can find retrying info in v2v debug log and v2v conversion can continue after reconnecting with ESXi host after retrying, then v2v conversion can finish successfully 2.2.1 log into ESXi7.0 host and restart /etc/init.d/hostd #/etc/init.d/hostd restart watchdog-hostd: Terminating watchdog process with PID 266206 hostd stopped. hostd started. 2.2.2 #virt-v2v -ic vpx://root.198.169/data/10.73.199.217/?no_verify=1 -it vddk -io vddk-libdir=/home/vmware-vix-disklib-distrib -io vddk-thumbprint=B5:52:1F:B4:21:09:45:24:51:32:56:F6:63:6A:93:5D:54:08:2D:78 esx7.0-rhel8.2-x86_64 -ip /home/passwd -o local -os /home -v -x .... nbdkit: vddk[3]: error: VixDiskLib_Read: The server refused connection nbdkit: vddk[3]: debug: pread failed: original errno = 5 nbdkit: vddk[3]: debug: retry 1: waiting 2 seconds before retrying nbdkit: vddk[3]: debug: vddk: reopen readonly=1 nbdkit: vddk[3]: debug: vddk: finalize ..... nbdkit: vddk[3]: debug: vddk: open returned handle (nil) nbdkit: vddk[3]: debug: pread failed: original errno = 108 nbdkit: vddk[3]: debug: retry 2: waiting 4 seconds before retrying ..... [ 422.1] Finishing off ..... Scenario3: Restart /etc/init.d/hostd on ESXi host during v2v conversion(without vddk) 3.1 Convert a guest from VMware without vddk by virt-v2v # virt-v2v -ic vpx://root.73.148/data/10.73.72.61/?no_verify=1 esx6.0-rhel6.10-x86_64 -ip /home/passwd -o local -os /home -v -x 3.2 log into ESXi7.0 to restart /etc/init.d/hostd on ESXi host during v2v conversion, #/etc/init.d/hostd restart watchdog-hostd: Terminating watchdog process with PID 266206 hostd stopped. hostd started. 3.3 Can find retrying info in v2v debug log and v2v conversion can continue after reconnecting with ESXi host after retrying, then v2v conversion can finish successfully ..... nbdkit: curl[3]: error: pread: curl_easy_perform: HTTP response code said error: The requested URL returned error: 404 Not Found nbdkit: curl[3]: debug: pread failed: original errno = 5 nbdkit: curl[3]: debug: retry 1: waiting 2 seconds before retrying nbdkit: curl[3]: debug: curl: reopen readonly=1 nbdkit: curl[3]: debug: curl: finalize nbdkit: curl[3]: debug: curl: close nbdkit: curl[3]: debug: curl: open readonly=1 nbdkit: curl[3]: error: problem doing HEAD request to fetch size of URL [https://10.73.73.148/folder/esx6.0-rhel6.10-x86%5f64/esx6.0-rhel6.10-x86%5f64-flat.vmdk?dcPath=data&dsName=esx6.0-matrix]: HTTP response code said error: The requested URL returned error: 503 Service Unavailable ..... nbdkit: curl[3]: debug: retry 5: waiting 32 seconds before retrying nbdkit: curl[3]: debug: curl: reopen readonly=1 nbdkit: curl[3]: debug: curl: open readonly=1 nbdkit: curl[3]: debug: content length: 17179869184 nbdkit: curl[3]: debug: accept range supported (for HTTP/HTTPS) nbdkit: curl[3]: debug: curl: open returned handle 0x7f81700dee30 nbdkit: curl[3]: debug: curl: prepare readonly=1 nbdkit: curl[3]: debug: curl: get_size (100.00/100%) ..... [1921.7] Finishing off ..... Scenario4: Reboot ESXi host during v2v conversion(with vddk) 4.1 Convert a guest from ESXi host directly by virt-v2v #virt-v2v -ic esx://root.199.217/?no_verify=1 -it vddk -io vddk-libdir=/home/vmware-vix-disklib-distrib -io vddk-thumbprint=84:5D:EF:AE:38:66:81:FA:67:B6:EA:2B:73:C1:36:87:B5:D9:45:14 -o rhv-upload -os nfs_data -of raw -b ovirtmgmt -n ovirtmgmt -oc https://ibm-x3250m5-03.rhts.eng.pek2.redhat.com/ovirt-engine/api -op /home/rhvpasswd -oo rhv-cafile=/home/ca.pem -oo rhv-direct -oo rhv-verifypeer=true -ip /home/esxpw esx7.0-sles15sp1-x86_64 -v -x 4.2 Reboot ESXi host during v2v conversion, can find retrying info in v2v debug log and v2v conversion can continue after reconnecting with ESXi host after retrying, then v2v conversion can finish successfully ..... nbdkit: vddk[3]: error: VixDiskLib_Read: The server refused connection nbdkit: vddk[3]: debug: pread failed: original errno = 5 nbdkit: vddk[3]: debug: retry 1: waiting 2 seconds before retrying nbdkit: vddk[3]: debug: vddk: reopen readonly=1 nbdkit: vddk[3]: debug: vddk: finalize nbdkit: vddk[3]: debug: vddk: close ..... nbdkit: vddk[3]: debug: retry 4: waiting 16 seconds before retrying nbdkit: vddk[3]: debug: vddk: reopen readonly=1 nbdkit: vddk[3]: debug: vddk: open readonly=1 nbdkit: vddk[3]: debug: VDDK call: VixDiskLib_AllocateConnectParams () nbdkit: vddk[3]: debug: VDDK call: VixDiskLib_ConnectEx (h->params, 1, NULL, NULL, &connection) nbdkit: vddk[3]: debug: VixDiskLib: VixDiskLib_ConnectEx: Establish connection using (null). nbdkit: vddk[3]: debug: VixDiskLib: VixDiskLib_Connect: Establish connection. nbdkit: vddk[3]: debug: VixDiskLib: Resolve host. nbdkit: vddk[3]: error: VixDiskLib: VixDiskLib_Connect: Get CEIP status failed, treat CEIP as disabled. Other error encountered: HTTP error response: Bad Request at 4844. nbdkit: vddk[3]: debug: 2020-05-21T08:21:10.452-04:00 warning -[137753] [Originator@6876 sub=transport] SAN transport mode requires a snapshot. nbdkit: vddk[3]: debug: VDDK call: VixDiskLib_Open (connection, [esx7.0-matrix] esx7.0-sles15sp1-x86_64/esx7.0-sles15sp1-x86_64.vmdk, 4, &handle) nbdkit: vddk[3]: debug: VixDiskLib: VixDiskLib_OpenEx: Open a disk. nbdkit: vddk[3]: debug: NBD_ClientOpen: attempting to create connection to ha-nfcssl://[esx7.0-matrix] esx7.0-sles15sp1-x86_64/esx7.0-sles15sp1-x86_64.vmdk.199.217:902 nbdkit: vddk[3]: debug: Setting NFC log level to 1 ..... [ 739.8] Finishing off ..... Scenario5: Stop sshd service and kill sshd process on Xen host during v2v conversion 5.1 Convert a guest from Xen host by virt-v2v # virt-v2v -ic xen+ssh://root.224.33 xen-hvm-rhel7.8-x86_64 -of qcow2 -v -x 5.2 Log into Xen host, stop sshd service and kill the sshd process of v2v connection # service sshd stop # ps -ef|grep sshd root 22672 1 0 10:29 ? 00:00:00 sshd: root@pts/2 root 22849 1 27 10:37 ? 00:00:12 sshd: root@notty root 22899 22674 0 10:38 pts/2 00:00:00 grep sshd # kill -9 22849 5.3 Can find retrying info in v2v debug log ..... nbdkit: ssh[3]: error: read failed: Socket error: disconnected (-1) nbdkit: ssh[3]: debug: pread failed: original errno = 5 nbdkit: ssh[3]: debug: retry 1: waiting 2 seconds before retrying nbdkit: ssh[3]: debug: ssh: reopen readonly=1 nbdkit: ssh[3]: debug: ssh: finalize nbdkit: ssh[3]: debug: ssh: close ..... nbdkit: ssh[3]: error: failed to connect to remote host: 10.73.224.33: Connection refused nbdkit: ssh[3]: debug: ssh: open returned handle (nil) nbdkit: ssh[3]: debug: pread failed: original errno = 108 nbdkit: ssh[3]: debug: retry 5: waiting 32 seconds before retrying nbdkit: ssh[3]: debug: ssh: reopen readonly=1 nbdkit: ssh[3]: debug: ssh: open readonly=1 nbdkit: ssh[3]: error: failed to connect to remote host: 10.73.224.33: Connection refused nbdkit: ssh[3]: debug: ssh: open returned handle (nil) nbdkit: ssh[3]: debug: could not recover after 5 retries nbdkit: ssh[3]: debug: sending error reply: Cannot send after transport endpoint shutdown nbdkit: ssh[3]: debug: pread failed: original errno = 0 nbdkit: ssh[3]: debug: retry 1: waiting 2 seconds before retrying ...... 5.4 Starting sshd service of xen host at different v2v conversion phase will has different result 5.4.1 If start sshd service of Xen host when v2v is retrying less than 5 times before copying disk, v2v can ssh with Xen host successfully and conversion can be continued 5.4.2 If start sshd service of xen host when v2v is retrying more than 5 times before copying disk, v2v can ssh with Xen host successfully and conversion can be continued, then v2v conversion can finish successfully, pls refer to "XEN-retry-5+-before-copying-disk.log" 5.4.3 If start sshd service of Xen host when v2v is retrying less than 5 times during copying disk, v2v can ssh with Xen host successfully and conversion can be continued 5.4.4 If start sshd service of xen host when v2v is retrying more than 5 times during copying disk, v2v can't ssh with Xen host again and v2v conversion will be failed quickly, pls refer to "XEN-retry-5+-during-copying-disk.log" ...... nbdkit: ssh[3]: error: failed to connect to remote host: 10.73.224.33: Connection refused nbdkit: ssh[3]: debug: ssh: open returned handle (nil) nbdkit: ssh[3]: debug: pread failed: original errno = 108 nbdkit: ssh[3]: debug: retry 5: waiting 32 seconds before retrying nbdkit: ssh[3]: debug: ssh: reopen readonly=1 nbdkit: ssh[3]: debug: ssh: open readonly=1 nbdkit: ssh[3]: error: failed to connect to remote host: 10.73.224.33: Connection refused nbdkit: ssh[3]: debug: ssh: open returned handle (nil) nbdkit: ssh[3]: debug: could not recover after 5 retries nbdkit: ssh[3]: debug: sending error reply: Cannot send after transport endpoint shutdown nbdkit: ssh[3]: debug: pread failed: original errno = 0 nbdkit: ssh[3]: debug: retry 1: waiting 2 seconds before retrying qemu-img: error while reading at byte 1424031744: Cannot send after transport endpoint shutdown nbdkit: ssh[3]: debug: ssh: reopen readonly=1 ..... nbdkit: ssh[3]: debug: ssh: finalize nbdkit: ssh[3]: debug: cacheextents: close nbdkit: ssh[3]: debug: readahead: close nbdkit: ssh[3]: debug: retry: close nbdkit: ssh[3]: debug: reopens needed: 12 nbdkit: ssh[3]: debug: ssh: close virt-v2v: error: qemu-img command failed, see earlier errors rm -rf '/tmp/v2vnbdkit.6FqAwx' rm -rf '/var/tmp/null.mOc6yg' ...... Hi rjones, Please help to check the result of step5.4.4, is it expected? Besides, get same result with step5.4.4 when testing with VMware host, pls refer to "VMware-retry-5+-during-copying-disk.log",the results are summarized as follows: (1) If recover the connection with source host when v2v is retrying less than 5 times before copying disk,v2v conversion can continue (2) If recover the connection with source host when v2v is retrying more than 5 times before copying disk, v2v conversion can continue (3) If recover the connection with source host when v2v is retrying less than 5 times during copying disk, v2v conversion can continue (4) If recover the connection with source host when v2v is retrying more than 5 times during copying disk, v2v conversion will be failed quickly once the connection is recovered with source host
Created attachment 1690689 [details] XEN-retry-5+-before-copying-disk.log
Created attachment 1690690 [details] XEN-retry-5+-during-copying-disk.log
Created attachment 1690691 [details] VMware-retry-5+-during-copying-disk.log
First comment (not related to your question): This patch only helps when virt-v2v is using nbdkit to connect to the source. Virt-v2v uses nbdkit during conversion and during copying. However there is a short period before conversion starts (everything before "Opening the overlay") when we're connecting to VMware using libvirt APIs without nbdkit, and this patch does not help with this. So "before conversion" tests are fine, as long as you don't try "before Opening the overlay" which will still fail! (In reply to mxie from comment #7) > Created attachment 1690690 [details] > XEN-retry-5+-during-copying-disk.log nbdkit: ssh[3]: debug: could not recover after 5 retries nbdkit: ssh[3]: debug: sending error reply: Cannot send after transport endpoint shutdown ... qemu-img: error while reading at byte 1424031744: Cannot send after transport endpoint shutdown The SSH connection has been down for more than 64 seconds, and so the retry filter cannot recover. (You said "If start sshd service of xen host when v2v is retrying more than 5 times during copying disk" which I think means that you waited until after the 5th retry, which would mean more than 64 seconds in total). This is expected behaviour, since the retry filter doesn't retry indefinitely, only for 64 seconds. See also the nbdkit retry filter manual: http://libguestfs.org/nbdkit-retry-filter.1.html#DESCRIPTION ("about 1 minute"). (In reply to mxie from comment #8) > Created attachment 1690691 [details] > VMware-retry-5+-during-copying-disk.log nbdkit: vddk[3]: debug: could not recover after 5 retries This is the same as above. I believe this is all working as expected. Note that this filter is only meant to cover short network outages. The problem with making the retry defaults longer is twofold: (1) We'd end up retrying over and over for a real network outage. (2) There are other timeouts in the stack which might kick in, confusing efforts to recover.
According to comment5 and comment9, move the bug from ON_QA to VERIFIED
Hi Richard, Sorry, forgot to test vmx+ssh scenario for the bug, I think the bug is not fixed in this scenario, do I need to move the bug to assigned status or file a new bug for this scenario? Packages: virt-v2v-1.42.0-3.module+el8.3.0+6497+b190d2a5.x86_64 libguestfs-1.42.0-1.module+el8.3.0+6496+d39ac712.x86_64 package libvirt is not installed qemu-kvm-5.0.0-0.module+el8.3.0+6620+5d5e1420.x86_64 nbdkit-1.20.2-1.module+el8.3.0+6764+cc503f20.x86_64 Steps: 1.Convert a guest from VMX via ssh # virt-v2v -i vmx -it ssh ssh://root.199.217/vmfs/volumes/esx7.0-matrix/esx7.0-ubuntu18.04-x86_64/esx7.0-ubuntu18.04-x86_64.vmx -v -x 2. Disable ssh function on ESXi host during v2v conversion 3. Found v2v conversion will failed during copying disk without retrying to connect ESXi host, details pls refer to vmx+ssh_retry_filter.log ...... [ 62.2] Copying disk 1/1 to /var/lib/libvirt/images/esx7.0-ubuntu18.04-x86_64-sda (raw) ...... nbdkit: ssh[3]: debug: cacheextents: open readonly=1 nbdkit: ssh[3]: debug: readahead: open readonly=1 nbdkit: ssh[3]: debug: retry: open readonly=1 nbdkit: ssh[3]: debug: ssh: open readonly=1 nbdkit: ssh[3]: error: failed to connect to remote host: 10.73.199.217: Connection refused nbdkit: ssh[3]: debug: ssh: open returned handle (nil) nbdkit: ssh[3]: debug: retry: open returned handle (nil) nbdkit: ssh[3]: debug: readahead: open returned handle (nil) nbdkit: ssh[3]: debug: cacheextents: open returned handle (nil) nbdkit: ssh[3]: debug: write: NBD_OPT_ABORT: Broken pipe qemu-img: Could not open '/var/tmp/v2vovl1ec901.qcow2': Could not open backing file: Requested export not available virt-v2v: error: qemu-img command failed, see earlier errors rm -rf '/tmp/v2vnbdkit.D4XuuL'
Created attachment 1693067 [details] vmx+ssh_retry_filter.log
This should literally not be possible, so it's a new bug (in nbdkit): > nbdkit: ssh[3]: debug: ssh: open returned handle (nil)
Created attachment 1693377 [details] disable-ssh-esxi7.0.png
Thanks rjones for confirmation, use bug1841820 to track the problem of comment11
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (virt:8.3 bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:5137