Bug 1586198
| Summary: | ovirt-imageio-daemon times out during migration | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | Kedar Kulkarni <kkulkarn> | ||||
| Component: | libguestfs | Assignee: | Richard W.M. Jones <rjones> | ||||
| Status: | CLOSED ERRATA | QA Contact: | Virtualization Bugs <virt-bugs> | ||||
| Severity: | high | Docs Contact: | |||||
| Priority: | unspecified | ||||||
| Version: | 7.6 | CC: | juzhou, mxie, mzhan, ptoscano, smallamp, tzheng, xiaodwan | ||||
| Target Milestone: | rc | ||||||
| Target Release: | --- | ||||||
| Hardware: | Unspecified | ||||||
| OS: | Unspecified | ||||||
| Whiteboard: | V2V | ||||||
| Fixed In Version: | libguestfs-1.38.2-5.el7 | Doc Type: | If docs needed, set a value | ||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2018-10-30 07:45:35 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Bug Depends On: | 1563278 | ||||||
| Bug Blocks: | |||||||
| Attachments: |
|
||||||
This was fixed upstream with https://github.com/libguestfs/libguestfs/commit/f25404c65f8e078a3ca5bd5a1ab91343edd22506 which is in libguestfs >= 1.39.6. Try to reproduce the bug with builds: virt-v2v-1.36.10-6.11.rhvpreview.el7ev.x86_64 libguestfs-1.36.10-6.11.rhvpreview.el7ev.x86_64 ovirt-imageio-daemon-1.3.0-0.el7ev.noarch rhv:4.2.5-0.1.el7ev Reproduce steps: 1.Convert the guest to rhv4.2 using rhv-upload, the conversion will be failed during copying # virt-v2v -ic vpx://vsphere.local%5cAdministrator.73.141/data/10.73.75.219/?no_verify=1 esx6.7-rhel7.5-x86_64 -o rhv-upload -oc https://ibm-x3250m5-03.rhts.eng.pek2.redhat.com/ovirt-engine/api -os nfs_data -op /tmp/rhvpasswd -oo rhv-cafile=/home/ca.pem -oo rhv-direct -of raw --password-file /tmp/passwd -b ovirtmgmt -on mxie-reproduce [ 0.1] Opening the source -i libvirt -ic vpx://vsphere.local%5cAdministrator.73.141/data/10.73.75.219/?no_verify=1 esx6.7-rhel7.5-x86_64 [ 13.0] Creating an overlay to protect the source from being modified [ 14.8] Initializing the target -o rhv-upload -oc https://ibm-x3250m5-03.rhts.eng.pek2.redhat.com/ovirt-engine/api -op /tmp/rhvpasswd -os nfs_data [ 18.7] Opening the overlay [ 64.8] Inspecting the overlay [ 393.6] Checking for sufficient free disk space in the guest [ 393.6] Estimating space required on target for each disk [ 393.6] Converting Red Hat Enterprise Linux Server 7.5 (Maipo) to run on KVM virt-v2v: This guest has virtio drivers installed. [3007.8] Mapping filesystem data to avoid copying unused and blank areas [3011.5] Closing the overlay [3011.7] Checking if the guest needs BIOS or UEFI to boot [3011.7] Assigning disks to buses [3011.7] Copying disk 1/1 to qemu URI json:{ "file.driver": "nbd", "file.path": "/var/tmp/rhvupload.zJOs0x/nbdkit1.sock", "file.export": "/" } (raw) nbdkit: python[1]: error: /var/tmp/rhvupload.zJOs0x/rhv-upload-plugin.py: pwrite: error: could not write sector (2363490304, 16384): 403: Forbidden qemu-img: error while writing sector 4616192: Input/output error nbdkit: python[1]: error: /var/tmp/rhvupload.zJOs0x/rhv-upload-plugin.py: flush: error: could not flush: 403: Forbidden nbdkit: python[1]: error: /var/tmp/rhvupload.zJOs0x/rhv-upload-plugin.py: flush: error: could not flush: 403: Forbidden virt-v2v: error: qemu-img command failed, see earlier errors If reporting bugs, run virt-v2v with debugging enabled and include the complete output: virt-v2v -v -x [...] [root@dell-per630-fc-03 ~]# nbdkit: python[1]: error: /var/tmp/rhvupload.zJOs0x/rhv-upload-plugin.py: close: error: Fault reason is "Operation Failed". Fault detail is "[Cannot remove Virtual Disk. Related operation is currently in progress. Please try again later.]". HTTP response code is 409. 2.Check the error info in /var/log/ovirt-imageio-daemon/daemon.log on registered host of rhv4.2, there is about 2mins gap between the successful request and the following requests that fail. # cat daemon.log .... 2018-07-16 17:14:07,639 INFO (ticket.server) [web] START [/] GET /tickets/4ec501db-2384-4202-91a0-f9ac78cab291 2018-07-16 17:14:07,639 INFO (ticket.server) [tickets] Retrieving ticket 4ec501db-2384-4202-91a0-f9ac78cab291 2018-07-16 17:14:07,641 INFO (ticket.server) [web] FINISH [/] GET /tickets/4ec501db-2384-4202-91a0-f9ac78cab291: [200] 354 (0.00s) 2018-07-16 17:21:54,203 INFO (Thread-595) [web] START [10.16.64.228] PUT /images/4ec501db-2384-4202-91a0-f9ac78cab291 2018-07-16 17:21:54,204 WARNING (Thread-595) [web] ERROR [10.16.64.228] PUT /images/4ec501db-2384-4202-91a0-f9ac78cab291: [403] Ticket u'4ec501db-2384-4202-91a0-f9ac78cab291' expired (0.00s) 2018-07-16 17:21:57,204 INFO (Thread-596) [web] START [10.16.64.228] PATCH /images/4ec501db-2384-4202-91a0-f9ac78cab291 2018-07-16 17:21:57,574 WARNING (Thread-596) [web] ERROR [10.16.64.228] PATCH /images/4ec501db-2384-4202-91a0-f9ac78cab291: [403] Ticket u'4ec501db-2384-4202-91a0-f9ac78cab291' expired (0.37s) 2018-07-16 17:21:59,274 INFO (Thread-597) [web] START [10.16.64.228] PATCH /images/4ec501db-2384-4202-91a0-f9ac78cab291 2018-07-16 17:21:59,646 WARNING (Thread-597) [web] ERROR [10.16.64.228] PATCH /images/4ec501db-2384-4202-91a0-f9ac78cab291: [403] Ticket u'4ec501db-2384-4202-91a0-f9ac78cab291' expired (0.37s) 3.Check inactivity_timeout of ticket "4ec501db-2384-4202-91a0-f9ac78cab291" is 60s in https://ibm-x3250m5-03.rhts.eng.pek2.redhat.com/ovirt-engine/api/imagetransfers/ .... <image_transfer href="/ovirt-engine/api/imagetransfers/962a0949-1f1c-48a2-aea6-b5b8fe1a6011" id="962a0949-1f1c-48a2-aea6-b5b8fe1a6011"> .... <direction>upload</direction> <inactivity_timeout>60</inactivity_timeout> <phase>paused_user</phase> <proxy_url>https://ibm-x3250m5-03.rhts.eng.pek2.redhat.com:54323/images/4ec501db-2384-4202-91a0-f9ac78cab291</proxy_url> .... </image_transfers> Verify the bug with builds: virt-v2v-1.38.2-7.el7.x86_64 libguestfs-1.38.2-7.el7.x86_64 ovirt-imageio-daemon-1.3.0-0.el7ev.noarch rhv:4.2.5-0.1.el7ev Steps: 1.Convert the guest to rhv4.2 using rhv-upload # virt-v2v -ic vpx://vsphere.local%5cAdministrator.73.141/data/10.73.75.219/?no_verify=1 esx6.7-rhel7.5-x86_64 -o rhv-upload -oc https://ibm-x3250m5-03.rhts.eng.pek2.redhat.com/ovirt-engine/api -os nfs_data -op /tmp/rhvpasswd -oo rhv-cafile=/home/ca.pem -oo rhv-direct -of raw --password-file /tmp/passwd -b ovirtmgmt -on v2v-7 [ 0.1] Opening the source -i libvirt -ic vpx://vsphere.local%5cAdministrator.73.141/data/10.73.75.219/?no_verify=1 esx6.7-rhel7.5-x86_64 [ 12.9] Creating an overlay to protect the source from being modified [ 14.7] Initializing the target -o rhv-upload -oc https://ibm-x3250m5-03.rhts.eng.pek2.redhat.com/ovirt-engine/api -op /tmp/rhvpasswd -os nfs_data [ 19.1] Opening the overlay [ 72.2] Inspecting the overlay [ 410.1] Checking for sufficient free disk space in the guest [ 410.1] Estimating space required on target for each disk [ 410.1] Converting Red Hat Enterprise Linux Server 7.5 (Maipo) to run on KVM virt-v2v: This guest has virtio drivers installed. [2916.4] Mapping filesystem data to avoid copying unused and blank areas [2920.8] Closing the overlay [2921.0] Checking if the guest needs BIOS or UEFI to boot [2921.0] Assigning disks to buses [2921.0] Copying disk 1/1 to qemu URI json:{ "file.driver": "nbd", "file.path": "/var/tmp/rhvupload.MEOTfe/nbdkit1.sock", "file.export": "/" } (raw) (0.00/100%) 2.Check ticket info in /var/log/ovirt-imageio-daemon/daemon.log on registered host of rhv4.2 # cat daemon.log .... 2018-07-16 18:38:46,583 INFO (Thread-821) [web] START [10.16.64.228] PATCH /images/108bb2fd-c440-4ece-877c-ede2592fdc26 2018-07-16 18:38:46,583 INFO (Thread-821) [images] Zeroing 33554432 bytes at offset 7449082368 flush False to /rhev/data-center/mnt/10.66.144.40:_home_nfs__data/484255cb-cf10-4d22-ba81-32fdb29f0d21/images/7e23145d-83d3-4e7d-833c-30f2094fdb79/47ae44ef-4ef6-412b-8c56-0fca09fef6d8 for ticket 108bb2fd-c440-4ece-877c-ede2592fdc26 2018-07-16 18:38:47,625 INFO (Thread-821) [directio] Operation stats: <Clock(total=1.04, write=1.04)> 2018-07-16 18:38:47,626 INFO (Thread-821) [web] FINISH [10.16.64.228] PATCH /images/108bb2fd-c440-4ece-877c-ede2592fdc26: [200] 0 (1.05s) .... 3.Check inactivity_timeout of ticket "108bb2fd-c440-4ece-877c-ede2592fdc26" is 3600s in https://ibm-x3250m5-03.rhts.eng.pek2.redhat.com/ovirt-engine/api/imagetransfers/ .... <image_transfer href="/ovirt-engine/api/imagetransfers/4f2f6dcc-3efc-4f05-8624-c44b61ccc1dc" id="4f2f6dcc-3efc-4f05-8624-c44b61ccc1dc"> .... <direction>upload</direction> <inactivity_timeout>3600</inactivity_timeout> <phase>transferring</phase> <proxy_url>https://ibm-x3250m5-03.rhts.eng.pek2.redhat.com:54323/images/108bb2fd-c440-4ece-877c-ede2592fdc26</proxy_url> .... </image_transfers> Result: Virt-v2v has improved inactivity_timeout from 60s to 3600s Verify the bug with builds: virt-v2v-1.38.2-7.el7.x86_64 libguestfs-1.38.2-7.el7.x86_64 ovirt-imageio-daemon-1.4.1-0.el7ev.noarch rhv:4.2.5-0.1.el7ev Steps: 1.Convert the guest to rhv4.2 using rhv-upload # virt-v2v -ic vpx://vsphere.local%5cAdministrator.73.141/data/10.73.75.219/?no_verify=1 esx6.7-rhel7.5-x86_64 -o rhv-upload -oc https://ibm-x3250m5-03.rhts.eng.pek2.redhat.com/ovirt-engine/api -os nfs_data -op /tmp/rhvpasswd -oo rhv-cafile=/home/ca.pem -oo rhv-direct -of raw --password-file /tmp/passwd -b ovirtmgmt -on latest [ 0.4] Opening the source -i libvirt -ic vpx://vsphere.local%5cAdministrator.73.141/data/10.73.75.219/?no_verify=1 esx6.7-rhel7.5-x86_64 [ 2.6] Creating an overlay to protect the source from being modified [ 3.7] Initializing the target -o rhv-upload -oc https://ibm-x3250m5-03.rhts.eng.pek2.redhat.com/ovirt-engine/api -op /tmp/rhvpasswd -os nfs_data [ 5.3] Opening the overlay [ 26.4] Inspecting the overlay [ 190.1] Checking for sufficient free disk space in the guest [ 190.1] Estimating space required on target for each disk [ 190.1] Converting Red Hat Enterprise Linux Server 7.5 (Maipo) to run on KVM virt-v2v: This guest has virtio drivers installed. [1368.9] Mapping filesystem data to avoid copying unused and blank areas [1371.7] Closing the overlay [1372.5] Checking if the guest needs BIOS or UEFI to boot [1372.5] Assigning disks to buses [1372.5] Copying disk 1/1 to qemu URI json:{ "file.driver": "nbd", "file.path": "/var/tmp/rhvupload.JKt9Ba/nbdkit1.sock", "file.export": "/" } (raw) (0.00/100%) 2.Check ticket info in /var/log/ovirt-imageio-daemon/daemon.log on registered host of rhv4.2 # cat daemon.log .... 2018-07-16 21:48:37,447 INFO (Thread-47) [web] START: [10.73.72.77] PATCH /images/bb9607b2-eb29-432c-8c50-b66dd4cbd540 2018-07-16 21:48:37,447 INFO (Thread-47) [images] Zeroing 33554432 bytes at offset 1308622848 flush False to /rhev/data-center/mnt/10.66.144.40:_home_nfs__data/484255cb-cf10-4d22-ba81-32fdb29f0d21/images/0438a88d-dd58-4ae1-a9d5-7c363590d875/9cda0dca-9310-47da-baa4-e9a38dfc61fe for ticket bb9607b2-eb29-432c-8c50-b66dd4cbd540 2018-07-16 21:48:37,981 INFO (Thread-47) [web] FINISH [10.73.72.77] PATCH /images/bb9607b2-eb29-432c-8c50-b66dd4cbd540: [200] 0 [request=0.533445, operation=0.532788, write=0.531389] .... 3.Check inactivity_timeout of ticket "bb9607b2-eb29-432c-8c50-b66dd4cbd540" is 3600s in https://ibm-x3250m5-03.rhts.eng.pek2.redhat.com/ovirt-engine/api/imagetransfers/ .... <image_transfer href="/ovirt-engine/api/imagetransfers/f024bfba-46a1-48ad-a064-9ac53c6f25a3" id="f024bfba-46a1-48ad-a064-9ac53c6f25a3"> .... <direction>upload</direction> <inactivity_timeout>3600</inactivity_timeout> <phase>transferring</phase> <proxy_url>https://ibm-x3250m5-03.rhts.eng.pek2.redhat.com:54323/images/bb9607b2-eb29-432c-8c50-b66dd4cbd540</proxy_url> .... </image_transfers> Result: Virt-v2v has improved inactivity_timeout from 60s to 3600s Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2018:3021 |
Created attachment 1447951 [details] ovirt-imageio-daemon logs Description of problem: As per Richard W. M. Jones, he investigated the logs and found: End of previous successful write request: 2018-06-04 13:23:37,077 INFO (Thread-5066) [web] FINISH [10.8.196.27] PUT /images/6544e4c7-60e6-4433-bf75-571708ee1e6e: [200] 0 (0.04s) Start of three zero, trim or flush requests which all fail: 2018-06-04 13:25:44,268 INFO (Thread-5067) [web] START [10.8.196.27] PATCH /images/6544e4c7-60e6-4433-bf75-571708ee1e6e 2018-06-04 13:25:44,308 WARNING (Thread-5067) [web] ERROR [10.8.196.27] PATCH /images/6544e4c7-60e6-4433-bf75-571708ee1e6e: [403] Ticket u'6544e4c7-60e6-4433-bf75-571708ee1e6e' expired (0.04s) 2018-06-04 13:27:51,628 INFO (Thread-5068) [web] START [10.8.196.27] PATCH /images/6544e4c7-60e6-4433-bf75-571708ee1e6e 2018-06-04 13:27:51,667 WARNING (Thread-5068) [web] ERROR [10.8.196.27] PATCH /images/6544e4c7-60e6-4433-bf75-571708ee1e6e: [403] Ticket u'6544e4c7-60e6-4433-bf75-571708ee1e6e' expired (0.04s) 2018-06-04 13:29:58,987 INFO (Thread-5069) [web] START [10.8.196.27] PATCH /images/6544e4c7-60e6-4433-bf75-571708ee1e6e 2018-06-04 13:29:59,027 WARNING (Thread-5069) [web] ERROR [10.8.196.27] PATCH /images/6544e4c7-60e6-4433-bf75-571708ee1e6e: [403] Ticket u'6544e4c7-60e6-4433-bf75-571708ee1e6e' expired (0.04s) You'll notice there is a ~2 minute gap between the successful request and the following requests that fail. We really need to kill authentication timeouts - if that's what's happening here - because there can be arbitrarily long delays between requests, eg. if the source disk has a long run of zero blocks. Version-Release number of selected component (if applicable): ovirt-imageio-daemon-1.3.1.2-0.el7ev.noarch RHV 4.2.4-0.1.el7 virt-v2v 1.36.10rhel=7,release=6.11.rhvpreview.el7ev,libvirt nbdkit-plugin-vddk-1.2.2-1.el7.x86_64 nbdkit-plugin-python-common-1.2.2-1.el7ev.x86_64 nbdkit-plugin-python2-1.2.2-1.el7ev.x86_64 nbdkit-devel-1.2.2-1.el7ev.x86_64 nbdkit-1.2.2-1.el7ev.x86_64 v2v-automate branch - "extended_reporting_and_rhv_upload" MIQ Nightly 24th May How reproducible: Believe 100% Steps to Reproduce: Steps to reproduce: 1. Provision ManageIQ appliance which has migration capabilities (as of today 24th May nightly build can be used) 2. Add VMware/RHV providers 3.Select any of the RHV hosts to be conversion host, run the playbooks from RHV Manager to configure the host you want to be conversion host(i.e. to install nbdkit, virt-v2v, etc.) 4. In ManageIQ, select that host, add its root credentials, add tags for conversion hosts. 5. Import v2v-automate domain git repo in manageIQ. 6. Create infrastructure mapping and migration plan, and execute migration plan. 7. Let the plan execute, it may fail, download the ovirt-imageio-daemon logs Actual results: logs show something like what we see in description Expected results: migration should complete, without timeout error Additional info: