## Description of problem: When I try to upload a disk image, it fails with the error: WARN [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-29) [cor-id] Correlation ID: [cor-id], Call Stack: null, Custom Event ID: -1, Message: Unable to upload image to disk <disk-img> due to a network error. Make sure ovirt-imageio-proxy service is installed and configured, and ovirt-engine's certificate is registered as a valid CA in the browser. ## Version-Release number of selected component (if applicable): rhevm-4.0.6.3-0.1.el7ev.noarch ovirt-engine-4.0.6.3-0.1.el7ev.noarch ovirt-imageio-common-0.3.0-0.el7ev.noarch ovirt-imageio-proxy-0.4.0-0.el7ev.noarch ovirt-imageio-proxy-setup-0.4.0-0.el7ev.noarch ## How reproducible: Always ## Steps to Reproduce: 1. import CA 2. click 'disks' -> 'upload' -> start 3. select image to upload ## Actual results: Some traffic can be seen over port 54323 (between client and rhevm), but then the status goes to 'paused by system'. After a log time, originally stated error appears in the event tab ## Expected results: disk image to import, or a more specific error provided as to the cause of the problem. ## Additional info required ports are open, and 'ImageProxyAddress' is correct
Hi Marcus, Please make sure: * The *engine* ca is imported, i.e.: http://<engine_url>/ovirt-engine/services/pki-resource?resource=ca-certificate&format=X509-PEM-CA * Network isn't blocked (firewall, etc) * You're logged to the WebAdmin using https and fqdn.
(In reply to Marcus West from comment #0) Which browser are you using? did you try with another browser?
@Marcus: * Does the client machine have any limitation on upload transfer speed? * Can you please try to reduce the value of UploadImageChunkSizeKB (in vdc_options table) and check whether it mitigates the issue.
Another solution: try increasing the 'imaged_read_timeout_sec' value (e.g. 300), located in the proxy conf file (/etc/ovirt-imageio-proxy/ovirt-imageio-proxy.conf), and restart the proxy service..
> which version of rhel and webob are installed in the working 4.0 env? on rhevmhe1: Red Hat Enterprise Linux Server release 7.3 (Maipo) python-webob-1.2.3-6.el7.noarch eap7-jboss-metadata-web-10.0.0-6.Final_redhat_1.1.ep7.el7.noarch eap7-jboss-websocket-api_1.1_spec-1.1.1-1.Final_redhat_1.1.ep7.el7.noarch ovirt-engine-4.0.6.3-0.1.el7ev.noarch ovirt-imageio-common-0.5.0-0.el7ev.noarch ovirt-imageio-proxy-0.5.0-0.el7ev.noarch ovirt-imageio-proxy-setup-0.5.0-0.el7ev.noarch > Which browser are you using? did you try with another browser? browser: Firefox 45.4.0, on RHEL7 CSB I also have Chrome 54.0.2840.59, will try that also
Nir / Daniel, Anything special in the scenario that should be tested? It seems to me just a regular UI upload..
(In reply to Natalie Gavrielov from comment #26) > Nir / Daniel, > > Anything special in the scenario that should be tested? > It seems to me just a regular UI upload.. The addressed issue here is network bandwidth. The fix was to increase the request timeout to a more 'reasonable' one (from 10 seconds to 120 seconds). So upload should now succeed with less than optimal network conditions (depends on the UploadImageXhrTimeoutInSeconds configuration value).
Created attachment 1279065 [details] logs: engine, imageio-proxy, imageio-daemon, vdsm Daniel, I tried to verify this issue the following way: 1. start a UI upload of a disk. 2. configure: tc qdisk add dev eno1 root netem delay 40s (This means there should be a delay of 40 seconds of each chunk that us being sent to the engine) 3. view log and UI to see the progress of the upload. Note: I tried a few values: 115, 100, 80, 40, 20 ,1 second. the only value that is not being paused by the system is 1 (all the rest gets paused). engine.log shows the following warnings (a warning for a test that was paused): ------------------------------------------------------------------------------- 2017-05-15 19:52:49,793+03 WARN [org.ovirt.engine.core.bll.storage.disk.image.TransferDiskImageCommand] (DefaultQuartzScheduler1) [b36c5aac-68bd-4efe-bfbe-1f458df96dea] Transfer paused due to no updates in 65 seconds. Upload disk 'test-upload-with-delay' (id '257639b8-3c35-4d68-8c0a-fc02294fce37') 2017-05-15 20:15:32,702+03 WARN [org.ovirt.engine.core.bll.storage.disk.image.TransferDiskImageCommand] (DefaultQuartzScheduler6) [c18d551d-a489-43f4-92dc-b79d458be981] Transfer paused due to no updates in 66 seconds. Upload disk 'test-2' (id '9916ab7b-b3c4-4d77-a06d-376a519406f8') 2017-05-15 20:18:20,943+03 WARN [org.ovirt.engine.core.bll.storage.disk.image.TransferDiskImageCommand] (DefaultQuartzScheduler2) [c18d551d-a489-43f4-92dc-b79d458be981] Transfer paused due to no updates in 66 seconds. Upload disk 'test-2' (id '9916ab7b-b3c4-4d77-a06d-376a519406f8') 2017-05-15 20:28:25,883+03 WARN [org.ovirt.engine.core.bll.storage.disk.image.TransferDiskImageCommand] (DefaultQuartzScheduler10) [c18d551d-a489-43f4-92dc-b79d458be981] Transfer paused due to no updates in 66 seconds. Upload disk 'test-2' (id '9916ab7b-b3c4-4d77-a06d-376a519406f8') 2017-05-15 20:44:45,336+03 WARN [org.ovirt.engine.core.bll.storage.disk.image.TransferDiskImageCommand] (DefaultQuartzScheduler10) [c18d551d-a489-43f4-92dc-b79d458be981] Transfer paused due to no updates in 67 seconds. Upload disk 'test-2' (id '9916ab7b-b3c4-4d77-a06d-376a519406f8') 2017-05-15 20:52:11,883+03 WARN [org.ovirt.engine.core.bll.storage.disk.image.TransferDiskImageCommand] (DefaultQuartzScheduler4) [c18d551d-a489-43f4-92dc-b79d458be981] Transfer paused due to no updates in 65 seconds. Upload disk 'test-2' (id '9916ab7b-b3c4-4d77-a06d-376a519406f8') ------------------------------------------------------------------------------- I was expecting the warning to be generated after ~120 seconds, because UploadImageXhrTimeoutInSeconds is configured with 120.. engine=# SELECT * FROM vdc_options where option_name ilike '%UploadImageXhrTimeoutInSeconds%'; option_id | option_name | option_value | version -----------+--------------------------------+--------------+--------- 445 | UploadImageXhrTimeoutInSeconds | 120 | general (1 row) Any idea why this warning pops up after approximately a minute?
The 'Transfer paused due to no updates in 65 seconds' message only means that the ImageTransfer entity wasn't updated for about a minute. The entity update should be triggered by the client every few seconds by invoking TransferImageStatus command (for keepalive reasons). I.e. the status wasn't polled for about a minute, probably during the traffic control execution. Any way, the UploadImageXhrTimeoutInSeconds value is only the timeout value of the request, which can be estimated by the logs [*]. Indeed about 2 minutes were elapsed between resuming and pausing the transfer. [*}: 2017-05-15 20:50:19,590+03 INFO [org.ovirt.engine.core.bll.storage.disk.image.TransferDiskImageCommand] (DefaultQuartzScheduler6) [c18d551d-a489-43f4-92dc-b79d458be981] Resuming session for Upload disk 'test-2' (id '9916ab7b-b3c4-4d77-a06d-376a519406f8') 2017-05-15 20:52:11,883+03 WARN [org.ovirt.engine.core.bll.storage.disk.image.TransferDiskImageCommand] (DefaultQuartzScheduler4) [c18d551d-a489-43f4-92dc-b79d458be981] Transfer paused due to no updates in 65 seconds. Upload disk 'test-2' (id '9916ab7b-b3c4-4d77-a06d-376a519406f8')
(In reply to Daniel Erez from comment #30) > The 'Transfer paused due to no updates in 65 seconds' message only means > that the ImageTransfer entity wasn't updated for about a minute. The entity > update should be triggered by the client every few seconds by invoking > TransferImageStatus command (for keepalive reasons). I.e. the status wasn't > polled for about a minute, probably during the traffic control execution. > Any way, the UploadImageXhrTimeoutInSeconds value is only the timeout value > of the request, which can be estimated by the logs [*]. Indeed about 2 > minutes were elapsed between resuming and pausing the transfer. > > [*}: > > 2017-05-15 20:50:19,590+03 INFO > [org.ovirt.engine.core.bll.storage.disk.image.TransferDiskImageCommand] > (DefaultQuartzScheduler6) [c18d551d-a489-43f4-92dc-b79d458be981] Resuming > session for Upload disk 'test-2' (id '9916ab7b-b3c4-4d77-a06d-376a519406f8') > > 2017-05-15 20:52:11,883+03 WARN > [org.ovirt.engine.core.bll.storage.disk.image.TransferDiskImageCommand] > (DefaultQuartzScheduler4) [c18d551d-a489-43f4-92dc-b79d458be981] Transfer > paused due to no updates in 65 seconds. Upload disk 'test-2' (id > '9916ab7b-b3c4-4d77-a06d-376a519406f8') I may have described the steps in comment 28 a bit different than how it really was. After the first test (as described in comment 28), I continued the test the following way: 1. remove the delay: sudo tc qdisk del dev eno1 root 2. resume upload. 3. *wait for the upload to continue*, configure the delay again. 4. view log and UI to see the progress of the upload. Now, regarding step 3 - I in some cases I allowed upload to work for some time (a few minutes), before configuring the delay. So basically, these two log messages don't really say much, for example: 2017-05-15 20:37:43,804+03 INFO [org.ovirt.engine.core.bll.storage.disk.image.TransferDiskImageCommand] (DefaultQuartzScheduler9) [c18d551d-a489-43f4-92dc-b79d458be981] Resuming session for Upload disk 'test-2' (id '9916ab7b-b3c4-4d77-a06d-376a519406f8') 2017-05-15 20:44:45,336+03 WARN [org.ovirt.engine.core.bll.storage.disk.image.TransferDiskImageCommand] (DefaultQuartzScheduler10) [c18d551d-a489-43f4-92dc-b79d458be981] Transfer paused due to no updates in 67 seconds. Upload disk 'test-2' (id '9916ab7b-b3c4-4d77-a06d-376a519406f8') Here we have a 7 minutes.. Any suggestions on how to test this?
It could be a bit tricky to measure the exact time as the client might retry the request a few times, but you could try using the console log in the browser. However, the issue in this bug is not really about the exact timeout period but rather to simply verify that the upload won't fail when network bandwidth isn't optimal. Just an example: upload rate less than 1MB/s. So I think that measuring the exact timing isn't relevant as we rely on both chunk size and timeout values, i.e. we can only roughly estimate the supported network conditions. In short, simply check upload rate of ~1MB/s should do it for this bug.
Verified using: rhevm-4.1.2.2-0.1.el7.noarch ovirt-imageio-proxy-1.0.0-0.el7ev.noarch ovirt-imageio-common-1.0.0-0.el7ev.noarch vdsm-4.19.14-1.el7ev.x86_64 ovirt-imageio-daemon-1.0.0-0.el7ev.noarch Scenario tested: Upload images using python-sdk: a. rate: 1MB/s b. rate 512KB/s Both uploads finished successfully.
(In reply to Natalie Gavrielov from comment #33) > Verified using: > rhevm-4.1.2.2-0.1.el7.noarch > ovirt-imageio-proxy-1.0.0-0.el7ev.noarch > ovirt-imageio-common-1.0.0-0.el7ev.noarch > vdsm-4.19.14-1.el7ev.x86_64 > ovirt-imageio-daemon-1.0.0-0.el7ev.noarch > > Scenario tested: > Upload images using python-sdk: > a. rate: 1MB/s > b. rate 512KB/s > > Both uploads finished successfully. Wrong builds, retested with the following builds: ovirt-engine-4.2.0-0.0.master.20170523140304.git04be891.el7.centos.noarch ovirt-imageio-proxy-1.0.0-0.201705230845.gitff244bb.el7.centos.noarch ovirt-imageio-common-1.0.0-0.201705230845.gitff244bb.el7.centos.noarch vdsm-4.20.0-886.gitf9accf8.el7.centos.x86_64 ovirt-imageio-daemon-1.0.0-0.201705230845.gitff244bb.el7.centos.noarch
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:1519
BZ<2>Jira Resync