Bug 2066113 - When downloading 4 disks concurrently using SDK, it sometimes fails with: 'Resource temporarily unavailable'
Summary: When downloading 4 disks concurrently using SDK, it sometimes fails with: 'Re...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: ovirt-imageio
Classification: oVirt
Component: Client
Version: 2.4.1
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ovirt-4.5.0
: 2.4.2
Assignee: Nir Soffer
QA Contact: Ilia Markelov
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-03-20 21:19 UTC by Evelina Shames
Modified: 2022-05-04 07:32 UTC (History)
6 users (show)

Fixed In Version: ovirt-imageio-2.4.2-1
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-04-28 09:26:34 UTC
oVirt Team: Storage
Embargoed:
pm-rhel: ovirt-4.5?


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github oVirt ovirt-imageio pull 50 0 None open http: Increase listen backlog to 40 2022-03-20 22:01:17 UTC
Red Hat Issue Tracker RHV-45365 0 None None None 2022-03-20 21:28:07 UTC

Description Evelina Shames 2022-03-20 21:19:33 UTC
Description of problem:
As part of our automation TC27605, we try to download 4 snapshot disks in parallel using python SDK: download_disk_snapshot.py

Sometimes, one of the download operations fails with the following error:

Starting multi threaded snapshot disks downloads
[10.35.232.28] Executing command python3 /usr/share/doc/python3-ovirt-engine-sdk4/examples/download_disk_snapshot.py iscsi_1 811727c0-8925-41ea-8d9f-0e7331546ce6 /root/download/811727c0-8925-41ea-8d9f-0e7331546ce6 -c engine
[10.35.232.28] Executing command python3 /usr/share/doc/python3-ovirt-engine-sdk4/examples/download_disk_snapshot.py iscsi_1 dc80b2c9-4866-4e93-8f90-b670eb58c7c1 /root/download/dc80b2c9-4866-4e93-8f90-b670eb58c7c1 -c engine
[10.35.232.28] Executing command python3 /usr/share/doc/python3-ovirt-engine-sdk4/examples/download_disk_snapshot.py iscsi_1 1679debf-1e65-4434-9342-c85f9452b6a4 /root/download/1679debf-1e65-4434-9342-c85f9452b6a4 -c engine
[10.35.232.28] Executing command python3 /usr/share/doc/python3-ovirt-engine-sdk4/examples/download_disk_snapshot.py iscsi_1 ac7b998d-c0e7-4bb2-a012-f596584c8065 /root/download/ac7b998d-c0e7-4bb2-a012-f596584c8065 -c engine
[10.35.232.28] Failed to run command ['python3', '/usr/share/doc/python3-ovirt-engine-sdk4/examples/download_disk_snapshot.py', 'iscsi_1', 'dc80b2c9-4866-4e93-8f90-b670eb58c7c1', '/root/download/dc80b2c9-4866-4e93-8f90-b670eb58c7c1', '-c', 'engine'] ERR: Traceback (most recent call last):
  File "/usr/share/doc/python3-ovirt-engine-sdk4/examples/download_disk_snapshot.py", line 165, in <module>
    **extra_args)
  File "/usr/lib64/python3.6/site-packages/ovirt_imageio/client/_api.py", line 186, in download
    name="download")
  File "/usr/lib64/python3.6/site-packages/ovirt_imageio/_internal/io.py", line 69, in copy
    log.debug("Executor failed")
  File "/usr/lib64/python3.6/site-packages/ovirt_imageio/_internal/io.py", line 193, in __exit__
    self.stop()
  File "/usr/lib64/python3.6/site-packages/ovirt_imageio/_internal/io.py", line 170, in stop
    raise self._errors[0]
  File "/usr/lib64/python3.6/site-packages/ovirt_imageio/_internal/io.py", line 235, in _run
    handler = self._handler_factory()
  File "/usr/lib64/python3.6/site-packages/ovirt_imageio/_internal/io.py", line 262, in __init__
    self._src = src_factory()
  File "/usr/lib64/python3.6/site-packages/ovirt_imageio/_internal/backends/http.py", line 82, in clone
    con = self._clone_connection()
  File "/usr/lib64/python3.6/site-packages/ovirt_imageio/_internal/backends/http.py", line 420, in _clone_connection
    return self._create_unix_connection(self.server_address)
  File "/usr/lib64/python3.6/site-packages/ovirt_imageio/_internal/backends/http.py", line 392, in _create_unix_connection
    con.connect()
  File "/usr/lib64/python3.6/site-packages/ovirt_imageio/_internal/backends/http.py", line 627, in connect
    self.sock.connect(self.path)
BlockingIOError: [Errno 11] Resource temporarily unavailable

we start 4 downloads, each download uses 4 connections = 16 concurrent connections.

looks like bug 1925345

Version-Release number of selected component (if applicable):
Seen for a while now in both 4.4 and 4.5.
After a deep investigation, decided to open a bug.

How reproducible:
Barly.
Once in a while, it is reproduced in our regression runs.

Steps to Reproduce:
1. Clone VM from template with disks permutations including FS on block SD:
virtio_scsicow, virtio_scsiraw, virtioraw, virtiocow
2. Make a snapshot of the VM
3. Using the SDK example script, download the snapshot disks in parallel on the SPM host:
 a. TC1: with finalization of the transfer
 b. TC2: Without finalization

Actual results:
Sometimes, one of the download operations fails on 'BlockingIOError: [Errno 11] Resource temporarily unavailable'

Expected results:
All download operations should succeed.

Additional info:

Comment 1 Evelina Shames 2022-03-20 21:20:19 UTC
version: ovirt-imageio-client-2.3.0-1.el8ev.x86_64

Comment 2 Nir Soffer 2022-03-20 22:10:03 UTC
Evelina, can you confirm that this build solves the issue?
https://github.com/oVirt/ovirt-imageio/actions/runs/2013146111

To test, you can download this zip file:
https://github.com/oVirt/ovirt-imageio/suites/5730182938/artifacts/189701716

mkdir test
cd test
unzip ../rpm-centos-8.zip
dnf upgrade *.rpm

Running the automated tests starting 4 concurrent downloads should not fail,
maybe run it 10 times to be sure.

With the fix you should be able to download 10 disk in parallel without any
error. This uses 40 concurrent connections to imageio server.

Comment 3 Nir Soffer 2022-03-20 22:12:48 UTC
Proposing for 4.5.0 since this is a trivial fix and easy to test.

It is unlikely to happen in real usage, so we can also deliver this
in 4.5.1.

Comment 4 Arik 2022-03-21 08:04:47 UTC
(In reply to Nir Soffer from comment #3)
> Proposing for 4.5.0 since this is a trivial fix and easy to test.
> 
> It is unlikely to happen in real usage, so we can also deliver this
> in 4.5.1.

As according to Evelina it affects our automation and the fix is trivial, let's aim for 4.5.0

Comment 5 Nir Soffer 2022-03-21 13:12:19 UTC
Fixed merged, will be available in ovirt-imageio 2.4.2-1.

Comment 7 Evelina Shames 2022-03-24 08:03:08 UTC
(In reply to Nir Soffer from comment #2)
> Evelina, can you confirm that this build solves the issue?
> https://github.com/oVirt/ovirt-imageio/actions/runs/2013146111
> 
> To test, you can download this zip file:
> https://github.com/oVirt/ovirt-imageio/suites/5730182938/artifacts/189701716
> 
> mkdir test
> cd test
> unzip ../rpm-centos-8.zip
> dnf upgrade *.rpm
> 
> Running the automated tests starting 4 concurrent downloads should not fail,
> maybe run it 10 times to be sure.
> 
> With the fix you should be able to download 10 disk in parallel without any
> error. This uses 40 concurrent connections to imageio server.

Will be tested as part of our automation runs as this one is hard to reproduce.

Comment 8 Nir Soffer 2022-03-28 20:52:03 UTC
The issue was inheriting the default listen backlog size (5) from python standard
library. This never caused a problem since nobody tried to start more than 5 
transfers at the same time.

The QE test starting 4 transfers in the same time, when each of them starting 4
connections at the same time can fail randomly with the default listen backlog.

The listen backlog was change to 40 to allow up to 10 transfers started at the
same time, using the default 4 connections per transfers.

We have a new automated tests starting 10 transfers at the same time, so this
is unlikely to break:
https://github.com/oVirt/ovirt-imageio/blob/385e9e460b7487569adf07a00d3405aba71e46d8/test/client_test.py#L1071

Running the current automated tests will be enough for testing this change. 
The tests that used to fail randomly should not fail now.

Comment 10 Sandro Bonazzola 2022-04-28 09:26:34 UTC
This bugzilla is included in oVirt 4.5.0 release, published on April 20th 2022.

Since the problem described in this bug report should be resolved in oVirt 4.5.0 release, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.

Comment 11 Ilia Markelov 2022-05-04 07:32:33 UTC
Verified.

All download operations succeeded.

Versions:
ovirt-engine-4.5.0.2-0.7.el8ev 
ovirt-imageio-2.4.3-1


Note You need to log in before you can comment on or make changes to this bug.