Bug 1483400 - Highly populated export domain fails to attach.
Summary: Highly populated export domain fails to attach.
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-engine
Version: 4.1.4
Hardware: x86_64
OS: Linux
medium
medium
Target Milestone: ovirt-4.2.0
: ---
Assignee: Benny Zlotnik
QA Contact: Kevin Alon Goldblatt
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-08-21 04:06 UTC by Germano Veit Michel
Modified: 2020-09-10 11:16 UTC (History)
10 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-05-15 17:43:37 UTC
oVirt Team: Storage
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Knowledge Base (Solution) 3159041 0 None None None 2017-08-22 04:30:45 UTC
Red Hat Product Errata RHEA-2018:1488 0 None None None 2018-05-15 17:45:35 UTC
oVirt gerrit 81218 0 master MERGED core: optimize AttachStorageDomainToPoolCommand 2020-06-23 09:02:14 UTC

Description Germano Veit Michel 2017-08-21 04:06:43 UTC
Description of problem:

1. at 13:41:06, admin attaches export domain to DC.

2017-08-21 13:41:06,655+10 INFO  [org.ovirt.engine.core.bll.storage.domain.AttachStorageDomainToPoolCommand] (org.ovirt.thread.pool-6-thread-48) [b56aacc1-c32e-49be-8c67-15897edb9783] Running command: AttachStorageDomainToPoolCommand internal: false. Entities affected :  ID: 92db7237-df7e-4e08-bbb3-3c040bfef826 Type: StorageAction group MANIPULATE_STORAGE_DOMAIN with role type ADMIN,  ID: 59938be9-0258-02fc-02b4-0000000000aa Type: StoragePoolAction group MANIPULATE_STORAGE_DOMAIN with role type ADMIN

2. Then at 13:41:39, a GetImagesListVDSCommand returning 50+ images (highly populated Export SD)

3. Then a sequence of GetVolumesListVDSCommand and GetImageInfoVDSCommand for the list in [2].
This loops for 5 minutes, getting the info for all the images sequentially.

4. At the 5 minute mark, at 13:46:39

2017-08-21 13:46:39,036+10 ERROR [org.ovirt.engine.core.bll.storage.disk.image.GetUnregisteredDiskQuery] (org.ovirt.thread.pool-6-thread-48) [b56aacc1-c32e-49be-8c67-15897edb9783] Query 'GetUnregisteredDiskQuery' failed: Could not get JDBC Connection; nested exception is java.sql.SQLException: javax.resource.ResourceException: IJ000460: Error checking for a transaction
2017-08-21 13:46:39,036+10 ERROR [org.ovirt.engine.core.bll.storage.disk.image.GetUnregisteredDiskQuery] (org.ovirt.thread.pool-6-thread-48) [b56aacc1-c32e-49be-8c67-15897edb9783] Exception: org.springframework.jdbc.CannotGetJdbcConnectionException: Could not get JDBC Connection; nested exception is java.sql.SQLException: javax.resource.ResourceException: IJ000460: Error checking for a transaction

If I reduce the number of images so that the loop at [3] takes less than 5 minutes, everything works fine.
The logs are similar to bugzilla #1446878 but I don't see any deadlock in postgres logs.

Is the loop on getting the volume infos for all images inside a 5 minute transaction that times out?

Version-Release number of selected component (if applicable):
rhevm-4.1.4.2-0.1.el7.noarch

How reproducible:
100%

1. Create NFS Share
2. Create Export Domain on NFS share
3. Maintenance and Detach Export Domain
4. Use virt-v2v to populate it with ~50+ disks.
   # for ((n=0;n<50;n++)); do virt-v2v -o rhev -os 10.64.24.33:/exports/data7 rhel7.3; done
5. Attach Export Domain

* You may need more than 50 images if the Storage/Hosts are fast. Mine is quite slow.

Actual results:
Export Domain fails to attach

Expected results:
Export Domain attached

Comment 4 Allon Mureinik 2017-10-02 12:45:26 UTC
Benny, the attached patch is merged.
Should this be MODIFIED, or are we waiting for anything else?

Comment 5 Benny Zlotnik 2017-10-02 12:50:51 UTC
Moved to MODIFIED

Comment 6 rhev-integ 2017-11-02 13:38:35 UTC
INFO: Bug status wasn't changed from MODIFIED to ON_QA due to the following reason:

[No relevant external trackers attached]

For more info please contact: rhv-devops

Comment 8 Kevin Alon Goldblatt 2017-12-03 15:50:42 UTC
Verified with the following code:
--------------------------------------
ovirt-engine-4.2.0-0.5.master.el7.noarch
vdsm-4.20.8-53.gitc3edfc0.el7.centos.x86_64

Verified with the following scenario:
--------------------------------------
1. Created 50 vms with disk and exported to export domain
2. Set the export to maintenance and detached it
3. Attached the export domain again >>>>> the attach operation has undergone significant performance improvements and the it took a few second to complete
4. Vm Import displays the vms within a few seconds too.



Moving to VERIFIED

Comment 12 errata-xmlrpc 2018-05-15 17:43:37 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2018:1488

Comment 13 Franta Kust 2019-05-16 13:05:35 UTC
BZ<2>Jira Resync


Note You need to log in before you can comment on or make changes to this bug.