Bug 1483400

Summary: Highly populated export domain fails to attach.
Product: Red Hat Enterprise Virtualization Manager Reporter: Germano Veit Michel <gveitmic>
Component: ovirt-engineAssignee: Benny Zlotnik <bzlotnik>
Status: CLOSED ERRATA QA Contact: Kevin Alon Goldblatt <kgoldbla>
Severity: medium Docs Contact:
Priority: medium    
Version: 4.1.4CC: bzlotnik, ebenahar, lsurette, ratamir, rbalakri, Rhev-m-bugs, srevivo, tnisan, ykaul, ylavi
Target Milestone: ovirt-4.2.0   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-05-15 17:43:37 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Germano Veit Michel 2017-08-21 04:06:43 UTC
Description of problem:

1. at 13:41:06, admin attaches export domain to DC.

2017-08-21 13:41:06,655+10 INFO  [org.ovirt.engine.core.bll.storage.domain.AttachStorageDomainToPoolCommand] (org.ovirt.thread.pool-6-thread-48) [b56aacc1-c32e-49be-8c67-15897edb9783] Running command: AttachStorageDomainToPoolCommand internal: false. Entities affected :  ID: 92db7237-df7e-4e08-bbb3-3c040bfef826 Type: StorageAction group MANIPULATE_STORAGE_DOMAIN with role type ADMIN,  ID: 59938be9-0258-02fc-02b4-0000000000aa Type: StoragePoolAction group MANIPULATE_STORAGE_DOMAIN with role type ADMIN

2. Then at 13:41:39, a GetImagesListVDSCommand returning 50+ images (highly populated Export SD)

3. Then a sequence of GetVolumesListVDSCommand and GetImageInfoVDSCommand for the list in [2].
This loops for 5 minutes, getting the info for all the images sequentially.

4. At the 5 minute mark, at 13:46:39

2017-08-21 13:46:39,036+10 ERROR [org.ovirt.engine.core.bll.storage.disk.image.GetUnregisteredDiskQuery] (org.ovirt.thread.pool-6-thread-48) [b56aacc1-c32e-49be-8c67-15897edb9783] Query 'GetUnregisteredDiskQuery' failed: Could not get JDBC Connection; nested exception is java.sql.SQLException: javax.resource.ResourceException: IJ000460: Error checking for a transaction
2017-08-21 13:46:39,036+10 ERROR [org.ovirt.engine.core.bll.storage.disk.image.GetUnregisteredDiskQuery] (org.ovirt.thread.pool-6-thread-48) [b56aacc1-c32e-49be-8c67-15897edb9783] Exception: org.springframework.jdbc.CannotGetJdbcConnectionException: Could not get JDBC Connection; nested exception is java.sql.SQLException: javax.resource.ResourceException: IJ000460: Error checking for a transaction

If I reduce the number of images so that the loop at [3] takes less than 5 minutes, everything works fine.
The logs are similar to bugzilla #1446878 but I don't see any deadlock in postgres logs.

Is the loop on getting the volume infos for all images inside a 5 minute transaction that times out?

Version-Release number of selected component (if applicable):
rhevm-4.1.4.2-0.1.el7.noarch

How reproducible:
100%

1. Create NFS Share
2. Create Export Domain on NFS share
3. Maintenance and Detach Export Domain
4. Use virt-v2v to populate it with ~50+ disks.
   # for ((n=0;n<50;n++)); do virt-v2v -o rhev -os 10.64.24.33:/exports/data7 rhel7.3; done
5. Attach Export Domain

* You may need more than 50 images if the Storage/Hosts are fast. Mine is quite slow.

Actual results:
Export Domain fails to attach

Expected results:
Export Domain attached

Comment 4 Allon Mureinik 2017-10-02 12:45:26 UTC
Benny, the attached patch is merged.
Should this be MODIFIED, or are we waiting for anything else?

Comment 5 Benny Zlotnik 2017-10-02 12:50:51 UTC
Moved to MODIFIED

Comment 6 rhev-integ 2017-11-02 13:38:35 UTC
INFO: Bug status wasn't changed from MODIFIED to ON_QA due to the following reason:

[No relevant external trackers attached]

For more info please contact: rhv-devops

Comment 8 Kevin Alon Goldblatt 2017-12-03 15:50:42 UTC
Verified with the following code:
--------------------------------------
ovirt-engine-4.2.0-0.5.master.el7.noarch
vdsm-4.20.8-53.gitc3edfc0.el7.centos.x86_64

Verified with the following scenario:
--------------------------------------
1. Created 50 vms with disk and exported to export domain
2. Set the export to maintenance and detached it
3. Attached the export domain again >>>>> the attach operation has undergone significant performance improvements and the it took a few second to complete
4. Vm Import displays the vms within a few seconds too.



Moving to VERIFIED

Comment 12 errata-xmlrpc 2018-05-15 17:43:37 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2018:1488

Comment 13 Franta Kust 2019-05-16 13:05:35 UTC
BZ<2>Jira Resync