Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1483400

Summary:	Highly populated export domain fails to attach.
Product:	Red Hat Enterprise Virtualization Manager	Reporter:	Germano Veit Michel <gveitmic>
Component:	ovirt-engine	Assignee:	Benny Zlotnik <bzlotnik>
Status:	CLOSED ERRATA	QA Contact:	Kevin Alon Goldblatt <kgoldbla>
Severity:	medium	Docs Contact:
Priority:	medium
Version:	4.1.4	CC:	bzlotnik, ebenahar, lsurette, ratamir, rbalakri, Rhev-m-bugs, srevivo, tnisan, ykaul, ylavi
Target Milestone:	ovirt-4.2.0
Target Release:	---
Hardware:	x86_64
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2018-05-15 17:43:37 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	Storage	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Germano Veit Michel 2017-08-21 04:06:43 UTC

Description of problem:

1. at 13:41:06, admin attaches export domain to DC.

2017-08-21 13:41:06,655+10 INFO  [org.ovirt.engine.core.bll.storage.domain.AttachStorageDomainToPoolCommand] (org.ovirt.thread.pool-6-thread-48) [b56aacc1-c32e-49be-8c67-15897edb9783] Running command: AttachStorageDomainToPoolCommand internal: false. Entities affected :  ID: 92db7237-df7e-4e08-bbb3-3c040bfef826 Type: StorageAction group MANIPULATE_STORAGE_DOMAIN with role type ADMIN,  ID: 59938be9-0258-02fc-02b4-0000000000aa Type: StoragePoolAction group MANIPULATE_STORAGE_DOMAIN with role type ADMIN

2. Then at 13:41:39, a GetImagesListVDSCommand returning 50+ images (highly populated Export SD)

3. Then a sequence of GetVolumesListVDSCommand and GetImageInfoVDSCommand for the list in [2].
This loops for 5 minutes, getting the info for all the images sequentially.

4. At the 5 minute mark, at 13:46:39

2017-08-21 13:46:39,036+10 ERROR [org.ovirt.engine.core.bll.storage.disk.image.GetUnregisteredDiskQuery] (org.ovirt.thread.pool-6-thread-48) [b56aacc1-c32e-49be-8c67-15897edb9783] Query 'GetUnregisteredDiskQuery' failed: Could not get JDBC Connection; nested exception is java.sql.SQLException: javax.resource.ResourceException: IJ000460: Error checking for a transaction
2017-08-21 13:46:39,036+10 ERROR [org.ovirt.engine.core.bll.storage.disk.image.GetUnregisteredDiskQuery] (org.ovirt.thread.pool-6-thread-48) [b56aacc1-c32e-49be-8c67-15897edb9783] Exception: org.springframework.jdbc.CannotGetJdbcConnectionException: Could not get JDBC Connection; nested exception is java.sql.SQLException: javax.resource.ResourceException: IJ000460: Error checking for a transaction

If I reduce the number of images so that the loop at [3] takes less than 5 minutes, everything works fine.
The logs are similar to bugzilla #1446878 but I don't see any deadlock in postgres logs.

Is the loop on getting the volume infos for all images inside a 5 minute transaction that times out?

Version-Release number of selected component (if applicable):
rhevm-4.1.4.2-0.1.el7.noarch

How reproducible:
100%

1. Create NFS Share
2. Create Export Domain on NFS share
3. Maintenance and Detach Export Domain
4. Use virt-v2v to populate it with ~50+ disks.
   # for ((n=0;n<50;n++)); do virt-v2v -o rhev -os 10.64.24.33:/exports/data7 rhel7.3; done
5. Attach Export Domain

* You may need more than 50 images if the Storage/Hosts are fast. Mine is quite slow.

Actual results:
Export Domain fails to attach

Expected results:
Export Domain attached

Comment 4 Allon Mureinik 2017-10-02 12:45:26 UTC

Benny, the attached patch is merged.
Should this be MODIFIED, or are we waiting for anything else?

Comment 5 Benny Zlotnik 2017-10-02 12:50:51 UTC

Moved to MODIFIED

Comment 6 rhev-integ 2017-11-02 13:38:35 UTC

INFO: Bug status wasn't changed from MODIFIED to ON_QA due to the following reason:

[No relevant external trackers attached]

For more info please contact: rhv-devops

Comment 8 Kevin Alon Goldblatt 2017-12-03 15:50:42 UTC

Verified with the following code:
--------------------------------------
ovirt-engine-4.2.0-0.5.master.el7.noarch
vdsm-4.20.8-53.gitc3edfc0.el7.centos.x86_64

Verified with the following scenario:
--------------------------------------
1. Created 50 vms with disk and exported to export domain
2. Set the export to maintenance and detached it
3. Attached the export domain again >>>>> the attach operation has undergone significant performance improvements and the it took a few second to complete
4. Vm Import displays the vms within a few seconds too.



Moving to VERIFIED

Comment 12 errata-xmlrpc 2018-05-15 17:43:37 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2018:1488

Comment 13 Franta Kust 2019-05-16 13:05:35 UTC

BZ<2>Jira Resync