Bug 1665784

Summary: Several "Interrupted attempting lock" during startup on larger scale environment.
Product: Red Hat Enterprise Virtualization Manager Reporter: Germano Veit Michel <gveitmic>
Component: ovirt-engineAssignee: Eli Mesika <emesika>
Status: CLOSED WONTFIX QA Contact: Lukas Svaty <lsvaty>
Severity: medium Docs Contact:
Priority: high    
Version: 4.2.7CC: emesika, gveitmic, mgoldboi, mperina, ratamir, Rhev-m-bugs, rnori
Target Milestone: ---Flags: lsvaty: testing_plan_complete-
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-03-12 13:16:07 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Infra RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Germano Veit Michel 2019-01-14 02:26:37 UTC
Description of problem:

While trying to diagnose a RHV environment that is not able to add storage domains, we noticed several problems. Among them, one may be a bug: when the engine is starting there are lots of "Interrupted attempting lock" SQL exceptions. There are also more DB exceptions later, but they seem to concentrate during initialization times, see:

2019-01-10 15:00:28,289Z INFO  [org.ovirt.engine.core.bll.Backend] (ServerService Thread Pool -- 41) [] Running ovirt-engine 4.2.7.5-0.1.el7ev

2019-01-10 15:03:30,976Z INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogableBase] (EE-ManagedThreadFactory-engineScheduled-Thread-16) [4575cbf0] Failed to get vds '88d5ae8c-fe39-4f2b-bd29-27b5238d8fd1', error: PreparedStatementCallback; uncategorized SQLException for SQL [select * from  getvdsstaticbyvdsid(?)]; SQL state [null]; error code [0]; IJ031013: Interrupted attempting lock: org.jboss.jca.adapters.jdbc.local.LocalManagedConnection@72c6f1fc; nested exception is java.sql.SQLException: IJ031013: Interrupted attempting lock: org.jboss.jca.adapters.jdbc.local.LocalManagedConnection@72c6f1fc

Several queries are impacted, among them:
getvdsbyvdsid, getvdsstaticbyvdsid, getiscsiifacesbyhostidandstoragetargetid

Version-Release number of selected component (if applicable):
ovirt-engine 4.2.7.5-0.1.el7ev

How reproducible:
Unknown, but large scale environment

engine=> select count(*) from vds;
 count 
-------
   127

engine=> select count(*) from storage_domains;
 count 
-------
   354
(1 row)


Additional info:
1. There are several other apparent problems, we are working on possible network and unreachable storage issues. Still, it may be a scalability problem and these exceptions do not look right.

2. We already raised max_connections. Seems to be using ~110 during sosreport.
/var/opt/rh/rh-postgresql95/lib/pgsql/data/postgresql.conf:max_connections = 250
Value of property 'ENGINE_DB_MAX_CONNECTIONS' is '200'.

Comment 11 Moran Goldboim 2019-03-12 13:16:07 UTC
sev 1 issue seen is fixed by a hardware upgrade. closing this bug, please reopen if applicable.