Description of problem: When System is idle, we see the following error in engine.log (you can see the all error in attached log): 2017-12-18 12:14:03,408+02 ERROR [org.ovirt.engine.core.bll.DisplayAllAuditLogEventsCommand] (default task-3) [9b5 51e4d-82e2-4b80-a7c3-eb8b42502abc] Exception: org.springframework.dao.DeadlockLoserDataAccessException: CallableSt atementCallback; SQL [{call displayallauditlogevents(?)}]; ERROR: deadlock detected PL/pgSQL function displayallauditlogevents(integer) line 3 at SQL statement; nested exception is org.postgresql.ut il.PSQLException: ERROR: deadlock detected Version-Release number of selected component (if applicable): [root@vega10 ovirt-engine]# rpm -qa | grep ovirt-engine ovirt-engine-websocket-proxy-4.2.1-0.0.master.20171206161426.git88e9120.el7.centos.noarch ovirt-engine-vmconsole-proxy-helper-4.2.1-0.0.master.20171206161426.git88e9120.el7.centos.noarch ovirt-engine-webadmin-portal-4.2.1-0.0.master.20171206161426.git88e9120.el7.centos.noarch ovirt-engine-extensions-api-impl-4.2.1-0.0.master.20171206161426.git88e9120.el7.centos.noarch ovirt-engine-metrics-1.1.2-0.0.master.20171204093359.el7.centos.noarch ovirt-engine-extension-aaa-jdbc-1.1.7-0.0.master.gitbbb2931.el7.centos.noarch ovirt-engine-tools-backup-4.2.1-0.0.master.20171206161426.git88e9120.el7.centos.noarch ovirt-engine-lib-4.2.1-0.0.master.20171206161426.git88e9120.el7.centos.noarch ovirt-engine-cli-3.6.9.3-0.1.20170116.gitad0561a.el7.centos.noarch ovirt-engine-wildfly-overlay-11.0.1-1.el7.centos.noarch ovirt-engine-dwh-4.2.2-0.0.master.20171129124832.el7.centos.noarch ovirt-engine-restapi-4.2.1-0.0.master.20171206161426.git88e9120.el7.centos.noarch ovirt-engine-setup-plugin-ovirt-engine-4.2.1-0.0.master.20171206161426.git88e9120.el7.centos.noarch ovirt-engine-tools-4.2.1-0.0.master.20171206161426.git88e9120.el7.centos.noarch ovirt-engine-4.2.1-0.0.master.20171206161426.git88e9120.el7.centos.noarch ovirt-engine-setup-plugin-websocket-proxy-4.2.1-0.0.master.20171206161426.git88e9120.el7.centos.noarch ovirt-engine-sdk-python-3.6.9.2-0.1.20161204.gite99bbd1.el7.centos.noarch ovirt-engine-wildfly-11.0.0-1.el7.centos.x86_64 ovirt-engine-setup-4.2.1-0.0.master.20171206161426.git88e9120.el7.centos.noarch ovirt-engine-dbscripts-4.2.1-0.0.master.20171206161426.git88e9120.el7.centos.noarch ovirt-engine-dashboard-1.2.0-1.20171206git2170060.el7.centos.noarch ovirt-engine-api-explorer-0.0.3-0.alpha.1.20171205git87608c5.el7.centos.noarch ovirt-engine-dwh-setup-4.2.2-0.0.master.20171129124832.el7.centos.noarch ovirt-engine-setup-plugin-vmconsole-proxy-helper-4.2.1-0.0.master.20171206161426.git88e9120.el7.centos.noarch ovirt-engine-backend-4.2.1-0.0.master.20171206161426.git88e9120.el7.centos.noarch python-ovirt-engine-sdk4-4.2.2-2.el7.centos.x86_64 ovirt-engine-setup-base-4.2.1-0.0.master.20171206161426.git88e9120.el7.centos.noarch ovirt-engine-setup-plugin-ovirt-engine-common-4.2.1-0.0.master.20171206161426.git88e9120.el7.centos.noarch # cat /etc/os-release NAME="Red Hat Enterprise Linux Server" VERSION="7.4 (Maipo)" How reproducible: This was noticed only once, during monitoring of our system which is used for scale tests. System was idle at that time Steps to Reproduce: We made the following setup and monitored it during the night: Physical hosts: 6 Nested hosts: 400 those are basically vms which act as hosts (with vdsm package) RHEVM engine: 1 (vega10) VMs: 1200 400 of them reserved for nested hosts Storage: ISCSI Actual results: See above ERROR from engine.log Expected results: No errors Additional info: Attached engine.log
- severity? - title misses the fact it happens in scale? - title misses where the deadlock occured (added). - was that before or after the DWH change? - have we seen it in lower scale?
Please add the relevant PG server log, we need it to get the details about the deadlock details from the database log
(In reply to Yaniv Kaul from comment #2) > - severity? > - title misses the fact it happens in scale? > - title misses where the deadlock occured (added). > - was that before or after the DWH change? > - have we seen it in lower scale? > - severity?-->Still investigating for the impact > - title misses the fact it happens in scale?-->changed > - title misses where the deadlock occured (added). > - was that before or after the DWH change?-->before > - have we seen it in lower scale?-->no
Severity?
Eli, note that the engine.log contains also the following error: 2017-12-18 11:56:08,918+02 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.MessageBundler] (default task-35) [09f17fff-ccfe-4105-99f1-02507a268e76] Key 'USER_DISPLAY_ALL_AUDIT_LOG' is not translated in 'bundles/AuditLogMessages' and by looking at backend/manager/modules/dal/src/main/resources/bundles/AuditLogMessages.properties - I can't find any message to be printed for 'USER_DISPLAY_ALL_AUDIT_LOG' event and neither for 'USER_DISPLAY_ALL_AUDIT_LOG_FAILED'
Verified on ovirt-engine-4.2.2-0.0.master.20180107151411.git59e7855.el7.centos.noarch I'm not seeing any such error in log in my env. If you are able to reproduce this, please reopen this bug.
This bugzilla is included in oVirt 4.2.1 release, published on Feb 12th 2018. Since the problem described in this bug report should be resolved in oVirt 4.2.1 release, it has been closed with a resolution of CURRENT RELEASE. If the solution does not work for you, please open a new bug report.
Created attachment 1401912 [details] Engine, Postgres logs showing exception related to deadlock in audit table on clearing
#Reproducing: Able to reproduce this by pressing 'clear all' on alerts, then 'clear all' on events in the drop down tab in webadmin UI. #Enviroment Running 4.2.2.1-0.1.el7 on RHEL 7.5 (3.10.0-855.el7.x86_64) 401 Hosts, and 1542 VMs Postgres engine db is on same tier as engine application. Postgres of dwh is on remote tier. Both pg_data are on mounted luns. #Attachment shows: UI exception and Engine.log d exception and corresponding postgres.log during the time of the exception. <snip> 2018-02-28 11:44:57.881 UTC DETAIL: Process 27691 waits for ShareLock on transaction 69491033; blocked by process 27707. Process 27707 waits for ExclusiveLock on tuple (22721,5) of relation 16465 of database 16385; blocked by process 22313. Process 22313 waits for ShareLock on transaction 69491702; blocked by process 27691. Process 27691: select * from clearallauditlogevents($1) as result Process 27707: select * from setallauditlogalerts($1, $2) as result Process 22313: select * from clearallauditlogevents($1) as result 2018-02-28 11:44:57.881 UTC HINT: See server log for query details. 2018-02-28 11:44:57.881 UTC CONTEXT: while updating tuple (22368,1) in relation "audit_log" SQL statement "UPDATE audit_log SET deleted = true FROM ( SELECT * FROM audit_log WHERE severity != v_severity AND NOT deleted FOR UPDATE) AS s" PL/pgSQL function clearallauditlogevents(integer) line 4 at SQL statement 2018-02-28 11:44:57.881 UTC STATEMENT: select * from clearallauditlogevents($1) as result </snip> Re-opening this bug.
Eli, might this issue be fixed by patches which was created for BZ1535864?
(In reply to Martin Perina from comment #13) > Eli, might this issue be fixed by patches which was created for BZ1535864? Probably , because in current code all table is updated and the select for update protects only part of the records
Moving to ON_QA as this should be fixed using the same fix as for BZ1535864
Validated on 4.2.2.4-0.1.el7 on our scale setup. Watched both, engine log and pg log while executing the steps described by Mordechai in comment 12. Also watched those logs in idle, no deadlock error noticed.
This bugzilla is included in oVirt 4.2.2 release, published on March 28th 2018. Since the problem described in this bug report should be resolved in oVirt 4.2.2 release, it has been closed with a resolution of CURRENT RELEASE. If the solution does not work for you, please open a new bug report.