Created attachment 1495423 [details] engine.log Description of problem: I run a cluster with 10 hosts for automated testing purposes. I automatically create VMs, run some workload and delete them again. Today I ran into a situation where I could not create VMs anymore. Messages were: Failed to create VM <vmname> (User: …). Failed to complete VM <vmname> creation. Version-Release number of selected component (if applicable): 4.2.6.4-1.el7 How reproducible: Occurred today for the first time, generated load on hosts by automatically creating vms, running vm workloads, deleting vms Steps to Reproduce: 1. 2. 3. Actual results: VM creation failed. Expected results: Additional info:
Looks like virt/infra issue. Oliver, any errors on the hosts in vdsm.log?
Created attachment 1495445 [details] VDSM logs from hosts and SPM
Nir, Just those: vdsm.log.3.06:2018-10-18 19:50:41,244+0200 ERROR (vmchannels) [virt.vm] (vmId='7d907188-7500-41eb-994b-ca7dd04af4e1') No JSON object could be decoded: 'e": ""}' (guestagent:566) vdsm.log.3.09:2018-10-18 19:49:20,655+0200 ERROR (vmchannels) [virt.vm] (vmId='5e89450b-7f21-42c2-8db1-358201176ee2') No JSON object could be decoded: 'in 10"}' (guestagent:566) vdsm.log.3.13:2018-10-18 19:48:45,017+0200 ERROR (vmchannels) [virt.vm] (vmId='5af0ec86-58e5-4eee-93fb-1555665cfab7') No JSON object could be decoded: 'WSVAGRANT"}' (guestagent:566) vdsm.log.3.15:2018-10-18 19:11:22,660+0200 ERROR (vmchannels) [virt.vm] (vmId='fcc2c898-a289-4a0a-a7e6-32a6bb2d5e01') No JSON object could be decoded: 'in 10"}' (guestagent:566) vdsm.log.3.15:2018-10-18 19:34:10,053+0200 ERROR (vmchannels) [virt.vm] (vmId='cb9b0e6d-8cf2-4fa2-a8ba-b4fea6b870ce') No JSON object could be decoded: 'WSVAGRANT"}' (guestagent:566) vdsm.log.3.15:2018-10-18 19:34:16,925+0200 ERROR (vmchannels) [virt.vm] (vmId='462d5314-2e8d-4438-b379-8acda9818242') No JSON object could be decoded: 'e": ""}' (guestagent:566) vdsm.log.3.17:2018-10-18 19:41:45,433+0200 ERROR (vmchannels) [virt.vm] (vmId='33d55c0b-1235-4145-85d3-6e8faf186cb3') No JSON object could be decoded: 'in 10"}' (guestagent:566) vdsm.log.3.27:2018-10-18 19:42:52,795+0200 ERROR (vmchannels) [virt.vm] (vmId='8f37bf67-c4bf-4269-984e-cdd859955a87') No JSON object could be decoded: 'WSVAGRANT"}' (guestagent:566) vdsm.log.3.27:2018-10-18 19:42:52,795+0200 ERROR (vmchannels) [virt.vm] (vmId='8f37bf67-c4bf-4269-984e-cdd859955a87') Extra data: line 1 column 15 - line 1 column 28 (char 14 - 27): ' "active-user", "name": ""}' (guestagent:566) vdsm.log.3.30:2018-10-18 19:35:10,885+0200 ERROR (vmchannels) [virt.vm] (vmId='c27f1a31-8cf9-4c80-a341-1543d5262e1e') No JSON object could be decoded: 'in 2016"}' (guestagent:566) vdsm.log.3.31:2018-10-18 19:38:03,316+0200 ERROR (vmchannels) [virt.vm] (vmId='5a641993-e67f-4b44-aadc-1e8ac4028ef5') No JSON object could be decoded: 'in 10"}' (guestagent:566) vdsm.log.3.31:2018-10-18 19:51:11,817+0200 ERROR (vmchannels) [virt.vm] (vmId='9c4ac41c-46d8-4a87-a53b-790ab663a640') No JSON object could be decoded: ': "WINDOWSVAGRANT"}' (guestagent:566) I attached VDSM logs from hosts and SPM.
Please attach engine.log
Engine log is already attached. Do you need more entries before the occurrence?
I didn't see it in the attached archive. Let me check again
What actually happens to the VMs? I see that the engine schedules it and hands it off to vdsm, and vdsm starts the VM. The guest agent may not be installed, but there's not much else in the logs. Can you attach a qemu log for one of the VMs which may be having problems? foliocissprint-c29, for example?
Created attachment 1495668 [details] qemu logs qemu logs for vm foliocicsprint-c17 from all hosts in the cluster. Related messages in engine: log_time | severity | message ----------------------------+----------+--------------------------------------------------------------------------------------------------------------------- 2018-10-18 10:31:44.443+02 | 0 | VM foliocicsprint-c17 creation was initiated by username.com.com-authz. 2018-10-18 10:32:00.899+02 | 0 | VM foliocicsprint-c17 creation has been completed. 2018-10-18 10:32:02.91+02 | 0 | VM foliocicsprint-c17 was started by username.com.com-authz (Host: testVMHOST029). 2018-10-18 10:32:52.191+02 | 0 | VM foliocicsprint-c17 started on Host testVMHOST029 2018-10-18 11:23:47.243+02 | 0 | VM foliocicsprint-c17 powered off by username.com.com-authz (Host: testVMHOST029). 2018-10-18 11:23:53.491+02 | 0 | VM foliocicsprint-c17 is down. 2018-10-18 11:24:04.234+02 | 0 | VM foliocicsprint-c17 was successfully removed. 2018-10-18 14:05:20.526+02 | 0 | VM foliocicsprint-c17 creation was initiated by username.com.com-authz. 2018-10-18 14:07:40.559+02 | 0 | VM foliocicsprint-c17 creation has been completed. 2018-10-18 14:07:44.792+02 | 0 | VM foliocicsprint-c17 was started by username.com.com-authz (Host: testVMHOST015). 2018-10-18 14:08:16.516+02 | 0 | VM foliocicsprint-c17 started on Host testVMHOST015 2018-10-18 15:04:10.021+02 | 0 | VM foliocicsprint-c17 powered off by username.com.com-authz (Host: testVMHOST015). 2018-10-18 15:04:21.358+02 | 0 | VM foliocicsprint-c17 is down. 2018-10-18 15:05:01.665+02 | 0 | VM foliocicsprint-c17 was successfully removed. 2018-10-18 19:50:10.159+02 | 2 | Failed to create VM foliocicsprint-c17 (User: username.com.com-authz). 2018-10-18 19:50:17.576+02 | 2 | Failed to complete VM foliocicsprint-c17 creation. 2018-10-18 19:52:18.235+02 | 0 | VM foliocicsprint-c17 creation was initiated by username.com.com-authz. 2018-10-18 19:52:30.049+02 | 0 | VM foliocicsprint-c17 creation has been completed. 2018-10-18 19:52:39.031+02 | 0 | VM foliocicsprint-c17 was started by username.com.com-authz (Host: testVMHOST031). 2018-10-18 19:53:21.57+02 | 0 | VM foliocicsprint-c17 started on Host testVMHOST031 2018-10-18 20:58:54.516+02 | 0 | VM foliocicsprint-c17 powered off by username.com.com-authz (Host: testVMHOST031). 2018-10-18 20:59:02.844+02 | 0 | VM foliocicsprint-c17 is down. 2018-10-18 20:59:42.686+02 | 0 | VM foliocicsprint-c17 was successfully removed. 2018-10-19 05:06:32.552+02 | 0 | VM foliocicsprint-c17 creation was initiated by username.com.com-authz. 2018-10-19 05:06:46.185+02 | 0 | VM foliocicsprint-c17 creation has been completed. 2018-10-19 05:06:51.024+02 | 0 | VM foliocicsprint-c17 was started by username.com.com-authz (Host: testVMHOST031). 2018-10-19 05:07:21.55+02 | 0 | VM foliocicsprint-c17 started on Host testVMHOST031 2018-10-19 06:04:04.024+02 | 0 | VM foliocicsprint-c17 powered off by username.com.com-authz (Host: testVMHOST031). 2018-10-19 06:04:42.193+02 | 0 | VM foliocicsprint-c17 was successfully removed.
The problem is, that vms sometimes (e.g. foliocicsprint-c17 at 2018-10-18 19:50:10.159+02) cannot be created, I don't see vdsm trying to start one of those.
This looks like a storage problem. Is the template locked or otherwise unreachable?
I did not see any storage problem. The template is used by multiple VMs in parallel. Is there a limit?
there's no limit, but your system doesnt' seem to keep up. It timed out after 10 minutes (the example in comment #9) and in the meantime there were a lot of VM starts. How many do you try to run in parallel?
(In reply to Michal Skrivanek from comment #12) > there's no limit, but your system doesnt' seem to keep up. It timed out > after 10 minutes (the example in comment #9) and in the meantime there were > a lot of VM starts. How many do you try to run in parallel? Hi, each job creates around 20-40 thin clones from the same template. (target should be 100-200 if stable) but there can be multiple jobs at the same time creating clones of the template.
Can you throttle this job? It seems as if your storage isn't able to keep up. Why not time it to create 20 at a time (or more, you'd need to test how long until it times out), then continue to the next batch?
(In reply to Ryan Barry from comment #14) > Can you throttle this job? It seems as if your storage isn't able to keep up. > > Why not time it to create 20 at a time (or more, you'd need to test how long > until it times out), then continue to the next batch? I don't think this can be done client side. I can throttle the creation per job, but don't now how many jobs are running somewhere else. Its not the only job running against the same engine and storage. It whould be ok for me if the VMs need longer time for creating if there is some load (engine, storage ...), but i am not ok with getting "Internal Engine Error" from the API. The queing has to be done on oVirt side.
Lowering priority as it's hardware related.
(In reply to Doron Fediuck from comment #16) > Lowering priority as it's hardware related. I don't think you could fix this with hardware, we already have an all flash storage attached with fibrechannel and it happens with 20 clones at the same time.
(In reply to christian.grundmann from comment #15) > > It whould be ok for me if the VMs need longer time for creating if there is > some load (engine, storage ...), but i am not ok with getting "Internal > Engine Error" from the API. The queing has to be done on oVirt side. yes, that's the transaction timeout for sync jobs. For normal creations diska are created separately and asynchronously internally, here it seems the clone operation is part of the synchronous response and that's wrong. What iexactly are you calling? can you post a sample rest api request?
(In reply to Michal Skrivanek from comment #18) > (In reply to christian.grundmann from comment #15) > > > > > It whould be ok for me if the VMs need longer time for creating if there is > > some load (engine, storage ...), but i am not ok with getting "Internal > > Engine Error" from the API. The queing has to be done on oVirt side. > > yes, that's the transaction timeout for sync jobs. For normal creations > diska are created separately and asynchronously internally, here it seems > the clone operation is part of the synchronous response and that's wrong. > What iexactly are you calling? can you post a sample rest api request? I am using vagrant with ovirt4 plugin https://www.ovirt.org/blog/2017/02/using-oVirt-vagrant/ which uses your ruby SDK https://github.com/myoung34/vagrant-ovirt4/blob/master/lib/vagrant-ovirt4/action/create_vm.rb hope this helps
Is this as simple as calling 'vagrant up' 20 times in succession?
(In reply to Ryan Barry from comment #20) > Is this as simple as calling 'vagrant up' 20 times in succession? we are using https://www.vagrantup.com/docs/multi-machine/ but yes it should also be reproduceable if you call vagrant up in parallel
Let's move this to an asynchronous call
Re-targeting to 4.3.1 since it is missing a patch, an acked blocker flag, or both
Created attachment 1588743 [details] engine.log for vm fsc-gkjay1zx-clientwindows1-c7
In some cases when we hit this bug we can see the Job for creating the vm running a long time (10-45 minutes) before it fails. If this happens all other jobs which operate on the same template (delete vm, create vm) are blocked until the long running create fails. I added a log file with multiple retries creating the VM "fsc-gkjay1zx-clientwindows1-c7" from a template A database error can be found in the log: 2019-07-08 12:53:42,572+02 ERROR [org.ovirt.engine.core.bll.AddVmCommand] (default task-49982) [] Command 'org.ovirt.engine.core.bll.AddVmCommand' failed: Failed to obtain JDBC Connection; nested exception is java.sql.SQLException: javax.resource.ResourceException: IJ000457: Unchecked throwable in managedConnectionReconnected() cl=org.jboss.jca.core.connectionmanager.listener.TxConnectionListener@6cc2006f[state=NORMAL managed connection=org.jboss.jca.adapters.jdbc.local.LocalManagedConnection@258f4e0 connection handles=1 lastReturned=1562583222519 lastValidated=1562582322918 lastCheckedOut=1562583222519 trackByTx=false pool=org.jboss.jca.core.connectionmanager.pool.strategy.OnePool@4c466379 mcp=SemaphoreConcurrentLinkedQueueManagedConnectionPool@387e9380[pool=ENGINEDataSource] xaResource=LocalXAResourceImpl@3def3afd[connectionListener=6cc2006f connectionManager=7388ab63 warned=false currentXid=null productName=PostgreSQL productVersion=10.6 jndiName=java:/ENGINEDataSource] txSync=null] 2019-07-08 12:53:42,573+02 INFO [org.ovirt.engine.core.vdsbroker.SetVmStatusVDSCommand] (default task-50082) [] FINISH, SetVmStatusVDSCommand, return: , log id: f87a880 2019-07-08 12:53:42,572+02 ERROR [org.ovirt.engine.core.bll.AddVmCommand] (default task-49982) [] Exception: org.springframework.jdbc.CannotGetJdbcConnectionException: Failed to obtain JDBC Connection; nested exception is java.sql.SQLException: javax.resource.ResourceException: IJ000457: Unchecked throwable in managedConnectionReconnected() cl=org.jboss.jca.core.connectionmanager.listener.TxConnectionListener@6cc2006f[state=NORMAL managed connection=org.jboss.jca.adapters.jdbc.local.LocalManagedConnection@258f4e0 connection handles=1 lastReturned=1562583222519 lastValidated=1562582322918 lastCheckedOut=1562583222519 trackByTx=false pool=org.jboss.jca.core.connectionmanager.pool.strategy.OnePool@4c466379 mcp=SemaphoreConcurrentLinkedQueueManagedConnectionPool@387e9380[pool=ENGINEDataSource] xaResource=LocalXAResourceImpl@3def3afd[connectionListener=6cc2006f connectionManager=7388ab63 warned=false currentXid=null productName=PostgreSQL productVersion=10.6 jndiName=java:/ENGINEDataSource] txSync=null] at org.springframework.jdbc.datasource.DataSourceUtils.getConnection(DataSourceUtils.java:81) [spring-jdbc.jar:5.0.4.RELEASE] at org.springframework.jdbc.core.JdbcTemplate.execute(JdbcTemplate.java:1045) [spring-jdbc.jar:5.0.4.RELEASE] at org.springframework.jdbc.core.JdbcTemplate.call(JdbcTemplate.java:1104) [spring-jdbc.jar:5.0.4.RELEASE] at org.springframework.jdbc.core.simple.AbstractJdbcCall.executeCallInternal(AbstractJdbcCall.java:414) [spring-jdbc.jar:5.0.4.RELEASE] at org.springframework.jdbc.core.simple.AbstractJdbcCall.doExecute(AbstractJdbcCall.java:374) [spring-jdbc.jar:5.0.4.RELEASE] at org.springframework.jdbc.core.simple.SimpleJdbcCall.execute(SimpleJdbcCall.java:198) [spring-jdbc.jar:5.0.4.RELEASE] at org.ovirt.engine.core.dal.dbbroker.SimpleJdbcCallsHandler.executeImpl(SimpleJdbcCallsHandler.java:135) [dal.jar:] at org.ovirt.engine.core.dal.dbbroker.SimpleJdbcCallsHandler.executeImpl(SimpleJdbcCallsHandler.java:130) [dal.jar:] at org.ovirt.engine.core.dal.dbbroker.SimpleJdbcCallsHandler.executeModification(SimpleJdbcCallsHandler.java:76) [dal.jar:] at org.ovirt.engine.core.dao.DefaultGenericDao.save(DefaultGenericDao.java:93) [dal.jar:] at org.ovirt.engine.core.dao.VmDynamicDaoImpl.save(VmDynamicDaoImpl.java:35) [dal.jar:] at org.ovirt.engine.core.bll.AddVmCommand.addVmDynamic(AddVmCommand.java:1272) [bll.jar:] at org.ovirt.engine.core.bll.AddVmCommand.lambda$executeVmCommand$3(AddVmCommand.java:1003) [bll.jar:] at org.ovirt.engine.core.utils.transaction.TransactionSupport.executeInNewTransaction(TransactionSupport.java:202) [utils.jar:] at org.ovirt.engine.core.bll.AddVmCommand.executeVmCommand(AddVmCommand.java:1001) [bll.jar:] at org.ovirt.engine.core.bll.VmCommand.executeCommand(VmCommand.java:172) [bll.jar:] at org.ovirt.engine.core.bll.CommandBase.executeWithoutTransaction(CommandBase.java:1157) [bll.jar:] at org.ovirt.engine.core.bll.CommandBase.executeActionInTransactionScope(CommandBase.java:1315) [bll.jar:] at org.ovirt.engine.core.bll.CommandBase.runInTransaction(CommandBase.java:1964) [bll.jar:] at org.ovirt.engine.core.utils.transaction.TransactionSupport.executeInSuppressed(TransactionSupport.java:164) [utils.jar:] at org.ovirt.engine.core.utils.transaction.TransactionSupport.executeInScope(TransactionSupport.java:103) [utils.jar:] at org.ovirt.engine.core.bll.CommandBase.execute(CommandBase.java:1375) [bll.jar:] at org.ovirt.engine.core.bll.CommandBase.executeAction(CommandBase.java:419) [bll.jar:] at org.ovirt.engine.core.bll.executor.DefaultBackendActionExecutor.execute(DefaultBackendActionExecutor.java:13) [bll.jar:] at org.ovirt.engine.core.bll.Backend.runAction(Backend.java:451) [bll.jar:] at org.ovirt.engine.core.bll.Backend.runActionImpl(Backend.java:433) [bll.jar:] at org.ovirt.engine.core.bll.Backend.runAction(Backend.java:388) [bll.jar:] at sun.reflect.GeneratedMethodAccessor482.invoke(Unknown Source) [:1.8.0_212] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) [rt.jar:1.8.0_212] at java.lang.reflect.Method.invoke(Method.java:498) [rt.jar:1.8.0_212] at org.jboss.as.ee.component.ManagedReferenceMethodInterceptor.processInvocation(ManagedReferenceMethodInterceptor.java:52) at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:422) at org.jboss.invocation.InterceptorContext$Invocation.proceed(InterceptorContext.java:509) at org.jboss.as.weld.ejb.DelegatingInterceptorInvocationContext.proceed(DelegatingInterceptorInvocationContext.java:92) [wildfly-weld-ejb-15.0.1.Final.jar:15.0.1.Final] at org.jboss.weld.interceptor.proxy.WeldInvocationContextImpl.interceptorChainCompleted(WeldInvocationContextImpl.java:107) [weld-core-impl-3.0.5.Final.jar:3.0.5.Final] at org.jboss.weld.interceptor.proxy.WeldInvocationContextImpl.proceed(WeldInvocationContextImpl.java:126) [weld-core-impl-3.0.5.Final.jar:3.0.5.Final] at org.ovirt.engine.core.common.di.interceptor.LoggingInterceptor.apply(LoggingInterceptor.java:12) [common.jar:] at sun.reflect.GeneratedMethodAccessor74.invoke(Unknown Source) [:1.8.0_212] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) [rt.jar:1.8.0_212] at java.lang.reflect.Method.invoke(Method.java:498) [rt.jar:1.8.0_212] at org.jboss.weld.interceptor.reader.SimpleInterceptorInvocation$SimpleMethodInvocation.invoke(SimpleInterceptorInvocation.java:73) [weld-core-impl-3.0.5.Final.jar:3.0.5.Final] at org.jboss.weld.interceptor.proxy.WeldInvocationContextImpl.invokeNext(WeldInvocationContextImpl.java:92) [weld-core-impl-3.0.5.Final.jar:3.0.5.Final] at org.jboss.weld.interceptor.proxy.WeldInvocationContextImpl.proceed(WeldInvocationContextImpl.java:124) [weld-core-impl-3.0.5.Final.jar:3.0.5.Final] at org.jboss.weld.bean.InterceptorImpl.intercept(InterceptorImpl.java:105) [weld-core-impl-3.0.5.Final.jar:3.0.5.Final] at org.jboss.as.weld.ejb.DelegatingInterceptorInvocationContext.proceed(DelegatingInterceptorInvocationContext.java:82) [wildfly-weld-ejb-15.0.1.Final.jar:15.0.1.Final] at org.jboss.as.weld.interceptors.EjbComponentInterceptorSupport.delegateInterception(EjbComponentInterceptorSupport.java:60) at org.jboss.as.weld.interceptors.Jsr299BindingsInterceptor.delegateInterception(Jsr299BindingsInterceptor.java:77) at org.jboss.as.weld.interceptors.Jsr299BindingsInterceptor.doMethodInterception(Jsr299BindingsInterceptor.java:89) at org.jboss.as.weld.interceptors.Jsr299BindingsInterceptor.processInvocation(Jsr299BindingsInterceptor.java:102) at org.jboss.as.ee.component.interceptors.UserInterceptorFactory$1.processInvocation(UserInterceptorFactory.java:63) at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:422) at org.jboss.invocation.InterceptorContext$Invocation.proceed(InterceptorContext.java:509) at org.ovirt.engine.core.bll.interceptors.CorrelationIdTrackerInterceptor.aroundInvoke(CorrelationIdTrackerInterceptor.java:13) [bll.jar:] at sun.reflect.GeneratedMethodAccessor88.invoke(Unknown Source) [:1.8.0_212] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) [rt.jar:1.8.0_212] at java.lang.reflect.Method.invoke(Method.java:498) [rt.jar:1.8.0_212] at org.jboss.as.ee.component.ManagedReferenceLifecycleMethodInterceptor.processInvocation(ManagedReferenceLifecycleMethodInterceptor.java:89) at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:422) at org.jboss.as.ejb3.component.invocationmetrics.ExecutionTimeInterceptor.processInvocation(ExecutionTimeInterceptor.java:43) [wildfly-ejb3-15.0.1.Final.jar:15.0.1.Final] at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:422) at org.jboss.as.ee.concurrent.ConcurrentContextInterceptor.processInvocation(ConcurrentContextInterceptor.java:45) [wildfly-ee-15.0.1.Final.jar:15.0.1.Final] at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:422) at org.jboss.invocation.InitialInterceptor.processInvocation(InitialInterceptor.java:40) at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:422) at org.jboss.invocation.ChainedInterceptor.processInvocation(ChainedInterceptor.java:53) at org.jboss.as.ee.component.interceptors.ComponentDispatcherInterceptor.processInvocation(ComponentDispatcherInterceptor.java:52) at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:422) at org.jboss.as.ejb3.component.singleton.SingletonComponentInstanceAssociationInterceptor.processInvocation(SingletonComponentInstanceAssociationInterceptor.java:53) [wildfly-ejb3-15.0.1.Final.jar:15.0.1.Final] at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:422) at org.jboss.as.ejb3.tx.CMTTxInterceptor.invokeInNoTx(CMTTxInterceptor.java:216) [wildfly-ejb3-15.0.1.Final.jar:15.0.1.Final] at org.jboss.as.ejb3.tx.CMTTxInterceptor.supports(CMTTxInterceptor.java:418) [wildfly-ejb3-15.0.1.Final.jar:15.0.1.Final] at org.jboss.as.ejb3.tx.CMTTxInterceptor.processInvocation(CMTTxInterceptor.java:148) [wildfly-ejb3-15.0.1.Final.jar:15.0.1.Final] at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:422) at org.jboss.invocation.InterceptorContext$Invocation.proceed(InterceptorContext.java:509) at org.jboss.weld.module.ejb.AbstractEJBRequestScopeActivationInterceptor.aroundInvoke(AbstractEJBRequestScopeActivationInterceptor.java:81) [weld-ejb-3.0.5.Final.jar:3.0.5.Final] at org.jboss.as.weld.ejb.EjbRequestScopeActivationInterceptor.processInvocation(EjbRequestScopeActivationInterceptor.java:89) at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:422) at org.jboss.as.ejb3.component.interceptors.CurrentInvocationContextInterceptor.processInvocation(CurrentInvocationContextInterceptor.java:41) [wildfly-ejb3-15.0.1.Final.jar:15.0.1.Final] at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:422) at org.jboss.as.ejb3.component.invocationmetrics.WaitTimeInterceptor.processInvocation(WaitTimeInterceptor.java:47) [wildfly-ejb3-15.0.1.Final.jar:15.0.1.Final] at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:422) at org.jboss.as.ejb3.security.SecurityContextInterceptor.processInvocation(SecurityContextInterceptor.java:100) [wildfly-ejb3-15.0.1.Final.jar:15.0.1.Final] at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:422) at org.jboss.as.ejb3.deployment.processors.StartupAwaitInterceptor.processInvocation(StartupAwaitInterceptor.java:22) [wildfly-ejb3-15.0.1.Final.jar:15.0.1.Final] at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:422) at org.jboss.as.ejb3.component.interceptors.ShutDownInterceptorFactory$1.processInvocation(ShutDownInterceptorFactory.java:64) [wildfly-ejb3-15.0.1.Final.jar:15.0.1.Final] at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:422) at org.jboss.as.ejb3.component.interceptors.LoggingInterceptor.processInvocation(LoggingInterceptor.java:67) [wildfly-ejb3-15.0.1.Final.jar:15.0.1.Final] at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:422) at org.jboss.as.ee.component.NamespaceContextInterceptor.processInvocation(NamespaceContextInterceptor.java:50) at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:422) at org.jboss.invocation.ContextClassLoaderInterceptor.processInvocation(ContextClassLoaderInterceptor.java:60) at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:422) at org.jboss.invocation.InterceptorContext.run(InterceptorContext.java:438) at org.wildfly.security.manager.WildFlySecurityManager.doChecked(WildFlySecurityManager.java:618) at org.jboss.invocation.AccessCheckingInterceptor.processInvocation(AccessCheckingInterceptor.java:57) at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:422) at org.jboss.invocation.ChainedInterceptor.processInvocation(ChainedInterceptor.java:53) at org.jboss.as.ee.component.ViewService$View.invoke(ViewService.java:198) at org.jboss.as.ee.component.ViewDescription$1.processInvocation(ViewDescription.java:185) at org.jboss.as.ee.component.ProxyInvocationHandler.invoke(ProxyInvocationHandler.java:81) at org.ovirt.engine.core.common.interfaces.BackendLocal$$$view3.runAction(Unknown Source) [common.jar:] at org.ovirt.engine.api.restapi.resource.BackendResource.doAction(BackendResource.java:250) at org.ovirt.engine.api.restapi.resource.AbstractBackendCollectionResource.doCreateEntity(AbstractBackendCollectionResource.java:233) at org.ovirt.engine.api.restapi.resource.AbstractBackendCollectionResource.performCreate(AbstractBackendCollectionResource.java:144) at org.ovirt.engine.api.restapi.resource.AbstractBackendCollectionResource.performCreate(AbstractBackendCollectionResource.java:134) at org.ovirt.engine.api.restapi.resource.AbstractBackendCollectionResource.performCreate(AbstractBackendCollectionResource.java:153) at org.ovirt.engine.api.restapi.resource.BackendVmsResource.addVm(BackendVmsResource.java:568) at org.ovirt.engine.api.restapi.resource.BackendVmsResource.add(BackendVmsResource.java:239) at sun.reflect.GeneratedMethodAccessor1687.invoke(Unknown Source) [:1.8.0_212] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) [rt.jar:1.8.0_212] at java.lang.reflect.Method.invoke(Method.java:498) [rt.jar:1.8.0_212] at org.jboss.resteasy.core.MethodInjectorImpl.invoke(MethodInjectorImpl.java:139) [resteasy-jaxrs-3.6.2.Final.jar:3.6.2.Final] at org.jboss.resteasy.core.ResourceMethodInvoker.internalInvokeOnTarget(ResourceMethodInvoker.java:509) [resteasy-jaxrs-3.6.2.Final.jar:3.6.2.Final] at org.jboss.resteasy.core.ResourceMethodInvoker.invokeOnTargetAfterFilter(ResourceMethodInvoker.java:399) [resteasy-jaxrs-3.6.2.Final.jar:3.6.2.Final] at org.jboss.resteasy.core.ResourceMethodInvoker.lambda$invokeOnTarget$0(ResourceMethodInvoker.java:363) [resteasy-jaxrs-3.6.2.Final.jar:3.6.2.Final] at org.jboss.resteasy.core.interception.PreMatchContainerRequestContext.filter(PreMatchContainerRequestContext.java:355) [resteasy-jaxrs-3.6.2.Final.jar:3.6.2.Final] at org.jboss.resteasy.core.ResourceMethodInvoker.invokeOnTarget(ResourceMethodInvoker.java:365) [resteasy-jaxrs-3.6.2.Final.jar:3.6.2.Final] at org.jboss.resteasy.core.ResourceMethodInvoker.invoke(ResourceMethodInvoker.java:337) [resteasy-jaxrs-3.6.2.Final.jar:3.6.2.Final] at org.jboss.resteasy.core.ResourceLocatorInvoker.invokeOnTargetObject(ResourceLocatorInvoker.java:137) [resteasy-jaxrs-3.6.2.Final.jar:3.6.2.Final] at org.jboss.resteasy.core.ResourceLocatorInvoker.invoke(ResourceLocatorInvoker.java:100) [resteasy-jaxrs-3.6.2.Final.jar:3.6.2.Final] at org.jboss.resteasy.core.SynchronousDispatcher.invoke(SynchronousDispatcher.java:439) [resteasy-jaxrs-3.6.2.Final.jar:3.6.2.Final] at org.jboss.resteasy.core.SynchronousDispatcher.lambda$invoke$4(SynchronousDispatcher.java:229) [resteasy-jaxrs-3.6.2.Final.jar:3.6.2.Final] at org.jboss.resteasy.core.SynchronousDispatcher.lambda$preprocess$0(SynchronousDispatcher.java:135) [resteasy-jaxrs-3.6.2.Final.jar:3.6.2.Final] at org.jboss.resteasy.core.interception.PreMatchContainerRequestContext.filter(PreMatchContainerRequestContext.java:355) [resteasy-jaxrs-3.6.2.Final.jar:3.6.2.Final] at org.jboss.resteasy.core.SynchronousDispatcher.preprocess(SynchronousDispatcher.java:138) [resteasy-jaxrs-3.6.2.Final.jar:3.6.2.Final] at org.jboss.resteasy.core.SynchronousDispatcher.invoke(SynchronousDispatcher.java:215) [resteasy-jaxrs-3.6.2.Final.jar:3.6.2.Final] at org.jboss.resteasy.plugins.server.servlet.ServletContainerDispatcher.service(ServletContainerDispatcher.java:227) [resteasy-jaxrs-3.6.2.Final.jar:3.6.2.Final] at org.jboss.resteasy.plugins.server.servlet.HttpServletDispatcher.service(HttpServletDispatcher.java:56) [resteasy-jaxrs-3.6.2.Final.jar:3.6.2.Final] at org.jboss.resteasy.plugins.server.servlet.HttpServletDispatcher.service(HttpServletDispatcher.java:51) [resteasy-jaxrs-3.6.2.Final.jar:3.6.2.Final] at javax.servlet.http.HttpServlet.service(HttpServlet.java:791) [jboss-servlet-api_4.0_spec-1.0.0.Final.jar:1.0.0.Final] at io.undertow.servlet.handlers.ServletHandler.handleRequest(ServletHandler.java:74) at io.undertow.servlet.handlers.FilterHandler.handleRequest(FilterHandler.java:81) at io.undertow.servlet.handlers.security.ServletSecurityRoleHandler.handleRequest(ServletSecurityRoleHandler.java:62) at io.undertow.servlet.handlers.ServletChain$1.handleRequest(ServletChain.java:68) at io.undertow.servlet.handlers.ServletDispatchingHandler.handleRequest(ServletDispatchingHandler.java:36) at io.undertow.server.handlers.PredicateHandler.handleRequest(PredicateHandler.java:43) at io.undertow.server.handlers.PredicateHandler.handleRequest(PredicateHandler.java:43) at io.undertow.server.handlers.PredicateHandler.handleRequest(PredicateHandler.java:43) at io.undertow.servlet.handlers.ServletInitialHandler.dispatchRequest(ServletInitialHandler.java:274) at io.undertow.servlet.handlers.ServletInitialHandler.dispatchToPath(ServletInitialHandler.java:209) at io.undertow.servlet.spec.RequestDispatcherImpl.forwardImpl(RequestDispatcherImpl.java:223) at io.undertow.servlet.spec.RequestDispatcherImpl.forwardImplSetup(RequestDispatcherImpl.java:149) at io.undertow.servlet.spec.RequestDispatcherImpl.forward(RequestDispatcherImpl.java:111) at org.ovirt.engine.api.restapi.invocation.VersionFilter.doFilter(VersionFilter.java:178) at org.ovirt.engine.api.restapi.invocation.VersionFilter.doFilter(VersionFilter.java:98) at io.undertow.servlet.core.ManagedFilter.doFilter(ManagedFilter.java:61) at io.undertow.servlet.handlers.FilterHandler$FilterChainImpl.doFilter(FilterHandler.java:131) at org.ovirt.engine.api.restapi.invocation.CurrentFilter.doFilter(CurrentFilter.java:117) at org.ovirt.engine.api.restapi.invocation.CurrentFilter.doFilter(CurrentFilter.java:72) at io.undertow.servlet.core.ManagedFilter.doFilter(ManagedFilter.java:61) at io.undertow.servlet.handlers.FilterHandler$FilterChainImpl.doFilter(FilterHandler.java:131) at org.ovirt.engine.core.aaa.filters.RestApiSessionMgmtFilter.doFilter(RestApiSessionMgmtFilter.java:78) [aaa.jar:] at io.undertow.servlet.core.ManagedFilter.doFilter(ManagedFilter.java:61) at io.undertow.servlet.handlers.FilterHandler$FilterChainImpl.doFilter(FilterHandler.java:131) at org.ovirt.engine.core.aaa.filters.EnforceAuthFilter.doFilter(EnforceAuthFilter.java:42) [aaa.jar:] at io.undertow.servlet.core.ManagedFilter.doFilter(ManagedFilter.java:61) at io.undertow.servlet.handlers.FilterHandler$FilterChainImpl.doFilter(FilterHandler.java:131) at org.ovirt.engine.core.aaa.filters.SsoRestApiNegotiationFilter.doFilter(SsoRestApiNegotiationFilter.java:84) [aaa.jar:] at io.undertow.servlet.core.ManagedFilter.doFilter(ManagedFilter.java:61) at io.undertow.servlet.handlers.FilterHandler$FilterChainImpl.doFilter(FilterHandler.java:131) at org.ovirt.engine.core.aaa.filters.SsoRestApiAuthFilter.doFilter(SsoRestApiAuthFilter.java:47) [aaa.jar:] at io.undertow.servlet.core.ManagedFilter.doFilter(ManagedFilter.java:61) at io.undertow.servlet.handlers.FilterHandler$FilterChainImpl.doFilter(FilterHandler.java:131) at org.ovirt.engine.core.aaa.filters.SessionValidationFilter.doFilter(SessionValidationFilter.java:59) [aaa.jar:] at io.undertow.servlet.core.ManagedFilter.doFilter(ManagedFilter.java:61) at io.undertow.servlet.handlers.FilterHandler$FilterChainImpl.doFilter(FilterHandler.java:131) at org.ovirt.engine.core.aaa.filters.RestApiSessionValidationFilter.doFilter(RestApiSessionValidationFilter.java:35) [aaa.jar:] at io.undertow.servlet.core.ManagedFilter.doFilter(ManagedFilter.java:61) at io.undertow.servlet.handlers.FilterHandler$FilterChainImpl.doFilter(FilterHandler.java:131) at org.ovirt.engine.api.restapi.security.CSRFProtectionFilter.doFilter(CSRFProtectionFilter.java:111) at org.ovirt.engine.api.restapi.security.CSRFProtectionFilter.doFilter(CSRFProtectionFilter.java:102) at io.undertow.servlet.core.ManagedFilter.doFilter(ManagedFilter.java:61) at io.undertow.servlet.handlers.FilterHandler$FilterChainImpl.doFilter(FilterHandler.java:131) at org.ovirt.engine.core.utils.servlet.CORSSupportFilter.doFilter(CORSSupportFilter.java:283) [utils.jar:] at io.undertow.servlet.core.ManagedFilter.doFilter(ManagedFilter.java:61) at io.undertow.servlet.handlers.FilterHandler$FilterChainImpl.doFilter(FilterHandler.java:131) at io.undertow.servlet.handlers.FilterHandler.handleRequest(FilterHandler.java:84) at io.undertow.servlet.handlers.security.ServletSecurityRoleHandler.handleRequest(ServletSecurityRoleHandler.java:62) at io.undertow.servlet.handlers.ServletChain$1.handleRequest(ServletChain.java:68) at io.undertow.servlet.handlers.ServletDispatchingHandler.handleRequest(ServletDispatchingHandler.java:36) at org.wildfly.extension.undertow.security.SecurityContextAssociationHandler.handleRequest(SecurityContextAssociationHandler.java:78) at io.undertow.server.handlers.PredicateHandler.handleRequest(PredicateHandler.java:43) at io.undertow.servlet.handlers.security.SSLInformationAssociationHandler.handleRequest(SSLInformationAssociationHandler.java:132) at io.undertow.servlet.handlers.security.ServletAuthenticationCallHandler.handleRequest(ServletAuthenticationCallHandler.java:57) at io.undertow.server.handlers.PredicateHandler.handleRequest(PredicateHandler.java:43) at io.undertow.security.handlers.AuthenticationConstraintHandler.handleRequest(AuthenticationConstraintHandler.java:53) at io.undertow.security.handlers.AbstractConfidentialityHandler.handleRequest(AbstractConfidentialityHandler.java:46) at io.undertow.servlet.handlers.security.ServletConfidentialityConstraintHandler.handleRequest(ServletConfidentialityConstraintHandler.java:64) at io.undertow.servlet.handlers.security.ServletSecurityConstraintHandler.handleRequest(ServletSecurityConstraintHandler.java:59) at io.undertow.security.handlers.AuthenticationMechanismsHandler.handleRequest(AuthenticationMechanismsHandler.java:60) at io.undertow.servlet.handlers.security.CachedAuthenticatedSessionHandler.handleRequest(CachedAuthenticatedSessionHandler.java:77) at io.undertow.security.handlers.NotificationReceiverHandler.handleRequest(NotificationReceiverHandler.java:50) at io.undertow.security.handlers.AbstractSecurityContextAssociationHandler.handleRequest(AbstractSecurityContextAssociationHandler.java:43) at io.undertow.server.handlers.PredicateHandler.handleRequest(PredicateHandler.java:43) at org.wildfly.extension.undertow.security.jacc.JACCContextIdHandler.handleRequest(JACCContextIdHandler.java:61) at io.undertow.server.handlers.PredicateHandler.handleRequest(PredicateHandler.java:43) at io.undertow.server.handlers.PredicateHandler.handleRequest(PredicateHandler.java:43) at io.undertow.servlet.handlers.ServletInitialHandler.handleFirstRequest(ServletInitialHandler.java:292) at io.undertow.servlet.handlers.ServletInitialHandler.access$100(ServletInitialHandler.java:81) at io.undertow.servlet.handlers.ServletInitialHandler$2.call(ServletInitialHandler.java:138) at io.undertow.servlet.handlers.ServletInitialHandler$2.call(ServletInitialHandler.java:135) at io.undertow.servlet.core.ServletRequestContextThreadSetupAction$1.call(ServletRequestContextThreadSetupAction.java:48) at io.undertow.servlet.core.ContextClassLoaderSetupAction$1.call(ContextClassLoaderSetupAction.java:43) at org.wildfly.extension.undertow.security.SecurityContextThreadSetupAction.lambda$create$0(SecurityContextThreadSetupAction.java:105) at org.wildfly.extension.undertow.deployment.UndertowDeploymentInfoService$UndertowThreadSetupAction.lambda$create$0(UndertowDeploymentInfoService.java:1502) at org.wildfly.extension.undertow.deployment.UndertowDeploymentInfoService$UndertowThreadSetupAction.lambda$create$0(UndertowDeploymentInfoService.java:1502) at org.wildfly.extension.undertow.deployment.UndertowDeploymentInfoService$UndertowThreadSetupAction.lambda$create$0(UndertowDeploymentInfoService.java:1502) at io.undertow.servlet.handlers.ServletInitialHandler.dispatchRequest(ServletInitialHandler.java:272) at io.undertow.servlet.handlers.ServletInitialHandler.access$000(ServletInitialHandler.java:81) at io.undertow.servlet.handlers.ServletInitialHandler$1.handleRequest(ServletInitialHandler.java:104) at io.undertow.server.Connectors.executeRootHandler(Connectors.java:360) at io.undertow.server.HttpServerExchange$1.run(HttpServerExchange.java:830) at org.jboss.threads.ContextClassLoaderSavingRunnable.run(ContextClassLoaderSavingRunnable.java:35) at org.jboss.threads.EnhancedQueueExecutor.safeRun(EnhancedQueueExecutor.java:1985) at org.jboss.threads.EnhancedQueueExecutor$ThreadBody.doRunTask(EnhancedQueueExecutor.java:1487) at org.jboss.threads.EnhancedQueueExecutor$ThreadBody.run(EnhancedQueueExecutor.java:1378) at java.lang.Thread.run(Thread.java:748) [rt.jar:1.8.0_212] Caused by: java.sql.SQLException: javax.resource.ResourceException: IJ000457: Unchecked throwable in managedConnectionReconnected() cl=org.jboss.jca.core.connectionmanager.listener.TxConnectionListener@6cc2006f[state=NORMAL managed connection=org.jboss.jca.adapters.jdbc.local.LocalManagedConnection@258f4e0 connection handles=1 lastReturned=1562583222519 lastValidated=1562582322918 lastCheckedOut=1562583222519 trackByTx=false pool=org.jboss.jca.core.connectionmanager.pool.strategy.OnePool@4c466379 mcp=SemaphoreConcurrentLinkedQueueManagedConnectionPool@387e9380[pool=ENGINEDataSource] xaResource=LocalXAResourceImpl@3def3afd[connectionListener=6cc2006f connectionManager=7388ab63 warned=false currentXid=null productName=PostgreSQL productVersion=10.6 jndiName=java:/ENGINEDataSource] txSync=null] at org.jboss.jca.adapters.jdbc.WrapperDataSource.getConnection(WrapperDataSource.java:146) at org.jboss.as.connector.subsystems.datasources.WildFlyDataSource.getConnection(WildFlyDataSource.java:64) at org.springframework.jdbc.datasource.DataSourceUtils.fetchConnection(DataSourceUtils.java:151) [spring-jdbc.jar:5.0.4.RELEASE] at org.springframework.jdbc.datasource.DataSourceUtils.doGetConnection(DataSourceUtils.java:115) [spring-jdbc.jar:5.0.4.RELEASE] at org.springframework.jdbc.datasource.DataSourceUtils.getConnection(DataSourceUtils.java:78) [spring-jdbc.jar:5.0.4.RELEASE] ... 217 more Caused by: javax.resource.ResourceException: IJ000457: Unchecked throwable in managedConnectionReconnected() cl=org.jboss.jca.core.connectionmanager.listener.TxConnectionListener@6cc2006f[state=NORMAL managed connection=org.jboss.jca.adapters.jdbc.local.LocalManagedConnection@258f4e0 connection handles=1 lastReturned=1562583222519 lastValidated=1562582322918 lastCheckedOut=1562583222519 trackByTx=false pool=org.jboss.jca.core.connectionmanager.pool.strategy.OnePool@4c466379 mcp=SemaphoreConcurrentLinkedQueueManagedConnectionPool@387e9380[pool=ENGINEDataSource] xaResource=LocalXAResourceImpl@3def3afd[connectionListener=6cc2006f connectionManager=7388ab63 warned=false currentXid=null productName=PostgreSQL productVersion=10.6 jndiName=java:/ENGINEDataSource] txSync=null] at org.jboss.jca.core.connectionmanager.AbstractConnectionManager.reconnectManagedConnection(AbstractConnectionManager.java:975) at org.jboss.jca.core.connectionmanager.AbstractConnectionManager.allocateConnection(AbstractConnectionManager.java:792) at org.jboss.jca.adapters.jdbc.WrapperDataSource.getConnection(WrapperDataSource.java:138) ... 221 more Caused by: javax.resource.ResourceException: IJ000461: Could not enlist in transaction on entering meta-aware object at org.jboss.jca.core.connectionmanager.tx.TxConnectionManagerImpl.managedConnectionReconnected(TxConnectionManagerImpl.java:571) at org.jboss.jca.core.connectionmanager.AbstractConnectionManager.reconnectManagedConnection(AbstractConnectionManager.java:970) ... 223 more Caused by: java.lang.IllegalStateException: Transaction Local transaction (delegate=TransactionImple < ac, BasicAction: 0:ffff0a0a600a:-6fccc7b3:5d05ffb0:3bc987e status: ActionStatus.ABORTED >, owner=Local transaction context for provider JBoss JTA transaction provider) is not active STATUS_ROLLEDBACK at org.jboss.jca.core.connectionmanager.listener.TxConnectionListener.enlist(TxConnectionListener.java:296) at org.jboss.jca.core.connectionmanager.tx.TxConnectionManagerImpl.managedConnectionReconnected(TxConnectionManagerImpl.java:564) ... 224 more
Checking the logs. Failing to connect to the database should basically never happen, and a brief look shows other operations mostly ok. Martin, is there a known bug around this?
Any Updates on this Bug? We hit this problem hundreds of times a day.
Same error in the logs?
Created attachment 1617034 [details] engine.log from today, 00:00 to 03.00am Attached engine.log with current messages from today 00:00am to 03:00am. re:Database connections: we use ENGINE_DB_MAX_CONNECTIONS=300 in /etc/ovirt-engine/engine.conf.d//99-custom.conf and 400 PostgreSQL connections: engine=# SHOW max_connections; max_connections ----------------- 400
Martin, still jdbc connection errors. Any ideas from the log?
I checked database queries and locks - I seems there is a bottleneck in get_entity_permissions: List of active PostgreSQL queries: pid | query_start | duration | wait_event | wait_event_type | backend_xid | query -------+-------------------------------+------------------+---------------+-----------------+-------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------ 9354 | 2019-10-15 14:18:28.288848+02 | 01:10:13.275115 | | | | autovacuum: VACUUM ANALYZE public.vm_interface_statistics 8221 | 2019-10-15 14:35:38.510619+02 | 00:53:03.053344 | | | | autovacuum: VACUUM pg_toast.pg_toast_16876 1200 | 2019-10-15 14:44:51.529132+02 | 00:43:50.034831 | tuple | Lock | 1851321666 | select * from insertvmstatic($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, $13, $14, $15, $16, $17, $18, $19, $20, $21, $22, $23, $24, $25, $26, $27, $28, 32740 | 2019-10-15 15:06:07.899729+02 | 00:22:33.664234 | tuple | Lock | 1851384698 | select * from deletevmstatic($1, $2) as result 309 | 2019-10-15 15:07:10.595158+02 | 00:21:30.968805 | tuple | Lock | 1851387682 | select * from deletevmstatic($1, $2) as result 5613 | 2019-10-15 15:07:13.238539+02 | 00:21:28.325424 | tuple | Lock | 1851387839 | select * from deletevmstatic($1, $2) as result 932 | 2019-10-15 15:07:23.372251+02 | 00:21:18.191712 | tuple | Lock | 1851388326 | select * from deletevmstatic($1, $2) as result 1188 | 2019-10-15 15:08:06.837363+02 | 00:20:34.7266 | tuple | Lock | 1851391569 | select * from insertvmstatic($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, $13, $14, $15, $16, $17, $18, $19, $20, $21, $22, $23, $24, $25, $26, $27, $28, 5628 | 2019-10-15 15:12:53.903368+02 | 00:15:47.660595 | tuple | Lock | 1851406867 | select * from insertvmstatic($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, $13, $14, $15, $16, $17, $18, $19, $20, $21, $22, $23, $24, $25, $26, $27, $28, 734 | 2019-10-15 15:15:44.639126+02 | 00:12:56.924837 | tuple | Lock | 1851415621 | select * from insertvmstatic($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, $13, $14, $15, $16, $17, $18, $19, $20, $21, $22, $23, $24, $25, $26, $27, $28, 27058 | 2019-10-15 15:18:42.728431+02 | 00:09:58.835532 | tuple | Lock | 1851425007 | select * from insertvmstatic($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, $13, $14, $15, $16, $17, $18, $19, $20, $21, $22, $23, $24, $25, $26, $27, $28, 5766 | 2019-10-15 15:19:25.293926+02 | 00:09:16.270037 | tuple | Lock | 1851425962 | select * from deletevmstatic($1, $2) as result 627 | 2019-10-15 15:19:35.298312+02 | 00:09:06.265651 | tuple | Lock | 1851426409 | select * from deletevmstatic($1, $2) as result 32742 | 2019-10-15 15:19:45.552827+02 | 00:08:56.011136 | tuple | Lock | 1851426803 | select * from deletevmstatic($1, $2) as result 386 | 2019-10-15 15:19:47.744071+02 | 00:08:53.819892 | tuple | Lock | 1851426898 | select * from deletevmstatic($1, $2) as result 388 | 2019-10-15 15:19:57.296754+02 | 00:08:44.267209 | tuple | Lock | 1851427299 | select * from deletevmstatic($1, $2) as result 382 | 2019-10-15 15:20:01.688527+02 | 00:08:39.875436 | tuple | Lock | 1851427472 | select * from deletevmstatic($1, $2) as result 5620 | 2019-10-15 15:20:28.162028+02 | 00:08:13.401935 | tuple | Lock | 1851428781 | select * from insertvmstatic($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, $13, $14, $15, $16, $17, $18, $19, $20, $21, $22, $23, $24, $25, $26, $27, $28, 381 | 2019-10-15 15:20:35.125342+02 | 00:08:06.438621 | tuple | Lock | 1851429498 | select * from insertvmstatic($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, $13, $14, $15, $16, $17, $18, $19, $20, $21, $22, $23, $24, $25, $26, $27, $28, 5776 | 2019-10-15 15:20:39.908977+02 | 00:08:01.654986 | tuple | Lock | 1851429508 | select * from insertvmstatic($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, $13, $14, $15, $16, $17, $18, $19, $20, $21, $22, $23, $24, $25, $26, $27, $28, 308 | 2019-10-15 15:20:51.269025+02 | 00:07:50.294938 | tuple | Lock | 1851430190 | select * from insertvmstatic($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, $13, $14, $15, $16, $17, $18, $19, $20, $21, $22, $23, $24, $25, $26, $27, $28, 24290 | 2019-10-15 15:21:27.759719+02 | 00:07:13.804244 | tuple | Lock | 1851431680 | select * from insertvmstatic($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, $13, $14, $15, $16, $17, $18, $19, $20, $21, $22, $23, $24, $25, $26, $27, $28, 32660 | 2019-10-15 15:21:48.720561+02 | 00:06:52.843402 | | | | autovacuum: VACUUM ANALYZE public.vm_statistics 11809 | 2019-10-15 15:22:28.697261+02 | 00:06:12.866702 | tuple | Lock | 1851434923 | select * from insertvmstatic($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, $13, $14, $15, $16, $17, $18, $19, $20, $21, $22, $23, $24, $25, $26, $27, $28, 5621 | 2019-10-15 15:22:31.712517+02 | 00:06:09.851446 | tuple | Lock | 1851435592 | select * from insertvmstatic($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, $13, $14, $15, $16, $17, $18, $19, $20, $21, $22, $23, $24, $25, $26, $27, $28, 27072 | 2019-10-15 15:22:58.671025+02 | 00:05:42.892938 | tuple | Lock | 1851436269 | select * from insertvmstatic($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, $13, $14, $15, $16, $17, $18, $19, $20, $21, $22, $23, $24, $25, $26, $27, $28, 24170 | 2019-10-15 15:26:23.339741+02 | 00:02:18.224222 | transactionid | Lock | 1851447148 | select * from insertvmstatic($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, $13, $14, $15, $16, $17, $18, $19, $20, $21, $22, $23, $24, $25, $26, $27, $28, 4072 | 2019-10-15 15:27:30.742165+02 | 00:01:10.821798 | | | | autovacuum: VACUUM ANALYZE public.disk_image_dynamic 5149 | 2019-10-15 15:28:29.687431+02 | 00:00:11.876532 | | | | autovacuum: VACUUM ANALYZE public.vds_interface_statistics 921 | 2019-10-15 15:28:31.586535+02 | 00:00:09.977428 | | | 1851441692 | select * from get_entity_permissions($1, $2, $3, $4) 4655 | 2019-10-15 15:28:37.609028+02 | 00:00:03.954935 | | | 1851452535 | autovacuum: VACUUM public.ad_groups 26927 | 2019-10-15 15:28:40.95742+02 | 00:00:00.606543 | | | | SELECT * FROM (SELECT * FROM vms WHERE ( vm_guid IN (SELECT distinct vms_with_tags.vm_guid FROM vms_with_tags WHERE ( vms_with_tags.cluster_compatibility_v 5528 | 2019-10-15 15:28:41.457081+02 | 00:00:00.106882 | | | | SELECT * FROM ((SELECT audit_log.* FROM audit_log WHERE audit_log.severity = '10' AND not deleted) ORDER BY audit_log_id DESC ) as T1 OFFSET (1 -1) LIMIT 27059 | 2019-10-15 15:28:41.457081+02 | 00:00:00.106882 | | | | SELECT * FROM ((SELECT audit_log.* FROM audit_log WHERE audit_log.severity = '10' AND not deleted) ORDER BY audit_log_id DESC ) as T1 OFFSET (1 -1) LIMIT 5527 | 2019-10-15 15:28:41.457081+02 | 00:00:00.106882 | | | | SELECT * FROM ((SELECT audit_log.* FROM audit_log WHERE audit_log.severity = '10' AND not deleted) ORDER BY audit_log_id DESC ) as T1 OFFSET (1 -1) LIMIT 19771 | 2019-10-15 15:28:41.468637+02 | 00:00:00.095326 | | | | SELECT * FROM ((SELECT distinct audit_log.* FROM audit_log LEFT OUTER JOIN vms_with_tags ON audit_log.vm_id=vms_with_tags.vm_guid WHERE vms_with_tags.v 395 | 2019-10-15 15:28:41.49595+02 | 00:00:00.068013 | | | | select * from getdisksvmguid($1, $2, $3, $4) 5727 | 2019-10-15 15:28:41.504785+02 | 00:00:00.059178 | | | | select * from getdisksvmguid($1, $2, $3, $4) 925 | 2019-10-15 15:28:41.526192+02 | 00:00:00.037771 | | | | select * from getdisksvmguid($1, $2, $3, $4) 30404 | 2019-10-15 15:28:41.534436+02 | 00:00:00.029527 | | | | select * from getstorage_domains_by_storagepoolid($1, $2, $3) 5537 | 2019-10-15 15:28:41.563963+02 | 00:00:00 | | | | select pid,query_start,now()-query_start,wait_event,wait_event_type,backend_xid,left(query,160) from pg_stat_activity where state = 'active' order by query_star 1189 | 2019-10-15 15:28:41.565652+02 | -00:00:00.001689 | | | 1851452555 | select * from updategroup($1, $2, $3, $4, $5, $6) as result 32739 | 2019-10-15 15:28:41.567017+02 | -00:00:00.003054 | | | 1851452554 | select * from public.updatevmstatistics(CAST ($1 AS numeric),CAST ($2 AS numeric),CAST ($3 AS numeric),CAST ($4 AS int4),CAST ($5 AS int4),CAST ($6 AS int4),CAS All long running queries are directly or indirectly waiting for transactions running "select * from get_entity_permissions($1, $2, $3, $4)" Example: pid | query_start | duration | wait_event | wait_event_type | backend_xid | query -------+-------------------------------+------------------+---------------+-----------------+-------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------ 1200 | 2019-10-15 14:44:51.529132+02 | 00:43:50.034831 | tuple | Lock | 1851321666 | select * from insertvmstatic($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, $13, $14, $15, $16, $17, $18, $19, $20, $21, $22, $23, $24, $25, $26, $27, $28, Pid 1200 is waiting for tuple: locktype | database | relation | page | tuple | virtualxid | transactionid | classid | objid | objsubid | virtualtransaction | pid | mode | granted | fastpath ---------------+----------+------------+------+-------+-------------+---------------+---------+-------+----------+--------------------+-------+--------------------------+---------+---------- relation | 16400 | 34042 | | | | | | | | 296/20574 | 1200 | AccessShareLock | t | t relation | 16400 | 34041 | | | | | | | | 296/20574 | 1200 | AccessShareLock | t | t relation | 16400 | 34037 | | | | | | | | 296/20574 | 1200 | AccessShareLock | t | t relation | 16400 | 893909676 | | | | | | | | 296/20574 | 1200 | AccessShareLock | t | t relation | 16400 | 893909668 | | | | | | | | 296/20574 | 1200 | AccessShareLock | t | t relation | 16400 | 34117 | | | | | | | | 296/20574 | 1200 | AccessShareLock | t | t relation | 16400 | 34110 | | | | | | | | 296/20574 | 1200 | AccessShareLock | t | t relation | 16400 | 34231 | | | | | | | | 296/20574 | 1200 | AccessShareLock | t | t relation | 16400 | 34184 | | | | | | | | 296/20574 | 1200 | AccessShareLock | t | t relation | 16400 | 34183 | | | | | | | | 296/20574 | 1200 | AccessShareLock | t | t relation | 16400 | 34175 | | | | | | | | 296/20574 | 1200 | AccessShareLock | t | t relation | 16400 | 16698 | | | | | | | | 296/20574 | 1200 | AccessShareLock | t | t relation | 16400 | 16689 | | | | | | | | 296/20574 | 1200 | AccessShareLock | t | t relation | 16400 | 16821 | | | | | | | | 296/20574 | 1200 | AccessShareLock | t | t relation | 16400 | 16827 | | | | | | | | 296/20574 | 1200 | AccessShareLock | t | t relation | 16400 | 3915845230 | | | | | | | | 296/20574 | 1200 | AccessShareLock | t | t virtualxid | | | | | 296/20574 | | | | | 296/20574 | 1200 | ExclusiveLock | t | t relation | 16400 | 16737 | | | | | | | | 296/20574 | 1200 | RowShareLock | t | f relation | 16400 | 34044 | | | | | | | | 296/20574 | 1200 | AccessShareLock | t | f transactionid | | | | | | 1851321666 | | | | 296/20574 | 1200 | ExclusiveLock | t | f relation | 16400 | 34731 | | | | | | | | 296/20574 | 1200 | AccessShareLock | t | f relation | 16400 | 34723 | | | | | | | | 296/20574 | 1200 | AccessShareLock | t | f relation | 16400 | 34733 | | | | | | | | 296/20574 | 1200 | AccessShareLock | t | f relation | 16400 | 34717 | | | | | | | | 296/20574 | 1200 | AccessShareLock | t | f relation | 16400 | 34732 | | | | | | | | 296/20574 | 1200 | AccessShareLock | t | f relation | 16400 | 34729 | | | | | | | | 296/20574 | 1200 | AccessShareLock | t | f relation | 16400 | 34735 | | | | | | | | 296/20574 | 1200 | AccessShareLock | t | f relation | 16400 | 34730 | | | | | | | | 296/20574 | 1200 | AccessShareLock | t | f relation | 16400 | 34710 | | | | | | | | 296/20574 | 1200 | AccessShareLock | t | f tuple | 16400 | 16737 | 323 | 15 | | | | | | 296/20574 | 1200 | AccessExclusiveLock | f | f ========> WAITING FOR THIS LOCK relation | 16400 | 3412388157 | | | | | | | | 296/20574 | 1200 | AccessShareLock | t | f relation | 16400 | 35238 | | | | | | | | 296/20574 | 1200 | AccessShareLock | t | f relation | 16400 | 34043 | | | | | | | | 296/20574 | 1200 | AccessShareLock | t | f relation | 16400 | 34734 | | | | | | | | 296/20574 | 1200 | AccessShareLock | t | f This lock is held by pid 24170: locktype | database | relation | page | tuple | virtualxid | transactionid | classid | objid | objsubid | virtualtransaction | pid | mode | granted | fastpath ---------------+----------+------------+------+-------+-------------+---------------+---------+-------+----------+--------------------+-------+--------------------------+---------+---------- tuple | 16400 | 16737 | 323 | 15 | | | | | | 200/105144 | 5776 | AccessExclusiveLock | f | f tuple | 16400 | 16737 | 323 | 15 | | | | | | 113/3389146 | 32742 | ExclusiveLock | f | f tuple | 16400 | 16737 | 323 | 15 | | | | | | 41/4872932 | 5613 | ExclusiveLock | f | f tuple | 16400 | 16737 | 323 | 15 | | | | | | 22/1438873 | 11809 | AccessExclusiveLock | f | f tuple | 16400 | 16737 | 323 | 15 | | | | | | 70/4136129 | 381 | AccessExclusiveLock | f | f tuple | 16400 | 16737 | 323 | 15 | | | | | | 240/38216 | 627 | ExclusiveLock | f | f tuple | 16400 | 16737 | 323 | 15 | | | | | | 280/17506 | 1188 | AccessExclusiveLock | f | f tuple | 16400 | 16737 | 323 | 15 | | | | | | 127/676286 | 27072 | AccessExclusiveLock | f | f tuple | 16400 | 16737 | 323 | 15 | | | | | | 245/54549 | 382 | ExclusiveLock | f | f tuple | 16400 | 16737 | 323 | 15 | | | | | | 104/2122318 | 32740 | ExclusiveLock | f | f tuple | 16400 | 16737 | 323 | 15 | | | | | | 164/93028 | 5628 | AccessExclusiveLock | f | f tuple | 16400 | 16737 | 323 | 15 | | | | | | 196/68074 | 5766 | ExclusiveLock | f | f tuple | 16400 | 16737 | 323 | 15 | | | | | | 157/464766 | 5620 | AccessExclusiveLock | f | f tuple | 16400 | 16737 | 323 | 15 | | | | | | 309/24703 | 24170 | AccessExclusiveLock | t | f ========> HOLDING FOR THIS LOCK tuple | 16400 | 16737 | 323 | 15 | | | | | | 159/690083 | 5621 | AccessExclusiveLock | f | f tuple | 16400 | 16737 | 323 | 15 | | | | | | 296/20574 | 1200 | AccessExclusiveLock | f | f tuple | 16400 | 16737 | 323 | 15 | | | | | | 287/30226 | 932 | ExclusiveLock | f | f tuple | 16400 | 16737 | 323 | 15 | | | | | | 106/2174716 | 27058 | AccessExclusiveLock | f | f tuple | 16400 | 16737 | 323 | 15 | | | | | | 264/43344 | 734 | AccessExclusiveLock | f | f tuple | 16400 | 16737 | 323 | 15 | | | | | | 247/109781 | 388 | ExclusiveLock | f | f tuple | 16400 | 16737 | 323 | 15 | | | | | | 252/45259 | 386 | ExclusiveLock | f | f tuple | 16400 | 16737 | 323 | 15 | | | | | | 187/73408 | 308 | AccessExclusiveLock | f | f tuple | 16400 | 16737 | 323 | 15 | | | | | | 189/85051 | 309 | ExclusiveLock | f | f tuple | 16400 | 16737 | 323 | 15 | | | | | | 325/37193 | 24290 | AccessExclusiveLock | f | f Pid 24170 is waiting for transaction locktype | database | relation | page | tuple | virtualxid | transactionid | classid | objid | objsubid | virtualtransaction | pid | mode | granted | fastpath ---------------+----------+------------+------+-------+-------------+---------------+---------+-------+----------+--------------------+-------+--------------------------+---------+---------- relation | 16400 | 34042 | | | | | | | | 309/24703 | 24170 | AccessShareLock | t | t relation | 16400 | 34041 | | | | | | | | 309/24703 | 24170 | AccessShareLock | t | t relation | 16400 | 34037 | | | | | | | | 309/24703 | 24170 | AccessShareLock | t | t relation | 16400 | 893909676 | | | | | | | | 309/24703 | 24170 | AccessShareLock | t | t relation | 16400 | 893909668 | | | | | | | | 309/24703 | 24170 | AccessShareLock | t | t relation | 16400 | 34117 | | | | | | | | 309/24703 | 24170 | AccessShareLock | t | t relation | 16400 | 34110 | | | | | | | | 309/24703 | 24170 | AccessShareLock | t | t relation | 16400 | 34231 | | | | | | | | 309/24703 | 24170 | AccessShareLock | t | t relation | 16400 | 34184 | | | | | | | | 309/24703 | 24170 | AccessShareLock | t | t relation | 16400 | 34183 | | | | | | | | 309/24703 | 24170 | AccessShareLock | t | t relation | 16400 | 34175 | | | | | | | | 309/24703 | 24170 | AccessShareLock | t | t relation | 16400 | 16698 | | | | | | | | 309/24703 | 24170 | AccessShareLock | t | t relation | 16400 | 16689 | | | | | | | | 309/24703 | 24170 | AccessShareLock | t | t relation | 16400 | 16821 | | | | | | | | 309/24703 | 24170 | AccessShareLock | t | t relation | 16400 | 16827 | | | | | | | | 309/24703 | 24170 | AccessShareLock | t | t relation | 16400 | 3915845230 | | | | | | | | 309/24703 | 24170 | AccessShareLock | t | t virtualxid | | | | | 309/24703 | | | | | 309/24703 | 24170 | ExclusiveLock | t | t transactionid | | | | | | 1851447148 | | | | 309/24703 | 24170 | ExclusiveLock | t | f relation | 16400 | 16737 | | | | | | | | 309/24703 | 24170 | RowShareLock | t | f transactionid | | | | | | 1851441692 | | | | 309/24703 | 24170 | ShareLock | f | f ========> WAITING FOR THIS LOCK relation | 16400 | 34731 | | | | | | | | 309/24703 | 24170 | AccessShareLock | t | f relation | 16400 | 34044 | | | | | | | | 309/24703 | 24170 | AccessShareLock | t | f relation | 16400 | 34723 | | | | | | | | 309/24703 | 24170 | AccessShareLock | t | f relation | 16400 | 34733 | | | | | | | | 309/24703 | 24170 | AccessShareLock | t | f relation | 16400 | 34717 | | | | | | | | 309/24703 | 24170 | AccessShareLock | t | f relation | 16400 | 34732 | | | | | | | | 309/24703 | 24170 | AccessShareLock | t | f relation | 16400 | 34729 | | | | | | | | 309/24703 | 24170 | AccessShareLock | t | f relation | 16400 | 34735 | | | | | | | | 309/24703 | 24170 | AccessShareLock | t | f relation | 16400 | 34730 | | | | | | | | 309/24703 | 24170 | AccessShareLock | t | f relation | 16400 | 34710 | | | | | | | | 309/24703 | 24170 | AccessShareLock | t | f tuple | 16400 | 16737 | 323 | 15 | | | | | | 309/24703 | 24170 | AccessExclusiveLock | t | f relation | 16400 | 3412388157 | | | | | | | | 309/24703 | 24170 | AccessShareLock | t | f relation | 16400 | 35238 | | | | | | | | 309/24703 | 24170 | AccessShareLock | t | f relation | 16400 | 34043 | | | | | | | | 309/24703 | 24170 | AccessShareLock | t | f relation | 16400 | 34734 | | | | | | | | 309/24703 | 24170 | AccessShareLock | t | f Transaction 1851441692 is running get_entity_permissions: pid | query_start | duration | wait_event | wait_event_type | backend_xid | query -------+-------------------------------+------------------+---------------+-----------------+-------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------ 921 | 2019-10-15 15:28:31.586535+02 | 00:00:09.977428 | | | 1851441692 | select * from get_entity_permissions($1, $2, $3, $4) Please note that the long running query is running for 45 minutes while get_entity_permissions is running for only 10 seconds. From my analysis, all long running querys end waiting for get_entity_permissions queries. For the long running query I saw different waits for tuples, so the query proceeds but as soon as it got a tuple lock it is blocked while trying to access the next page or tuple.
PostgreSQL locking and query analysis was after upgrading to 4.3.6.7-1.el7
I checked into get_entity_permissions and identified engine_sessions as a potential candidate for this problem: - get_entity_permissions calls getUserAndGroupsById for the user_id and queries engine_sessions. - engine_sessions has more than 40.000 rows matching this query and getUserAndGroupsById takes more than 15 seconds to complete. How can we cleanup engine_sessions?
We managed to track the open sessions down to the used vagrant ovirt4 plugin. The plugin doesn't close the ovirt connection, so the sessions stay open after each API call until the timeout (UserSessionTimeOutInterval) We created a pull request https://github.com/myoung34/vagrant-ovirt4/pull/106 to close the connection after the used actions. Now we are down to < 300 sessions per user and no longer have errors or timeouts creating vms. In my opinion there are 2 issues left: - a high number of sessions from the same user must not block other operations. - add APISessionTimeoutInterval option to allow different settings for API and user sessions
(In reply to christian.grundmann from comment #34) > We managed to track the open sessions down to the used vagrant ovirt4 plugin. > The plugin doesn't close the ovirt connection, so the sessions stay open > after each API call until the timeout (UserSessionTimeOutInterval) > We created a pull request > https://github.com/myoung34/vagrant-ovirt4/pull/106 to close the connection > after the used actions. > Now we are down to < 300 sessions per user and no longer have errors or > timeouts creating vms. > > In my opinion there are 2 issues left: > > - a high number of sessions from the same user must not block other > operations. It doesn't block other users, it just overloads your current system, so if you had more powerful server, engine would be responding better > - add APISessionTimeoutInterval option to allow different settings for API > and user sessions It's clearly defined, that clients needs to close their connections (sessions) when they don't need it, otherwise connections (sessions) will be discarded automatically after specified timeout. Introducing different session timeout for RESTAPI and UI would be quite problematic, not sure how it would comply with our OAUTH/OIDC integration. But anyway due to limited resources, closing as deferred.