Bugzilla (bugzilla.redhat.com) will be under maintenance for infrastructure upgrades and will not be available on July 31st between 12:30 AM - 05:30 AM UTC. We appreciate your understanding and patience. You can follow status.redhat.com for details.
Bug 1418648 - Engine fails to revert the compensation context when performing restart during CreateCloneOfTemplateCommand
Summary: Engine fails to revert the compensation context when performing restart durin...
Keywords:
Status: CLOSED INSUFFICIENT_DATA
Alias: None
Product: ovirt-engine
Classification: oVirt
Component: BLL.Storage
Version: future
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ovirt-4.2.0
: ---
Assignee: Benny Zlotnik
QA Contact: Raz Tamir
URL:
Whiteboard: storage
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-02-02 12:16 UTC by Kevin Alon Goldblatt
Modified: 2017-11-19 10:36 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-10-18 18:26:40 UTC
oVirt Team: Storage
rule-engine: ovirt-4.2+


Attachments (Terms of Use)
server, engine and vdsm logs (808.12 KB, application/x-gzip)
2017-02-02 12:49 UTC, Kevin Alon Goldblatt
no flags Details
vdsm, server, engine logs (1.76 MB, application/x-gzip)
2017-08-30 15:55 UTC, Kevin Alon Goldblatt
no flags Details

Description Kevin Alon Goldblatt 2017-02-02 12:16:55 UTC
Description of problem:
Engine fails to recreate session after Create Template failure

Version-Release number of selected component (if applicable):
ovirt-engine-4.1.0.3-0.1.el7.noarch
rhevm-4.1.0.3-0.1.el7.noarch
vdsm-4.19.4-1.el7ev.x86_64

How reproducible:
Always

Steps to Reproduce:
1. Create VM with 5 disks including block, nfs, ide, thin and preallocated
2. Create Template
3. Create VM from template and restart the engine during the CopyDataCommand on the HSM - engine restarts and fails with restarting the Create Template session

Actual results:
engine restarts and fails with restarting the Create Template session

Expected results:
engine restarts should succeed in restarting the Create Template session after restarting during the CopyDataCommand on the HSM

Additional info:
2017-02-02 12:49:49,086+02 INFO  [org.ovirt.engine.core.bll.Backend] (ServerService Thread Pool -- 62) [] Running compensation on startup for Command 'org.ovirt.engine.core.bll.AddVmFromTemplateCommand', Command Id '23eb19a2-0e99-492c-8eb4-245c8e8e6019'
2017-02-02 12:49:49,095+02 ERROR [org.ovirt.engine.core.bll.CommandsFactory] (ServerService Thread Pool -- 62) [] CommandsFactory : Failed to get type information using reflection for Class  'org.ovirt.engine.core.bll.CreateCloneOfTemplateCommand', Command Id 'd2e9864b-5895-4a52-8cac-9a6fc0aa85c8': org.ovirt.engine.core.bll.CreateCloneOfTemplateCommand.<init>(org.ovirt.engine.core.compat.Guid)
2017-02-02 12:49:49,095+02 ERROR [org.ovirt.engine.core.bll.CommandsFactory] (ServerService Thread Pool -- 62) [] Exception: java.lang.NoSuchMethodException: org.ovirt.engine.core.bll.CreateCloneOfTemplateCommand.<init>(org.ovirt.engine.core.compat.Guid)
        at java.lang.Class.getConstructor0(Class.java:3082) [rt.jar:1.8.0_121]
        at java.lang.Class.getDeclaredConstructor(Class.java:2178) [rt.jar:1.8.0_121]

Comment 1 Kevin Alon Goldblatt 2017-02-02 12:49:17 UTC
Created attachment 1247109 [details]
server, engine and vdsm logs

Adding logs

Comment 2 Tal Nisan 2017-02-05 10:05:59 UTC
Kevin, is it really a must to have all those 5 disks? Can it be reproduced with one disk only?

Comment 3 Kevin Alon Goldblatt 2017-03-06 13:50:58 UTC
(In reply to Tal Nisan from comment #2)
> Kevin, is it really a must to have all those 5 disks? Can it be reproduced
> with one disk only?

You don't need 5 disks as long as you can ensure that the restart occurs during the copydata phase. So use a larger disk of say 10g with at least 5g of data

Comment 4 Yaniv Kaul 2017-03-09 09:19:16 UTC
Liron - any updates?

Comment 5 Liron Aravot 2017-03-09 10:37:01 UTC
Hi Yaniv,

That's issue exists since that flow was introduced - unrelated to the SPDM work (it's caused by creation of default compensation context for transactive commands - bz 1399540)
I believe that we should fix it as part of BZ 1399191  https://bugzilla.redhat.com/show_bug.cgi?id=1399191#c10) which will tackle that entire flow - right now the only actual effect of the error is having exception in the logs (Kevin, please correct me if i'm wrong).

Tal, how do we want to proceed with it?

Comment 6 Tal Nisan 2017-03-09 12:51:58 UTC
Postponing for 4.1.2 in the meanwhile since it's a corner case as well, note that bug 1399191 is targeted to 4.2, do we want to move it to 4.1.2 as well?

Comment 7 Liron Aravot 2017-03-22 11:29:15 UTC
Tal, I believe that we should target both to the same version (possibly merge them) to test the flow one time after the changes (instead if multiple times) - it can be either 4.1.2 or 4.2, whatever seems better like a better fit to you.

Comment 8 Benny Zlotnik 2017-08-20 15:52:22 UTC
It seems that this issue was fixed as part of another fix, I cannot reproduce this. 

Can you check if this still happens?

Comment 9 Kevin Alon Goldblatt 2017-08-30 15:52:31 UTC
Ran the scenario again  >>>>> The create vm from template failed

Ran the scenario with the following code:
----------------------------------------------------
ovirt-engine-4.2.0-0.0.master.20170813134654.gitaee967b.el7.centos.noarch
vdsm-4.20.2-77.gite43f776.el7.centos.x86_64


Engine-log:
-----------------------------------
2017-08-30 17:41:23,980+03 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-2) [77e29e1b-fec3-4868-a27e-f6fdd3d84a32] EVENT_ID: USER_ADD_VM_STARTED(37), VM vm2_from_tp_v
mstar1 creation was initiated by admin@internal-authz.
2017-08-30 17:41:24,006+03 INFO  [org.ovirt.engine.core.bll.ConcurrentChildCommandsExecutionCallback] (DefaultQuartzScheduler9) [77e29e1b-fec3-4868-a27e-f6fdd3d84a32] Command 'CreateCloneOfTemplate' (id: '0a7bcd
a2-66b2-4ea3-b749-b7f54e202a3b') waiting on child command id: 'd81bb9ca-e4e8-4ee8-8487-3d36dedcce9d' type:'CopyImageGroupWithData' to complete
2017-08-30 17:41:24,021+03 INFO  [org.ovirt.engine.core.bll.AddVmFromTemplateCommand] (default task-2) [77e29e1b-fec3-4868-a27e-f6fdd3d84a32] Lock freed to object 'EngineLock:{exclusiveLocks='[vm2_from_tp_vmstar
1=VM_NAME]', sharedLocks='[9207331b-2e8d-4f0a-9640-17f9d7c5d19e=DISK, 41414829-a55e-4083-95bc-97ffbb3035b6=DISK, b99bb2ff-72a0-4373-a576-732e7e315c71=DISK, 1c0da4dd-3bcc-413a-9de5-56a6ea8c52c4=DISK, abe22005-dde
6-4be8-9709-95aaad8b3816=DISK, be51e72f-8f51-4958-9386-2a6aa095fdad=DISK, 9077f689-a27f-43a0-8dd2-54058d83b096=DISK, 171a09a7-c70f-4600-9e95-29366e9ae2c5=TEMPLATE]'}'
2017-08-30 17:41:24,280+03 ERROR [org.ovirt.engine.core.bll.tasks.CommandCallbacksPoller] (DefaultQuartzScheduler9) [77e29e1b-fec3-4868-a27e-f6fdd3d84a32] Error invoking callback method 'doPolling' for 'ACTIVE' 
command '1caf1d41-9462-489a-992b-0d8250e3a493'
2017-08-30 17:41:24,280+03 ERROR [org.ovirt.engine.core.bll.tasks.CommandCallbacksPoller] (DefaultQuartzScheduler9) [77e29e1b-fec3-4868-a27e-f6fdd3d84a32] Exception: org.springframework.jdbc.UncategorizedSQLExce
ption: PreparedStatementCallback; uncategorized SQLException for SQL [select * from  getvdsbyvdsid(?, ?, ?)]; SQL state [null]; error code [0]; IJ031013: Interrupted attempting lock: org.jboss.jca.adapters.jdbc.
local.LocalManagedConnection@52b73e3c; nested exception is java.sql.SQLException: IJ031013: Interrupted attempting lock: org.jboss.jca.adapters.jdbc.local.LocalManagedConnection@52b73e3c
        at org.springframework.jdbc.support.AbstractFallbackSQLExceptionTranslator.translate(AbstractFallbackSQLExceptionTranslator.java:84) [spring-jdbc.jar:4.3.9.RELEASE]
        at org.springframework.jdbc.support.AbstractFallbackSQLExceptionTranslator.translate(AbstractFallbackSQLExceptionTranslator.java:81) [spring-jdbc.jar:4.3.9.RELEASE]
        at org.springframework.jdbc.support.AbstractFallbackSQLExceptionTranslator.translate(AbstractFallbackSQLExceptionTranslator.java:81) [spring-jdbc.jar:4.3.9.RELEASE]
        at org.springframework.jdbc.core.JdbcTemplate.execute(JdbcTemplate.java:649) [spring-jdbc.jar:4.3.9.RELEASE]
        at org.springframework.jdbc.core.JdbcTemplate.query(JdbcTemplate.java:684) [spring-jdbc.jar:4.3.9.RELEASE]
        at org.springframework.jdbc.core.JdbcTemplate.query(JdbcTemplate.java:716) [spring-jdbc.jar:4.3.9.RELEASE]
        at org.springframework.jdbc.core.JdbcTemplate.query(JdbcTemplate.java:766) [spring-jdbc.jar:4.3.9.RELEASE]
        at org.ovirt.engine.core.dal.dbbroker.PostgresDbEngineDialect$PostgresSimpleJdbcCall.executeCallInternal(PostgresDbEngineDialect.java:152) [dal.jar:]
        at org.ovirt.engine.core.dal.dbbroker.PostgresDbEngineDialect$PostgresSimpleJdbcCall.doExecute(PostgresDbEngineDialect.java:118) [dal.jar:]
        at org.springframework.jdbc.core.simple.SimpleJdbcCall.execute(SimpleJdbcCall.java:198) [spring-jdbc.jar:4.3.9.RELEASE]
        at org.ovirt.engine.core.dal.dbbroker.SimpleJdbcCallsHandler.executeImpl(SimpleJdbcCallsHandler.java:135) [dal.jar:]
        at org.ovirt.engine.core.dal.dbbroker.SimpleJdbcCallsHandler.executeReadList(SimpleJdbcCallsHandler.java:105) [dal.jar:]
        at org.ovirt.engine.core.dao.VdsDaoImpl.get(VdsDaoImpl.java:56) [dal.jar:]
        at org.ovirt.engine.core.dao.VdsDaoImpl.get(VdsDaoImpl.java:50) [dal.jar:]
        at org.ovirt.engine.core.bll.HostJobCallback.childCommandsExecutionEnded(HostJobCallback.java:46) [bll.jar:]
        at org.ovirt.engine.core.bll.ChildCommandsCallbackBase.doPolling(ChildCommandsCallbackBase.java:63) [bll.jar:]
        at org.ovirt.engine.core.bll.tasks.CommandCallbacksPoller.invokeCallbackMethods(CommandCallbacksPoller.java:114) [bll.jar:]
        at sun.reflect.GeneratedMethodAccessor65.invoke(Unknown Source) [:1.8.0_131]
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) [rt.jar:1.8.0_131]
        at java.lang.reflect.Method.invoke(Method.java:498) [rt.jar:1.8.0_131]
        at org.ovirt.engine.core.utils.timer.JobWrapper.invokeMethod(JobWrapper.java:83) [scheduler.jar:]
        at org.ovirt.engine.core.utils.timer.JobWrapper.execute(JobWrapper.java:55) [scheduler.jar:]
        at org.quartz.core.JobRunShell.run(JobRunShell.java:213) [quartz.jar:]
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [rt.jar:1.8.0_131]
        at java.util.concurrent.FutureTask.run(FutureTask.java:266) [rt.jar:1.8.0_131]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [rt.jar:1.8.0_131]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [rt.jar:1.8.0_131]
        at java.lang.Thread.run(Thread.java:748) [rt.jar:1.8.0_131]
Caused by: java.sql.SQLException: IJ031013: Interrupted attempting lock: org.jboss.jca.adapters.jdbc.local.LocalManagedConnection@52b73e3c
        at org.jboss.jca.adapters.jdbc.BaseWrapperManagedConnection.tryLock(BaseWrapperManagedConnection.java:402)
        at org.jboss.jca.adapters.jdbc.WrappedConnection.lock(WrappedConnection.java:156)
        at org.jboss.jca.adapters.jdbc.WrappedStatement.lock(WrappedStatement.java:138)
        at org.jboss.jca.adapters.jdbc.WrappedResultSet.lock(WrappedResultSet.java:6022)
        at org.jboss.jca.adapters.jdbc.WrappedResultSet.next(WrappedResultSet.java:2679)
        at org.springframework.jdbc.core.RowMapperResultSetExtractor.extractData(RowMapperResultSetExtractor.java:92) [spring-jdbc.jar:4.3.9.RELEASE]
        at org.springframework.jdbc.core.RowMapperResultSetExtractor.extractData(RowMapperResultSetExtractor.java:60) [spring-jdbc.jar:4.3.9.RELEASE]
        at org.springframework.jdbc.core.JdbcTemplate$1.doInPreparedStatement(JdbcTemplate.java:697) [spring-jdbc.jar:4.3.9.RELEASE]


2017-08-30 17:41:23,889+03 INFO  [org.ovirt.engine.core.bll.AddVmFromTemplateCommand] (default task-2) [77e29e1b-fec3-4868-a27e-f6fdd3d84a32] Command [id=852d3b0b-5c3b-4d56-b184-77b75964abcd]: Compensating TRANS
IENT_ENTITY of org.ovirt.engine.core.common.businessentities.ReleaseMacsTransientCompensation; snapshot: org.ovirt.engine.core.common.businessentities.ReleaseMacsTransientCompensation@5c25d1d5.
2017-08-30 17:41:23,891+03 INFO  [org.ovirt.engine.core.utils.transaction.TransactionSupport] (default task-2) [77e29e1b-fec3-4868-a27e-f6fdd3d84a32] transaction rolled back
2017-08-30 17:41:23,905+03 ERROR [org.ovirt.engine.core.bll.CreateCloneOfTemplateCommand] (default task-2) [77e29e1b-fec3-4868-a27e-f6fdd3d84a32] Transaction rolled-back for command 'org.ovirt.engine.core.bll.Cr
eateCloneOfTemplateCommand'.
2017-08-30 17:41:23,905+03 ERROR [org.ovirt.engine.core.bll.CreateCloneOfTemplateCommand] (default task-2) [77e29e1b-fec3-4868-a27e-f6fdd3d84a32] Transaction rolled-back for command 'org.ovirt.engine.core.bll.Cr
eateCloneOfTemplateCommand'.
2017-08-30 17:41:23,905+03 ERROR [org.ovirt.engine.core.bll.CreateCloneOfTemplateCommand] (default task-2) [77e29e1b-fec3-4868-a27e-f6fdd3d84a32] Transaction rolled-back for command 'org.ovirt.engine.core.bll.Cr
eateCloneOfTemplateCommand'.
2017-08-30 17:41:23,905+03 ERROR [org.ovirt.engine.core.bll.CreateCloneOfTemplateCommand] (default task-2) [77e29e1b-fec3-4868-a27e-f6fdd3d84a32] Transaction rolled-back for command 'org.ovirt.engine.core.bll.Cr
eateCloneOfTemplateCommand'.
2017-08-30 17:41:23,905+03 ERROR [org.ovirt.engine.core.bll.CreateCloneOfTemplateCommand] (default task-2) [77e29e1b-fec3-4868-a27e-f6fdd3d84a32] Transaction rolled-back for command 'org.ovirt.engine.core.bll.Cr
eateCloneOfTemplateCommand'.
2017-08-30 17:41:23,905+03 ERROR [org.ovirt.engine.core.bll.CreateCloneOfTemplateCommand] (default task-2) [77e29e1b-fec3-4868-a27e-f6fdd3d84a32] Transaction rolled-back for command 'org.ovirt.engine.core.bll.Cr
eateCloneOfTemplateCommand'.
2017-08-30 17:41:23,905+03 ERROR [org.ovirt.engine.core.bll.CreateCloneOfTemplateCommand] (default task-2) [77e29e1b-fec3-4868-a27e-f6fdd3d84a32] Transaction rolled-back for command 'org.ovirt.engine.core.bll.Cr
eateCloneOfTemplateCommand'.
2017-08-30 17:41:23,905+03 ERROR [org.ovirt.engine.core.bll.AddVmFromTemplateCommand] (default task-2) [77e29e1b-fec3-4868-a27e-f6fdd3d84a32] Transaction rolled-back for command 'org.ovirt.engine.core.bll.AddVmF
romTemplateCommand'.

Comment 10 Kevin Alon Goldblatt 2017-08-30 15:55:12 UTC
Created attachment 1320181 [details]
vdsm, server, engine logs

New logs added

Comment 11 Allon Mureinik 2017-08-31 09:47:40 UTC
Comment 8 is from Aug 20th.

Comment 9 states that it was tested with a build from Aug 13th:
> ovirt-engine-4.2.0-0.0.master.20170813134654.gitaee967b.el7.centos.noarch

Can we please retry this with a minimal scenario and a recent build?
Thanks!

Comment 12 Allon Mureinik 2017-10-18 18:26:40 UTC
(In reply to Allon Mureinik from comment #11)
> Comment 8 is from Aug 20th.
> 
> Comment 9 states that it was tested with a build from Aug 13th:
> > ovirt-engine-4.2.0-0.0.master.20170813134654.gitaee967b.el7.centos.noarch
> 
> Can we please retry this with a minimal scenario and a recent build?
> Thanks!
No reliable reproducer, closing.

Feel free to reopen if you can answer the above needinfo.


Note You need to log in before you can comment on or make changes to this bug.