Description of problem: When snapshot creation is initiated, a vdsm task for volume creation is created, if engine takes too long to process the task it will fail with a transaction timeout, see additional info for a detailed Version-Release number of selected component (if applicable): How reproducible: 100% Steps to Reproduce: 1. Start creating a snapshot 2. Wait for vdsm to reply on the CreateVolumeVDSCommand and make the engine process it longer than the transaction timeout (300 by default), this can be achieved by strategically placing a sleep command in vdsm (possibly volume.py#create) Actual results: The operation will fail Expected results: Engine should patiently wait for vdsm to finish the task Additional info: We start the snapshot here: 2019-03-27 10:55:30,347+02 INFO [org.ovirt.engine.core.bll.snapshots.CreateSnapshotForVmCommand] (default task-3) [a6cebda9-38ee-46c3-b69c-01084f42be6f] Lock Acquired to object 'EngineLock:{exclusiveLocks='[9f5e07f2-1664-480c-bc8b-1f28fab56b7a=VM]', sharedLocks=''}' 2019-03-27 10:55:30,655+02 INFO [org.ovirt.engine.core.bll.snapshots.CreateSnapshotForVmCommand] (EE-ManagedThreadFactory-engine-Thread-20) [a6cebda9-38ee-46c3-b69c-01084f42be6f] Running command: CreateSnapshotForVmCommand internal: false. Entities affected : ID: 9f5e07f2-1664-480c-bc8b-1f28fab56b7a Type: VMAction group MANIPULATE_VM_SNAPSHOTS with role type USER 2019-03-27 10:55:30,980+02 INFO [org.ovirt.engine.core.bll.snapshots.CreateSnapshotCommand] (EE-ManagedThreadFactory-engine-Thread-20) [a6cebda9-38ee-46c3-b69c-01084f42be6f] Running command: CreateSnapshotCommand internal: true. Entities affected : ID: 00000000-0000-0000-0000-000000000000 Type: Storage 2019-03-27 10:55:31,041+02 INFO We send a create volume command to vdsm: [org.ovirt.engine.core.vdsbroker.irsbroker.CreateVolumeVDSCommand] (EE-ManagedThreadFactory-engine-Thread-20) [a6cebda9-38ee-46c3-b69c-01084f42be6f] START, CreateVolumeVDSCommand( CreateVolumeVDSCommandParameters:{storagePoolId='d191b902-4f0b-11e9-93ab-507b9dec63c2', ignoreFailoverLimit='false', storageDomainId='d75e9038-54c1-4c24-b79a-776ef336ca36', imageGroupId='62ed902a-056f-4b84-a2eb-c565a522ee96', imageSizeInBytes='1073741824', volumeFormat='COW', newImageId='579053f7-019f-4dbd-be2e-49340c35f7b2', imageType='Sparse', newImageDescription='', imageInitialSizeInBytes='0', imageId='873fc779-fd76-4565-9baf-660105197474', sourceImageGroupId='62ed902a-056f-4b84-a2eb-c565a522ee96'}), log id: 30362717 Vdsm replies within a timely manner: 2019-03-27 10:55:31,392+02 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.CreateVolumeVDSCommand] (EE-ManagedThreadFactory-engine-Thread-20) [a6cebda9-38ee-46c3-b69c-01084f42be6f] FINISH, CreateVolumeVDSCommand, return: 579053f7-019f-4dbd-be2e-49340c35f7b2, log id: 30362717 The transaction is rolled back because it timed out (30 seconds on my env) 2019-03-27 10:56:00,661+02 ERROR [org.ovirt.engine.core.bll.snapshots.CreateSnapshotForVmCommand] (Transaction Reaper Worker 0) [] Transaction rolled-back for command 'org.ovirt.engine.core.bll.snapshots.CreateSnapshotForVmCommand'. Only after the transaction has timed out engine begins to process vdsm's reply 2019-03-27 10:56:14,784+02 INFO [org.ovirt.engine.core.bll.tasks.AsyncTaskManager] (EE-ManagedThreadFactory-engine-Thread-20) [a6cebda9-38ee-46c3-b69c-01084f42be6f] Adding task '36cae13f-7a18-46da-9999-278001b44868' (Parent Command 'CreateSnapshot', Parameters Type 'org.ovirt.engine.core.common.asynctasks.AsyncTaskParameters'), polling hasn't started yet.. 2019-03-27 10:56:14,872+02 INFO [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (EE-ManagedThreadFactory-engine-Thread-20) [a6cebda9-38ee-46c3-b69c-01084f42be6f] BaseAsyncTask::startPollingTask: Starting to poll task '36cae13f-7a18-46da-9999-278001b44868'. // We fail because we no longer have a transaction available 2019-03-27 10:56:14,896+02 ERROR [org.ovirt.engine.core.bll.snapshots.CreateSnapshotForVmCommand] (EE-ManagedThreadFactory-engine-Thread-20) [a6cebda9-38ee-46c3-b69c-01084f42be6f] Command 'org.ovirt.engine.core.bll.snapshots.CreateSnapshotForVmCommand' failed: Failed to obtain JDBC Connection; nested exception is java.sql.SQLException: javax.resource.ResourceException: IJ000457: Unchecked throwable in managedConnectionReconnected() cl=org.jboss.jca.core.connectionmanager.listener.TxConnectionListener@6f537e03[state=NORMAL managed connection=org.jboss.jca.adapters.jdbc.local.LocalManagedConnection@60a21e07 connection handles=0 lastReturned=1553676974893 lastValidated=1553676811726 lastCheckedOut=1553676973899 trackByTx=false pool=org.jboss.jca.core.connectionmanager.pool.strategy.OnePool@32c57360 mcp=SemaphoreConcurrentLinkedQueueManagedConnectionPool@3d136373[pool=ENGINEDataSource] xaResource=LocalXAResourceImpl@7ec10f33[connectionListener=6f537e03 connectionManager=404b5954 warned=false currentXid=null productName=PostgreSQL productVersion=10.7 jndiName=java:/ENGINEDataSource] txSync=null] 2019-03-27 10:56:14,934+02 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engine-Thread-20) [a6cebda9-38ee-46c3-b69c-01084f42be6f] EVENT_ID: USER_FAILED_CREATE_SNAPSHOT(117), Failed to create Snapshot s2 for VM vm1 (User: admin@internal-authz).
> Steps to Reproduce: > 1. Start creating a snapshot > 2. Wait for vdsm to reply on the CreateVolumeVDSCommand and make the engine > process it longer than the transaction timeout (300 by default), this can be > achieved by strategically placing a sleep command in vdsm (possibly > volume.py#create) In step 2, I added "from time import sleep;sleep(400)" in "create" method in /usr/lib/python2.7/site-packages/vdsm/storage/volume.py. That's what you meant?
(In reply to Evelina Shames from comment #1) > > Steps to Reproduce: > > 1. Start creating a snapshot > > 2. Wait for vdsm to reply on the CreateVolumeVDSCommand and make the engine > > process it longer than the transaction timeout (300 by default), this can be > > achieved by strategically placing a sleep command in vdsm (possibly > > volume.py#create) > > > In step 2, I added "from time import sleep;sleep(400)" in "create" method in > /usr/lib/python2.7/site-packages/vdsm/storage/volume.py. > > That's what you meant? Verified by this comment. vdsm-4.30.12-1.el7ev.x86_64 ovirt-engine-4.3.3.2-0.1.el7.noarch Tell me if you have any comments.
If the snapshot was successful it should be good
This bugzilla is included in oVirt 4.3.3 release, published on April 16th 2019. Since the problem described in this bug report should be resolved in oVirt 4.3.3 release, it has been closed with a resolution of CURRENT RELEASE. If the solution does not work for you, please open a new bug report.
This bug seems to affect version 4.3.8. I have received in my RHV the following message during backup. We have no error failure or disconnection events, with our environment connected at 10GB. The following are the logs: Failed to complete snapshot 'vbk_590677bf-c292-45d5-8587-fb652651a3e5' creation for VM 'CLIGXXX'. VDSM hostXXX.domain command SnapshotVDS failed: Message timeout which can be caused by communication issues I did not identify in the usr/lib/python2.7 file/site-packages/vdsm/storage/volume.py where I should add the timeout option. Could you help me? # Packages: vdsm-4.30.40-1.el7ev.x86_64 ovirt-ansible-engine-setup-1.1.9-1.el7ev.noarch ovirt-hosted-engine-ha-2.3.6-1.el7ev.noarch python2-ovirt-host-deploy-1.8.4-1.el7ev.noarch ovirt-vmconsole-1.0.7-3.el7ev.noarch ovirt-imageio-common-1.5.3-0.el7ev.x86_64 cockpit-ovirt-dashboard-0.13.9-1.el7ev.noarch python2-ovirt-setup-lib-1.2.0-1.el7ev.noarch python-ovirt-engine-sdk4-4.3.2-1.el7ev.x86_64 ovirt-vmconsole-host-1.0.7-3.el7ev.noarch cockpit-machines-ovirt-195-1.el7.noarch python2-ovirt-node-ng-nodectl-4.3.7-0.20191031.0.el7ev.noarch ovirt-provider-ovn-driver-1.2.29-1.el7ev.noarch ovirt-hosted-engine-setup-2.3.12-1.el7ev.noarch ovirt-node-ng-nodectl-4.3.7-0.20191031.0.el7ev.noarch ovirt-ansible-repositories-1.1.5-1.el7ev.noarch ovirt-imageio-daemon-1.5.3-0.el7ev.noarch ovirt-host-4.3.5-1.el7ev.x86_64 ovirt-ansible-hosted-engine-setup-1.0.32-1.el7ev.noarch ovirt-host-dependencies-4.3.5-1.el7ev.x86_64 ovirt-host-deploy-common-1.8.4-1.el7ev.noarch