Created attachment 1131842 [details] Engine Log from around the time of snapshot deletion Description of problem: Deleting a snapshot of a virtual machine with persistent memory has caused the snapshot to remain locked with the host putting the following into the log files: 2016-02-25 10:32:53,272 INFO [org.ovirt.engine.core.bll.RemoveSnapshotCommandCallback] (DefaultQuartzScheduler_Worker-25) [4559dd29] Waiting on Live Merge child commands to complete 2016-02-25 10:32:53,332 INFO [org.ovirt.engine.core.bll.RemoveSnapshotSingleDiskLiveCommand] (DefaultQuartzScheduler_Worker-25) [225b841f] Waiting on Live Merge command step MERGE to complete 2016-02-25 10:32:53,335 INFO [org.ovirt.engine.core.bll.MergeCommandCallback] (DefaultQuartzScheduler_Worker-25) [4559dd29] Waiting on merge command to complete Version-Release number of selected component (if applicable): ovirt-hosted: CentOS 6.6 oVirt 3.5.1.1 ovirt-node: CentOS 7.1.1503 vdsm-4.16.20-0.el7.centos.x86_64 How reproducible: Deleting other snapshots have succeeded successfully. Steps to Reproduce: 1. Delete snapshot of VM. Actual results: Deleting the snapshot fails to complete with the log file filling up. Expected results: Snapshot deletes after a short period. Additional info: Snapshot deleted at 2016-02-24 23:51:17,094
I believe this is fixed already, but pending storage for triage/confirmation
Target release should be placed once a package build is known to fix a issue. Since this bug is not modified, the target version has been reset. Please use target milestone to plan a fix for a oVirt release.
Bug tickets must have version flags set prior to targeting them to a release. Please ask maintainer to set the correct version flags and only then set the target milestone.
This should be fixed now. In addition, there is a manual recovery script. See bug 1308501.
(In reply to Ala Hino from comment #4) > This should be fixed now. > > In addition, there is a manual recovery script. See bug 1308501. I've had a look through the engine log, but can't find a line that indicates that the DESTROY_IMAGE_CHECK step failed?
This is also the output from the "VM disk info gathering tool" script. digueti is the VM which has the locked snapshot. [root@ovirt-node-00 tmp]# ./check-disk-info.py VM digueti Disk c8de9fbe-ba6b-4e56-903d-b14c55b274f4 (sd:8d6835a5-7b8f-4cc7-a8bb-3a07926d522d) Volumes: 91171c6a-34cb-42fb-ab86-0d92442aa200 46f09997-a106-48be-b6f3-d20b08b870d2
This engine version doesn't include the DESTROY_IMAGE-CHECK step (patch added in 3.6.2). In the log I see that the merge successfully completes: 2016-02-22 10:06:34,056 INFO [org.ovirt.engine.core.bll.RemoveSnapshotSingleDiskLiveCommand] (DefaultQuartzScheduler_Worker-26) [550c9293] Executing Live Merge command step COMPLETE ... 2016-02-22 10:06:44,598 INFO [org.ovirt.engine.core.bll.RemoveSnapshotSingleDiskLiveCommand] (DefaultQuartzScheduler_Worker-30) [550c9293] Successfully merged snapshot 0e949eb7-eb84-4736-90af-d03742bf2836 images 9cf105f3-74ea-4faf-8e2a-0f712df8d660..747afa04-1ee5-4783-888a-7a417c51d595 This looks the cases we run into were the merge successfully completes at vdsm side but not at the engine side. Specifically, this is related to the fact that we used GetVolumeInfo rather than the new SPMGetVolumeInfo. With latest changes we have in this area, this behavior supposed to be fixed.
(In reply to Ala Hino from comment #7) > This engine version doesn't include the DESTROY_IMAGE-CHECK step (patch > added in 3.6.2). > > In the log I see that the merge successfully completes: > > 2016-02-22 10:06:34,056 INFO > [org.ovirt.engine.core.bll.RemoveSnapshotSingleDiskLiveCommand] > (DefaultQuartzScheduler_Worker-26) [550c9293] Executing Live Merge command > step COMPLETE > The snapshot deletion wasn't started until 2016-02-24 23:51:18,221 (line 75626). The log file is now full of these errors: 2016-02-25 03:18:43,415 INFO [org.ovirt.engine.core.vdsbroker.VdsUpdateRunTimeInfo] (DefaultQuartzScheduler_Worker-66) [4559dd29] VM job c1064551-022f-4549-a9d4-bb915e177e80: In progress, updating 2016-02-25 03:18:45,100 INFO [org.ovirt.engine.core.bll.RemoveSnapshotCommandCallback] (DefaultQuartzScheduler_Worker-40) [4559dd29] Waiting on Live Merge child commands to complete 2016-02-25 03:18:45,209 INFO [org.ovirt.engine.core.bll.RemoveSnapshotSingleDiskLiveCommand] (DefaultQuartzScheduler_Worker-40) [225b841f] Waiting on Live Merge command step MERGE to complete 2016-02-25 03:18:45,212 INFO [org.ovirt.engine.core.bll.MergeCommandCallback] (DefaultQuartzScheduler_Worker-40) [4559dd29] Waiting on merge command to complete [...] 2016-03-02 03:45:52,581 INFO [org.ovirt.engine.core.vdsbroker.VdsUpdateRunTimeInfo] (DefaultQuartzScheduler_Worker-44) [4559dd29] VM job c1064551-022f-4549-a9d4-bb915e177e80: In progress (no change) 2016-03-02 03:45:53,539 INFO [org.ovirt.engine.core.bll.RemoveSnapshotCommandCallback] (DefaultQuartzScheduler_Worker-66) [4559dd29] Waiting on Live Merge child commands to complete 2016-03-02 03:45:53,555 INFO [org.ovirt.engine.core.bll.RemoveSnapshotSingleDiskLiveCommand] (DefaultQuartzScheduler_Worker-66) [225b841f] Waiting on Live Merge command step MERGE to complete 2016-03-02 03:45:53,556 INFO [org.ovirt.engine.core.bll.MergeCommandCallback] (DefaultQuartzScheduler_Worker-66) [4559dd29] Waiting on merge command to complete
At some time those "Waiting on merge command to complete" should stop and you should see the final status of the operation. Are still using same build (3.5.z) or upgrade to 3.6.z?
(In reply to Ala Hino from comment #9) > At some time those "Waiting on merge command to complete" should stop and > you should see the final status of the operation. They are still going... (over 6 days later :-( ) 2016-03-02 15:55:32,727 INFO [org.ovirt.engine.core.vdsbroker.VdsUpdateRunTimeInfo] (DefaultQuartzScheduler_Worker-42) [4559dd29] VM job c1064551-022f-4549-a9d4-bb915e177e80: In progress, updating 2016-03-02 15:55:35,857 INFO [org.ovirt.engine.core.vdsbroker.VdsUpdateRunTimeInfo] (DefaultQuartzScheduler_Worker-89) [4559dd29] VM job c1064551-022f-4549-a9d4-bb915e177e80: In progress (no change) 2016-03-02 15:55:39,136 INFO [org.ovirt.engine.core.vdsbroker.VdsUpdateRunTimeInfo] (DefaultQuartzScheduler_Worker-1) [4559dd29] VM job c1064551-022f-4549-a9d4-bb915e177e80: In progress (no change) 2016-03-02 15:55:41,245 INFO [org.ovirt.engine.core.bll.RemoveSnapshotCommandCallback] (DefaultQuartzScheduler_Worker-73) [4559dd29] Waiting on Live Merge child commands to complete 2016-03-02 15:55:41,259 INFO [org.ovirt.engine.core.bll.RemoveSnapshotSingleDiskLiveCommand] (DefaultQuartzScheduler_Worker-73) [225b841f] Waiting on Live Merge command step MERGE to complete 2016-03-02 15:55:41,263 INFO [org.ovirt.engine.core.bll.MergeCommandCallback] (DefaultQuartzScheduler_Worker-73) [4559dd29] Waiting on merge command to complete 2016-03-02 15:55:42,232 INFO [org.ovirt.engine.core.vdsbroker.VdsUpdateRunTimeInfo] (DefaultQuartzScheduler_Worker-80) [4559dd29] VM job c1064551-022f-4549-a9d4-bb915e177e80: In progress (no change) > Are still using same build (3.5.z) or upgrade to 3.6.z? Yes: oVirt Engine Version: 3.5.1-1.el6
Verified using builds: rhevm-3.6.3.3-0.1.el6.noarch vdsm-4.17.23-0.el7ev.noarch Scenario: 1. Create VM (install OS and guest-agent). 2. Create snapshots (save memory). 3. Delete snapshots (one by one). All 9 snapshots were deleted within 2-4 minutes. From engine.log: 2016-03-02 20:10:42,016 INFO [org.ovirt.engine.core.bll.RemoveSnapshotCommandCallback] (DefaultQuartzScheduler_Worker-95) [aa1466d] Waiting on Live Merge child commands to complete 2016-03-02 20:10:52,028 INFO [org.ovirt.engine.core.bll.RemoveSnapshotSingleDiskLiveCommand] (DefaultQuartzScheduler_Worker-58) [4971bc63] Waiting on Live Merge command step 'MERGE' to complete 2016-03-02 20:11:18,138 INFO [org.ovirt.engine.core.bll.RemoveSnapshotSingleDiskLiveCommand] (DefaultQuartzScheduler_Worker-55) [4971bc63] Waiting on Live Merge command step 'DESTROY_IMAGE' to complete 2016-03-02 20:12:30,216 INFO [org.ovirt.engine.core.bll.RemoveSnapshotSingleDiskLiveCommand] (DefaultQuartzScheduler_Worker-62) [4971bc63] Executing Live Merge command step 'DESTROY_IMAGE_CHECK'
(In reply to sha from comment #10) > (In reply to Ala Hino from comment #9) > > At some time those "Waiting on merge command to complete" should stop and > > you should see the final status of the operation. > > They are still going... (over 6 days later :-( ) > > 2016-03-02 15:55:32,727 INFO > [org.ovirt.engine.core.vdsbroker.VdsUpdateRunTimeInfo] > (DefaultQuartzScheduler_Worker-42) [4559dd29] VM job > c1064551-022f-4549-a9d4-bb915e177e80: In progress, updating > 2016-03-02 15:55:35,857 INFO > [org.ovirt.engine.core.vdsbroker.VdsUpdateRunTimeInfo] > (DefaultQuartzScheduler_Worker-89) [4559dd29] VM job > c1064551-022f-4549-a9d4-bb915e177e80: In progress (no change) > 2016-03-02 15:55:39,136 INFO > [org.ovirt.engine.core.vdsbroker.VdsUpdateRunTimeInfo] > (DefaultQuartzScheduler_Worker-1) [4559dd29] VM job > c1064551-022f-4549-a9d4-bb915e177e80: In progress (no change) > 2016-03-02 15:55:41,245 INFO > [org.ovirt.engine.core.bll.RemoveSnapshotCommandCallback] > (DefaultQuartzScheduler_Worker-73) [4559dd29] Waiting on Live Merge child > commands to complete > 2016-03-02 15:55:41,259 INFO > [org.ovirt.engine.core.bll.RemoveSnapshotSingleDiskLiveCommand] > (DefaultQuartzScheduler_Worker-73) [225b841f] Waiting on Live Merge command > step MERGE to complete > 2016-03-02 15:55:41,263 INFO > [org.ovirt.engine.core.bll.MergeCommandCallback] > (DefaultQuartzScheduler_Worker-73) [4559dd29] Waiting on merge command to > complete > 2016-03-02 15:55:42,232 INFO > [org.ovirt.engine.core.vdsbroker.VdsUpdateRunTimeInfo] > (DefaultQuartzScheduler_Worker-80) [4559dd29] VM job > c1064551-022f-4549-a9d4-bb915e177e80: In progress (no change) > > > Are still using same build (3.5.z) or upgrade to 3.6.z? > Yes: > oVirt Engine Version: 3.5.1-1.el6 6 days is way too much. Is it an environment that you can upgrade?
(In reply to Ala Hino from comment #12) > (In reply to sha from comment #10) > > (In reply to Ala Hino from comment #9) > > > At some time those "Waiting on merge command to complete" should stop and > > > you should see the final status of the operation. > > > > They are still going... (over 6 days later :-( ) > > > > 2016-03-02 15:55:32,727 INFO > > [org.ovirt.engine.core.vdsbroker.VdsUpdateRunTimeInfo] > > (DefaultQuartzScheduler_Worker-42) [4559dd29] VM job > > c1064551-022f-4549-a9d4-bb915e177e80: In progress, updating > > 2016-03-02 15:55:35,857 INFO > > [org.ovirt.engine.core.vdsbroker.VdsUpdateRunTimeInfo] > > (DefaultQuartzScheduler_Worker-89) [4559dd29] VM job > > c1064551-022f-4549-a9d4-bb915e177e80: In progress (no change) > > 2016-03-02 15:55:39,136 INFO > > [org.ovirt.engine.core.vdsbroker.VdsUpdateRunTimeInfo] > > (DefaultQuartzScheduler_Worker-1) [4559dd29] VM job > > c1064551-022f-4549-a9d4-bb915e177e80: In progress (no change) > > 2016-03-02 15:55:41,245 INFO > > [org.ovirt.engine.core.bll.RemoveSnapshotCommandCallback] > > (DefaultQuartzScheduler_Worker-73) [4559dd29] Waiting on Live Merge child > > commands to complete > > 2016-03-02 15:55:41,259 INFO > > [org.ovirt.engine.core.bll.RemoveSnapshotSingleDiskLiveCommand] > > (DefaultQuartzScheduler_Worker-73) [225b841f] Waiting on Live Merge command > > step MERGE to complete > > 2016-03-02 15:55:41,263 INFO > > [org.ovirt.engine.core.bll.MergeCommandCallback] > > (DefaultQuartzScheduler_Worker-73) [4559dd29] Waiting on merge command to > > complete > > 2016-03-02 15:55:42,232 INFO > > [org.ovirt.engine.core.vdsbroker.VdsUpdateRunTimeInfo] > > (DefaultQuartzScheduler_Worker-80) [4559dd29] VM job > > c1064551-022f-4549-a9d4-bb915e177e80: In progress (no change) > > > > > Are still using same build (3.5.z) or upgrade to 3.6.z? > > Yes: > > oVirt Engine Version: 3.5.1-1.el6 > > 6 days is way too much. > Is it an environment that you can upgrade? As long as this issue won't prevent the oVirt hosted engine from being upgraded, I can certainly try it. I shall back everything up and give it a whirl.
(In reply to Ala Hino from comment #12) > (In reply to sha from comment #10) > > (In reply to Ala Hino from comment #9) > > > At some time those "Waiting on merge command to complete" should stop and > > > you should see the final status of the operation. > > > > They are still going... (over 6 days later :-( ) > > > > 2016-03-02 15:55:32,727 INFO > > [org.ovirt.engine.core.vdsbroker.VdsUpdateRunTimeInfo] > > (DefaultQuartzScheduler_Worker-42) [4559dd29] VM job > > c1064551-022f-4549-a9d4-bb915e177e80: In progress, updating > > 2016-03-02 15:55:35,857 INFO > > [org.ovirt.engine.core.vdsbroker.VdsUpdateRunTimeInfo] > > (DefaultQuartzScheduler_Worker-89) [4559dd29] VM job > > c1064551-022f-4549-a9d4-bb915e177e80: In progress (no change) > > 2016-03-02 15:55:39,136 INFO > > [org.ovirt.engine.core.vdsbroker.VdsUpdateRunTimeInfo] > > (DefaultQuartzScheduler_Worker-1) [4559dd29] VM job > > c1064551-022f-4549-a9d4-bb915e177e80: In progress (no change) > > 2016-03-02 15:55:41,245 INFO > > [org.ovirt.engine.core.bll.RemoveSnapshotCommandCallback] > > (DefaultQuartzScheduler_Worker-73) [4559dd29] Waiting on Live Merge child > > commands to complete > > 2016-03-02 15:55:41,259 INFO > > [org.ovirt.engine.core.bll.RemoveSnapshotSingleDiskLiveCommand] > > (DefaultQuartzScheduler_Worker-73) [225b841f] Waiting on Live Merge command > > step MERGE to complete > > 2016-03-02 15:55:41,263 INFO > > [org.ovirt.engine.core.bll.MergeCommandCallback] > > (DefaultQuartzScheduler_Worker-73) [4559dd29] Waiting on merge command to > > complete > > 2016-03-02 15:55:42,232 INFO > > [org.ovirt.engine.core.vdsbroker.VdsUpdateRunTimeInfo] > > (DefaultQuartzScheduler_Worker-80) [4559dd29] VM job > > c1064551-022f-4549-a9d4-bb915e177e80: In progress (no change) > > > > > Are still using same build (3.5.z) or upgrade to 3.6.z? > > Yes: > > oVirt Engine Version: 3.5.1-1.el6 > > 6 days is way too much. > Is it an environment that you can upgrade? I am unable to upgrade the ovirt-hosted-engine due to it failing to clear a zombie task: [ ERROR ] Failed to execute stage 'Setup validation': Failed to clear zombie tasks. Please access support in attempt to resolve the problem
Created attachment 1133710 [details] Failed upgrade of ovirt hosted engine
Eli, Sandro, Can you cast an eye and advice? I see following errors in the engine log: 2016-03-07 10:48:52 DEBUG otopi.plugins.ovirt_engine_setup.ovirt_engine.upgrade.asynctasks plugin.executeRaw:878 execute-result: ('/usr/share/ovirt-engine/setup/dbutils/taskcleaner.sh', '-l', '/var/log/ovirt-engine/setup/ovirt-engine-setup-20160307104727-tfmnss.log', '-u', 'engine', '-s', 'localhost', '-p', '5432', '-d', 'engine', '-q', '-R', '-z', '-A'), rc=1 2016-03-07 10:48:52 DEBUG otopi.plugins.ovirt_engine_setup.ovirt_engine.upgrade.asynctasks plugin.execute:936 execute-output: ('/usr/share/ovirt-engine/setup/dbutils/taskcleaner.sh', '-l', '/var/log/ovirt-engine/setup/ovirt-engine-setup-20160307104727-tfmnss.log', '-u', 'engine', '-s', 'localhost', '-p', '5432', '-d', 'engine', '-q', '-R', '-z', '-A') stdout: t 2016-03-07 10:48:52 DEBUG otopi.plugins.ovirt_engine_setup.ovirt_engine.upgrade.asynctasks plugin.execute:941 execute-output: ('/usr/share/ovirt-engine/setup/dbutils/taskcleaner.sh', '-l', '/var/log/ovirt-engine/setup/ovirt-engine-setup-20160307104727-tfmnss.log', '-u', 'engine', '-s', 'localhost', '-p', '5432', '-d', 'engine', '-q', '-R', '-z', '-A') stderr: ERROR: function fn_db_is_table_exists(unknown) does not exist LINE 1: SELECT (fn_db_is_table_exists ('command_entities')) ^ HINT: No function matches the given name and argument types. You might need to add explicit type casts. QUERY: SELECT (fn_db_is_table_exists ('command_entities')) CONTEXT: PL/pgSQL function "deleteallcommandentities" line 2 at IF SQL statement "SELECT DeleteAllCommandEntities()" PL/pgSQL function "deleteallentitysnapshot" line 3 at PERFORM FATAL: Cannot execute sql command: --command=SELECT DeleteAllJobs(); SELECT DeleteAllEntitySnapshot();SELECT DeleteAsyncTasksZombies(); 2016-03-07 10:48:52 DEBUG otopi.context context._executeMethod:156 method exception Traceback (most recent call last): File "/usr/lib/python2.6/site-packages/otopi/context.py", line 146, in _executeMethod method['method']() File "/usr/share/ovirt-engine/setup/bin/../plugins/ovirt-engine-setup/ovirt-engine/upgrade/asynctasks.py", line 464, in _validateZombies self._clearZombies() File "/usr/share/ovirt-engine/setup/bin/../plugins/ovirt-engine-setup/ovirt-engine/upgrade/asynctasks.py", line 162, in _clearZombies 'Failed to clear zombie tasks. ' RuntimeError: Failed to clear zombie tasks. Please access support in attempt to resolve the problem
Seems that you had an unsuccessful upgrade that dropped your functions. Please try to run .../dbscripts/schema.sh -u engine -d <db> -c refresh and try again
I'm assuming I am supposed to be doing this from the hosted engine... [root@ovirt-hosted-00 dbscripts]# /usr/share/ovirt-engine/dbscripts/schema.sh -u engine -d engine -c refresh Saving custom users permissions on database objects... pg_dump: [archiver (db)] connection to database "engine" failed: fe_sendauth: no password supplied FATAL: Cannot execute pg_dump command: --schema-only psql: fe_sendauth: no password supplied FATAL: Cannot execute sql command: --file=/usr/share/ovirt-engine/dbscripts/common_sp.sql I also tried as the postgres user: [postgres@ovirt-hosted-00 dbscripts]$ ./schema.sh -u engine -d engine -c refresh Saving custom users permissions on database objects... pg_dump: [archiver (db)] connection to database "engine" failed: fe_sendauth: no password supplied FATAL: Cannot execute pg_dump command: --schema-only psql: fe_sendauth: no password supplied FATAL: Cannot execute sql command: --file=./common_sp.sql [postgres@ovirt-hosted-00 dbscripts]$ psql -l List of databases Name | Owner | Encoding | Collation | Ctype | Access privileges -----------+----------+----------+-------------+-------------+----------------------- engine | engine | UTF8 | en_US.UTF-8 | en_US.UTF-8 | postgres | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 | template0 | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 | =c/postgres : postgres=CTc/postgres template1 | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 | =c/postgres : postgres=CTc/postgres (4 rows)
(In reply to Ala Hino from comment #16) > Eli, Sandro, > > Can you cast an eye and advice? > > I see following errors in the engine log: > > 2016-03-07 10:48:52 DEBUG > otopi.plugins.ovirt_engine_setup.ovirt_engine.upgrade.asynctasks > plugin.executeRaw:878 execute-result: > ('/usr/share/ovirt-engine/setup/dbutils/taskcleaner.sh', '-l', > '/var/log/ovirt-engine/setup/ovirt-engine-setup-20160307104727-tfmnss.log', > '-u', 'engine', '-s', 'localhost', '-p', '5432', '-d', 'engine', '-q', '-R', > '-z', '-A'), rc=1 > 2016-03-07 10:48:52 DEBUG > otopi.plugins.ovirt_engine_setup.ovirt_engine.upgrade.asynctasks > plugin.execute:936 execute-output: > ('/usr/share/ovirt-engine/setup/dbutils/taskcleaner.sh', '-l', > '/var/log/ovirt-engine/setup/ovirt-engine-setup-20160307104727-tfmnss.log', > '-u', 'engine', '-s', 'localhost', '-p', '5432', '-d', 'engine', '-q', '-R', > '-z', '-A') stdout: > t > > 2016-03-07 10:48:52 DEBUG > otopi.plugins.ovirt_engine_setup.ovirt_engine.upgrade.asynctasks > plugin.execute:941 execute-output: > ('/usr/share/ovirt-engine/setup/dbutils/taskcleaner.sh', '-l', > '/var/log/ovirt-engine/setup/ovirt-engine-setup-20160307104727-tfmnss.log', > '-u', 'engine', '-s', 'localhost', '-p', '5432', '-d', 'engine', '-q', '-R', > '-z', '-A') stderr: > ERROR: function fn_db_is_table_exists(unknown) does not exist > LINE 1: SELECT (fn_db_is_table_exists ('command_entities')) This function was added in 3.5.2. Opened bug 1315744 for this.