Bug 921201

Summary: rhevm-upgrade is failing between si26.4 to si27.4 (3.1.3) in async task cleanup
Product: Red Hat Enterprise Virtualization Manager Reporter: Haim <hateya>
Component: ovirt-engine-setupAssignee: Alex Lourie <alourie>
Status: CLOSED ERRATA QA Contact: Dafna Ron <dron>
Severity: high Docs Contact:
Priority: urgent    
Version: 3.1.3CC: acathrow, alourie, bazulay, chetan, dyasny, emesika, iheim, italkohe, mgoldboi, Rhev-m-bugs, sgrinber, tvvcox, yeylon, ykaul
Target Milestone: ---Keywords: ZStream
Target Release: 3.2.0   
Hardware: x86_64   
OS: Linux   
Whiteboard: integration
Fixed In Version: sf16 Doc Type: Bug Fix
Doc Text:
Upgrading the Manager failed when there were leftover compensation entries during the async tasks cleanup. These entries caused the backend to abort when trying to activate the compensation procedure on system startup. The taskcleaner utility now uses the -A flag, which enables rhevm-upgrade to remove all compensation data from the business_entity_snapshot table during upgrade.
Story Points: ---
Clone Of:
: 949694 (view as bug list) Environment:
Last Closed: 2013-06-10 21:36:04 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 921202    
Bug Blocks: 948448, 949694    
Attachments:
Description Flags
ovirt engine upgrade log none

Description Haim 2013-03-13 16:54:09 UTC
Description of problem:

running rhevm upgrade from si26.4 to si27.4 is failing while there are leftovers in the async tasks table, engine tries to stop them, wait for 3 minutes then prompt the following screen again (in a loop):

[ Mar 13 18:36:11 ] Would you like to proceed and try to stop tasks automatically?

attached full upgrade log.

root@rhevm-3 yum.repos.d]# rhevm-upgrade 
Loaded plugins: product-id, rhnplugin
This system is receiving updates from RHN Classic or RHN Satellite.

Checking for updates... (This may take several minutes)
11 Updates available:
 * rhevm-3.1.0-50.el6ev.noarch
 * rhevm-backend-3.1.0-50.el6ev.noarch
 * rhevm-config-3.1.0-50.el6ev.noarch
 * rhevm-dbscripts-3.1.0-50.el6ev.noarch
 * rhevm-genericapi-3.1.0-50.el6ev.noarch
 * rhevm-notification-service-3.1.0-50.el6ev.noarch
 * rhevm-restapi-3.1.0-50.el6ev.noarch
 * rhevm-tools-common-3.1.0-50.el6ev.noarch
 * rhevm-userportal-3.1.0-50.el6ev.noarch
 * rhevm-webadmin-portal-3.1.0-50.el6ev.noarch
 * vdsm-bootstrap-4.10.2-1.8.el6ev.noarch

During the upgrade process, RHEV Manager  will not be accessible.
All existing running virtual machines will continue but you will not be able to
start or stop any new virtual machines during the process.

Would you like to proceed? (yes|no): yes
Stopping ovirt-engine service...                         [ DONE ]
Stopping DB related services...                          [ DONE ]
Cleaning async tasks...                                  [ DONE ]

Info: The following tasks have been found running in the system: 

System Tasks:

                        command_type                        |                             entity_type                             
------------------------------------------------------------+---------------------------------------------------------------------
 org.ovirt.engine.core.bll.storage.RemoveStoragePoolCommand | org.ovirt.engine.core.common.businessentities.Network
 org.ovirt.engine.core.bll.storage.RemoveStoragePoolCommand | org.ovirt.engine.core.common.businessentities.Network
 org.ovirt.engine.core.bll.storage.RemoveStoragePoolCommand | org.ovirt.engine.core.common.businessentities.storage_pool_iso_map
 org.ovirt.engine.core.bll.storage.RemoveStoragePoolCommand | org.ovirt.engine.core.common.businessentities.storage_domain_static
(4 rows)




[ Mar 13 18:36:11 ] Would you like to proceed and try to stop tasks automatically?
(Answering 'no' will stop the upgrade)? (yes|no): yes

[ Mar 13 18:37:06 ] System will try to clear tasks during the next 3 minutes.


Info: The following tasks have been found running in the system: 

System Tasks:

                        command_type                        |                             entity_type                             
------------------------------------------------------------+---------------------------------------------------------------------
 org.ovirt.engine.core.bll.storage.RemoveStoragePoolCommand | org.ovirt.engine.core.common.businessentities.Network
 org.ovirt.engine.core.bll.storage.RemoveStoragePoolCommand | org.ovirt.engine.core.common.businessentities.Network
 org.ovirt.engine.core.bll.storage.RemoveStoragePoolCommand | org.ovirt.engine.core.common.businessentities.storage_pool_iso_map
 org.ovirt.engine.core.bll.storage.RemoveStoragePoolCommand | org.ovirt.engine.core.common.businessentities.storage_domain_static
(4 rows)




[ Mar 13 18:40:15 ] Would you like to proceed and try to stop tasks automatically?
(Answering 'no' will stop the upgrade)? (yes|no): no
Starting DB related services...                          [ DONE ]
Starting ovirt-engine...                                 [ DONE ]

There are still running tasks: 

System Tasks:

                        command_type                        |                             entity_type                             
------------------------------------------------------------+---------------------------------------------------------------------
 org.ovirt.engine.core.bll.storage.RemoveStoragePoolCommand | org.ovirt.engine.core.common.businessentities.Network
 org.ovirt.engine.core.bll.storage.RemoveStoragePoolCommand | org.ovirt.engine.core.common.businessentities.Network
 org.ovirt.engine.core.bll.storage.RemoveStoragePoolCommand | org.ovirt.engine.core.common.businessentities.storage_pool_iso_map
 org.ovirt.engine.core.bll.storage.RemoveStoragePoolCommand | org.ovirt.engine.core.common.businessentities.storage_domain_static
(4 rows)



Please make sure that there are no running system tasks before you continue. Please contact GSS for assistance. Stopping upgrade.
Error: Upgrade failed.

please check log at /var/log/ovirt-engine/ovirt-engine-upgrade_2013_03_13_18_33_42.log

Comment 1 Haim 2013-03-13 16:56:52 UTC
Created attachment 709718 [details]
ovirt engine upgrade log

Comment 4 Eli Mesika 2013-03-14 12:06:54 UTC
taskcleaner has a -C flag to clean compensation data, is the installer using it :

taskcleaner.sh [-h] [-s server] [-p PORT]] [-d DATABASE] [-u USERNAME] [-l LOGFILE]  [-t taskId] [-c commandId] [-z] [-R] [-C] [-J] [-q] [-v]

        -s SERVERNAME - The database servername for the database  (def. localhost)
        -p PORT       - The database port for the database        (def. 5432)
        -d DATABASE   - The database name                         (def. engine)
        -u USERNAME   - The admin username for the database.
        -l LOGFILE    - The logfile for capturing output          (def. taskcleaner.sh.log)
        -t TASK_ID    - Removes a task by its Task ID.
        -c COMMAND_ID - Removes all tasks related to the given Command Id.
        -z            - Removes/Displays a Zombie task.
        -R            - Removes all Zombie tasks.
        -C            - Clear related compensation entries.                                                                                                               
        -J            - Clear related Job Steps.                                                                                                                          
        -q            - Quite mode, do not prompt for confirmation.                                                                                                       
        -v            - Turn on verbosity                         (WARNING: lots of output)                                                                               
        -h            - This help text.

Comment 5 Alex Lourie 2013-03-14 12:56:38 UTC
(In reply to comment #4)
> taskcleaner has a -C flag to clean compensation data, is the installer using
> it :
> 
Yes. We run the upgrade with the -zRCJq flags.

Comment 6 Eli Mesika 2013-03-17 15:35:41 UTC
adding -A flag in bug 921202 fix
Please use in the installed -zAJq  instead of -zRCJq

Comment 10 Dafna Ron 2013-05-20 14:07:46 UTC
verified on sf17

before upgrade I had entried in the buisness_entity_snapshot and after upgrade it was cleared

before: 

engine=# SELECT * from business_entity_snapshot ;
                  id                  |              command_id              |                           command_type                           |                                               entity_id                                   
             |                             entity_type                             |                                         entity_snapshot                                         |                                      snapshot_class  
                                     | snapshot_type | insertion_order |          started_at           
--------------------------------------+--------------------------------------+------------------------------------------------------------------+-------------------------------------------------------------------------------------------
-------------+---------------------------------------------------------------------+-------------------------------------------------------------------------------------------------+------------------------------------------------------
-------------------------------------+---------------+-----------------+-------------------------------
 d7e2e524-c154-11e2-8a0c-001a4a169783 | e1cb8d0d-f6ae-4223-aa43-1337e9bbdd8b | org.ovirt.engine.core.bll.storage.ReconstructMasterDomainCommand | b31db687-35e8-403c-b61a-ac55bc09d7f8                                                      
             | org.ovirt.engine.core.common.businessentities.storage_domain_static | {                                                                                               | org.ovirt.engine.core.common.businessentities.storage
_domain_static                       |             0 |               2 | 2013-05-20 16:54:51.776459+03
                                                                                                                                                                                                                                            
                                                                                   :   "id" : [ "org.ovirt.engine.core.compat.Guid", {                                                                                                      
                                                                         
                                                                                                                                                                                                                                            
                                                                                   :     "uuid" : "b31db687-35e8-403c-b61a-ac55bc09d7f8"                                                                                                    
                                                                         
                                                                                                                                                                                                                                            
                                                                                   :   } ],                                                                                                                                                 
                                                                         
                                                                                                                                                                                                                                            
                                                                                   :   "storage" : "5jBDV4-MbBY-ZGoz-vOgA-Z26H-fogt-2zwh2e",                                                                                                
                                                                         
              

after:

engine=# SELECT * from business_entity_snapshot ;
 id | command_id | command_type | entity_id | entity_type | entity_snapshot | snapshot_class | snapshot_type | insertion_order | started_at 
----+------------+--------------+-----------+-------------+-----------------+----------------+---------------+-----------------+------------
(0 rows)

engine=#

Comment 11 errata-xmlrpc 2013-06-10 21:36:04 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHSA-2013-0888.html