Bug 877749 - Async tasks should be sampled continuously, not waited in a single long loop
Async tasks should be sampled continuously, not waited in a single long loop
Status: CLOSED CURRENTRELEASE
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-engine-setup (Show other bugs)
3.1.0
Unspecified Unspecified
high Severity high
: ---
: 3.2.0
Assigned To: Sandro Bonazzola
David Botzer
integration
:
Depends On: 921578
Blocks: 917401
  Show dependency treegraph
 
Reported: 2012-11-18 09:10 EST by Alex Lourie
Modified: 2014-01-13 19:04 EST (History)
9 users (show)

See Also:
Fixed In Version: sf-9
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed:
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
first-upgrade-fail (225.00 KB, application/octet-stream)
2013-03-12 09:47 EDT, David Botzer
no flags Details
Second-upgrade-fail-db-connections (224.95 KB, application/octet-stream)
2013-03-12 09:47 EDT, David Botzer
no flags Details
third-upgrade-fail (249.59 KB, application/octet-stream)
2013-03-12 09:48 EDT, David Botzer
no flags Details
No-Reports-Upgrade-Lock-Template (224.24 KB, application/octet-stream)
2013-03-14 07:18 EDT, David Botzer
no flags Details
rhevm-logs-NEW_1-5-2013 (72.50 KB, application/x-gzip)
2013-05-01 10:08 EDT, David Botzer
no flags Details
VDSM_Logs (748.30 KB, application/x-gzip)
2013-05-01 10:09 EDT, David Botzer
no flags Details


External Trackers
Tracker ID Priority Status Summary Last Updated
oVirt gerrit 12177 None None None Never

  None (edit)
Description Alex Lourie 2012-11-18 09:10:45 EST
Currently, during engine-upgrade there's a loop that waits for 3 minutes for System tasks to finish. Tasks could finish in 4 minutes, making a customer waiting for extra time.

The utility should sample the current situation and continue the upgrade as soon as tasks are cleared.
Comment 1 Sandro Bonazzola 2013-02-27 03:56:16 EST
patch 12177 merged upstream master: http://gerrit.ovirt.org/gitweb?p=ovirt-engine.git;a=commit;h=88142da3d12d2af79434582e47e10458716636fe
Comment 2 David Botzer 2013-03-10 03:59:27 EDT
How to Test This BZ,
Please advise What steps should I take to verify this BZ

Should I test upgrade to SF9, or from SF9 ?
Comment 3 David Botzer 2013-03-10 07:10:45 EDT
Should I upgrade to SF9 or from SF9 ?

Alex:
You need to start an upgrade with async tasks in the DB. After the upgrade started, you need to clear async tasks within less than 3 minutes; you should see that upgrade continues automatically right away.
Comment 6 David Botzer 2013-03-12 09:41:09 EDT
I started upgrade while asyn was running
in less than 30sec upgrade failed and only postgres service was up,

Is it correct ??
-----------------

rhevm-upgrade 

Checking for updates... (This may take several minutes)...[ DONE ]
10 Updates available:
 * rhevm-3.2.0-10.14.beta1.el6ev.noarch
 * rhevm-backend-3.2.0-10.14.beta1.el6ev.noarch
 * rhevm-config-3.2.0-10.14.beta1.el6ev.noarch
 * rhevm-dbscripts-3.2.0-10.14.beta1.el6ev.noarch
 * rhevm-genericapi-3.2.0-10.14.beta1.el6ev.noarch
 * rhevm-notification-service-3.2.0-10.14.beta1.el6ev.noarch
 * rhevm-restapi-3.2.0-10.14.beta1.el6ev.noarch
 * rhevm-tools-common-3.2.0-10.14.beta1.el6ev.noarch
 * rhevm-userportal-3.2.0-10.14.beta1.el6ev.noarch
 * rhevm-webadmin-portal-3.2.0-10.14.beta1.el6ev.noarch
Stopping ovirt-engine service...                     [ DONE ]
Stopping DB related services...                      [ DONE ]
Cleaning async tasks...                              [ DONE ]
Pre-upgrade validations...                           [ DONE ]
Backing Up Database...                               [ DONE ]
Rename Database...                                   [ ERROR ]
Error: Database rename failed. Check that there are no active connections to the DB and try again.
Error: Upgrade failed.
please check log at /var/log/ovirt-engine/ovirt-engine-upgrade_2013_03_12_15_38_28.log
[root@dbotzer-upgrade yum.repos.d]# Srvs 
postmaster (pid  16090) is running...
The engine is not running.
/etc/init.d/ovirt-engine-dwhd is stopped
Comment 7 David Botzer 2013-03-12 09:46:44 EDT
Second time I ran upgrade it fails saying connections to db exists,

Only after I did manually restart to postgresql & ovirt-engine-dwhd services
I could run upgrade,
======================================================

rhevm-upgrade 

Checking for updates... (This may take several minutes)...[ DONE ]
10 Updates available:
 * rhevm-3.2.0-10.14.beta1.el6ev.noarch
 * rhevm-backend-3.2.0-10.14.beta1.el6ev.noarch
 * rhevm-config-3.2.0-10.14.beta1.el6ev.noarch
 * rhevm-dbscripts-3.2.0-10.14.beta1.el6ev.noarch
 * rhevm-genericapi-3.2.0-10.14.beta1.el6ev.noarch
 * rhevm-notification-service-3.2.0-10.14.beta1.el6ev.noarch
 * rhevm-restapi-3.2.0-10.14.beta1.el6ev.noarch
 * rhevm-tools-common-3.2.0-10.14.beta1.el6ev.noarch
 * rhevm-userportal-3.2.0-10.14.beta1.el6ev.noarch
 * rhevm-webadmin-portal-3.2.0-10.14.beta1.el6ev.noarch
Stopping ovirt-engine service...                     [ DONE ]
Stopping DB related services...                      [ DONE ]
Cleaning async tasks...                              [ DONE ]
Pre-upgrade validations...                           [ DONE ]
Backing Up Database...                               [ DONE ]
Rename Database...                                   [ ERROR ]
Error: Database rename failed. Check that there are no active connections to the DB and try again.
Error: Upgrade failed.
please check log at /var/log/ovirt-engine/ovirt-engine-upgrade_2013_03_12_15_41_14.log
Comment 8 David Botzer 2013-03-12 09:47:21 EDT
Created attachment 708979 [details]
first-upgrade-fail
Comment 9 David Botzer 2013-03-12 09:47:50 EDT
Created attachment 708980 [details]
Second-upgrade-fail-db-connections
Comment 10 David Botzer 2013-03-12 09:48:29 EDT
Created attachment 708982 [details]
third-upgrade-fail
Comment 11 David Botzer 2013-03-12 09:49:20 EDT
According to the above results,
Is it correct behaviour for upgrade to run ?
Comment 13 David Botzer 2013-03-12 10:47:11 EDT
Eli:
Can u please advise, 
why something was connected to the db while dwhd is stopped and nothing should be still connected to it
Comment 15 David Botzer 2013-03-13 04:18:37 EDT
When I upgrade while I have dwh running, should upgrade-process clear all db connections
in order for rhevm-upgrade to succeed ?

I think its not quite ok, to close Bug 877749, since it identified Async tasks and stopped,
But the error was not on async tasks
it was - as see in the first notes in the bz

Error: Database rename failed. Check that there are no active connections to the DB and try again.
Error: Upgrade failed.
-----------------
So a user cannot understand its because async tasks were running
Comment 16 Eli Mesika 2013-03-13 04:50:30 EDT
(In reply to comment #13)
> Eli:
> Can u please advise, 
> why something was connected to the db while dwhd is stopped and nothing
> should be still connected to it

The fact is that you had using the database with other tool or instance 
The easiest way to prevent that is

1) restart the postgresql server
or
2) use ps -ef to find who is using the database
Comment 17 David Botzer 2013-03-13 05:16:05 EDT
It seems obvious that dwh is using the DB,
I had this discussion with Alex, whether rhevm-upgrade should do what u suggested
and integration decided that user should do it manually before upgrade

but what happens where both dwh and async tasks are running....
Comment 18 David Botzer 2013-03-13 06:01:41 EDT
Fixed, 3.2/sf10
rhevm-upgrade stops right away on the case of running async tasks
!!-- there is a place for a correct message why upgrade was stopped--!
Because it shows in one of the stages -->
Cleaning async tasks...                              [ DONE ]
When clearing those tasks upgrade continues
Fixed, 3.2/sf10

Please answer this last note so I can close this BZ
Comment 19 Sandro Bonazzola 2013-03-13 11:41:03 EDT
if you need to know how to see why it's stopped to "Cleaning async tasks" step,
while the script is waiting it writes in the log a debug line "Still waiting for system tasks to be cleared."
Comment 20 David Botzer 2013-03-14 07:17:56 EDT
I installed a Clean rhevm 3.2/SF9 without Reports and without DWH.

I imported Template from NFS EXP Domain, and could see it in psql
I started rhevm-upgrade (SF10) and the Async Task information was deleted from psql,

See pastebin
http://pastebin.test.redhat.com/132514

To examine if rhevm-upgrade failure had "leftovers"
I had started engine service, to check the template but its in Lock !!!

./unlock_entity.sh -t disk -q -s localhost -p 5432 -d engine -u postgres
/usr/share/ovirt-engine/dbscripts /usr/share/ovirt-engine/dbscripts
              entity_id               |               disk_id                
--------------------------------------+--------------------------------------
 3f8d3031-8423-4b4c-8320-63b491328981 | cc0be0f7-649c-4c19-bcb6-5a1fb7e739c3
(1 row)

See Log: ovirt-engine-upgrade_2013_03_14_13_08_25
Comment 21 David Botzer 2013-03-14 07:18:35 EDT
Created attachment 709979 [details]
No-Reports-Upgrade-Lock-Template
Comment 22 David Botzer 2013-04-28 05:08:20 EDT
Hi,

Where does this bug stands ?
What about the last note ?
Comment 24 David Botzer 2013-05-01 10:08:15 EDT
3.2/sf13 -> 3.2/sf14

1.
I upgraded the rhevm with async task running - 
Creating a Template from VM - After upgrade the VM is "Image locked" & the template I started to create is "locked" as well,
Is this a correct behaviour ?

./unlock_entity.sh -t disk -q -s localhost -p 5432 -d engine -u postgres
/usr/share/ovirt-engine/dbscripts /usr/share/ovirt-engine/dbscripts
              entity_id               |               disk_id                
--------------------------------------+--------------------------------------
 485da855-3faf-4bdf-a35f-0efcb329759a | 830d3180-4785-4ec2-beb6-bae47920b1f3
 d9b089a7-6066-434f-8754-3b0ecabc7d81 | e607fb2d-ad82-436c-8970-0046d2c22de0
(2 rows)

---------------------------
2.
I did upgrade from SF13 to SF14 (No reports, only rhevm)
See below -> Is this the message I should get when upgrading, while running async task ??
Should there be a System Task name ?! or is it the name itself ? 
------------------------------------------------------------
Would you like to proceed? (yes|no): y
Stopping ovirt-engine service...                     [ DONE ]
Stopping DB related services...                      [ DONE ]
Cleaning async tasks...                              [ DONE ]

Info: The following tasks have been found running in the system:

System Tasks:




[ May 01 16:52:40 ] Would you like to proceed and try to stop tasks automatically?
(Answering 'no' will stop the upgrade)? (yes|no): y

[ May 01 16:53:13 ] System will try to clear tasks during the next 3 minutes.
-----------------------------------------------------------
Attached logs
Comment 25 David Botzer 2013-05-01 10:08:47 EDT
Created attachment 742219 [details]
rhevm-logs-NEW_1-5-2013
Comment 26 David Botzer 2013-05-01 10:09:09 EDT
Created attachment 742220 [details]
VDSM_Logs
Comment 27 David Botzer 2013-05-02 06:59:06 EDT
Fixed, 3.2/sf14
Tasks are examined correctly,
the message during upgrade, and stuck template will have new BZs
Fixed, 3.2/sf14
Comment 28 Itamar Heim 2013-06-11 05:37:11 EDT
3.2 has been released
Comment 29 Itamar Heim 2013-06-11 05:37:15 EDT
3.2 has been released
Comment 30 Itamar Heim 2013-06-11 05:51:39 EDT
3.2 has been released

Note You need to log in before you can comment on or make changes to this bug.