Bug 877749 - Async tasks should be sampled continuously, not waited in a single long loop
Summary: Async tasks should be sampled continuously, not waited in a single long loop
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-engine-setup
Version: 3.1.0
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 3.2.0
Assignee: Sandro Bonazzola
QA Contact: David Botzer
URL:
Whiteboard: integration
Depends On: 921578
Blocks: 917401
TreeView+ depends on / blocked
 
Reported: 2012-11-18 14:10 UTC by Alex Lourie
Modified: 2014-01-14 00:04 UTC (History)
9 users (show)

Fixed In Version: sf-9
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:
oVirt Team: ---
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
first-upgrade-fail (225.00 KB, application/octet-stream)
2013-03-12 13:47 UTC, David Botzer
no flags Details
Second-upgrade-fail-db-connections (224.95 KB, application/octet-stream)
2013-03-12 13:47 UTC, David Botzer
no flags Details
third-upgrade-fail (249.59 KB, application/octet-stream)
2013-03-12 13:48 UTC, David Botzer
no flags Details
No-Reports-Upgrade-Lock-Template (224.24 KB, application/octet-stream)
2013-03-14 11:18 UTC, David Botzer
no flags Details
rhevm-logs-NEW_1-5-2013 (72.50 KB, application/x-gzip)
2013-05-01 14:08 UTC, David Botzer
no flags Details
VDSM_Logs (748.30 KB, application/x-gzip)
2013-05-01 14:09 UTC, David Botzer
no flags Details


Links
System ID Private Priority Status Summary Last Updated
oVirt gerrit 12177 0 None None None Never

Description Alex Lourie 2012-11-18 14:10:45 UTC
Currently, during engine-upgrade there's a loop that waits for 3 minutes for System tasks to finish. Tasks could finish in 4 minutes, making a customer waiting for extra time.

The utility should sample the current situation and continue the upgrade as soon as tasks are cleared.

Comment 1 Sandro Bonazzola 2013-02-27 08:56:16 UTC
patch 12177 merged upstream master: http://gerrit.ovirt.org/gitweb?p=ovirt-engine.git;a=commit;h=88142da3d12d2af79434582e47e10458716636fe

Comment 2 David Botzer 2013-03-10 07:59:27 UTC
How to Test This BZ,
Please advise What steps should I take to verify this BZ

Should I test upgrade to SF9, or from SF9 ?

Comment 3 David Botzer 2013-03-10 11:10:45 UTC
Should I upgrade to SF9 or from SF9 ?

Alex:
You need to start an upgrade with async tasks in the DB. After the upgrade started, you need to clear async tasks within less than 3 minutes; you should see that upgrade continues automatically right away.

Comment 6 David Botzer 2013-03-12 13:41:09 UTC
I started upgrade while asyn was running
in less than 30sec upgrade failed and only postgres service was up,

Is it correct ??
-----------------

rhevm-upgrade 

Checking for updates... (This may take several minutes)...[ DONE ]
10 Updates available:
 * rhevm-3.2.0-10.14.beta1.el6ev.noarch
 * rhevm-backend-3.2.0-10.14.beta1.el6ev.noarch
 * rhevm-config-3.2.0-10.14.beta1.el6ev.noarch
 * rhevm-dbscripts-3.2.0-10.14.beta1.el6ev.noarch
 * rhevm-genericapi-3.2.0-10.14.beta1.el6ev.noarch
 * rhevm-notification-service-3.2.0-10.14.beta1.el6ev.noarch
 * rhevm-restapi-3.2.0-10.14.beta1.el6ev.noarch
 * rhevm-tools-common-3.2.0-10.14.beta1.el6ev.noarch
 * rhevm-userportal-3.2.0-10.14.beta1.el6ev.noarch
 * rhevm-webadmin-portal-3.2.0-10.14.beta1.el6ev.noarch
Stopping ovirt-engine service...                     [ DONE ]
Stopping DB related services...                      [ DONE ]
Cleaning async tasks...                              [ DONE ]
Pre-upgrade validations...                           [ DONE ]
Backing Up Database...                               [ DONE ]
Rename Database...                                   [ ERROR ]
Error: Database rename failed. Check that there are no active connections to the DB and try again.
Error: Upgrade failed.
please check log at /var/log/ovirt-engine/ovirt-engine-upgrade_2013_03_12_15_38_28.log
[root@dbotzer-upgrade yum.repos.d]# Srvs 
postmaster (pid  16090) is running...
The engine is not running.
/etc/init.d/ovirt-engine-dwhd is stopped

Comment 7 David Botzer 2013-03-12 13:46:44 UTC
Second time I ran upgrade it fails saying connections to db exists,

Only after I did manually restart to postgresql & ovirt-engine-dwhd services
I could run upgrade,
======================================================

rhevm-upgrade 

Checking for updates... (This may take several minutes)...[ DONE ]
10 Updates available:
 * rhevm-3.2.0-10.14.beta1.el6ev.noarch
 * rhevm-backend-3.2.0-10.14.beta1.el6ev.noarch
 * rhevm-config-3.2.0-10.14.beta1.el6ev.noarch
 * rhevm-dbscripts-3.2.0-10.14.beta1.el6ev.noarch
 * rhevm-genericapi-3.2.0-10.14.beta1.el6ev.noarch
 * rhevm-notification-service-3.2.0-10.14.beta1.el6ev.noarch
 * rhevm-restapi-3.2.0-10.14.beta1.el6ev.noarch
 * rhevm-tools-common-3.2.0-10.14.beta1.el6ev.noarch
 * rhevm-userportal-3.2.0-10.14.beta1.el6ev.noarch
 * rhevm-webadmin-portal-3.2.0-10.14.beta1.el6ev.noarch
Stopping ovirt-engine service...                     [ DONE ]
Stopping DB related services...                      [ DONE ]
Cleaning async tasks...                              [ DONE ]
Pre-upgrade validations...                           [ DONE ]
Backing Up Database...                               [ DONE ]
Rename Database...                                   [ ERROR ]
Error: Database rename failed. Check that there are no active connections to the DB and try again.
Error: Upgrade failed.
please check log at /var/log/ovirt-engine/ovirt-engine-upgrade_2013_03_12_15_41_14.log

Comment 8 David Botzer 2013-03-12 13:47:21 UTC
Created attachment 708979 [details]
first-upgrade-fail

Comment 9 David Botzer 2013-03-12 13:47:50 UTC
Created attachment 708980 [details]
Second-upgrade-fail-db-connections

Comment 10 David Botzer 2013-03-12 13:48:29 UTC
Created attachment 708982 [details]
third-upgrade-fail

Comment 11 David Botzer 2013-03-12 13:49:20 UTC
According to the above results,
Is it correct behaviour for upgrade to run ?

Comment 13 David Botzer 2013-03-12 14:47:11 UTC
Eli:
Can u please advise, 
why something was connected to the db while dwhd is stopped and nothing should be still connected to it

Comment 15 David Botzer 2013-03-13 08:18:37 UTC
When I upgrade while I have dwh running, should upgrade-process clear all db connections
in order for rhevm-upgrade to succeed ?

I think its not quite ok, to close Bug 877749, since it identified Async tasks and stopped,
But the error was not on async tasks
it was - as see in the first notes in the bz

Error: Database rename failed. Check that there are no active connections to the DB and try again.
Error: Upgrade failed.
-----------------
So a user cannot understand its because async tasks were running

Comment 16 Eli Mesika 2013-03-13 08:50:30 UTC
(In reply to comment #13)
> Eli:
> Can u please advise, 
> why something was connected to the db while dwhd is stopped and nothing
> should be still connected to it

The fact is that you had using the database with other tool or instance 
The easiest way to prevent that is

1) restart the postgresql server
or
2) use ps -ef to find who is using the database

Comment 17 David Botzer 2013-03-13 09:16:05 UTC
It seems obvious that dwh is using the DB,
I had this discussion with Alex, whether rhevm-upgrade should do what u suggested
and integration decided that user should do it manually before upgrade

but what happens where both dwh and async tasks are running....

Comment 18 David Botzer 2013-03-13 10:01:41 UTC
Fixed, 3.2/sf10
rhevm-upgrade stops right away on the case of running async tasks
!!-- there is a place for a correct message why upgrade was stopped--!
Because it shows in one of the stages -->
Cleaning async tasks...                              [ DONE ]
When clearing those tasks upgrade continues
Fixed, 3.2/sf10

Please answer this last note so I can close this BZ

Comment 19 Sandro Bonazzola 2013-03-13 15:41:03 UTC
if you need to know how to see why it's stopped to "Cleaning async tasks" step,
while the script is waiting it writes in the log a debug line "Still waiting for system tasks to be cleared."

Comment 20 David Botzer 2013-03-14 11:17:56 UTC
I installed a Clean rhevm 3.2/SF9 without Reports and without DWH.

I imported Template from NFS EXP Domain, and could see it in psql
I started rhevm-upgrade (SF10) and the Async Task information was deleted from psql,

See pastebin
http://pastebin.test.redhat.com/132514

To examine if rhevm-upgrade failure had "leftovers"
I had started engine service, to check the template but its in Lock !!!

./unlock_entity.sh -t disk -q -s localhost -p 5432 -d engine -u postgres
/usr/share/ovirt-engine/dbscripts /usr/share/ovirt-engine/dbscripts
              entity_id               |               disk_id                
--------------------------------------+--------------------------------------
 3f8d3031-8423-4b4c-8320-63b491328981 | cc0be0f7-649c-4c19-bcb6-5a1fb7e739c3
(1 row)

See Log: ovirt-engine-upgrade_2013_03_14_13_08_25

Comment 21 David Botzer 2013-03-14 11:18:35 UTC
Created attachment 709979 [details]
No-Reports-Upgrade-Lock-Template

Comment 22 David Botzer 2013-04-28 09:08:20 UTC
Hi,

Where does this bug stands ?
What about the last note ?

Comment 24 David Botzer 2013-05-01 14:08:15 UTC
3.2/sf13 -> 3.2/sf14

1.
I upgraded the rhevm with async task running - 
Creating a Template from VM - After upgrade the VM is "Image locked" & the template I started to create is "locked" as well,
Is this a correct behaviour ?

./unlock_entity.sh -t disk -q -s localhost -p 5432 -d engine -u postgres
/usr/share/ovirt-engine/dbscripts /usr/share/ovirt-engine/dbscripts
              entity_id               |               disk_id                
--------------------------------------+--------------------------------------
 485da855-3faf-4bdf-a35f-0efcb329759a | 830d3180-4785-4ec2-beb6-bae47920b1f3
 d9b089a7-6066-434f-8754-3b0ecabc7d81 | e607fb2d-ad82-436c-8970-0046d2c22de0
(2 rows)

---------------------------
2.
I did upgrade from SF13 to SF14 (No reports, only rhevm)
See below -> Is this the message I should get when upgrading, while running async task ??
Should there be a System Task name ?! or is it the name itself ? 
------------------------------------------------------------
Would you like to proceed? (yes|no): y
Stopping ovirt-engine service...                     [ DONE ]
Stopping DB related services...                      [ DONE ]
Cleaning async tasks...                              [ DONE ]

Info: The following tasks have been found running in the system:

System Tasks:




[ May 01 16:52:40 ] Would you like to proceed and try to stop tasks automatically?
(Answering 'no' will stop the upgrade)? (yes|no): y

[ May 01 16:53:13 ] System will try to clear tasks during the next 3 minutes.
-----------------------------------------------------------
Attached logs

Comment 25 David Botzer 2013-05-01 14:08:47 UTC
Created attachment 742219 [details]
rhevm-logs-NEW_1-5-2013

Comment 26 David Botzer 2013-05-01 14:09:09 UTC
Created attachment 742220 [details]
VDSM_Logs

Comment 27 David Botzer 2013-05-02 10:59:06 UTC
Fixed, 3.2/sf14
Tasks are examined correctly,
the message during upgrade, and stuck template will have new BZs
Fixed, 3.2/sf14

Comment 28 Itamar Heim 2013-06-11 09:37:11 UTC
3.2 has been released

Comment 29 Itamar Heim 2013-06-11 09:37:15 UTC
3.2 has been released

Comment 30 Itamar Heim 2013-06-11 09:51:39 UTC
3.2 has been released


Note You need to log in before you can comment on or make changes to this bug.