Bug 1119639

Summary: Template creation stuck after upgrade
Product: Red Hat Enterprise Virtualization Manager Reporter: Petr Beňas <pbenas>
Component: ovirt-engineAssignee: Simone Tiraboschi <stirabos>
Status: CLOSED CURRENTRELEASE QA Contact: Petr Kubica <pkubica>
Severity: high Docs Contact:
Priority: unspecified    
Version: 3.4.0CC: amureini, bazulay, dfediuck, didi, gklein, lsurette, lveyde, michal.skrivanek, oourfali, pkubica, pstehlik, rbalakri, Rhev-m-bugs, rnori, sbonazzo, stirabos, yeylon, ykaul, ylavi
Target Milestone: ovirt-3.6.1Keywords: ZStream
Target Release: 3.6.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Cause: engine-setup needs to wait for async task to complete before starting the upgrade procedure. To avoid getting stuck indefinitely, it needs before to clean zombie tasks and commands. Due to a bug it was always cleaning all the pending tasks and so template creation get stuck. Consequence: Template creation get stuck while upgrading Fix: Clean only zombie tasks and commands and wait for the other to complete Result: it works
Story Points: ---
Clone Of:
: 1197616 (view as bug list) Environment:
Last Closed: 2016-03-11 07:32:39 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Integration RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
engine log from upgrade to 3.5
none
engine logs
none
engine-logs 3.5.1 >> 3.6.0 none

Description Petr Beňas 2014-07-15 08:02:45 UTC
Description of problem:
Originaly discovered as https://bugzilla.redhat.com/show_bug.cgi?id=1110662#c3

Version-Release number of selected component (if applicable):
rhevm-setup-3.4.1-0.25.el6ev

How reproducible:


Steps to Reproduce:
1. Start a template creation on 3.3
2. Stop the engine
3. Upgrade to 3.4

Actual results:
Template creation stuck, disk image locked

Expected results:


Additional info:

Comment 1 Oved Ourfali 2014-07-16 06:52:09 UTC
Ravi - shouldn't the setup job give a warning when doing the setup?
Petr - do you have the upgrade log?

Comment 2 Ravi Nori 2014-07-17 18:12:10 UTC
Upgrading while async tasks are running should show a warning and wait for the tasks to complete

Comment 3 Petr Beňas 2014-07-21 14:03:51 UTC
Unfortunately I don't have the upgrade log since it was tested on snapshotted machine and I had to undo the preview for testing of another bug.

Comment 4 Sandro Bonazzola 2014-07-21 14:35:52 UTC
Looks like async table check did not find the running task and didn't ask about what to do.
Can you reproduce the issue?

Not sure if this is an infra or an integration issue without the logs.
Temporary re-assign to Simone who assisted with the latest async tasks issues we had.

Comment 5 Petr Beňas 2014-07-21 15:10:16 UTC
I've tried only once so far.

Comment 6 Sandro Bonazzola 2014-09-02 12:48:25 UTC
Can you reproduce this upgrading to 3.5? The whole check has been rewritten in 3.5.0 so it may be something that hit only 3.4.z

Comment 7 Sandro Bonazzola 2014-09-02 12:49:09 UTC
Simone, can you try to reproduce on 3.4.z?

Comment 8 Simone Tiraboschi 2014-09-02 16:46:06 UTC
ok, reproduced as described upgrading from RHEVM 3.3 to RHEVM 3.4.
It doesn't seams to happen instead between oVirt 3.4 to oVirt 3.5 RC.

Comment 9 Sandro Bonazzola 2014-11-20 13:28:15 UTC
Moving to QA as per comment #8.

Comment 10 Petr Beňas 2014-12-05 10:02:57 UTC
Verified with upgrade from 3.4.4 to 3.5.0-0.23.beta.el6ev

Comment 11 Petr Beňas 2014-12-05 11:07:27 UTC
Sorry for reporting as verified, the issue happens with upgrade to 3.5 as well.

Comment 12 Petr Beňas 2014-12-05 11:10:51 UTC
Created attachment 965077 [details]
engine log from upgrade to 3.5

Comment 13 Sandro Bonazzola 2014-12-05 14:15:46 UTC
Dropping TestOnly as per comment #11.

Comment 14 Ravi Nori 2014-12-05 14:55:02 UTC
What is the content of the async_tasks table when the engine was shutdown?

Comment 15 Petr Beňas 2014-12-08 12:11:34 UTC
Created attachment 965818 [details]
engine logs

No idea what's in the table during the upgrade process itself, this state was collected after the upgrade. I can provide you with access to the reproducing setup, if it would help you.

Comment 16 Ravi Nori 2014-12-08 15:46:44 UTC
I see that there are tasks in the db
engine=>  select task_id, status, result from async_tasks
               task_id                | status | result 
--------------------------------------+--------+--------
 6e3c4160-d23a-473e-8d4b-1038b4ddc036 |      2 |      0
(1 row)


But when the upgrade process is started the task is cleaned up as a zombie task. The issue seems to be in the upgrade process

[ INFO  ] Stage: Setup validation
[WARNING] Less than 16384MB of memory is available
[ INFO  ] Cleaning stale zombie tasks

Comment 18 Eyal Edri 2015-02-25 08:43:28 UTC
3.5.1 is already full with bugs (over 80), and since none of these bugs were added as urgent for 3.5.1 release in the tracker bug, moving to 3.5.2

Comment 20 Yaniv Lavi 2015-03-02 08:56:59 UTC
This was probably fixed as part of BZ #1196136. It needs to be tested.

Comment 21 Petr Kubica 2015-04-02 10:51:29 UTC
We don't have downstream build of rhevm 3.6.0

Comment 22 Petr Kubica 2015-04-20 07:33:30 UTC
Hi Simone,
waiting for completion of making a template takes infinity time. On screen there are printed a messages over and over again:

Waiting for the completion of 1 running tasks during the next 20 seconds.
Press Ctrl+C to interrupt.

After I interrupted the waiting and started ovirt-engine service, I was not able to get into webadmin to check the tasks. 

steps:
1. Install ovirt 3.5.1.1-1.el6 from ovirt.org
2. Start "make a template" of a VM
3. stop the ovirt-engine service
4. upgrade to 3.6.0 (3.6.0-0.0.master.20150419172215.gitb6adbca.el6.noarch)

attached logs from engine

Comment 23 Petr Kubica 2015-04-20 07:35:05 UTC
Created attachment 1016247 [details]
engine-logs 3.5.1 >> 3.6.0

Comment 24 Simone Tiraboschi 2015-04-21 12:22:51 UTC
Thanks Peter,
waiting for a long time is expected: that task is basically stuck but it still cannot be detected as a zombie cause the default time after a task should by identified as zombie is really long.

The second effect instead is an issue. While it's waiting for running task to complete the engine is in maintenance mode and so you cannot access it.
When you press Ctrl-C it stops waiting and so it should recover the engine from maintenance mode.
The issue is that it seams that Ctrl-C is enough to suddenly kill the setup without letting it perform recovery actions.
Checking it, thanks.

Comment 25 Yedidyah Bar David 2015-04-21 12:29:39 UTC
(In reply to Simone Tiraboschi from comment #24)
> Thanks Peter,
> waiting for a long time is expected: that task is basically stuck but it
> still cannot be detected as a zombie cause the default time after a task
> should by identified as zombie is really long.
> 
> The second effect instead is an issue. While it's waiting for running task
> to complete the engine is in maintenance mode and so you cannot access it.
> When you press Ctrl-C it stops waiting and so it should recover the engine
> from maintenance mode.
> The issue is that it seams that Ctrl-C is enough to suddenly kill the setup
> without letting it perform recovery actions.
> Checking it, thanks.

Yes, see bug 1130764.

As already told Simone in private - until we solve that bug, we should not notify users to press ^C but think of some other means to stop setup - prompt every N seconds asking, or whatever.

Comment 26 Petr Kubica 2015-04-21 14:47:21 UTC
Ok, so we have two issue. I originally mentioned the infinity waiting for completion. 

The task "make of template" shouldn't be a zombie. Engine has a working host and if I understand it correctly, it should complete this task and after that continue with setup. The problem in my case the waiting. In engine, before upgrade making of template takes max 15 minutes but when I started "make a template" stopped engine service and start upgrade the engine, it still waiting on a completion. 

Based on a very very long waiting (approx. a few hours) I interrupted the waiting and I wanted to check the status. service engine was stopped so I started it. But the webadmin didn't work. This is maybe the issue with Ctrl-C during the setup.

Comment 27 Yaniv Lavi 2015-10-27 12:57:13 UTC
Can you please open a new bug with the control c issue while waiting on tasks and move this one to VERIFIED?

Comment 28 Petr Kubica 2015-11-20 10:59:04 UTC
Yes, now it's working correctly. Service of engine start up for completion of task.

Verified in rhevm-3.6.0.3-0.1.el6.noarch