Description of problem:
Originaly discovered as https://bugzilla.redhat.com/show_bug.cgi?id=1110662#c3
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1. Start a template creation on 3.3
2. Stop the engine
3. Upgrade to 3.4
Template creation stuck, disk image locked
Ravi - shouldn't the setup job give a warning when doing the setup?
Petr - do you have the upgrade log?
Upgrading while async tasks are running should show a warning and wait for the tasks to complete
Unfortunately I don't have the upgrade log since it was tested on snapshotted machine and I had to undo the preview for testing of another bug.
Looks like async table check did not find the running task and didn't ask about what to do.
Can you reproduce the issue?
Not sure if this is an infra or an integration issue without the logs.
Temporary re-assign to Simone who assisted with the latest async tasks issues we had.
I've tried only once so far.
Can you reproduce this upgrading to 3.5? The whole check has been rewritten in 3.5.0 so it may be something that hit only 3.4.z
Simone, can you try to reproduce on 3.4.z?
ok, reproduced as described upgrading from RHEVM 3.3 to RHEVM 3.4.
It doesn't seams to happen instead between oVirt 3.4 to oVirt 3.5 RC.
Moving to QA as per comment #8.
Verified with upgrade from 3.4.4 to 3.5.0-0.23.beta.el6ev
Sorry for reporting as verified, the issue happens with upgrade to 3.5 as well.
Created attachment 965077 [details]
engine log from upgrade to 3.5
Dropping TestOnly as per comment #11.
What is the content of the async_tasks table when the engine was shutdown?
Created attachment 965818 [details]
No idea what's in the table during the upgrade process itself, this state was collected after the upgrade. I can provide you with access to the reproducing setup, if it would help you.
I see that there are tasks in the db
engine=> select task_id, status, result from async_tasks
task_id | status | result
6e3c4160-d23a-473e-8d4b-1038b4ddc036 | 2 | 0
But when the upgrade process is started the task is cleaned up as a zombie task. The issue seems to be in the upgrade process
[ INFO ] Stage: Setup validation
[WARNING] Less than 16384MB of memory is available
[ INFO ] Cleaning stale zombie tasks
3.5.1 is already full with bugs (over 80), and since none of these bugs were added as urgent for 3.5.1 release in the tracker bug, moving to 3.5.2
This was probably fixed as part of BZ #1196136. It needs to be tested.
We don't have downstream build of rhevm 3.6.0
waiting for completion of making a template takes infinity time. On screen there are printed a messages over and over again:
Waiting for the completion of 1 running tasks during the next 20 seconds.
Press Ctrl+C to interrupt.
After I interrupted the waiting and started ovirt-engine service, I was not able to get into webadmin to check the tasks.
1. Install ovirt 18.104.22.168-1.el6 from ovirt.org
2. Start "make a template" of a VM
3. stop the ovirt-engine service
4. upgrade to 3.6.0 (3.6.0-0.0.master.20150419172215.gitb6adbca.el6.noarch)
attached logs from engine
Created attachment 1016247 [details]
engine-logs 3.5.1 >> 3.6.0
waiting for a long time is expected: that task is basically stuck but it still cannot be detected as a zombie cause the default time after a task should by identified as zombie is really long.
The second effect instead is an issue. While it's waiting for running task to complete the engine is in maintenance mode and so you cannot access it.
When you press Ctrl-C it stops waiting and so it should recover the engine from maintenance mode.
The issue is that it seams that Ctrl-C is enough to suddenly kill the setup without letting it perform recovery actions.
Checking it, thanks.
(In reply to Simone Tiraboschi from comment #24)
> Thanks Peter,
> waiting for a long time is expected: that task is basically stuck but it
> still cannot be detected as a zombie cause the default time after a task
> should by identified as zombie is really long.
> The second effect instead is an issue. While it's waiting for running task
> to complete the engine is in maintenance mode and so you cannot access it.
> When you press Ctrl-C it stops waiting and so it should recover the engine
> from maintenance mode.
> The issue is that it seams that Ctrl-C is enough to suddenly kill the setup
> without letting it perform recovery actions.
> Checking it, thanks.
Yes, see bug 1130764.
As already told Simone in private - until we solve that bug, we should not notify users to press ^C but think of some other means to stop setup - prompt every N seconds asking, or whatever.
Ok, so we have two issue. I originally mentioned the infinity waiting for completion.
The task "make of template" shouldn't be a zombie. Engine has a working host and if I understand it correctly, it should complete this task and after that continue with setup. The problem in my case the waiting. In engine, before upgrade making of template takes max 15 minutes but when I started "make a template" stopped engine service and start upgrade the engine, it still waiting on a completion.
Based on a very very long waiting (approx. a few hours) I interrupted the waiting and I wanted to check the status. service engine was stopped so I started it. But the webadmin didn't work. This is maybe the issue with Ctrl-C during the setup.
Can you please open a new bug with the control c issue while waiting on tasks and move this one to VERIFIED?
Yes, now it's working correctly. Service of engine start up for completion of task.
Verified in rhevm-22.214.171.124-0.1.el6.noarch