Bug 1303986 - Creating template from VM with cinder disks never ends "Waiting on CloneCinderDisksCommandCallback child commands to complete"
Summary: Creating template from VM with cinder disks never ends "Waiting on CloneCinde...
Keywords:
Status: CLOSED WORKSFORME
Alias: None
Product: ovirt-engine
Classification: oVirt
Component: BLL.Storage
Version: 3.6.2.6
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ovirt-3.6.3
: 3.6.3
Assignee: Maor
QA Contact: Aharon Canan
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-02-02 15:51 UTC by Natalie Gavrielov
Modified: 2016-03-10 07:28 UTC (History)
11 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-02-08 11:27:05 UTC
oVirt Team: Storage
Embargoed:
amureini: ovirt-3.6.z?
ylavi: exception?
ylavi: planning_ack+
amureini: devel_ack+
ngavrilo: testing_ack?


Attachments (Terms of Use)
engine.log, cinder logs, vdsm logs (2.44 MB, application/x-gzip)
2016-02-02 15:51 UTC, Natalie Gavrielov
no flags Details

Description Natalie Gavrielov 2016-02-02 15:51:29 UTC
Created attachment 1120485 [details]
engine.log, cinder logs, vdsm logs

Description of problem:

Creating template from VM with cinder disks never ends "Waiting on CloneCinderDisksCommandCallback child commands to complete".


Version-Release number of selected component:
rhevm-3.6.2.6-0.1.el6.noarch

How reproducible:
100%.

Steps to Reproduce:

1.Create VM with cinder disks and OS.
2.Create a template using the VM.

Actual results:
Create template operation doesn't end.

Expected results:
Operation should create a template. 

Additional info:
Seems to be regression of issue 1252958.

Comment 1 Maor 2016-02-02 16:42:29 UTC
Thanks for the logs Natalie.
It seems that Cinder has not finished creating those volumes for some reason (see [1]), I'm trying to figure out what is the problem with Cinder though from the engine side it looks that the behavior is what is suppose to be doing.

I will update in the bug once I will have new information.


[1]
# cinder list

                ID                  |   Status  | 
1d6d64d5-0505-4540-9fc0-1bcaf0e91862 |  creating | 
dae433d6-2f92-4169-9795-96b6449aa62c |  creating

Comment 2 Maor 2016-02-03 13:08:14 UTC
Trying to figure out the problem with the Cinder server, but it from the logs there it is hard to determined exactly.

Natalie, can you please try this on another Cinder server.

Since the problem is the Cinder server and the engine basically behaves as expected reducing the severity to medium for now

Comment 3 Natalie Gavrielov 2016-02-08 10:29:31 UTC
I've performed the test again, with a new Cinder, and a new engine (3.6.3).
This time the test passed.
My guess is that something went wrong on Cinder side.
In such case I expect to have the ability to detach the current Cinder, and attach a new one.
I was unable to do so, because the CloneCinderDisksCommandCallback was running (no way of stopping it).
I was told there is a timeout, "CoCoLifeInMinutes", which is 50 hours (3000 minutes).. assuming it does work, that's a pretty long time (in case it doesn't work, well.. that is a bigger problem).

Comment 4 Maor 2016-02-08 11:17:25 UTC
(In reply to Natalie Gavrielov from comment #3)
> I've performed the test again, with a new Cinder, and a new engine (3.6.3).
> This time the test passed.
> My guess is that something went wrong on Cinder side.
> In such case I expect to have the ability to detach the current Cinder, and
> attach a new one.

I know that there was a discussion about letting the user cancel running tasks or jobs/steps, though I can't seem to find that in Bugzilla.
Moti/Oved, maybe you know if there is an open RFE about it?

> I was unable to do so, because the CloneCinderDisksCommandCallback was
> running (no way of stopping it).

You can log into Cinder server and change the status of the volume from "creating" to any other failed status.
That should make CoCo finish the task with failure.

> I was told there is a timeout, "CoCoLifeInMinutes", which is 50 hours (3000
> minutes).. assuming it does work, that's a pretty long time (in case it
> doesn't work, well.. that is a bigger problem).

You are referring to https://bugzilla.redhat.com/1261733,
Regarding the timeout, basically the user can configure it to be less, but maybe the default value can be changed to be less then 50 hours.
infra guys, what do you think?

Comment 5 Maor 2016-02-08 11:27:05 UTC
Natalie, 
can u please open a bug with all the logs from the Cinder server on Cinder openstack where the volumes are staying in "creating" state forever.

If there will be an issue 
Closing this bug for now as workforme, please feel free to re-open it as you see right

Comment 6 Natalie Gavrielov 2016-02-08 12:01:45 UTC
(In reply to Maor from comment #4)
> You can log into Cinder server and change the status of the volume from
> "creating" to any other failed status.
> That should make CoCo finish the task with failure.

Cinder became unresponsive, so this operation will be impossible.

(In reply to Maor from comment #5)
> Natalie, 
> can u please open a bug with all the logs from the Cinder server on Cinder
> openstack where the volumes are staying in "creating" state forever.
> 
> If there will be an issue 
> Closing this bug for now as workforme, please feel free to re-open it as you
> see right

You mean all the logs that are attached to this issue?

Comment 7 Moti Asayag 2016-02-08 12:23:40 UTC
(In reply to Maor from comment #4)
> (In reply to Natalie Gavrielov from comment #3)
> > I've performed the test again, with a new Cinder, and a new engine (3.6.3).
> > This time the test passed.
> > My guess is that something went wrong on Cinder side.
> > In such case I expect to have the ability to detach the current Cinder, and
> > attach a new one.
> 
> I know that there was a discussion about letting the user cancel running
> tasks or jobs/steps, though I can't seem to find that in Bugzilla.
> Moti/Oved, maybe you know if there is an open RFE about it?
> 

See Bug 879248

> > I was unable to do so, because the CloneCinderDisksCommandCallback was
> > running (no way of stopping it).
> 
> You can log into Cinder server and change the status of the volume from
> "creating" to any other failed status.
> That should make CoCo finish the task with failure.
> 
> > I was told there is a timeout, "CoCoLifeInMinutes", which is 50 hours (3000
> > minutes).. assuming it does work, that's a pretty long time (in case it
> > doesn't work, well.. that is a bigger problem).
> 
> You are referring to https://bugzilla.redhat.com/1261733,
> Regarding the timeout, basically the user can configure it to be less, but
> maybe the default value can be changed to be less then 50 hours.
> infra guys, what do you think?

The 50 hours were taken originally from the async task manager zombie tasks configuration.

The purpose of the command's timeout is to make sure command isn't run forever (to avoid disk/vm being locked), but commands are expected to end-up eventually, either way (success or fail). The timeout can be set to a lower number in case of expectation for shorten commands.

Comment 8 Maor 2016-02-08 14:09:29 UTC
(In reply to Natalie Gavrielov from comment #6)
> (In reply to Maor from comment #4)
> > You can log into Cinder server and change the status of the volume from
> > "creating" to any other failed status.
> > That should make CoCo finish the task with failure.
> 
> Cinder became unresponsive, so this operation will be impossible.


I meant login to Cinder directly (through the ssh) and change the volume status

> 
> (In reply to Maor from comment #5)
> > Natalie, 
> > can u please open a bug with all the logs from the Cinder server on Cinder
> > openstack where the volumes are staying in "creating" state forever.
> > 
> > If there will be an issue 
> > Closing this bug for now as workforme, please feel free to re-open it as you
> > see right
> 
> You mean all the logs that are attached to this issue?

Basically only the Cinder logs which are attached to this bug

Comment 9 Natalie Gavrielov 2016-02-08 14:26:26 UTC
(In reply to Maor from comment #8)
> (In reply to Natalie Gavrielov from comment #6)
> > (In reply to Maor from comment #4)
> > > You can log into Cinder server and change the status of the volume from
> > > "creating" to any other failed status.
> > > That should make CoCo finish the task with failure.
> > 
> > Cinder became unresponsive, so this operation will be impossible.
> 
> 
> I meant login to Cinder directly (through the ssh) and change the volume
> status

Me too.


Note You need to log in before you can comment on or make changes to this bug.