Bug 846324 - RHEVM - Backend: Template stuck in Locked status after restart of VDSM during template import
Summary: RHEVM - Backend: Template stuck in Locked status after restart of VDSM during...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-engine
Version: unspecified
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: ---
Assignee: Eli Mesika
QA Contact: vvyazmin@redhat.com
URL:
Whiteboard: infra
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2012-08-07 13:16 UTC by Daniel Paikov
Modified: 2016-02-10 19:21 UTC (History)
11 users (show)

Fixed In Version: si19
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2012-12-04 20:06:59 UTC
oVirt Team: Infra
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
engine.log (274.46 KB, application/x-gzip)
2012-08-07 13:16 UTC, Daniel Paikov
no flags Details
vdsm.log (64.53 KB, application/x-gzip)
2012-08-07 13:17 UTC, Daniel Paikov
no flags Details
engine.log (436.37 KB, application/x-compressed-tar)
2012-08-08 13:53 UTC, Daniel Paikov
no flags Details
vdsm.log (230.44 KB, application/x-xz)
2012-08-08 13:53 UTC, Daniel Paikov
no flags Details

Description Daniel Paikov 2012-08-07 13:16:18 UTC
Created attachment 602747 [details]
engine.log

* Import a template from an export domain to a data domain.
* Restart VDSMD.
* After VDSM comes back and becomes SPM, the template (and its disk) remain stuck in Locked status.

Comment 1 Daniel Paikov 2012-08-07 13:17:09 UTC
Created attachment 602748 [details]
vdsm.log

Comment 2 Daniel Paikov 2012-08-07 13:29:45 UTC
As far as I can see, we are stuck in an infinite loop of trying to import the template and failing. I'm seeing an event of "Failed to complete copy of Template" every 10 seconds.

Comment 4 Ayal Baron 2012-08-07 20:14:00 UTC
vdsm log does not contain the restart part or anything that happened before (after?) nor any copyImage/moveImage command.

engine log doesn't seem to contain anything relevant either?
no excerpt here to point at issue seen?

Comment 5 Daniel Paikov 2012-08-08 08:20:01 UTC
(In reply to comment #4)
> vdsm log does not contain the restart part or anything that happened before
> (after?) nor any copyImage/moveImage command.
> 
> engine log doesn't seem to contain anything relevant either?
> no excerpt here to point at issue seen?

I collected the logs a few minutes after it happened. If you don't have enough info in the logs, please have someone contact me to see the hosts themselves.

Even now, a day later, I still have tasks on the VDSM host:

[root@orange-vdsc ~]# vdsClient -s 0 getAllTasksInfo
df6971bb-eb29-4183-b0d3-7116cfd2352e :
        verb = copyImage
        id = df6971bb-eb29-4183-b0d3-7116cfd2352e
f71d3b61-b51e-4bde-bddc-d6e1f7812aa1 :
        verb = deleteImage
        id = f71d3b61-b51e-4bde-bddc-d6e1f7812aa1

Comment 6 Ayal Baron 2012-08-08 12:31:47 UTC
In your reproduction steps you wrote you restarted vdsmd.  That means that log must contain "I am vdsm..." etc (the startup flow).  As far as I can tell, it does not.
Could be that log rotated, but looks like this is the wrong log.

Comment 7 Daniel Paikov 2012-08-08 13:53:02 UTC
Created attachment 603045 [details]
engine.log

Attaching more logs, presumably these are the right ones.

Comment 8 Daniel Paikov 2012-08-08 13:53:44 UTC
Created attachment 603046 [details]
vdsm.log

Comment 9 Ayal Baron 2012-09-16 06:26:40 UTC
After looking at the vdsm.log, there are 2 problems here:

1. after spmStart engine is sending stopTask to a task which is in aborting state (i.e. it's cleaning up) causing the task to stop the cleanup and leave garbage behind.

2. after task stops, engine keeps on calling getAllTasksStatuses which clearly states the task is finished but doesn't do anything about it.

Thread-56::INFO::2012-08-07 16:03:00,833::logUtils::37::dispatcher::(wrapper) Run and protect: getAllTasksStatuses(spUUID=None, options=None)
...
Thread-56::INFO::2012-08-07 16:03:00,835::logUtils::39::dispatcher::(wrapper) Run and protect: getAllTasksStatuses, Return response: {'allTasksStatus': {'df6971bb-eb29-4183-b0d3-7116cfd2352e': {'code': 0, 'message': 'Task is initializing', 'taskState': 'finished', 'taskResult': 'cleanFailure', 'taskID': 'df6971bb-eb29-4183-b0d3-7116cfd2352e'}}}

This is an infra issue.

Comment 10 Eli Mesika 2012-09-19 13:20:34 UTC
http://gerrit.ovirt.org/#/c/8057/

Comment 11 Eli Mesika 2012-09-20 08:13:53 UTC
fixed in commit: f4e4850

Comment 14 vvyazmin@redhat.com 2012-10-07 08:51:16 UTC
Verified on RHEVM 3.1 - SI19

RHEVM: rhevm-3.1.0-18.el6ev.noarch
VDSM: vdsm-4.9.6-36.0.el6_3.x86_64
LIBVIRT: libvirt-0.9.10-21.el6_3.4.x86_64
QEMU & KVM: qemu-kvm-rhev-0.12.1.2-2.295.el6_3.2.x86_64
SANLOCK: sanlock-2.3-4.el6_3.x86_64


Note You need to log in before you can comment on or make changes to this bug.