Bug 846324 - RHEVM - Backend: Template stuck in Locked status after restart of VDSM during template import
RHEVM - Backend: Template stuck in Locked status after restart of VDSM during...
Status: CLOSED CURRENTRELEASE
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-engine (Show other bugs)
unspecified
Unspecified Unspecified
high Severity high
: ---
: ---
Assigned To: Eli Mesika
vvyazmin@redhat.com
infra
: Regression
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2012-08-07 09:16 EDT by Daniel Paikov
Modified: 2016-02-10 14:21 EST (History)
11 users (show)

See Also:
Fixed In Version: si19
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2012-12-04 15:06:59 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: Infra
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
engine.log (274.46 KB, application/x-gzip)
2012-08-07 09:16 EDT, Daniel Paikov
no flags Details
vdsm.log (64.53 KB, application/x-gzip)
2012-08-07 09:17 EDT, Daniel Paikov
no flags Details
engine.log (436.37 KB, application/x-compressed-tar)
2012-08-08 09:53 EDT, Daniel Paikov
no flags Details
vdsm.log (230.44 KB, application/x-xz)
2012-08-08 09:53 EDT, Daniel Paikov
no flags Details

  None (edit)
Description Daniel Paikov 2012-08-07 09:16:18 EDT
Created attachment 602747 [details]
engine.log

* Import a template from an export domain to a data domain.
* Restart VDSMD.
* After VDSM comes back and becomes SPM, the template (and its disk) remain stuck in Locked status.
Comment 1 Daniel Paikov 2012-08-07 09:17:09 EDT
Created attachment 602748 [details]
vdsm.log
Comment 2 Daniel Paikov 2012-08-07 09:29:45 EDT
As far as I can see, we are stuck in an infinite loop of trying to import the template and failing. I'm seeing an event of "Failed to complete copy of Template" every 10 seconds.
Comment 4 Ayal Baron 2012-08-07 16:14:00 EDT
vdsm log does not contain the restart part or anything that happened before (after?) nor any copyImage/moveImage command.

engine log doesn't seem to contain anything relevant either?
no excerpt here to point at issue seen?
Comment 5 Daniel Paikov 2012-08-08 04:20:01 EDT
(In reply to comment #4)
> vdsm log does not contain the restart part or anything that happened before
> (after?) nor any copyImage/moveImage command.
> 
> engine log doesn't seem to contain anything relevant either?
> no excerpt here to point at issue seen?

I collected the logs a few minutes after it happened. If you don't have enough info in the logs, please have someone contact me to see the hosts themselves.

Even now, a day later, I still have tasks on the VDSM host:

[root@orange-vdsc ~]# vdsClient -s 0 getAllTasksInfo
df6971bb-eb29-4183-b0d3-7116cfd2352e :
        verb = copyImage
        id = df6971bb-eb29-4183-b0d3-7116cfd2352e
f71d3b61-b51e-4bde-bddc-d6e1f7812aa1 :
        verb = deleteImage
        id = f71d3b61-b51e-4bde-bddc-d6e1f7812aa1
Comment 6 Ayal Baron 2012-08-08 08:31:47 EDT
In your reproduction steps you wrote you restarted vdsmd.  That means that log must contain "I am vdsm..." etc (the startup flow).  As far as I can tell, it does not.
Could be that log rotated, but looks like this is the wrong log.
Comment 7 Daniel Paikov 2012-08-08 09:53:02 EDT
Created attachment 603045 [details]
engine.log

Attaching more logs, presumably these are the right ones.
Comment 8 Daniel Paikov 2012-08-08 09:53:44 EDT
Created attachment 603046 [details]
vdsm.log
Comment 9 Ayal Baron 2012-09-16 02:26:40 EDT
After looking at the vdsm.log, there are 2 problems here:

1. after spmStart engine is sending stopTask to a task which is in aborting state (i.e. it's cleaning up) causing the task to stop the cleanup and leave garbage behind.

2. after task stops, engine keeps on calling getAllTasksStatuses which clearly states the task is finished but doesn't do anything about it.

Thread-56::INFO::2012-08-07 16:03:00,833::logUtils::37::dispatcher::(wrapper) Run and protect: getAllTasksStatuses(spUUID=None, options=None)
...
Thread-56::INFO::2012-08-07 16:03:00,835::logUtils::39::dispatcher::(wrapper) Run and protect: getAllTasksStatuses, Return response: {'allTasksStatus': {'df6971bb-eb29-4183-b0d3-7116cfd2352e': {'code': 0, 'message': 'Task is initializing', 'taskState': 'finished', 'taskResult': 'cleanFailure', 'taskID': 'df6971bb-eb29-4183-b0d3-7116cfd2352e'}}}

This is an infra issue.
Comment 10 Eli Mesika 2012-09-19 09:20:34 EDT
http://gerrit.ovirt.org/#/c/8057/
Comment 11 Eli Mesika 2012-09-20 04:13:53 EDT
fixed in commit: f4e4850
Comment 14 vvyazmin@redhat.com 2012-10-07 04:51:16 EDT
Verified on RHEVM 3.1 - SI19

RHEVM: rhevm-3.1.0-18.el6ev.noarch
VDSM: vdsm-4.9.6-36.0.el6_3.x86_64
LIBVIRT: libvirt-0.9.10-21.el6_3.4.x86_64
QEMU & KVM: qemu-kvm-rhev-0.12.1.2-2.295.el6_3.2.x86_64
SANLOCK: sanlock-2.3-4.el6_3.x86_64

Note You need to log in before you can comment on or make changes to this bug.