Bug 1130766 - [engine-backend] Moving SPM to maintenance fails with an unclear error (RHEL7 hosts)
Summary: [engine-backend] Moving SPM to maintenance fails with an unclear error (RHEL7...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: oVirt
Classification: Retired
Component: ovirt-engine-core
Version: 3.5
Hardware: x86_64
OS: Unspecified
high
high
Target Milestone: ---
: 3.5.0
Assignee: Liron Aravot
QA Contact: Elad
URL:
Whiteboard: storage
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-08-17 13:03 UTC by Elad
Modified: 2016-02-10 17:07 UTC (History)
8 users (show)

Fixed In Version: ovirt-3.5.0_rc2
Clone Of:
Environment:
Last Closed: 2014-10-17 12:39:47 UTC
oVirt Team: Storage
Embargoed:


Attachments (Terms of Use)
engine, vdsm, supervdsm and libvirt logs (1.99 MB, application/x-gzip)
2014-08-17 13:03 UTC, Elad
no flags Details


Links
System ID Private Priority Status Summary Last Updated
oVirt gerrit 31925 0 master MERGED core: add logging to SpmStopVDSCommand Never
oVirt gerrit 32189 0 ovirt-engine-3.5 MERGED core: add logging to SpmStopVDSCommand Never

Description Elad 2014-08-17 13:03:12 UTC
Created attachment 927479 [details]
engine, vdsm, supervdsm and libvirt logs

Description of problem:
I have 2 RHEL7 host attached and running in a 3.5 DC connected to storage pool. I tried to put the SPM to maintenance and filed with an unclear error message from engine.


Version-Release number of selected component (if applicable):
ovirt-3.5-RC1

engine:
ovirt-engine-3.5.0-0.0.master.20140804172041.git23b558e.el6.noarch

hosts:
RHEL7
vdsm-4.14.13-1.el7ev.x86_64
qemu-kvm-1.5.3-60.el7_0.5.x86_64
libvirt-daemon-1.1.1-29.el7_0.1.x86_64

How reproducible:
Always on my setup

Steps to Reproduce:
1. Have 2 hosts with RHEL7 installed, add them to a shared DC
2. Create a storage pool with several domains from different types in it
3. Put SPM in maintenance

Actual results:

SpmStop command is reported to be sent to vdsm

2014-08-17 15:45:18,278 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStopVDSCommand] (org.ovirt.thread.pool-8-thread-39) [8536bd4] START, SpmStopVDSCommand(HostName = green-vdsa, HostId = 43484b24-89b6-4acb-afdb-c685fe8e9bf0, storagePoolId = 6914dd84-1f1d-44ad-8323-7f68066e4a13), log id: 40951598 


Failed to put the SPM to maintenance. Engine reports that there is a failure to change the status of the host due to a failure in spmStop.

2014-08-17 15:45:18,299 WARN  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (org.ovirt.thread.pool-8-thread-39) [8536bd4] Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: Failed to change status of host 'green-vdsa' due to a failure to stop the spm.

In vdsm.log I didn't see the spmStop. I'm not sure why engine reported that vdsm failed to execute it.


Expected results:
1) If spmStop command was never executed by vdsm, it shouldn't be reported as FINISH by engine.
2) The message is unclear and we can't know what exactly happened

Additional info:
engine, vdsm, supervdsm and libvirt logs

Comment 1 Allon Mureinik 2014-08-18 20:29:29 UTC
Elad, is this specific to RHEL 7? Can you reproduce it on RHEL 6.5? Fedora?

Comment 2 Elad 2014-08-19 07:08:03 UTC
Tried to reproduce it with rhel6.5 on 3.5 setup and host was moved successfully to maintenance.

Comment 3 Liron Aravot 2014-08-25 15:25:56 UTC
The failure to stop the spm should is because of tasks on the vdsm side there were probably 'unknown' on the engine side (see below).
I added a patch that adds a log to indicate that the spm wasn't stopped because of that.


Thread-220::DEBUG::2014-08-17 15:45:21,162::task::1185::TaskManager.Task::(prepare) Task=`eb2743a4-e1a0-49dd-bf5f-395c63d32c71`::finished: {'allTasksStatus': 
{'b422f1c2-1ef3-4990-b6a7-909f40d74cdb': {'code': 0, 'message': '1 jobs completed successfully', 'taskState': 'finished', 'taskResult': 'success', 'taskID': '
b422f1c2-1ef3-4990-b6a7-909f40d74cdb'}, '5b5cb243-4956-47d3-be8e-be1370af5f33': {'code': 0, 'message': '1 jobs completed successfully', 'taskState': 'finished
', 'taskResult': 'success', 'taskID': '5b5cb243-4956-47d3-be8e-be1370af5f33'}, 'a5c712bb-93b2-4ae5-a08c-dbedd5eae759': {'code': 0, 'message': '1 jobs complete
d successfully', 'taskState': 'finished', 'taskResult': 'success', 'taskID': 'a5c712bb-93b2-4ae5-a08c-dbedd5eae759'}, '8e27fd49-128b-46a9-9a16-ad03bb4713ff': 
{'code': 0, 'message': '1 jobs completed successfully', 'taskState': 'finished', 'taskResult': 'success', 'taskID': '8e27fd49-128b-46a9-9a16-ad03bb4713ff'}, '
6b2bac66-b147-4e04-a297-5b2c720180eb': {'code': 0, 'message': '1 jobs completed successfully', 'taskState': 'finished', 'taskResult': 'success', 'taskID': '6b
2bac66-b147-4e04-a297-5b2c720180eb'}, '986e4119-31bb-42e4-9197-400e624bf33c': {'code': 0, 'message': '1 jobs completed successfully', 'taskState': 'finished',
 'taskResult': 'success', 'taskID': '986e4119-31bb-42e4-9197-400e624bf33c'}, 'b88247c4-abc6-4d30-81fd-7d83ff8cec17': {'code': 0, 'message': '1 jobs completed 
successfully', 'taskState': 'finished', 'taskResult': 'success', 'taskID': 'b88247c4-abc6-4d30-81fd-7d83ff8cec17'}, '6604ea5e-4c34-4288-90e7-a9d00721ac85': {'
code': 0, 'message': '1 jobs completed successfully', 'taskState': 'finished', 'taskResult': 'success', 'taskID': '6604ea5e-4c34-4288-90e7-a9d00721ac85'}}}
Thread-220::DEBUG::2014-08-17 15:45:21,162::task::595::TaskManager.Task::(_updateState) Task=`eb2743a4-e1a0-49dd-bf5f-395c63d32c71`::moving from state prepari
ng -> state finished

Comment 4 Elad 2014-09-14 13:03:41 UTC
Tested the following:
- Got a situation in which I have unknown tasks running on SPM (uploadImageToStream tasks that are not being polled by engine due to https://bugzilla.redhat.com/show_bug.cgi?id=1136840)
- Tried to put the host to maintenance

spmStop is not sent to vdsm. 

2014-09-14 11:19:48,121 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStopVDSCommand] (org.ovirt.thread.pool-6-thread-12) [5c0b4281] SpmStopVDSCommand::Not stopping SPM on vds green-vdsb, pool id b7cf7a2a-4fde-4732-91f8-a9808ff85b93 as there are uncleared tasks


Verified using rhev3.5 vt3.1

Comment 5 Sandro Bonazzola 2014-10-17 12:39:47 UTC
oVirt 3.5 has been released and should include the fix for this issue.


Note You need to log in before you can comment on or make changes to this bug.