Bug 1130766

Summary: [engine-backend] Moving SPM to maintenance fails with an unclear error (RHEL7 hosts)
Product: [Retired] oVirt Reporter: Elad <ebenahar>
Component: ovirt-engine-coreAssignee: Liron Aravot <laravot>
Status: CLOSED CURRENTRELEASE QA Contact: Elad <ebenahar>
Severity: high Docs Contact:
Priority: high    
Version: 3.5CC: amureini, bugs, ebenahar, ecohen, gklein, iheim, rbalakri, yeylon
Target Milestone: ---   
Target Release: 3.5.0   
Hardware: x86_64   
OS: Unspecified   
Whiteboard: storage
Fixed In Version: ovirt-3.5.0_rc2 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-10-17 12:39:47 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
engine, vdsm, supervdsm and libvirt logs none

Description Elad 2014-08-17 13:03:12 UTC
Created attachment 927479 [details]
engine, vdsm, supervdsm and libvirt logs

Description of problem:
I have 2 RHEL7 host attached and running in a 3.5 DC connected to storage pool. I tried to put the SPM to maintenance and filed with an unclear error message from engine.


Version-Release number of selected component (if applicable):
ovirt-3.5-RC1

engine:
ovirt-engine-3.5.0-0.0.master.20140804172041.git23b558e.el6.noarch

hosts:
RHEL7
vdsm-4.14.13-1.el7ev.x86_64
qemu-kvm-1.5.3-60.el7_0.5.x86_64
libvirt-daemon-1.1.1-29.el7_0.1.x86_64

How reproducible:
Always on my setup

Steps to Reproduce:
1. Have 2 hosts with RHEL7 installed, add them to a shared DC
2. Create a storage pool with several domains from different types in it
3. Put SPM in maintenance

Actual results:

SpmStop command is reported to be sent to vdsm

2014-08-17 15:45:18,278 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStopVDSCommand] (org.ovirt.thread.pool-8-thread-39) [8536bd4] START, SpmStopVDSCommand(HostName = green-vdsa, HostId = 43484b24-89b6-4acb-afdb-c685fe8e9bf0, storagePoolId = 6914dd84-1f1d-44ad-8323-7f68066e4a13), log id: 40951598 


Failed to put the SPM to maintenance. Engine reports that there is a failure to change the status of the host due to a failure in spmStop.

2014-08-17 15:45:18,299 WARN  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (org.ovirt.thread.pool-8-thread-39) [8536bd4] Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: Failed to change status of host 'green-vdsa' due to a failure to stop the spm.

In vdsm.log I didn't see the spmStop. I'm not sure why engine reported that vdsm failed to execute it.


Expected results:
1) If spmStop command was never executed by vdsm, it shouldn't be reported as FINISH by engine.
2) The message is unclear and we can't know what exactly happened

Additional info:
engine, vdsm, supervdsm and libvirt logs

Comment 1 Allon Mureinik 2014-08-18 20:29:29 UTC
Elad, is this specific to RHEL 7? Can you reproduce it on RHEL 6.5? Fedora?

Comment 2 Elad 2014-08-19 07:08:03 UTC
Tried to reproduce it with rhel6.5 on 3.5 setup and host was moved successfully to maintenance.

Comment 3 Liron Aravot 2014-08-25 15:25:56 UTC
The failure to stop the spm should is because of tasks on the vdsm side there were probably 'unknown' on the engine side (see below).
I added a patch that adds a log to indicate that the spm wasn't stopped because of that.


Thread-220::DEBUG::2014-08-17 15:45:21,162::task::1185::TaskManager.Task::(prepare) Task=`eb2743a4-e1a0-49dd-bf5f-395c63d32c71`::finished: {'allTasksStatus': 
{'b422f1c2-1ef3-4990-b6a7-909f40d74cdb': {'code': 0, 'message': '1 jobs completed successfully', 'taskState': 'finished', 'taskResult': 'success', 'taskID': '
b422f1c2-1ef3-4990-b6a7-909f40d74cdb'}, '5b5cb243-4956-47d3-be8e-be1370af5f33': {'code': 0, 'message': '1 jobs completed successfully', 'taskState': 'finished
', 'taskResult': 'success', 'taskID': '5b5cb243-4956-47d3-be8e-be1370af5f33'}, 'a5c712bb-93b2-4ae5-a08c-dbedd5eae759': {'code': 0, 'message': '1 jobs complete
d successfully', 'taskState': 'finished', 'taskResult': 'success', 'taskID': 'a5c712bb-93b2-4ae5-a08c-dbedd5eae759'}, '8e27fd49-128b-46a9-9a16-ad03bb4713ff': 
{'code': 0, 'message': '1 jobs completed successfully', 'taskState': 'finished', 'taskResult': 'success', 'taskID': '8e27fd49-128b-46a9-9a16-ad03bb4713ff'}, '
6b2bac66-b147-4e04-a297-5b2c720180eb': {'code': 0, 'message': '1 jobs completed successfully', 'taskState': 'finished', 'taskResult': 'success', 'taskID': '6b
2bac66-b147-4e04-a297-5b2c720180eb'}, '986e4119-31bb-42e4-9197-400e624bf33c': {'code': 0, 'message': '1 jobs completed successfully', 'taskState': 'finished',
 'taskResult': 'success', 'taskID': '986e4119-31bb-42e4-9197-400e624bf33c'}, 'b88247c4-abc6-4d30-81fd-7d83ff8cec17': {'code': 0, 'message': '1 jobs completed 
successfully', 'taskState': 'finished', 'taskResult': 'success', 'taskID': 'b88247c4-abc6-4d30-81fd-7d83ff8cec17'}, '6604ea5e-4c34-4288-90e7-a9d00721ac85': {'
code': 0, 'message': '1 jobs completed successfully', 'taskState': 'finished', 'taskResult': 'success', 'taskID': '6604ea5e-4c34-4288-90e7-a9d00721ac85'}}}
Thread-220::DEBUG::2014-08-17 15:45:21,162::task::595::TaskManager.Task::(_updateState) Task=`eb2743a4-e1a0-49dd-bf5f-395c63d32c71`::moving from state prepari
ng -> state finished

Comment 4 Elad 2014-09-14 13:03:41 UTC
Tested the following:
- Got a situation in which I have unknown tasks running on SPM (uploadImageToStream tasks that are not being polled by engine due to https://bugzilla.redhat.com/show_bug.cgi?id=1136840)
- Tried to put the host to maintenance

spmStop is not sent to vdsm. 

2014-09-14 11:19:48,121 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStopVDSCommand] (org.ovirt.thread.pool-6-thread-12) [5c0b4281] SpmStopVDSCommand::Not stopping SPM on vds green-vdsb, pool id b7cf7a2a-4fde-4732-91f8-a9808ff85b93 as there are uncleared tasks


Verified using rhev3.5 vt3.1

Comment 5 Sandro Bonazzola 2014-10-17 12:39:47 UTC
oVirt 3.5 has been released and should include the fix for this issue.