Bug 1130766
Summary: | [engine-backend] Moving SPM to maintenance fails with an unclear error (RHEL7 hosts) | ||||||
---|---|---|---|---|---|---|---|
Product: | [Retired] oVirt | Reporter: | Elad <ebenahar> | ||||
Component: | ovirt-engine-core | Assignee: | Liron Aravot <laravot> | ||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | Elad <ebenahar> | ||||
Severity: | high | Docs Contact: | |||||
Priority: | high | ||||||
Version: | 3.5 | CC: | amureini, bugs, ebenahar, ecohen, gklein, iheim, rbalakri, yeylon | ||||
Target Milestone: | --- | ||||||
Target Release: | 3.5.0 | ||||||
Hardware: | x86_64 | ||||||
OS: | Unspecified | ||||||
Whiteboard: | storage | ||||||
Fixed In Version: | ovirt-3.5.0_rc2 | Doc Type: | Bug Fix | ||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2014-10-17 12:39:47 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | Storage | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Elad, is this specific to RHEL 7? Can you reproduce it on RHEL 6.5? Fedora? Tried to reproduce it with rhel6.5 on 3.5 setup and host was moved successfully to maintenance. The failure to stop the spm should is because of tasks on the vdsm side there were probably 'unknown' on the engine side (see below). I added a patch that adds a log to indicate that the spm wasn't stopped because of that. Thread-220::DEBUG::2014-08-17 15:45:21,162::task::1185::TaskManager.Task::(prepare) Task=`eb2743a4-e1a0-49dd-bf5f-395c63d32c71`::finished: {'allTasksStatus': {'b422f1c2-1ef3-4990-b6a7-909f40d74cdb': {'code': 0, 'message': '1 jobs completed successfully', 'taskState': 'finished', 'taskResult': 'success', 'taskID': ' b422f1c2-1ef3-4990-b6a7-909f40d74cdb'}, '5b5cb243-4956-47d3-be8e-be1370af5f33': {'code': 0, 'message': '1 jobs completed successfully', 'taskState': 'finished ', 'taskResult': 'success', 'taskID': '5b5cb243-4956-47d3-be8e-be1370af5f33'}, 'a5c712bb-93b2-4ae5-a08c-dbedd5eae759': {'code': 0, 'message': '1 jobs complete d successfully', 'taskState': 'finished', 'taskResult': 'success', 'taskID': 'a5c712bb-93b2-4ae5-a08c-dbedd5eae759'}, '8e27fd49-128b-46a9-9a16-ad03bb4713ff': {'code': 0, 'message': '1 jobs completed successfully', 'taskState': 'finished', 'taskResult': 'success', 'taskID': '8e27fd49-128b-46a9-9a16-ad03bb4713ff'}, ' 6b2bac66-b147-4e04-a297-5b2c720180eb': {'code': 0, 'message': '1 jobs completed successfully', 'taskState': 'finished', 'taskResult': 'success', 'taskID': '6b 2bac66-b147-4e04-a297-5b2c720180eb'}, '986e4119-31bb-42e4-9197-400e624bf33c': {'code': 0, 'message': '1 jobs completed successfully', 'taskState': 'finished', 'taskResult': 'success', 'taskID': '986e4119-31bb-42e4-9197-400e624bf33c'}, 'b88247c4-abc6-4d30-81fd-7d83ff8cec17': {'code': 0, 'message': '1 jobs completed successfully', 'taskState': 'finished', 'taskResult': 'success', 'taskID': 'b88247c4-abc6-4d30-81fd-7d83ff8cec17'}, '6604ea5e-4c34-4288-90e7-a9d00721ac85': {' code': 0, 'message': '1 jobs completed successfully', 'taskState': 'finished', 'taskResult': 'success', 'taskID': '6604ea5e-4c34-4288-90e7-a9d00721ac85'}}} Thread-220::DEBUG::2014-08-17 15:45:21,162::task::595::TaskManager.Task::(_updateState) Task=`eb2743a4-e1a0-49dd-bf5f-395c63d32c71`::moving from state prepari ng -> state finished Tested the following: - Got a situation in which I have unknown tasks running on SPM (uploadImageToStream tasks that are not being polled by engine due to https://bugzilla.redhat.com/show_bug.cgi?id=1136840) - Tried to put the host to maintenance spmStop is not sent to vdsm. 2014-09-14 11:19:48,121 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStopVDSCommand] (org.ovirt.thread.pool-6-thread-12) [5c0b4281] SpmStopVDSCommand::Not stopping SPM on vds green-vdsb, pool id b7cf7a2a-4fde-4732-91f8-a9808ff85b93 as there are uncleared tasks Verified using rhev3.5 vt3.1 oVirt 3.5 has been released and should include the fix for this issue. |
Created attachment 927479 [details] engine, vdsm, supervdsm and libvirt logs Description of problem: I have 2 RHEL7 host attached and running in a 3.5 DC connected to storage pool. I tried to put the SPM to maintenance and filed with an unclear error message from engine. Version-Release number of selected component (if applicable): ovirt-3.5-RC1 engine: ovirt-engine-3.5.0-0.0.master.20140804172041.git23b558e.el6.noarch hosts: RHEL7 vdsm-4.14.13-1.el7ev.x86_64 qemu-kvm-1.5.3-60.el7_0.5.x86_64 libvirt-daemon-1.1.1-29.el7_0.1.x86_64 How reproducible: Always on my setup Steps to Reproduce: 1. Have 2 hosts with RHEL7 installed, add them to a shared DC 2. Create a storage pool with several domains from different types in it 3. Put SPM in maintenance Actual results: SpmStop command is reported to be sent to vdsm 2014-08-17 15:45:18,278 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStopVDSCommand] (org.ovirt.thread.pool-8-thread-39) [8536bd4] START, SpmStopVDSCommand(HostName = green-vdsa, HostId = 43484b24-89b6-4acb-afdb-c685fe8e9bf0, storagePoolId = 6914dd84-1f1d-44ad-8323-7f68066e4a13), log id: 40951598 Failed to put the SPM to maintenance. Engine reports that there is a failure to change the status of the host due to a failure in spmStop. 2014-08-17 15:45:18,299 WARN [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (org.ovirt.thread.pool-8-thread-39) [8536bd4] Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: Failed to change status of host 'green-vdsa' due to a failure to stop the spm. In vdsm.log I didn't see the spmStop. I'm not sure why engine reported that vdsm failed to execute it. Expected results: 1) If spmStop command was never executed by vdsm, it shouldn't be reported as FINISH by engine. 2) The message is unclear and we can't know what exactly happened Additional info: engine, vdsm, supervdsm and libvirt logs