Bug 1409125 - SPDM job commands may not end while the performing host is non responsive
Summary: SPDM job commands may not end while the performing host is non responsive
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: ovirt-engine
Classification: oVirt
Component: BLL.Storage
Version: 4.1.0
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ovirt-4.1.0-beta
: 4.1.0.2
Assignee: Liron Aravot
QA Contact: Kevin Alon Goldblatt
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-12-29 17:30 UTC by Liron Aravot
Modified: 2017-02-15 15:02 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-02-15 15:02:00 UTC
oVirt Team: Storage
Embargoed:
rule-engine: ovirt-4.1+


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
oVirt gerrit 69384 0 master MERGED vdsbroker: adding UpdateVolumeVDSCommand 2017-01-16 17:33:06 UTC
oVirt gerrit 69386 0 master MERGED core: storage job - undetermined status failure 2017-01-16 17:33:28 UTC
oVirt gerrit 69387 0 master MERGED core: StorageJobCallback - adding support for child commands 2017-01-16 17:33:13 UTC
oVirt gerrit 69388 0 master MERGED core: StorageJobs - persist whether the job started 2017-01-16 17:33:02 UTC
oVirt gerrit 69389 0 master MERGED core: StorageJobs - adding support for job fencing 2017-01-16 17:33:09 UTC
oVirt gerrit 69390 0 master MERGED core: adding support for fencing storage volume job 2017-01-16 17:33:17 UTC
oVirt gerrit 69391 0 master MERGED core: CopyData - use volume job fencing 2017-01-16 17:33:22 UTC
oVirt gerrit 70547 0 ovirt-engine-4.1 ABANDONED core: Add "generation" to map representation of VdsmImageLocationInfo 2017-01-16 14:37:59 UTC
oVirt gerrit 70548 0 ovirt-engine-4.1 MERGED vdsbroker: adding UpdateVolumeVDSCommand 2017-01-17 15:05:29 UTC
oVirt gerrit 70549 0 ovirt-engine-4.1 MERGED core: storage job - undetermined status failure 2017-01-17 15:05:18 UTC
oVirt gerrit 70550 0 ovirt-engine-4.1 MERGED core: StorageJobCallback - adding support for child commands 2017-01-17 15:05:07 UTC
oVirt gerrit 70551 0 ovirt-engine-4.1 MERGED core: StorageJobs - persist whether the job started 2017-01-17 15:05:01 UTC
oVirt gerrit 70552 0 ovirt-engine-4.1 MERGED core: StorageJobs - adding support for job fencing 2017-01-17 15:04:57 UTC
oVirt gerrit 70553 0 ovirt-engine-4.1 MERGED core: adding support for fencing storage volume job 2017-01-17 15:04:53 UTC
oVirt gerrit 70554 0 ovirt-engine-4.1 MERGED core: CopyData - use volume job fencing 2017-01-17 15:10:55 UTC
oVirt gerrit 70555 0 ovirt-engine-4.1 MERGED core: VdsmImagePoller - ILLEGAL status consideration 2017-01-17 15:11:25 UTC
oVirt gerrit 70556 0 ovirt-engine-4.1 MERGED core: Add "generation" to map representation of VdsmImageLocationInfo 2017-01-17 15:10:27 UTC

Description Liron Aravot 2016-12-29 17:30:09 UTC
Description of problem:
When a SPDM job is being executed we attempt to poll the performing host until the job is ended.
In case the host becomes non responsive after the operation has started, we may be able to poll the entity the job is performed on to determine the job status.
But if the host becomes non responsive before the job has started, we can't end the command as the job might start (but it may not - in a case the host was powered off) - on that case the engine must wait for the host to become responsive again in order to determine that status of the operation.

How reproducible:
Always

Steps to Reproduce:
1. Move disk in data center with version >= 4.1
2. stop the vdsm service on the performing host before the job starts.

Actual results:
The engine will wait for the host to become responsive again in order to decide on that status of the operation.

Expected results:
The engine will fence the operation on supporting flows by updating the job entity so that the job will fail before it modifies it.

Comment 1 Kevin Alon Goldblatt 2017-02-06 12:47:49 UTC
Verified with the following code:
-----------------------------------------------------------------------
Version-Release number of selected component (if applicable):
vdsm-4.19.4-1.el7ev.x86_64
rhevm-4.1.0.3-0.1.el7.noarch
ovirt-engine-4.1.0.3-0.1.el7.noarch

Verified with the following scenario:
-----------------------------------------------------------------------
Steps to Reproduce:
Steps to Reproduce:
1. Move disk in data center with version >= 4.1
2. stop the vdsm service on the performing host before the job starts - Jobs fail gracefully



Moving to VERIFIED!


Note You need to log in before you can comment on or make changes to this bug.