Bug 1272083

Summary: Consume fix for "iscsi_session recovery_tmo revert back to default when a path becomes active"
Product: [oVirt] vdsm Reporter: Allon Mureinik <amureini>
Component: GeneralAssignee: Nir Soffer <nsoffer>
Status: CLOSED CURRENTRELEASE QA Contact: Aharon Canan <acanan>
Severity: high Docs Contact:
Priority: high    
Version: 4.14.0CC: amureini, bazulay, bugs, ecohen, eedri, lsurette, mgoldboi, nsoffer, rbalakri, sbonazzo, ycui, yeylon, ylavi
Target Milestone: ovirt-3.5.6Flags: rule-engine: planning_ack?
amureini: devel_ack+
rule-engine: testing_ack+
Target Release: 4.16.28   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: storage
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 1253790 Environment:
Last Closed: 2015-12-22 13:24:50 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1253789, 1253790    
Bug Blocks:    

Description Allon Mureinik 2015-10-15 12:47:19 UTC
+++ This bug was initially created as a clone of Bug #1253790 +++

Description of problem:

iSCSI default replacement_timeout is 120 seconds, resulting in too slow
iSCSI failover in multipath setup. In vdsm, this may lead to blocking of
multiple unrelated vdsm threads for many minutes, when lvm, multipath
ore scsi scan operation blocks.

This issue was resolved in multipath (bug 1099932), by configuring iscsi
session recovery_tmo sysfs attribute to multipath fast_io_fail_tmo
value, (5 seconds in vdsm configuration). However, this configuration
was reverted to the default 120 seconds after a device went down an up
again, or after restart of the iscsid daemon (bug 1139038).

This issue was fixed in kernel-3.10.0-295.el7. In this version, setting
session recovery_tmo using sysfs overrides the default value defined in
iscsid configuration file.

Vdsm need to require a kernel containing a fix for this issue on Fedora
versions including this fix.

--- Additional comment from Eyal Edri on 2015-10-14 18:38:14 IDT ---

shouldn't this bug has 3.5.z? flag set if the target milestone is set to 3.5.6?
trying to understand how clone candidates are treated in the new classification

--- Additional comment from Allon Mureinik on 2015-10-15 10:45:06 IDT ---

(In reply to Eyal Edri from comment #1)
> shouldn't this bug has 3.5.z? flag set if the target milestone is set to
> 3.5.6?
> trying to understand how clone candidates are treated in the new
> classification
Yeah, probably so.
This is waiting for qa-ack+ so we can clone it.
Gil - can you assist?
===============================================================================

Manualy cloning as the job refuses to clone vdsm bugs for some obscure reason.

Comment 1 Red Hat Bugzilla Rules Engine 2015-10-19 11:01:29 UTC
Target release should be placed once a package build is known to fix a issue. Since this bug is not modified, the target version has been reset. Please use target milestone to plan a fix for a oVirt release.

Comment 2 Aharon Canan 2015-11-10 16:37:44 UTC
Aren't this bug and the one below [1] the same? 

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1273421

Comment 3 Nir Soffer 2015-11-10 16:46:41 UTC
(In reply to Aharon Canan from comment #2)
> Aren't this bug and the one below [1] the same? 
> 
> [1] https://bugzilla.redhat.com/show_bug.cgi?id=1273421

No. bug 1273421 is about multipath fix, which was fixed for some time, but
was not enough. Multipath was configuring devices properly, but once a device
was becoming faulty (e.g. network issue), and active again, iscsid was
overriding device configuration using the default (120 seconds). This
was fixed lately in the kernel and now we require that kernel.

Comment 4 Aharon Canan 2015-11-11 06:17:42 UTC
But this is exactly what I checked in the other bug, that we require kernel (Please see comment #4 on https://bugzilla.redhat.com/show_bug.cgi?id=1273421)

anyway, just to be sure on both and not to missing something, 
Can you please approve and add verification steps?

Comment 5 Nir Soffer 2015-11-12 08:05:14 UTC
(In reply to Aharon Canan from comment #4)
> But this is exactly what I checked in the other bug, that we require kernel
> (Please see comment #4 on
> https://bugzilla.redhat.com/show_bug.cgi?id=1273421)
> 
> anyway, just to be sure on both and not to missing something, 
> Can you please approve and add verification steps?

You are correct it the same bug - but different products. This is an ovirt
bug, and bug 1273421 is a RHEV bug.

The fix is the same fix, requiring the right kernel for RHEL/Fedora.

Comment 6 Aharon Canan 2015-11-12 08:41:40 UTC
Verified using vt18.2

Comment 7 Sandro Bonazzola 2015-12-22 13:24:50 UTC
oVirt 3.5.6 has been released and the bz verified, moving to closed current release.