Bug 1272083 - Consume fix for "iscsi_session recovery_tmo revert back to default when a path becomes active"
Consume fix for "iscsi_session recovery_tmo revert back to default when a pat...
Status: CLOSED CURRENTRELEASE
Product: vdsm
Classification: oVirt
Component: General (Show other bugs)
4.14.0
Unspecified Unspecified
high Severity high (vote)
: ovirt-3.5.6
: 4.16.28
Assigned To: Nir Soffer
Aharon Canan
storage
:
Depends On: 1253789 1253790
Blocks:
  Show dependency treegraph
 
Reported: 2015-10-15 08:47 EDT by Allon Mureinik
Modified: 2017-11-19 05:40 EST (History)
14 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 1253790
Environment:
Last Closed: 2015-12-22 08:24:50 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: Storage
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
rule-engine: planning_ack?
amureini: devel_ack+
rule-engine: testing_ack+


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
oVirt gerrit 47473 None None None Never

  None (edit)
Description Allon Mureinik 2015-10-15 08:47:19 EDT
+++ This bug was initially created as a clone of Bug #1253790 +++

Description of problem:

iSCSI default replacement_timeout is 120 seconds, resulting in too slow
iSCSI failover in multipath setup. In vdsm, this may lead to blocking of
multiple unrelated vdsm threads for many minutes, when lvm, multipath
ore scsi scan operation blocks.

This issue was resolved in multipath (bug 1099932), by configuring iscsi
session recovery_tmo sysfs attribute to multipath fast_io_fail_tmo
value, (5 seconds in vdsm configuration). However, this configuration
was reverted to the default 120 seconds after a device went down an up
again, or after restart of the iscsid daemon (bug 1139038).

This issue was fixed in kernel-3.10.0-295.el7. In this version, setting
session recovery_tmo using sysfs overrides the default value defined in
iscsid configuration file.

Vdsm need to require a kernel containing a fix for this issue on Fedora
versions including this fix.

--- Additional comment from Eyal Edri on 2015-10-14 18:38:14 IDT ---

shouldn't this bug has 3.5.z? flag set if the target milestone is set to 3.5.6?
trying to understand how clone candidates are treated in the new classification

--- Additional comment from Allon Mureinik on 2015-10-15 10:45:06 IDT ---

(In reply to Eyal Edri from comment #1)
> shouldn't this bug has 3.5.z? flag set if the target milestone is set to
> 3.5.6?
> trying to understand how clone candidates are treated in the new
> classification
Yeah, probably so.
This is waiting for qa-ack+ so we can clone it.
Gil - can you assist?
===============================================================================

Manualy cloning as the job refuses to clone vdsm bugs for some obscure reason.
Comment 1 Red Hat Bugzilla Rules Engine 2015-10-19 07:01:29 EDT
Target release should be placed once a package build is known to fix a issue. Since this bug is not modified, the target version has been reset. Please use target milestone to plan a fix for a oVirt release.
Comment 2 Aharon Canan 2015-11-10 11:37:44 EST
Aren't this bug and the one below [1] the same? 

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1273421
Comment 3 Nir Soffer 2015-11-10 11:46:41 EST
(In reply to Aharon Canan from comment #2)
> Aren't this bug and the one below [1] the same? 
> 
> [1] https://bugzilla.redhat.com/show_bug.cgi?id=1273421

No. bug 1273421 is about multipath fix, which was fixed for some time, but
was not enough. Multipath was configuring devices properly, but once a device
was becoming faulty (e.g. network issue), and active again, iscsid was
overriding device configuration using the default (120 seconds). This
was fixed lately in the kernel and now we require that kernel.
Comment 4 Aharon Canan 2015-11-11 01:17:42 EST
But this is exactly what I checked in the other bug, that we require kernel (Please see comment #4 on https://bugzilla.redhat.com/show_bug.cgi?id=1273421)

anyway, just to be sure on both and not to missing something, 
Can you please approve and add verification steps?
Comment 5 Nir Soffer 2015-11-12 03:05:14 EST
(In reply to Aharon Canan from comment #4)
> But this is exactly what I checked in the other bug, that we require kernel
> (Please see comment #4 on
> https://bugzilla.redhat.com/show_bug.cgi?id=1273421)
> 
> anyway, just to be sure on both and not to missing something, 
> Can you please approve and add verification steps?

You are correct it the same bug - but different products. This is an ovirt
bug, and bug 1273421 is a RHEV bug.

The fix is the same fix, requiring the right kernel for RHEL/Fedora.
Comment 6 Aharon Canan 2015-11-12 03:41:40 EST
Verified using vt18.2
Comment 7 Sandro Bonazzola 2015-12-22 08:24:50 EST
oVirt 3.5.6 has been released and the bz verified, moving to closed current release.

Note You need to log in before you can comment on or make changes to this bug.