1597472 – tcmu-runner: add ring reset support

Bug 1597472 - tcmu-runner: add ring reset support

Summary: tcmu-runner: add ring reset support

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	tcmu-runner
Sub Component:
Version:	cns-3.10
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	CNS 3.10
Assignee:	Xiubo Li
QA Contact:	Sweta Anandpara
Docs Contact:
URL:
Whiteboard:
Depends On:	1599669
Blocks:	1568862
TreeView+	depends on / blocked

Reported:	2018-07-03 02:08 UTC by Xiubo Li
Modified:	2018-09-12 09:28 UTC (History)
CC List:	11 users (show)
Fixed In Version:	tcmu-runner-1.2.0-23.el7rhgs
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2018-09-12 09:27:16 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHEA-2018:2691	0	None	None	None	2018-09-12 09:28:22 UTC

Description Xiubo Li 2018-07-03 02:08:07 UTC

Description of problem:

Since the Kernel has support the ring buffer reset feature, which will fix https://bugzilla.redhat.com/show_bug.cgi?id=1596684#c5 stuck bug.

There has two ways to fix this:
1, set the cmd_time_out to none zero to fire the SCSI cmds in ring buffer
2, reset the ring buffer for fire the stuck SCSI cmds, and this BZ is for this case.

The tcmu-runner patches are:

commit 5bf4f822c5d195a5588908eefffeae36ef9e0080
Author: Mike Christie <mchristi>
Date:   Thu Dec 14 20:01:00 2017 -0600

    libtcmu: fix unclean shutdown and restart
    
    If we restart a daemon using libtcmu while IO is in flight
    the kernel could have commands partially completed. This
    patch has us block the device so new IO is stopped, and
    then we reset the ring to a clean state.


commit 9d5562056953a465e3b21efa7a98048026dcc443
Author: Xiubo Li <xiubli>
Date:   Sat Mar 3 23:08:01 2018 -0500

    libtcmu: add tcmu_is_ring_reset_support support
    
    This will compatible with the old kernel version which could
    not support the ring reset operations.
    
    Signed-off-by: Xiubo Li <xiubli>


commit 12d551bb6eafa71e91ca702a9f5d790aa3a08c57
Author: Mike Christie <mchristi>
Date:   Thu Dec 14 20:04:39 2017 -0600

    runner/lib: flush device during shutdown
    
    If remove_device is called during device removal then the kernel
    will not have done the REMOVE event until all IO has stopped
    and new IO will not be sent.
    
    If remove_device is called due to the daemon/app being stopped
    then the kernel will not have been notified. This has the lib
    block new IO and wait for the commands in the ring to be
    completed normally before removing the device.
    
    If for some reason we hang, then we can still kill runner
    and the previous patch will handle any cleaned up needed.

...


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 2 Yaniv Kaul 2018-07-08 09:04:12 UTC

Hi Xiubo,
Bug 1562587 is for RHEL 7.6. Are we expecting a backport to 7.5.z and rely on it (or should we postpone this bug to 3.11 or further - to match 7.6?)

Comment 19 errata-xmlrpc 2018-09-12 09:27:16 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2018:2691

Note You need to log in before you can comment on or make changes to this bug.