Description of problem: Since the Kernel has support the ring buffer reset feature, which will fix https://bugzilla.redhat.com/show_bug.cgi?id=1596684#c5 stuck bug. There has two ways to fix this: 1, set the cmd_time_out to none zero to fire the SCSI cmds in ring buffer 2, reset the ring buffer for fire the stuck SCSI cmds, and this BZ is for this case. The tcmu-runner patches are: commit 5bf4f822c5d195a5588908eefffeae36ef9e0080 Author: Mike Christie <mchristi> Date: Thu Dec 14 20:01:00 2017 -0600 libtcmu: fix unclean shutdown and restart If we restart a daemon using libtcmu while IO is in flight the kernel could have commands partially completed. This patch has us block the device so new IO is stopped, and then we reset the ring to a clean state. commit 9d5562056953a465e3b21efa7a98048026dcc443 Author: Xiubo Li <xiubli> Date: Sat Mar 3 23:08:01 2018 -0500 libtcmu: add tcmu_is_ring_reset_support support This will compatible with the old kernel version which could not support the ring reset operations. Signed-off-by: Xiubo Li <xiubli> commit 12d551bb6eafa71e91ca702a9f5d790aa3a08c57 Author: Mike Christie <mchristi> Date: Thu Dec 14 20:04:39 2017 -0600 runner/lib: flush device during shutdown If remove_device is called during device removal then the kernel will not have done the REMOVE event until all IO has stopped and new IO will not be sent. If remove_device is called due to the daemon/app being stopped then the kernel will not have been notified. This has the lib block new IO and wait for the commands in the ring to be completed normally before removing the device. If for some reason we hang, then we can still kill runner and the previous patch will handle any cleaned up needed. ... Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
Hi Xiubo, Bug 1562587 is for RHEL 7.6. Are we expecting a backport to 7.5.z and rely on it (or should we postpone this bug to 3.11 or further - to match 7.6?)
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2018:2691