Bug 2007687

Summary: [GSS]ESXi hosts regularly losing the iSCSI connections and not able to recover (5.0z1)
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: Ilya Dryomov <idryomov>
Component: iSCSIAssignee: Xiubo Li <xiubli>
Status: CLOSED ERRATA QA Contact: Gopi <gpatta>
Severity: high Docs Contact: Mary Frances Hull <mhull>
Priority: high    
Version: 3.3CC: agunn, ceph-eng-bugs, ceph-qe-bugs, gjose, gpatta, gsitlani, idryomov, kjosy, mhull, pdhange, pnataraj, tserlin, vereddy, xiubli
Target Milestone: ---   
Target Release: 5.0z1   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: tcmu-runner-1.5.4-2.el8cp Doc Type: Bug Fix
Doc Text:
.The RADOS Block Device handler correctly parses configuration strings Previously, the RADOS Block Device (RBD) handler used the `strtok()` function while parsing configuration strings, which is not thread-safe. This caused incorrect parsing of the configuration string of image names when creating or reopening an image. This resulted in the image failing to open. With this release, the RBD handler uses the thread-safe `strtok_r()` function, allowing for the correct parsing of configuration strings.
Story Points: ---
Clone Of: 2003221 Environment:
Last Closed: 2021-11-02 16:39:46 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1959686    

Comment 8 Preethi 2021-10-04 12:37:46 UTC
Was not able to see the below issue with tcmu-runner-1.5.4-2.el8cp and with the latest ceph build. 

"2021-09-01 12:27:55.526 3154 [ERROR] tcmu_rbd_open:877 rbd/igw-hdd-01.vmw-hdd-scratch-01: Could not get image name
2021-09-01 12:27:56.529 3154 [ERROR] tcmu_acquire_dev_lock:395 rbd/igw-hdd-01.vmw-hdd-scratch-01: Could not reopen "

Below are the steps followed
1) Configured ISCSI with 4 gateways collocated with OSDs
2) Created 100GB disk 
3) Having ESXi Initiator, Created VMFS datastore and RAN IOs 
4) While IOs are running, performed HA 

Latest build has passed the QE standard ISCSI test, hence, moving this to be verified after discussing with @Ilya Dryomov as the issue is rarely reproduceable and there are no proper steps to verify.

Comment 14 errata-xmlrpc 2021-11-02 16:39:46 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat Ceph Storage 5.0 Bug Fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:4105