Bug 2239455

Summary: [RHCS-5.X backport] [RFE] BLK/Kernel: Improve protection against running one OSD twice
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: Bipin Kunal <bkunal>
Component: RADOSAssignee: Prashant Dhange <pdhange>
Status: CLOSED ERRATA QA Contact: skanta
Severity: high Docs Contact: Ranjini M N <rmandyam>
Priority: high    
Version: 3.3CC: akupczyk, amathuri, bhubbard, ceph-eng-bugs, cephqe-warriors, choffman, ksirivad, lflores, nojha, pdhange, pdhiran, rfriedma, rmandyam, rzarzyns, skanta, sseshasa, tserlin, vereddy, vumrao
Target Milestone: ---Keywords: FutureFeature
Target Release: 5.3z6   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: ceph-16.2.10-240.el8cp Doc Type: Enhancement
Doc Text:
.Improved protection against running BlueStore twice Previously, advisory locking was used to protect against running BlueStore twice. This works well on baremetal deployments. However, when used on containers it would create unrelated inodes that targeted same `mknod b` block device. As a result, two containers might assume that they can have exclusive access which led to severe errors. With this release, you can improve protection against running OSDs twice at the same time on one block device. You can reinforce advisory locking with O_EXCL open flag dedicated for block devices. It is no longer possible to open one BlueStore instance twice and the overwrite and corruption does not occur.
Story Points: ---
Clone Of: 2239449 Environment:
Last Closed: 2024-02-08 16:55:29 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 2149453, 2239449    
Bug Blocks: 2258797    

Description Bipin Kunal 2023-09-18 13:03:15 UTC
+++ This bug was initially created as a clone of Bug #2239449 +++

+++ This bug was initially created as a clone of Bug #2149453 +++

Description of problem:
[RFE] BLK/Kernel: Improve protection against running one OSD twice

https://tracker.ceph.com/issues/58113
https://github.com/ceph/ceph/pull/49132


Version-Release number of selected component (if applicable):
RHCS 3.x and above

--- Additional comment from Red Hat Bugzilla on 2023-01-01 00:39:44 IST ---

remove performed by PnT Account Manager <pnt-expunge>

--- Additional comment from Red Hat Bugzilla on 2023-01-01 00:43:34 IST ---

remove performed by PnT Account Manager <pnt-expunge>

--- Additional comment from Red Hat Bugzilla on 2023-01-01 01:02:42 IST ---

remove performed by PnT Account Manager <pnt-expunge>

--- Additional comment from Red Hat Bugzilla on 2023-01-01 01:30:06 IST ---

remove performed by PnT Account Manager <pnt-expunge>

--- Additional comment from Red Hat Bugzilla on 2023-01-01 04:13:36 IST ---

remove performed by PnT Account Manager <pnt-expunge>

--- Additional comment from Red Hat Bugzilla on 2023-01-01 05:13:40 IST ---

remove performed by PnT Account Manager <pnt-expunge>

--- Additional comment from Red Hat Bugzilla on 2023-01-01 05:15:58 IST ---

remove performed by PnT Account Manager <pnt-expunge>

--- Additional comment from Red Hat Bugzilla on 2023-01-01 11:05:28 IST ---

remove performed by PnT Account Manager <pnt-expunge>

--- Additional comment from Red Hat Bugzilla on 2023-01-01 11:57:15 IST ---

remove performed by PnT Account Manager <pnt-expunge>

--- Additional comment from Red Hat Bugzilla on 2023-01-01 11:59:08 IST ---

remove performed by PnT Account Manager <pnt-expunge>

--- Additional comment from Red Hat Bugzilla on 2023-01-01 14:08:37 IST ---

Account disabled by LDAP Audit

--- Additional comment from Red Hat Bugzilla on 2023-01-01 14:09:48 IST ---

Account disabled by LDAP Audit

--- Additional comment from Red Hat Bugzilla on 2023-01-01 14:17:24 IST ---

Account disabled by LDAP Audit

--- Additional comment from Red Hat Bugzilla on 2023-01-01 14:18:45 IST ---

remove performed by PnT Account Manager <pnt-expunge>

--- Additional comment from Red Hat Bugzilla on 2023-01-01 14:19:58 IST ---

Account disabled by LDAP Audit

--- Additional comment from Red Hat Bugzilla on 2023-01-01 14:21:57 IST ---

remove performed by PnT Account Manager <pnt-expunge>

--- Additional comment from Red Hat Bugzilla on 2023-01-09 13:58:11 IST ---

Account disabled by LDAP Audit for extended failure

--- Additional comment from Radoslaw Zarzynski on 2023-03-27 22:35:46 IST ---

The patchset is merged for Reef (which will become 7.0).
Clone tickets are needed if want to backport to 6.x; will create them.

--- Additional comment from Neha Ojha on 2023-07-06 23:54:09 IST ---

Adam, looks like we have outstanding concerns in upstream https://github.com/ceph/ceph/pull/49132#issuecomment-1429485027?

--- Additional comment from Radoslaw Zarzynski on 2023-09-12 23:17:13 IST ---

The comimt is already in Reef. All the status changes here
are related to the comments (links above) in the PR.

--- Additional comment from Adam Kupczyk on 2023-09-14 18:45:20 IST ---

The concerns noted do not make sense to me and I decided to ignore them. 
There is no logic in claim that xfs-reclaim should interfere with opening different block device.
If somehow this is the case and BS cannot start then protection works as intended.
If xfs-reclaim opens BS block device then obviously it is xfs problem.

Comment 11 errata-xmlrpc 2024-02-08 16:55:29 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: Red Hat Ceph Storage 5.3 Security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2024:0745