Bug 2239449

Summary: [RHCS-6.X backport] [RFE] BLK/Kernel: Improve protection against running one OSD twice
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: Bipin Kunal <bkunal>
Component: RADOSAssignee: Adam Kupczyk <akupczyk>
Status: CLOSED ERRATA QA Contact: skanta
Severity: high Docs Contact: Disha Walvekar <dwalveka>
Priority: high    
Version: 3.3CC: akupczyk, amathuri, bhubbard, ceph-eng-bugs, cephqe-warriors, choffman, dwalveka, ksirivad, lflores, nojha, pdhange, pdhiran, rfriedma, rzarzyns, skanta, sseshasa, tserlin, vereddy, vumrao
Target Milestone: ---Keywords: FutureFeature
Target Release: 6.1z3   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: ceph-17.2.6-160.el9cp Doc Type: Enhancement
Doc Text:
Feature: Improve protection against running OSD twice at the same time on one block device. Reinforce advisory locking with O_EXCL open flag dedicated for block devices. Reason: We had protection against running BlueStore twice by using advisory locking. This worked (and still works) very well on baremetal deployments. However, when one uses containers it is possible to create unrelated inodes that target same block device "mknod b". As a result, 2 containers can think they have exclusive access. This almost always leads to severe errors. Result: It is no longer possible to open one BlueStore instance twice. Some categories of weird overwrite and corruptions will now be gone.
Story Points: ---
Clone Of: 2149453
: 2239455 (view as bug list) Environment:
Last Closed: 2023-12-12 13:55:51 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 2149453    
Bug Blocks: 2239455, 2247624    

Description Bipin Kunal 2023-09-18 12:56:34 UTC
+++ This bug was initially created as a clone of Bug #2149453 +++

Description of problem:
[RFE] BLK/Kernel: Improve protection against running one OSD twice

https://tracker.ceph.com/issues/58113
https://github.com/ceph/ceph/pull/49132


Version-Release number of selected component (if applicable):
RHCS 3.x and above

--- Additional comment from Red Hat Bugzilla on 2023-01-01 00:39:44 IST ---

remove performed by PnT Account Manager <pnt-expunge>

--- Additional comment from Red Hat Bugzilla on 2023-01-01 00:43:34 IST ---

remove performed by PnT Account Manager <pnt-expunge>

--- Additional comment from Red Hat Bugzilla on 2023-01-01 01:02:42 IST ---

remove performed by PnT Account Manager <pnt-expunge>

--- Additional comment from Red Hat Bugzilla on 2023-01-01 01:30:06 IST ---

remove performed by PnT Account Manager <pnt-expunge>

--- Additional comment from Red Hat Bugzilla on 2023-01-01 04:13:36 IST ---

remove performed by PnT Account Manager <pnt-expunge>

--- Additional comment from Red Hat Bugzilla on 2023-01-01 05:13:40 IST ---

remove performed by PnT Account Manager <pnt-expunge>

--- Additional comment from Red Hat Bugzilla on 2023-01-01 05:15:58 IST ---

remove performed by PnT Account Manager <pnt-expunge>

--- Additional comment from Red Hat Bugzilla on 2023-01-01 11:05:28 IST ---

remove performed by PnT Account Manager <pnt-expunge>

--- Additional comment from Red Hat Bugzilla on 2023-01-01 11:57:15 IST ---

remove performed by PnT Account Manager <pnt-expunge>

--- Additional comment from Red Hat Bugzilla on 2023-01-01 11:59:08 IST ---

remove performed by PnT Account Manager <pnt-expunge>

--- Additional comment from Red Hat Bugzilla on 2023-01-01 14:08:37 IST ---

Account disabled by LDAP Audit

--- Additional comment from Red Hat Bugzilla on 2023-01-01 14:09:48 IST ---

Account disabled by LDAP Audit

--- Additional comment from Red Hat Bugzilla on 2023-01-01 14:17:24 IST ---

Account disabled by LDAP Audit

--- Additional comment from Red Hat Bugzilla on 2023-01-01 14:18:45 IST ---

remove performed by PnT Account Manager <pnt-expunge>

--- Additional comment from Red Hat Bugzilla on 2023-01-01 14:19:58 IST ---

Account disabled by LDAP Audit

--- Additional comment from Red Hat Bugzilla on 2023-01-01 14:21:57 IST ---

remove performed by PnT Account Manager <pnt-expunge>

--- Additional comment from Red Hat Bugzilla on 2023-01-09 13:58:11 IST ---

Account disabled by LDAP Audit for extended failure

--- Additional comment from Radoslaw Zarzynski on 2023-03-27 22:35:46 IST ---

The patchset is merged for Reef (which will become 7.0).
Clone tickets are needed if want to backport to 6.x; will create them.

--- Additional comment from Neha Ojha on 2023-07-06 23:54:09 IST ---

Adam, looks like we have outstanding concerns in upstream https://github.com/ceph/ceph/pull/49132#issuecomment-1429485027?

--- Additional comment from Radoslaw Zarzynski on 2023-09-12 23:17:13 IST ---

The comimt is already in Reef. All the status changes here
are related to the comments (links above) in the PR.

--- Additional comment from Adam Kupczyk on 2023-09-14 18:45:20 IST ---

The concerns noted do not make sense to me and I decided to ignore them. 
There is no logic in claim that xfs-reclaim should interfere with opening different block device.
If somehow this is the case and BS cannot start then protection works as intended.
If xfs-reclaim opens BS block device then obviously it is xfs problem.

Comment 13 errata-xmlrpc 2023-12-12 13:55:51 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: Red Hat Ceph Storage 6.1 security, enhancements, and bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2023:7740

Comment 14 Red Hat Bugzilla 2024-04-11 04:25:04 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days