Bug 2239449 - [RHCS-6.X backport] [RFE] BLK/Kernel: Improve protection against running one OSD twice
Summary: [RHCS-6.X backport] [RFE] BLK/Kernel: Improve protection against running one ...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat Storage
Component: RADOS
Version: 3.3
Hardware: x86_64
OS: Linux
high
high
Target Milestone: ---
: 6.1z3
Assignee: Adam Kupczyk
QA Contact: skanta
Disha Walvekar
URL:
Whiteboard:
Depends On: 2149453
Blocks: 2239455 2247624
TreeView+ depends on / blocked
 
Reported: 2023-09-18 12:56 UTC by Bipin Kunal
Modified: 2024-04-11 04:25 UTC (History)
19 users (show)

Fixed In Version: ceph-17.2.6-160.el9cp
Doc Type: Enhancement
Doc Text:
Feature: Improve protection against running OSD twice at the same time on one block device. Reinforce advisory locking with O_EXCL open flag dedicated for block devices. Reason: We had protection against running BlueStore twice by using advisory locking. This worked (and still works) very well on baremetal deployments. However, when one uses containers it is possible to create unrelated inodes that target same block device "mknod b". As a result, 2 containers can think they have exclusive access. This almost always leads to severe errors. Result: It is no longer possible to open one BlueStore instance twice. Some categories of weird overwrite and corruptions will now be gone.
Clone Of: 2149453
: 2239455 (view as bug list)
Environment:
Last Closed: 2023-12-12 13:55:51 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github ceph ceph pull 53566 0 None open quincy: blk/kernel: Add O_EXCL for block devices 2023-09-20 22:59:38 UTC
Red Hat Issue Tracker RHCEPH-7479 0 None None None 2023-09-18 12:57:47 UTC
Red Hat Product Errata RHSA-2023:7740 0 None None None 2023-12-12 13:56:00 UTC

Description Bipin Kunal 2023-09-18 12:56:34 UTC
+++ This bug was initially created as a clone of Bug #2149453 +++

Description of problem:
[RFE] BLK/Kernel: Improve protection against running one OSD twice

https://tracker.ceph.com/issues/58113
https://github.com/ceph/ceph/pull/49132


Version-Release number of selected component (if applicable):
RHCS 3.x and above

--- Additional comment from Red Hat Bugzilla on 2023-01-01 00:39:44 IST ---

remove performed by PnT Account Manager <pnt-expunge>

--- Additional comment from Red Hat Bugzilla on 2023-01-01 00:43:34 IST ---

remove performed by PnT Account Manager <pnt-expunge>

--- Additional comment from Red Hat Bugzilla on 2023-01-01 01:02:42 IST ---

remove performed by PnT Account Manager <pnt-expunge>

--- Additional comment from Red Hat Bugzilla on 2023-01-01 01:30:06 IST ---

remove performed by PnT Account Manager <pnt-expunge>

--- Additional comment from Red Hat Bugzilla on 2023-01-01 04:13:36 IST ---

remove performed by PnT Account Manager <pnt-expunge>

--- Additional comment from Red Hat Bugzilla on 2023-01-01 05:13:40 IST ---

remove performed by PnT Account Manager <pnt-expunge>

--- Additional comment from Red Hat Bugzilla on 2023-01-01 05:15:58 IST ---

remove performed by PnT Account Manager <pnt-expunge>

--- Additional comment from Red Hat Bugzilla on 2023-01-01 11:05:28 IST ---

remove performed by PnT Account Manager <pnt-expunge>

--- Additional comment from Red Hat Bugzilla on 2023-01-01 11:57:15 IST ---

remove performed by PnT Account Manager <pnt-expunge>

--- Additional comment from Red Hat Bugzilla on 2023-01-01 11:59:08 IST ---

remove performed by PnT Account Manager <pnt-expunge>

--- Additional comment from Red Hat Bugzilla on 2023-01-01 14:08:37 IST ---

Account disabled by LDAP Audit

--- Additional comment from Red Hat Bugzilla on 2023-01-01 14:09:48 IST ---

Account disabled by LDAP Audit

--- Additional comment from Red Hat Bugzilla on 2023-01-01 14:17:24 IST ---

Account disabled by LDAP Audit

--- Additional comment from Red Hat Bugzilla on 2023-01-01 14:18:45 IST ---

remove performed by PnT Account Manager <pnt-expunge>

--- Additional comment from Red Hat Bugzilla on 2023-01-01 14:19:58 IST ---

Account disabled by LDAP Audit

--- Additional comment from Red Hat Bugzilla on 2023-01-01 14:21:57 IST ---

remove performed by PnT Account Manager <pnt-expunge>

--- Additional comment from Red Hat Bugzilla on 2023-01-09 13:58:11 IST ---

Account disabled by LDAP Audit for extended failure

--- Additional comment from Radoslaw Zarzynski on 2023-03-27 22:35:46 IST ---

The patchset is merged for Reef (which will become 7.0).
Clone tickets are needed if want to backport to 6.x; will create them.

--- Additional comment from Neha Ojha on 2023-07-06 23:54:09 IST ---

Adam, looks like we have outstanding concerns in upstream https://github.com/ceph/ceph/pull/49132#issuecomment-1429485027?

--- Additional comment from Radoslaw Zarzynski on 2023-09-12 23:17:13 IST ---

The comimt is already in Reef. All the status changes here
are related to the comments (links above) in the PR.

--- Additional comment from Adam Kupczyk on 2023-09-14 18:45:20 IST ---

The concerns noted do not make sense to me and I decided to ignore them. 
There is no logic in claim that xfs-reclaim should interfere with opening different block device.
If somehow this is the case and BS cannot start then protection works as intended.
If xfs-reclaim opens BS block device then obviously it is xfs problem.

Comment 13 errata-xmlrpc 2023-12-12 13:55:51 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: Red Hat Ceph Storage 6.1 security, enhancements, and bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2023:7740

Comment 14 Red Hat Bugzilla 2024-04-11 04:25:04 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days


Note You need to log in before you can comment on or make changes to this bug.