Bug 2152053 - ceph orchestrator affected by ceph-volume inventory commands that hang and stay in D state
Summary: ceph orchestrator affected by ceph-volume inventory commands that hang and st...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat Storage
Component: Ceph-Volume
Version: 5.3
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: 5.3z1
Assignee: Guillaume Abrioux
QA Contact: Manisha Saini
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-12-09 03:16 UTC by Vasishta
Modified: 2023-02-28 10:06 UTC (History)
4 users (show)

Fixed In Version: ceph-16.2.10-98.el8cp
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-02-28 10:06:16 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Ceph Project Bug Tracker 57088 0 None None None 2022-12-09 03:16:19 UTC
Github ceph ceph pull 47535 0 None Merged pacific: ceph-volume: system.get_mounts() refactor 2023-01-17 11:45:17 UTC
Red Hat Issue Tracker RHCEPH-5765 0 None None None 2022-12-09 03:44:55 UTC
Red Hat Product Errata RHSA-2023:0980 0 None None None 2023-02-28 10:06:51 UTC

Description Vasishta 2022-12-09 03:16:19 UTC
Description of problem:
When a network mount is present in /proc/mounts but for any reason
the corresponding server is down, this function hangs forever.
In a cluster deployed with cephadm, the consequence is that
it triggers ceph-volume inventory commands that hang and stay in D
state.

Downstream Context:
In our env, ceph orch upgrade was stuck indefinitely, upon examining, found out that 1/12 node *might* had some stale cephfs mounts which is causing stuck operations. (df -h, df -l, strace -o df.errors df), the blocker of upgrade could also be due to same reason as ceph-volume inventry check and ceph orch upgrade are blocked.

Contextual Steps to Reproduce:
1. Configure 5.x ceph cluster
2. Have some stale mounts in one of the cluster nodes
3. Try ceph orch upgrade, observe that cluster doesn't get upgraded without giving a clue, check that ceph-volume inventory gets stuck.

Version-Release number of selected component (if applicable):
5.3
16.2.10-75

How reproducible:
Once


Actual results:
ceph-volume inventory gets stuck.

Expected results:
ceph-volume to avoid stale mounts

Additional info:
Fix is already present in quincy.

Comment 1 Vasishta 2022-12-09 03:23:31 UTC
Fix has been backported to pacific also, this is a tracker for downstream inclusion of the fix.
As the issue being one of the reason for upgrade process, created this tracker

[Workaround is to reboot the node, will try and update further]

Comment 9 Manisha Saini 2023-02-09 09:48:41 UTC
Hi @Guillaume Abrioux , Could you please let us know the verification steps for same? 

From the description, it looks like we need to upgrade the cluster with stale mounts in /proc/mounts . I have few questions 

1. Verification of this BZ requires upgrade or this can be tested some other way? 

2. If needs to be tested with upgrade, How to create stale entry for cephfs volume for verification. 

3. Upgrade needs to perform from 5.3z1 to 6.0 for verification and for reproducing this issue we need to perform upgrade from 5.3 (LIVE) to 5.3z1 builds?

Comment 14 Manisha Saini 2023-02-14 10:23:42 UTC
Based on comment #11 comment #12 and comment #13 , Moving this BZ to verified state

Comment 15 errata-xmlrpc 2023-02-28 10:06:16 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: Red Hat Ceph Storage 5.3 Bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2023:0980


Note You need to log in before you can comment on or make changes to this bug.