Bug 2310385 - Upon CephFS volume recovery network fencing fails on external mode cluster
Summary: Upon CephFS volume recovery network fencing fails on external mode cluster
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenShift Data Foundation
Classification: Red Hat Storage
Component: rook
Version: 4.17
Hardware: Unspecified
OS: Unspecified
unspecified
urgent
Target Milestone: ---
: ODF 4.17.0
Assignee: Subham Rai
QA Contact: Joy John Pinto
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2024-09-06 10:30 UTC by Joy John Pinto
Modified: 2024-10-30 14:33 UTC (History)
6 users (show)

Fixed In Version: 4.17.0-107
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2024-10-30 14:33:10 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github red-hat-storage rook pull 733 0 None open Bug 2310385: external: mds caps to healthchecker/cephfs users 2024-09-20 06:46:05 UTC
Github rook rook pull 14722 0 None Draft external: mds caps to healthchecker/cephfs users 2024-09-17 04:05:17 UTC
Red Hat Issue Tracker OCSBZM-9068 0 None None None 2024-09-06 10:30:58 UTC
Red Hat Product Errata RHSA-2024:8676 0 None None None 2024-10-30 14:33:13 UTC

Description Joy John Pinto 2024-09-06 10:30:12 UTC
Description of problem (please be detailed as possible and provide log
snippests):

Upon CephFS volume recovery network fencing fails on external mode cluster 


Version of all relevant components (if applicable):
OCP 4.17 and ODF 4.17.0-92


Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?
NA


Is there any workaround available to the best of your knowledge?
No


Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?
1


Can this issue reproducible?
Yes

Can this issue reproduce from the UI?
Yes

If this is a regression, please provide more details to justify this:
NA

Steps to Reproduce:
Steps to Reproduce:
1. Install Openshift data foundation and deploy a app pod in external mode cluster
2. Shutdown the node on which CephFS RWO pod is deployed
3.Once the node is down, add taint
```oc  taint nodes <node-name> node.kubernetes.io/out-of-service=nodeshutdown:NoExecute ```
Wait for some time(if the application pod and rook operator are on the same node wait for bit logger) then check the networkFence cr status 



Actual results:
Network fence creation fails on external mode cluster with error 

2024-09-06 10:14:57.907050 D | op-k8sutil: creating endpoint "rook-ceph-mgr-external". [{[{10.1.160.145  <nil> nil}] [] [{http-external-metrics 9283 TCP <nil>}]}]
2024-09-06 10:14:57.982565 D | exec: Running command: ceph tell mds.fsvol001.osd-0.icwduo client ls --format json --connect-timeout=15 --cluster=openshift-storage --conf=/var/lib/rook/openshift-storage/openshift-storage.config --name=client.healthchecker --keyring=/var/lib/rook/openshift-storage/client.healthchecker.keyring --format json
2024-09-06 10:14:58.140259 E | ceph-cluster-controller: failed to handle node failure. failed to create network fence for node "compute-0".: failed to fence cephfs subvolumes: failed to list watchers for cephfs subvolumeName csi-vol-4bab4cfd-bc94-4886-a2c4-beeecab6dfb2. exit status 13


Expected results:
Network fence creation should be successful upon tainting the node

Additional info:

Comment 10 Sunil Kumar Acharya 2024-09-27 06:46:45 UTC
Please update the RDT flag/text appropriately.

Comment 12 errata-xmlrpc 2024-10-30 14:33:10 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: Red Hat OpenShift Data Foundation 4.17.0 Security, Enhancement, & Bug Fix Update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2024:8676


Note You need to log in before you can comment on or make changes to this bug.