Bug 2264031

Summary: mds pods continuously switching from "Active to Standby-replay" and "s-r to active" when debug 25 enabled.
Product: [Red Hat Storage] Red Hat OpenShift Data Foundation Reporter: Nagendra Reddy <nagreddy>
Component: cephAssignee: Kotresh HR <khiremat>
ceph sub component: CephFS QA Contact: Elad <ebenahar>
Status: CLOSED DUPLICATE Docs Contact:
Severity: medium    
Priority: unspecified CC: bniver, khiremat, muagarwa, pdonnell, sheggodu, sostapov, vshankar
Version: 4.15Keywords: Reopened
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2024-09-11 13:23:04 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Nagendra Reddy 2024-02-13 13:16:50 UTC
Description of problem (please be detailed as possible and provide log
snippests):


I observed mds pods continuously switched from "Active --> Standby-Replay" and "Standby-replay --> Active"  without pod restart. This is observed only after enabling debug mode in mds "ceph config set mds debug_mds 25".


Version of all relevant components (if applicable):

versions used:
ceph version 17.2.6-194.el9cp (d9f4aedda0fc0d99e7e0e06892a69523d2eb06dc) quincy (stable)
odf: 4.15.0-126
ocp: 4.15.0-0.nightly-2024-01-31-032716
Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?


Is there any workaround available to the best of your knowledge?


Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?


Can this issue reproducible?


Can this issue reproduce from the UI?


If this is a regression, please provide more details to justify this:


Steps to Reproduce:
1.Set debug mode in ceph tools using cmd "ceph config set mds debug_mds 25" 
2.Run file operations[creation, modification..etc] IO on cephfs continuously  to reach MDS cache utilisation 100% of its limit.
3.mds pods continuously switched from "Active --> Standby-Replay" and "Standby-replay --> Active"  without pod restart


Actual results:
mds pods continuously switched from "Active --> Standby-Replay" and "Standby-replay --> Active"  without pod restart

Expected results:
MSD should not switch between Active and s-r continuously without any failures to the pods.

Additional info:

Comment 16 Venky Shankar 2024-05-29 06:57:03 UTC
I'm closing this BZ since the issue isn't seen when debug_mds is set to default. There is a lack of debug data when the MDS gets restarted with debug_mds-25.

Please reopen with debug detail captured if necessary.

Comment 22 Red Hat Bugzilla 2025-01-10 04:25:03 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days