Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

This project is now read‑only. Starting Monday, February 2, please use https://ibm-ceph.atlassian.net/ for all bug tracking management.

Bug 2225891

Summary:	Ceph Fs down flag is not working
Product:	[Red Hat Storage] Red Hat Ceph Storage	Reporter:	Amarnath <amk>
Component:	CephFS	Assignee:	Rishabh Dave <ridave>
Status:	CLOSED NOTABUG	QA Contact:	Hemanth Kumar <hyelloji>
Severity:	high	Docs Contact:
Priority:	unspecified
Version:	6.1	CC:	ceph-eng-bugs, cephqe-warriors, gfarnum, mchangir, pdonnell, ridave, vshankar
Target Milestone:	---
Target Release:	7.1z3
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2024-11-20 13:50:45 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Amarnath 2023-07-25 18:00:29 UTC

Description of problem:
The Ceph Fs down flag is not working.
Even after setting it to false FS is not coming back

Test Steps followed
1. Created a filesystem
2. Mounted the file system and filled data
3. set down flag to True. This was making the FS to go down
    ceph fs set cephfs-down-flag down true 
4.Unset the flag.
    ceph fs set cephfs-down-flag down false

Test run Logs : http://magna002.ceph.redhat.com/cephci-jenkins/cephci-run-907CTT/

MDS logs : http://magna002.ceph.redhat.com/ceph-qe-logs/amar/ceph-mds.cephfs-down-flag.ceph-amk-recovery-6-1-jiiduj-node2.whnvfw.log

Parallely ran ceph fs status command : 
http://magna002.ceph.redhat.com/ceph-qe-logs/amar/mds_logs.txt


[root@ceph-amk-recovery-6-1-jiiduj-node8 cephfs_kerneljw67otoull_1]# ceph versions
{
    "mon": {
        "ceph version 17.2.6-99.el9cp (6869830013a8878a3930e23c75d8b990f6b0c491) quincy (stable)": 3
    },
    "mgr": {
        "ceph version 17.2.6-99.el9cp (6869830013a8878a3930e23c75d8b990f6b0c491) quincy (stable)": 2
    },
    "osd": {
        "ceph version 17.2.6-99.el9cp (6869830013a8878a3930e23c75d8b990f6b0c491) quincy (stable)": 12
    },
    "mds": {
        "ceph version 17.2.6-99.el9cp (6869830013a8878a3930e23c75d8b990f6b0c491) quincy (stable)": 7
    },
    "overall": {
        "ceph version 17.2.6-99.el9cp (6869830013a8878a3930e23c75d8b990f6b0c491) quincy (stable)": 24
    }
}

 


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Patrick Donnelly 2023-07-25 19:52:10 UTC

Please attach the mon logs when running these commands. Please also include `ceph fs dump` before and after these commands are run.

Comment 3 Patrick Donnelly 2023-08-14 22:24:19 UTC

The rank became damaged but the MDS log you attached is from a different time period so I cannot see what happened. Can you reproduce with

debug_mds = 20
debug_ms = 1

and attach the MDS logs please.

Comment 12 Amarnath 2023-09-17 11:42:55 UTC

Hi Venky,

Results for this tests are not consistent.
How ever, most of the times it fails(6/10).
It latest run also it failed

http://magna002.ceph.redhat.com/cephci-jenkins/test-runs/18.2.0-20/Weekly/cephfs/9/tier-4_cephfs_recovery/Taking_the_cephfs_down_with_down_flag_0.log

Regards,
Amarnath

Comment 20 Rishabh Dave 2024-06-10 11:53:38 UTC

I tried reproducing this with upstream Quincy and it failed. I also tried Reef and main, both failed.

Eventually, I also wrote 3 tests for this. First one sets "down" to true, after confirming that it is down, it sets it to "false" and then waits and checks for MDS to be "up:active" state.

Second one does this along with a creating and doing this for a FS named "cephfs-down-flag" (since that was mention in reproducing recipe provided on the BZ description). And the third test repeats the first one but 100 times. I ran all these few times and all of them passed every time meaning this bug couldn't be reproduced.

Comment 36 Red Hat Bugzilla 2025-03-27 04:25:02 UTC

The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days