Bug 2104968

Summary: MDS Metada damage detected after rebalancing event
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: Tupper Cole <tcole>
Component: CephFSAssignee: Venky Shankar <vshankar>
Status: CLOSED COMPLETED QA Contact: Hemanth Kumar <hyelloji>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: 4.2CC: ceph-eng-bugs, cephqe-warriors, gfarnum, gjose, vumrao
Target Milestone: ---   
Target Release: 6.1   
Hardware: Unspecified   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-03-23 13:54:07 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Tupper Cole 2022-07-07 15:18:07 UTC
Description of problem:
After a large ingest of data caused OSDs to fill beyond backfill_toofull on the devices used for MDS metadata causing MDS to go read only. After rebalancing MDS reports metadata damage. Cluster has recovered all pgs, but metadata is still damaged.




Version-Release number of selected component (if applicable):
14.2.22 - upstream.

Red Hat has agreed to help customer stabilize this cluster long enough to migrate data off of it.



How reproducible:
Ongoing

Current status of cluster shows no metadata damage, but MDS is still starting up:

HEALTH_WARN 1 filesystem is degraded; 1 MDSs behind on trimming
FS_DEGRADED 1 filesystem is degraded
    fs cephfs is degraded
MDS_TRIM 1 MDSs behind on trimming
    mds.chhq-supcphmd01(mds.0): Behind on trimming (37827/128) max_segments: 128, num_segments: 37827
  cluster:
    id:     a2f1af2e-3a13-4043-9055-c3e4ea38d715
    health: HEALTH_WARN
            1 filesystem is degraded
            1 MDSs behind on trimming
 
  services:
    mon: 5 daemons, quorum chhq-supcphmn03,chhq-supcphmn02,chhq-supcphmn01,chhq-supcphmd01,chhq-supcphmd02 (age 20h)
    mgr: chhq-supcphmn02(active, since 11m), standbys: chhq-supcphmn01, chhq-supcphmn03
    mds: cephfs:1/1 {0=chhq-supcphmd01=up:replay} 1 up:standby
    osd: 2116 osds: 2112 up (since 22h), 2112 in (since 22h); 4 remapped pgs
    rgw: 15 daemons active (chhq-supcphmn01, chhq-supcphmn02.rgw0, chhq-supcphmn03.rgw0, chhq-supcphmn04.rgw0, chhq-supcphmn04.rgw3, chhq-supcphmn05.rgw0, chhq-supcphmn05.rgw1, chhq-supcphmn06.rgw0, chhq-supcphmn06.rgw1, chhq-supcphmn06.rgw2, chhq-supcphmn06.rgw3, chhq-supcphmn07.rgw0, chhq-supcphmn07.rgw1, chhq-supcphmn07.rgw2, chhq-supcphmn07.rgw3)
 
  task status:
 
  data:
    pools:   22 pools, 53696 pgs
    objects: 9.59G objects, 5.4 PiB
    usage:   14 PiB used, 7.4 PiB / 21 PiB avail
    pgs:     4809625/45003881250 objects misplaced (0.011%)
             53536 active+clean
             144   active+clean+scrubbing+deep
             11    active+clean+scrubbing
             4     active+remapped+backfilling
             1     active+clean+snaptrim
 
  io:
    client:   484 MiB/s rd, 485 MiB/s wr, 4.28k op/s rd, 4.47k op/s wr
    recovery: 3.5 MiB/s, 403 objects/s
 


Additional info:Case 03245586 is being used to gather data. 

Debug log on MDS startup is in note https://access.redhat.com/support/cases/#/case/03245586/discussion?attachmentId=a096R00002pE5fTQAS

Comment 1 Tupper Cole 2022-07-07 15:26:22 UTC
MDS startup timed out:

################################################################
2022-07-07 10:22:27.570 7f9e2e60c700  1 heartbeat_map is_healthy 'MDSRank' had timed out after 15
2022-07-07 10:22:27.570 7f9e2e60c700  0 mds.beacon.chhq-supcphmd01 Skipping beacon heartbeat to monitors (last acked 95.5139s ago); MDS internal heartbeat is not healthy!
2022-07-07 10:22:28.070 7f9e2e60c700  1 heartbeat_map is_healthy 'MDSRank' had timed out after 15
2022-07-07 10:22:28.070 7f9e2e60c700  0 mds.beacon.chhq-supcphmd01 Skipping beacon heartbeat to monitors (last acked 96.0139s ago); MDS internal heartbeat is not healthy!
2022-07-07 10:22:28.570 7f9e2e60c700  1 heartbeat_map is_healthy 'MDSRank' had timed out after 15
2022-07-07 10:22:28.570 7f9e2e60c700  0 mds.beacon.chhq-supcphmd01 Skipping beacon heartbeat to monitors (last acked 96.5139s ago); MDS internal heartbeat is not healthy!
2022-07-07 10:22:29.070 7f9e2e60c700  1 heartbeat_map is_healthy 'MDSRank' had timed out after 15
2022-07-07 10:22:29.070 7f9e2e60c700  0 mds.beacon.chhq-supcphmd01 Skipping beacon heartbeat to monitors (last acked 97.0139s ago); MDS internal heartbeat is not healthy!
2022-07-07 10:22:29.570 7f9e2e60c700  1 heartbeat_map is_healthy 'MDSRank' had timed out after 15
2022-07-07 10:22:29.570 7f9e2e60c700  0 mds.beacon.chhq-supcphmd01 Skipping beacon heartbeat to monitors (last acked 97.5139s ago); MDS internal heartbeat is not healthy!
2022-07-07 10:22:30.070 7f9e2e60c700  1 heartbeat_map is_healthy 'MDSRank' had timed out after 15
2022-07-07 10:22:30.070 7f9e2e60c700  0 mds.beacon.chhq-supcphmd01 Skipping beacon heartbeat to monitors (last acked 98.0139s ago); MDS internal heartbeat is not healthy!
2022-07-07 10:22:30.570 7f9e2e60c700  1 heartbeat_map is_healthy 'MDSRank' had timed out after 15
2022-07-07 10:22:30.570 7f9e2e60c700  0 mds.beacon.chhq-supcphmd01 Skipping beacon heartbeat to monitors (last acked 98.5139s ago); MDS internal heartbeat is not healthy!

Comment 2 Tupper Cole 2022-07-07 15:30:29 UTC
The cluster has plenty of available space if needed for recovery. 

cephfs_data 44TB used 1.1PB max available
cephfs_metadata 1.0TB used, 8.6PB max available

Comment 3 Tupper Cole 2022-07-07 15:46:05 UTC
MDS startup continues to fail. Appears to be some kind of timeout, but it's unclear why.

Yesterday customer ran `cephfs-journal-tool event recover_dentries summary` which seemed to allow replay to complete before timing out.

Comment 5 Vikhyat Umrao 2022-07-07 20:12:35 UTC
(In reply to Tupper Cole from comment #3)
> MDS startup continues to fail. Appears to be some kind of timeout, but it's
> unclear why.
> 
> Yesterday customer ran `cephfs-journal-tool event recover_dentries summary`
> which seemed to allow replay to complete before timing out.

Tupper - can you please capture debug_mds = 20 and debug_ms =1 from the active MDS.

Comment 21 Greg Farnum 2023-03-23 13:54:07 UTC
This was a support case and did not in itself reveal any code bugs.