Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
This project is now read‑only. Starting Monday, February 2, please use https://ibm-ceph.atlassian.net/ for all bug tracking management.

Bug 2172808

Summary: mds: make num_fwd and num_retry to __u32
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: Xiubo Li <xiubli>
Component: CephFSAssignee: Xiubo Li <xiubli>
Status: CLOSED ERRATA QA Contact: Hemanth Kumar <hyelloji>
Severity: medium Docs Contact: Akash Raj <akraj>
Priority: unspecified    
Version: 5.3CC: akraj, amk, ceph-eng-bugs, cephqe-warriors, hyelloji, kdreyer, tserlin, vdas, vereddy, vshankar
Target Milestone: ---   
Target Release: 5.3z2   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Enhancement
Doc Text:
.Client request counters are converted from `_u8` type to `_u32` type and the limit is set to 256 times Previously, in multiple active MDSs cases, if a single request failed in the current MDS, the client would forward the request to another MDS. If no MDS could successfully handle the request, it would bounce infinitely between MDSs. The old `num_fwd`/`num_retry` counters are `_u8` type, which would overflow after bouncing 256 times. With this enhancement, the counters are converted from `_u8` type to `_u32` type and the limit for forwarding and retrying is set to 256 times. The client requests stop forwarding and retrying after 256 times and fails directly instead of infinitely forwarding and retrying.
Story Points: ---
Clone Of: 2172791 Environment:
Last Closed: 2023-04-11 20:07:59 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 2172791    
Bug Blocks: 2185621    

Description Xiubo Li 2023-02-23 07:20:40 UTC
+++ This bug was initially created as a clone of Bug #2172791 +++

Description of problem:

The num_fwd in MClientRequestForward is int32_t, while the num_fwd
in ceph_mds_request_head is __u8. This is buggy when the num_fwd
is larger than 256 it will always be truncate to 0 again. But the
client couldn't recoginize this.

--- Additional comment from RHEL Program Management on 2023-02-23 06:07:12 UTC ---

Please specify the severity of this bug. Severity is defined here:
https://bugzilla.redhat.com/page.cgi?id=fields.html#bug_severity.

Comment 1 RHEL Program Management 2023-02-23 07:20:53 UTC
Please specify the severity of this bug. Severity is defined here:
https://bugzilla.redhat.com/page.cgi?id=fields.html#bug_severity.

Comment 8 Amarnath 2023-04-03 12:22:37 UTC
Hi Xiubo,

Can you help with the steps for verifying this BZ

Regards,
Amarnath

Comment 9 Amarnath 2023-04-05 19:06:51 UTC
Hi All,
I have executed sanity and did not observe any breakage
Logs : http://magna002.ceph.redhat.com/cephci-jenkins/cephci-run-PGDR73

Moving this to Verified

[root@ceph-amk-fcntl-pgdr73-node7 ~]# ceph versions
{
    "mon": {
        "ceph version 16.2.10-160.el8cp (6977980612de1db28e41e0a90ff779627cde7a8c) pacific (stable)": 3
    },
    "mgr": {
        "ceph version 16.2.10-160.el8cp (6977980612de1db28e41e0a90ff779627cde7a8c) pacific (stable)": 2
    },
    "osd": {
        "ceph version 16.2.10-160.el8cp (6977980612de1db28e41e0a90ff779627cde7a8c) pacific (stable)": 12
    },
    "mds": {
        "ceph version 16.2.10-160.el8cp (6977980612de1db28e41e0a90ff779627cde7a8c) pacific (stable)": 1
    },
    "overall": {
        "ceph version 16.2.10-160.el8cp (6977980612de1db28e41e0a90ff779627cde7a8c) pacific (stable)": 18
    }
}

Regards,
Amarnath

Comment 14 errata-xmlrpc 2023-04-11 20:07:59 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat Ceph Storage 5.3 Bug Fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2023:1732