Bug 2172808 - mds: make num_fwd and num_retry to __u32
Summary: mds: make num_fwd and num_retry to __u32
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat Storage
Component: CephFS
Version: 5.3
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
: 5.3z2
Assignee: Xiubo Li
QA Contact: Hemanth Kumar
Akash Raj
URL:
Whiteboard:
Depends On: 2172791
Blocks: 2185621
TreeView+ depends on / blocked
 
Reported: 2023-02-23 07:20 UTC by Xiubo Li
Modified: 2023-05-19 07:10 UTC (History)
10 users (show)

Fixed In Version:
Doc Type: Enhancement
Doc Text:
.Client request counters are converted from `_u8` type to `_u32` type and the limit is set to 256 times Previously, in multiple active MDSs cases, if a single request failed in the current MDS, the client would forward the request to another MDS. If no MDS could successfully handle the request, it would bounce infinitely between MDSs. The old `num_fwd`/`num_retry` counters are `_u8` type, which would overflow after bouncing 256 times. With this enhancement, the counters are converted from `_u8` type to `_u32` type and the limit for forwarding and retrying is set to 256 times. The client requests stop forwarding and retrying after 256 times and fails directly instead of infinitely forwarding and retrying.
Clone Of: 2172791
Environment:
Last Closed: 2023-04-11 20:07:59 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Ceph Project Bug Tracker 58825 0 None None None 2023-03-29 04:37:03 UTC
Ceph Project Bug Tracker 58853 0 None None None 2023-03-29 04:37:03 UTC
Red Hat Issue Tracker RHCEPH-6174 0 None None None 2023-02-23 07:21:43 UTC
Red Hat Product Errata RHBA-2023:1732 0 None None None 2023-04-11 20:08:56 UTC

Description Xiubo Li 2023-02-23 07:20:40 UTC
+++ This bug was initially created as a clone of Bug #2172791 +++

Description of problem:

The num_fwd in MClientRequestForward is int32_t, while the num_fwd
in ceph_mds_request_head is __u8. This is buggy when the num_fwd
is larger than 256 it will always be truncate to 0 again. But the
client couldn't recoginize this.

--- Additional comment from RHEL Program Management on 2023-02-23 06:07:12 UTC ---

Please specify the severity of this bug. Severity is defined here:
https://bugzilla.redhat.com/page.cgi?id=fields.html#bug_severity.

Comment 1 RHEL Program Management 2023-02-23 07:20:53 UTC
Please specify the severity of this bug. Severity is defined here:
https://bugzilla.redhat.com/page.cgi?id=fields.html#bug_severity.

Comment 8 Amarnath 2023-04-03 12:22:37 UTC
Hi Xiubo,

Can you help with the steps for verifying this BZ

Regards,
Amarnath

Comment 9 Amarnath 2023-04-05 19:06:51 UTC
Hi All,
I have executed sanity and did not observe any breakage
Logs : http://magna002.ceph.redhat.com/cephci-jenkins/cephci-run-PGDR73

Moving this to Verified

[root@ceph-amk-fcntl-pgdr73-node7 ~]# ceph versions
{
    "mon": {
        "ceph version 16.2.10-160.el8cp (6977980612de1db28e41e0a90ff779627cde7a8c) pacific (stable)": 3
    },
    "mgr": {
        "ceph version 16.2.10-160.el8cp (6977980612de1db28e41e0a90ff779627cde7a8c) pacific (stable)": 2
    },
    "osd": {
        "ceph version 16.2.10-160.el8cp (6977980612de1db28e41e0a90ff779627cde7a8c) pacific (stable)": 12
    },
    "mds": {
        "ceph version 16.2.10-160.el8cp (6977980612de1db28e41e0a90ff779627cde7a8c) pacific (stable)": 1
    },
    "overall": {
        "ceph version 16.2.10-160.el8cp (6977980612de1db28e41e0a90ff779627cde7a8c) pacific (stable)": 18
    }
}

Regards,
Amarnath

Comment 14 errata-xmlrpc 2023-04-11 20:07:59 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat Ceph Storage 5.3 Bug Fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2023:1732


Note You need to log in before you can comment on or make changes to this bug.