Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

This project is now read‑only. Starting Monday, February 2, please use https://ibm-ceph.atlassian.net/ for all bug tracking management.

Bug 2172808

Summary:	mds: make num_fwd and num_retry to __u32
Product:	[Red Hat Storage] Red Hat Ceph Storage	Reporter:	Xiubo Li <xiubli>
Component:	CephFS	Assignee:	Xiubo Li <xiubli>
Status:	CLOSED ERRATA	QA Contact:	Hemanth Kumar <hyelloji>
Severity:	medium	Docs Contact:	Akash Raj <akraj>
Priority:	unspecified
Version:	5.3	CC:	akraj, amk, ceph-eng-bugs, cephqe-warriors, hyelloji, kdreyer, tserlin, vdas, vereddy, vshankar
Target Milestone:	---
Target Release:	5.3z2
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	Enhancement
Doc Text:	.Client request counters are converted from `_u8` type to `_u32` type and the limit is set to 256 times Previously, in multiple active MDSs cases, if a single request failed in the current MDS, the client would forward the request to another MDS. If no MDS could successfully handle the request, it would bounce infinitely between MDSs. The old `num_fwd`/`num_retry` counters are `_u8` type, which would overflow after bouncing 256 times. With this enhancement, the counters are converted from `_u8` type to `_u32` type and the limit for forwarding and retrying is set to 256 times. The client requests stop forwarding and retrying after 256 times and fails directly instead of infinitely forwarding and retrying.	Story Points:	---
Clone Of:	2172791	Environment:
Last Closed:	2023-04-11 20:07:59 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:	2172791
Bug Blocks:	2185621

Description Xiubo Li 2023-02-23 07:20:40 UTC

+++ This bug was initially created as a clone of Bug #2172791 +++

Description of problem:

The num_fwd in MClientRequestForward is int32_t, while the num_fwd
in ceph_mds_request_head is __u8. This is buggy when the num_fwd
is larger than 256 it will always be truncate to 0 again. But the
client couldn't recoginize this.

--- Additional comment from RHEL Program Management on 2023-02-23 06:07:12 UTC ---

Please specify the severity of this bug. Severity is defined here:
https://bugzilla.redhat.com/page.cgi?id=fields.html#bug_severity.

Comment 1 RHEL Program Management 2023-02-23 07:20:53 UTC

Please specify the severity of this bug. Severity is defined here:
https://bugzilla.redhat.com/page.cgi?id=fields.html#bug_severity.

Comment 8 Amarnath 2023-04-03 12:22:37 UTC

Hi Xiubo,

Can you help with the steps for verifying this BZ

Regards,
Amarnath

Comment 9 Amarnath 2023-04-05 19:06:51 UTC

Hi All,
I have executed sanity and did not observe any breakage
Logs : http://magna002.ceph.redhat.com/cephci-jenkins/cephci-run-PGDR73

Moving this to Verified

[root@ceph-amk-fcntl-pgdr73-node7 ~]# ceph versions
{
    "mon": {
        "ceph version 16.2.10-160.el8cp (6977980612de1db28e41e0a90ff779627cde7a8c) pacific (stable)": 3
    },
    "mgr": {
        "ceph version 16.2.10-160.el8cp (6977980612de1db28e41e0a90ff779627cde7a8c) pacific (stable)": 2
    },
    "osd": {
        "ceph version 16.2.10-160.el8cp (6977980612de1db28e41e0a90ff779627cde7a8c) pacific (stable)": 12
    },
    "mds": {
        "ceph version 16.2.10-160.el8cp (6977980612de1db28e41e0a90ff779627cde7a8c) pacific (stable)": 1
    },
    "overall": {
        "ceph version 16.2.10-160.el8cp (6977980612de1db28e41e0a90ff779627cde7a8c) pacific (stable)": 18
    }
}

Regards,
Amarnath

Comment 14 errata-xmlrpc 2023-04-11 20:07:59 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat Ceph Storage 5.3 Bug Fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2023:1732