RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 2217964 - NFSv4.1+ client is not freezing session table upon BADSESSION leading to improper re-use of the slot and the application to fail with EIO
Summary: NFSv4.1+ client is not freezing session table upon BADSESSION leading to impr...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 9
Classification: Red Hat
Component: kernel
Version: 9.2
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: rc
: ---
Assignee: Benjamin Coddington
QA Contact: Zhi Li
URL:
Whiteboard:
Depends On: 2217963
Blocks:
TreeView+ depends on / blocked
 
Reported: 2023-06-27 16:38 UTC by Olga Kornieskaia
Modified: 2023-11-07 11:04 UTC (History)
4 users (show)

Fixed In Version: kernel-5.14.0-340.el9
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 2217963
Environment:
Last Closed: 2023-11-07 08:48:33 UTC
Type: Bug
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Gitlab redhat/centos-stream/src/kernel centos-stream-9 merge_requests 2797 0 None opened NFSv4.1: freeze the session table upon receiving NFS4ERR_BADSESSION 2023-07-11 14:22:41 UTC
Red Hat Issue Tracker RHELPLAN-160974 0 None None None 2023-06-27 16:40:11 UTC
Red Hat Product Errata RHSA-2023:6583 0 None None None 2023-11-07 08:48:50 UTC

Description Olga Kornieskaia 2023-06-27 16:38:50 UTC
+++ This bug was initially created as a clone of Bug #2217963 +++

Description of problem:

During session trunking testing, upon an event where one of the trunked server IPs leaves the trunking group, the application would fail with EIO.

The sequence if events that required the failure were as follows:
1. Client sends a LOCK request to the server on IP1. The server processes the request, populates the session cache. However, the reply never reaches the client. The connection gets reset.
2. At this point, one of the LIFs (IP2) migrates and triggers a change in trunking group membership.
3. On the client, retries the request but this time the request is sent to server on IP2. This cause the server to send BADSESSION error.
4. Upon receiving the BADSESSION the client initiates the session recovery which schedules a state manager thread but it doesn't get to run right away. Client releases the slot.
5. A different LOCK operation gets to run, because the session table isn't frozen until the state manager actually runs, the client re-uses the released slot and sends a LOCK (with different arguments) to the server on IP1. Server replies out of the cache.
6. Client recovers the session and then proceeds to use the locking stateid received in step#5. However, that stateid is bogus for that file handle. The server fails with BAD_STATEID which leads to an EIO error.

The solution is Trond's testing branch and slotted for 6.5.


ommit c907e72f58ed979a24a9fdcadfbc447c51d5e509
Author: Olga Kornievskaia <kolga>
Date:   Sun Jun 18 17:32:25 2023 -0400

    NFSv4.1: freeze the session table upon receiving NFS4ERR_BADSESSION
    
    When the client received NFS4ERR_BADSESSION, it schedules recovery
    and start the state manager thread which in turn freezes the
    session table and does not allow for any new requests to use the
    no-longer valid session. However, it is possible that before
    the state manager thread runs, a new operation would use the
    released slot that received BADSESSION and was therefore not
    updated its sequence number. Such re-use of the slot can lead
    the application errors.
    
    Fixes: 5c441544f045 ("NFSv4.x: Handle bad/dead sessions correctly in nfs41_s
equence_process()")
    Signed-off-by: Olga Kornievskaia <kolga>
    Signed-off-by: Trond Myklebust <trond.myklebust>

Asking for this to be fixed in zstream.



Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 13 errata-xmlrpc 2023-11-07 08:48:33 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: kernel security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2023:6583


Note You need to log in before you can comment on or make changes to this bug.