Bug 2240280 - [CephFS-NFS] CEPH_FS_NONBLOCKING_IO is stuck when compiling the Linux kernel in the NFS mount director
Summary: [CephFS-NFS] CEPH_FS_NONBLOCKING_IO is stuck when compiling the Linux kernel ...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat Storage
Component: CephFS
Version: 7.0
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: 7.0
Assignee: Frank Filz
QA Contact: Manisha Saini
Rivka Pollack
URL:
Whiteboard:
Depends On:
Blocks: 2237662
TreeView+ depends on / blocked
 
Reported: 2023-09-22 20:29 UTC by Frank Filz
Modified: 2024-01-31 09:31 UTC (History)
6 users (show)

Fixed In Version: ceph-18.2.0-47.el9cp
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-12-13 15:24:00 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github ceph ceph pull 53407 0 None Merged Client: Fix nonblocking-io zero by read 2023-09-23 01:24:27 UTC
Red Hat Issue Tracker RHCEPH-7529 0 None None None 2023-09-22 20:30:13 UTC
Red Hat Product Errata RHBA-2023:7780 0 None None None 2023-12-13 15:24:04 UTC

Description Frank Filz 2023-09-22 20:29:45 UTC
Description of problem:

Originally reported in https://github.com/nfs-ganesha/nfs-ganesha/issues/988

My ganesha version is 5.5.
The ceph version is the latest, and NONBLOCKING_IO compilation is on.

When my client mounts the NFS directory, both reading and writing are normal.

But when I tried to compile the Linux kernel in the mount directory, or other software compilation, it quickly got stuck.

I analyzed the ganesha log and found that one client's read request was not completed.
Because of the read request, ceph_ll_nonblocking_readv_writev returned 0, but ceph client did not call the callback.
Then I analyzed from ceph client and found that the file size of this read is 0, although the requested offset is 0 and len is 8192.
In this case, ceph returns 0 directly and thinks that read is complete and will not call the callback function again.
see:

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results: Kernel build stalls


Expected results: Kernel build completes as expected


Additional info:

Comment 1 Frank Filz 2023-09-22 20:32:56 UTC
Upstream fix is available and merged:

https://github.com/ceph/ceph/pull/53407

Patch back ported and merged into ceph-7.0-rhel-patches

Comment 8 Frank Filz 2023-10-11 21:05:56 UTC
As a bug introduced by the async/nonblocking work, I don't think this requires doc text. Please advise on how to proceed.

Comment 9 errata-xmlrpc 2023-12-13 15:24:00 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat Ceph Storage 7.0 Bug Fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2023:7780


Note You need to log in before you can comment on or make changes to this bug.