Bug 1211017

Summary: kernel WARNING nfs_direct_good_bytes upon client<>ds network partition during direct read when mounted using flex-files layout
Product: [Fedora] Fedora Reporter: Jean Spector <jean>
Component: kernelAssignee: nfs-maint
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: medium Docs Contact:
Priority: unspecified    
Version: rawhideCC: gansalmon, itamar, jean, jonathan, kernel-maint, madhu.chinakonda, mchehab
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: kernel-4.0.1-300.fc22 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-04-30 13:58:11 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
kernel WARNING stack trace none

Description Jean Spector 2015-04-12 07:23:13 UTC
Created attachment 1013568 [details]
kernel WARNING stack trace

Description of problem:
kernel: WARNING: CPU: 0 PID: 68 at fs/nfs/direct.c:132 nfs_direct_good_bytes+0xa1/0xc0 [nfs]

Version-Release number of selected component (if applicable):
* Client fedora rawhide 4.0.0-0.rc5.git4.1.fc23.x86_64
* Primary Data MDS b630 3.11.9-integration-630-90182a7f

How reproducible:
Reproduced twice

Steps to Reproduce:

Reproduced using an internal tool - the following is the test flow outline:
01. Mount flex-files MDS (configured to returns layouts with 2 mirrors)
02. Writing to files to provide data to read later
03. Closing files after writing to them
04. Clearing clients' cache
05. Opening files in read-only mode (O_RDONLY | O_SYNC | O_DIRECT)
06. Performing I/O to trigger LAYOUTGET (is_write: False)
07. Clearing clients' cache
08. Initiating disaster DisasterTypes.NETWORK_PARTITION (Client drops packets goings to DS #0)
09. Performing early main I/O - right after the disaster strikes (is_write: False)
10. Sleeping for 420s
11. Ending disaster condition (unblock IP)
12. Waiting for all the clients to complete I/O

Actual results:
kernel: WARNING: CPU: 0 PID: 68 at fs/nfs/direct.c:132 nfs_direct_good_bytes+0xa1/0xc0 [nfs]

Expected results:
* I/O successful
* No warnings emitted

Comment 1 Jean Spector 2015-04-16 05:43:26 UTC
Fixing patch: http://www.spinics.net/lists/linux-nfs/msg50655.html

Comment 2 Josh Boyer 2015-04-21 12:19:11 UTC
That patch has been sitting uncommented on a for a while now.  I poked the upstream thread to see if why.

Out of curiosity, did you happen to build a kernel with those and test?

Comment 3 Josh Boyer 2015-04-27 13:26:04 UTC
OK, this patch finally went upstream.  It should be in the 4.1-rc1 build today.

Comment 4 Josh Boyer 2015-04-27 15:15:24 UTC
Fixed in rawhide and in git for F22.

Comment 5 Fedora Update System 2015-04-30 12:24:58 UTC
kernel-4.0.1-300.fc22 has been submitted as an update for Fedora 22.
https://admin.fedoraproject.org/updates/kernel-4.0.1-300.fc22

Comment 6 Jean Spector 2015-04-30 13:29:37 UTC
Verified on rawhide running kernel 4.1.0-0.rc1.git0.1.fc23.x86_64

Comment 7 Fedora Update System 2015-05-03 17:22:29 UTC
kernel-4.0.1-300.fc22 has been pushed to the Fedora 22 stable repository.  If problems still persist, please make note of it in this bug report.