Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 2020857

Summary:

generic_file_aio_read returns 0 when interrupted early with a fatal signal [rhel-7.9.z]

Product:

Red Hat Enterprise Linux 7

Reporter:

Marc Dionne (Auristor) <mdionne>

Component:

kernel

Assignee:

Carlos Maiolino <cmaiolin>

kernel sub component:

File Systems

QA Contact:

Murphy Zhou <xzhou>

Status:

CLOSED ERRATA

Docs Contact:

Severity:

unspecified

Priority:

urgent

CC:

ajmitchell, aviro, cmaiolin, dhowells, esandeen, ikent, jaltman, jreznik, llong, mdionne, mszeredi, nmurray, rhandlin, swhiteho, tdamato, xzhou

Version:

7.9

Keywords:

Triaged, ZStream

Target Milestone:

Flags:

pm-rhel: mirror+

Target Release:

---

Hardware:

Unspecified

OS:

Unspecified

Whiteboard:

Fixed In Version:

kernel-3.10.0-1160.51.1.el7

Doc Type:

If docs needed, set a value

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2022-01-11 17:35:10 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

Bug Blocks:

2024081

Attachments:

Description	Flags
Patch to set desc.error so the error code gets passed back up.	none
Patch to set desc->error so the error code gets passed back up.	none

Description Marc Dionne (Auristor) 2021-11-06 14:36:11 UTC

Created attachment 1840453 [details]
Patch to set desc.error so the error code gets passed back up.

Description of problem:

generic_file_aio_read (or generic_file_aio_read2 for ext4) is used by many filesystems, directly or indirectly, to implement their read or aio_read vfs file operations.

Here is the part where it calls do_generic_file_read, a void function, passing it a pointer to the 'desc' structure.  Note that desc.written and desc.error are initialized to 0:

                desc.written = 0;
                desc.arg.buf = iov[seg].iov_base + offset;
                desc.count = iov[seg].iov_len - offset;
                if (desc.count == 0)
                        continue;
                desc.error = 0;
                do_generic_file_read(filp, ppos, &desc, file_read_actor);
                retval += desc.written;
                if (desc.error) {
                        retval = retval ?: desc.error;
                        break;
                }

In do_generic_file_read, there is a check for a pending fatal signal:

                if (fatal_signal_pending(current)) {
                        error = -EINTR;
                        goto out;
                }

and at the out label:

out:
        ra->prev_pos = prev_index;
        ra->prev_pos <<= PAGE_CACHE_SHIFT;
        ra->prev_pos |= prev_offset;

        *ppos = ((loff_t)index << PAGE_CACHE_SHIFT) + offset;
        file_accessed(filp);
}

Note that the local variable 'error' is not returned or assigned to anything, so the -EINTR is lost and never makes it to the caller.  If no data has been read yet (retval == 0), the function will just return 0, since desc.error is 0.

This means that many file systems can return 0 bytes from ->read, even though the request is not at EOF and some data might be available.

Upstream in the mainline kernel the fatal signal check was introduced by this commit:

commit 5abf186a30a89d5b9c18a6bf93a2c192c9fd52f6
Author: Michal Hocko <mhocko>
Date:   Fri Feb 3 13:13:29 2017 -0800

    mm, fs: check for fatal signals in do_generic_file_read()

But at that point in the mainline kernel, the function was not void and the out label had:

    return written ? written : error;

so the -EINTR was correctly passed back up to callers.  This looks like an error in the backporting of that patch to 3.10; the same can be seen in the 3.10 stable kernel in 3.10.107 and 3.10.108.

This is probably not noticeable when the read is initiated from userspace through a syscall, as the process is terminated because of the fatal signal, and the error code doesn't matter.

It can however cause problems for in kernel callers that call ->read or ->aio_read directly and assume that a return of 0 indicates that there is no data to be read, as is usually guaranteed.

There's evidence that this can occur with AuriStorFS clients, when the kernel module accesses files that are part of the local disk cache.  Problems have been reported with accesses to the disk cache index file, and data files used to store directory contents.


Version-Release number of selected component (if applicable):

The problem was observed with kernel 3.10.0-1160.31.1.el7.x86_64.  Code inspection suggests it was introduced during the cycle leading up to 3.10.0-862.el7.


How reproducible:

The problem has been observed regularly (~10 times a day) in a large set of servers running the AuriStorFS client.


Steps to Reproduce:

I will see if I can come up with a script/scenario that can reproduce the problem on demand.



Actual results:


Expected results:


Additional info:

See attached proposed patch.

Comment 3 Marc Dionne (Auristor) 2021-11-06 23:33:20 UTC

Created attachment 1840497 [details]
Patch to set desc->error so the error code gets passed back up.

Comment 5 Eric Sandeen 2021-11-08 15:22:52 UTC

Thanks for the clear bug report and the patch.

Comment 6 Carlos Maiolino 2021-11-09 08:50:54 UTC

(In reply to Marc Dionne from comment #3)
> Created attachment 1840497 [details]
> Patch to set desc->error so the error code gets passed back up.

Hi Marc.

I see you've marked the patch you attached as obsolete. It seems
you've created it aiming to send it upstream to linux-stable, but
I couldn't find any thread where you posted it, did you manage to
submit it?
Just asking so I can backport it and keep your SoB. Otherwise I'll
just submit it directly to rhel

Cheers

Comment 7 Marc Dionne (Auristor) 2021-11-09 12:39:42 UTC

(In reply to Carlos Maiolino from comment #6)
> (In reply to Marc Dionne from comment #3)
> > Created attachment 1840497 [details]
> > Patch to set desc->error so the error code gets passed back up.
> 
> Hi Marc.
> 
> I see you've marked the patch you attached as obsolete. It seems
> you've created it aiming to send it upstream to linux-stable, but
> I couldn't find any thread where you posted it, did you manage to
> submit it?
> Just asking so I can backport it and keep your SoB. Otherwise I'll
> just submit it directly to rhel
> 
> Cheers

Hi Carlos,

I marked the original patch as obsolete because it had an error (desc.error rather than desc->error), but replaced it with the current version which is correct.  I have a script that reproduces getting an incorrect 0 from ->read on an xfs filesystem, and I have verified that this no longer occurs with a patched kernel.

Looks to me like the 3.10 in linux-stable is no longer maintained, so not sure there's any active upstream that this patch could be sent to.

Thanks,
Marc

Comment 8 Carlos Maiolino 2021-11-09 13:49:55 UTC

(In reply to Marc Dionne from comment #7)
> (In reply to Carlos Maiolino from comment #6)
> > (In reply to Marc Dionne from comment #3)
> > > Created attachment 1840497 [details]
> > > Patch to set desc->error so the error code gets passed back up.
> > 
> > Hi Marc.
> > 
> > I see you've marked the patch you attached as obsolete. It seems
> > you've created it aiming to send it upstream to linux-stable, but
> > I couldn't find any thread where you posted it, did you manage to
> > submit it?
> > Just asking so I can backport it and keep your SoB. Otherwise I'll
> > just submit it directly to rhel
> > 
> > Cheers
> 
> Hi Carlos,
> 
> I marked the original patch as obsolete because it had an error (desc.error
> rather than desc->error), but replaced it with the current version which is
> correct.  I have a script that reproduces getting an incorrect 0 from ->read
> on an xfs filesystem, and I have verified that this no longer occurs with a
> patched kernel.
> 
> Looks to me like the 3.10 in linux-stable is no longer maintained, so not
> sure there's any active upstream that this patch could be sent to.
> 
> Thanks,
> Marc

Hi,
no worries Marc, I just wanted to confirm if you have sent it and I wasn't finding
it anywhere.

Cheers.

Comment 19 Jeffrey Altman 2021-12-09 03:41:56 UTC

Will this fix be included in 3.10.0-957.86.1.el7 (7.6) and 3.10.0-1062.60.1.el7 (7.7) in addition to 3.10.0-1160.51.1.el7 (7.9)?

Comment 24 errata-xmlrpc 2022-01-11 17:35:10 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: kernel security and bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:0063