Bug 125178 - LTC9119-Random page cache corruption when audit is enabled in rhel 3 kernels
LTC9119-Random page cache corruption when audit is enabled in rhel 3 kernels
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 3
Classification: Red Hat
Component: kernel (Show other bugs)
3.0
All Linux
medium Severity high
: ---
: ---
Assigned To: Peter Martuccelli
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2004-06-03 09:28 EDT by Peter Martuccelli
Modified: 2007-11-30 17:07 EST (History)
8 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2004-09-02 00:31:45 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Peter Martuccelli 2004-06-03 09:28:16 EDT
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4.2)
Gecko/20040301

Description of problem:
Random page cache entries will have a '/' character written to them on
page aligned boundaries.  Suspected kernel routine is do_realpath in
drivers/audit/args.c.  Example of corruption when a kernel compilation
is used to reproduce the issue follows.

Corrupted file drivers/char/epca.c

Code snippet from pristine code base

	pc_callout.name = "cud";
	pc_callout.major = DIGICU_MAJOR;  <== this line is correct
	pc_callout.minor_start = 0;
	pc_callout.init_termios.c_cflag = B9600 | CS8 | CREAD | CLOCAL | HUPCL;
	pc_callout.subtype = SERIAL_TYPE_CALLOUT;

[peterm@redrum char]$ md5sum epca.c
b9f50dcc5528a9fe3db522d7c942024c  epca.c
[peterm@redrum char]$

Code snippet from corrupted file

	pc_callout.name = "cud";
	pc_callout.major = DIGICU/MAJOR;  <== this line has been modified
	pc_callout.minor_start = 0;
	pc_callout.init_termios.c_cflag = B9600 | CS8 | CREAD | CLOCAL | HUPCL;
	pc_callout.subtype = SERIAL_TYPE_CALLOUT;

MD5 sum before reboot:  File has been modified.
[peterm@redrum char]$ md5sum epca.c
76fab5e6a01648ffd808fd0cde8418b9  epca.c

The page is not marked dirty, and is never flushed, it remains
corrupted for the life of the bootload.  Following a reboot the file
reverts back to its original state.

MD5 sum after reboot:  File reverts back to pristine version.
[peterm@redrum char]$ md5sum epca.c
b9f50dcc5528a9fe3db522d7c942024c  epca.c


Version-Release number of selected component (if applicable):
kernel-2.4.21-15

How reproducible:
Always

Steps to Reproduce:
1. enable auditing
2. recompile kernel, make modules, etc
3. look for compilation errors, investigate compilation failures
looking for a '/' inserted into the source code, header file, etc.
4. reboot and edit corrupted file a second time, '/' is no longer present.
5. depending on the corrution to the page cache entries you may
experience an oops.
    

Additional info:
Comment 1 Klaus Weidner 2004-06-03 12:28:39 EDT
note that the addresses where the corruption occurs appear to be the
*last* addresses in a page.

Probably the code in drivers/audit/args.c:do_realpath is walking
backwards too far and inserting a slash before the start of the
string, in a memory page that doesn't belong to it.
Comment 2 IBM Bug Proxy 2004-06-03 18:54:40 EDT
ubject: [Bug 9119] New:  - RH125178-Random page cache corruption when audit is enabled in rhel 3 kernels
Importance: normal
References: <9119.bugzilla@linux.ibm.com>
In-Reply-To: <9119.bugzilla@linux.ibm.com>
X-Bugzilla-Reason: Reporter
X-Bugzilla-Family: Distro Service
Message-Id: <20040603215825.4048593B67@smtp.linux.ibm.com>
Date: Thu,  3 Jun 2004 17:58:25 -0400 (EDT)

Do not reply to this note.  It was sent by a machine.  Instead append your
comments to the bug at the URL below.

https://bugzilla.linux.ibm.com/show_bug.cgi?id=9119

           Summary: RH125178-Random page cache corruption when audit is
                    enabled in rhel 3 kernels
            Vendor: Red Hat Linux
           Version: RHEL3 U2
          Platform: xSeries
      Architecture: All
Submitting Project: Bluefortress
 Customer Priority: --
       Owning Team: LTC
    OSC Acceptance: N/S
   Customer Status: N/S
     Required Date: 0000-00-00 00:00:00
       Target Date: 2000-00-00 00:00:00
     Make External: ---
            Status: OPEN
Technical Severity: high
 Engineer Priority: P2
         Component: Kernel
             Owner: khoa@us.ibm.com
       SubmittedBy: gjlynx@us.ibm.com
         QAContact: khoa@us.ibm.com


Opened by Peter Martuccelli (peterm@redhat.com) on 2004-06-03 09:28

From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4.2)
Gecko/20040301

Description of problem:
Random page cache entries will have a '/' character written to them on
page aligned boundaries.  Suspected kernel routine is do_realpath in
drivers/audit/args.c.  Example of corruption when a kernel compilation
is used to reproduce the issue follows.

Corrupted file drivers/char/epca.c

Code snippet from pristine code base

	pc_callout.name = "cud";
	pc_callout.major = DIGICU_MAJOR;  <== this line is correct
	pc_callout.minor_start = 0;
	pc_callout.init_termios.c_cflag = B9600 | CS8 | CREAD | CLOCAL | HUPCL;
	pc_callout.subtype = SERIAL_TYPE_CALLOUT;

[peterm@redrum char]$ md5sum epca.c
b9f50dcc5528a9fe3db522d7c942024c  epca.c
[peterm@redrum char]$

Code snippet from corrupted file

	pc_callout.name = "cud";
	pc_callout.major = DIGICU/MAJOR;  <== this line has been modified
	pc_callout.minor_start = 0;
	pc_callout.init_termios.c_cflag = B9600 | CS8 | CREAD | CLOCAL | HUPCL;
	pc_callout.subtype = SERIAL_TYPE_CALLOUT;

MD5 sum before reboot:  File has been modified.
[peterm@redrum char]$ md5sum epca.c
76fab5e6a01648ffd808fd0cde8418b9  epca.c

The page is not marked dirty, and is never flushed, it remains
corrupted for the life of the bootload.  Following a reboot the file
reverts back to its original state.

MD5 sum after reboot:  File reverts back to pristine version.
[peterm@redrum char]$ md5sum epca.c
b9f50dcc5528a9fe3db522d7c942024c  epca.c


Version-Release number of selected component (if applicable):
kernel-2.4.21-15

How reproducible:
Always

Steps to Reproduce:
1. enable auditing
2. recompile kernel, make modules, etc
3. look for compilation errors, investigate compilation failures
looking for a '/' inserted into the source code, header file, etc.
4. reboot and edit corrupted file a second time, '/' is no longer present.
5. depending on the corrution to the page cache entries you may
experience an oops.
    

Additional info:


------- Additional Comment #1 From Klaus Weidner (klaus@atsec.com) on 2004-06-03
12:28 -------

note that the addresses where the corruption occurs appear to be the
*last* addresses in a page.

Probably the code in drivers/audit/args.c:do_realpath is walking
backwards too far and inserting a slash before the start of the
string, in a memory page that doesn't belong to it. 
Comment 3 Peter Martuccelli 2004-06-04 17:11:16 EDT
I had three engineers running the patch today with no problems reported. 

I am moving ahead with getting the patch applied to the RHEL3 kernel.
Comment 4 Need Real Name 2004-06-04 18:03:21 EDT
Thanks Peter, Can you tell me if the other observed issue was tested
with this patch ( and possible resolved?). It was the issue of: 
filesystems corruption, after reboot modified files lose the modification
Comment 5 IBM Bug Proxy 2004-06-04 19:26:05 EDT
----- Additional Comments From khoa@us.ibm.com  2004-06-04 19:22 -------
Since fix is already available and will be accepted by Red Hat, I'd like to
move this bug into FixedAwaitingTest state.  Thanks. 
Comment 6 IBM Bug Proxy 2004-06-07 18:31:00 EDT
----- Additional Comments From khake@us.ibm.com  2004-06-07 18:27 -------
Peter: When can you make this available for the team to use as part of the test
case runs?   Thanks. 
Comment 7 Ernie Petrides 2004-06-15 05:56:23 EDT
A fix for this problem has just been committed to the RHEL3 U3
patch pool this evening (in kernel version 2.4.21-15.11.EL).
Comment 8 Peter Martuccelli 2004-07-19 18:00:09 EDT
Users running LAuS to audit system calls without this fix are at risk
of incurring data corruption.  You only incur persistent data
corruption when the flawed code writes to a dirty page.  
Comment 9 John Flanagan 2004-09-02 00:31:45 EDT
An errata has been issued which should help the problem 
described in this bug report. This report is therefore being 
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files, 
please follow the link below. You may reopen this bug report 
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2004-433.html
Comment 10 IBM Bug Proxy 2004-09-15 11:29:01 EDT
----- Additional Comments From markwiz@us.ibm.com  2004-09-15 11:25 EDT -------
IBM - RHEL3 U3 is available and this bug should be fixed. Please test and post
results. 

Note You need to log in before you can comment on or make changes to this bug.