Bug 125178

Summary: LTC9119-Random page cache corruption when audit is enabled in rhel 3 kernels
Product: Red Hat Enterprise Linux 3 Reporter: Peter Martuccelli <peterm>
Component: kernelAssignee: Peter Martuccelli <peterm>
Status: CLOSED ERRATA QA Contact:
Severity: high Docs Contact:
Priority: medium    
Version: 3.0CC: bugproxy, ccb, fenlason, khake, kweidner, petrides, riel, tao
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2004-09-02 04:31:45 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Peter Martuccelli 2004-06-03 13:28:16 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4.2)
Gecko/20040301

Description of problem:
Random page cache entries will have a '/' character written to them on
page aligned boundaries.  Suspected kernel routine is do_realpath in
drivers/audit/args.c.  Example of corruption when a kernel compilation
is used to reproduce the issue follows.

Corrupted file drivers/char/epca.c

Code snippet from pristine code base

	pc_callout.name = "cud";
	pc_callout.major = DIGICU_MAJOR;  <== this line is correct
	pc_callout.minor_start = 0;
	pc_callout.init_termios.c_cflag = B9600 | CS8 | CREAD | CLOCAL | HUPCL;
	pc_callout.subtype = SERIAL_TYPE_CALLOUT;

[peterm@redrum char]$ md5sum epca.c
b9f50dcc5528a9fe3db522d7c942024c  epca.c
[peterm@redrum char]$

Code snippet from corrupted file

	pc_callout.name = "cud";
	pc_callout.major = DIGICU/MAJOR;  <== this line has been modified
	pc_callout.minor_start = 0;
	pc_callout.init_termios.c_cflag = B9600 | CS8 | CREAD | CLOCAL | HUPCL;
	pc_callout.subtype = SERIAL_TYPE_CALLOUT;

MD5 sum before reboot:  File has been modified.
[peterm@redrum char]$ md5sum epca.c
76fab5e6a01648ffd808fd0cde8418b9  epca.c

The page is not marked dirty, and is never flushed, it remains
corrupted for the life of the bootload.  Following a reboot the file
reverts back to its original state.

MD5 sum after reboot:  File reverts back to pristine version.
[peterm@redrum char]$ md5sum epca.c
b9f50dcc5528a9fe3db522d7c942024c  epca.c


Version-Release number of selected component (if applicable):
kernel-2.4.21-15

How reproducible:
Always

Steps to Reproduce:
1. enable auditing
2. recompile kernel, make modules, etc
3. look for compilation errors, investigate compilation failures
looking for a '/' inserted into the source code, header file, etc.
4. reboot and edit corrupted file a second time, '/' is no longer present.
5. depending on the corrution to the page cache entries you may
experience an oops.
    

Additional info:

Comment 1 Klaus Weidner 2004-06-03 16:28:39 UTC
note that the addresses where the corruption occurs appear to be the
*last* addresses in a page.

Probably the code in drivers/audit/args.c:do_realpath is walking
backwards too far and inserting a slash before the start of the
string, in a memory page that doesn't belong to it.

Comment 2 IBM Bug Proxy 2004-06-03 22:54:40 UTC
ubject: [Bug 9119] New:  - RH125178-Random page cache corruption when audit is enabled in rhel 3 kernels
Importance: normal
References: <9119.bugzilla.com>
In-Reply-To: <9119.bugzilla.com>
X-Bugzilla-Reason: Reporter
X-Bugzilla-Family: Distro Service
Message-Id: <20040603215825.4048593B67.ibm.com>
Date: Thu,  3 Jun 2004 17:58:25 -0400 (EDT)

Do not reply to this note.  It was sent by a machine.  Instead append your
comments to the bug at the URL below.

https://bugzilla.linux.ibm.com/show_bug.cgi?id=9119

           Summary: RH125178-Random page cache corruption when audit is
                    enabled in rhel 3 kernels
            Vendor: Red Hat Linux
           Version: RHEL3 U2
          Platform: xSeries
      Architecture: All
Submitting Project: Bluefortress
 Customer Priority: --
       Owning Team: LTC
    OSC Acceptance: N/S
   Customer Status: N/S
     Required Date: 0000-00-00 00:00:00
       Target Date: 2000-00-00 00:00:00
     Make External: ---
            Status: OPEN
Technical Severity: high
 Engineer Priority: P2
         Component: Kernel
             Owner: khoa.com
       SubmittedBy: gjlynx.com
         QAContact: khoa.com


Opened by Peter Martuccelli (peterm) on 2004-06-03 09:28

From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4.2)
Gecko/20040301

Description of problem:
Random page cache entries will have a '/' character written to them on
page aligned boundaries.  Suspected kernel routine is do_realpath in
drivers/audit/args.c.  Example of corruption when a kernel compilation
is used to reproduce the issue follows.

Corrupted file drivers/char/epca.c

Code snippet from pristine code base

	pc_callout.name = "cud";
	pc_callout.major = DIGICU_MAJOR;  <== this line is correct
	pc_callout.minor_start = 0;
	pc_callout.init_termios.c_cflag = B9600 | CS8 | CREAD | CLOCAL | HUPCL;
	pc_callout.subtype = SERIAL_TYPE_CALLOUT;

[peterm@redrum char]$ md5sum epca.c
b9f50dcc5528a9fe3db522d7c942024c  epca.c
[peterm@redrum char]$

Code snippet from corrupted file

	pc_callout.name = "cud";
	pc_callout.major = DIGICU/MAJOR;  <== this line has been modified
	pc_callout.minor_start = 0;
	pc_callout.init_termios.c_cflag = B9600 | CS8 | CREAD | CLOCAL | HUPCL;
	pc_callout.subtype = SERIAL_TYPE_CALLOUT;

MD5 sum before reboot:  File has been modified.
[peterm@redrum char]$ md5sum epca.c
76fab5e6a01648ffd808fd0cde8418b9  epca.c

The page is not marked dirty, and is never flushed, it remains
corrupted for the life of the bootload.  Following a reboot the file
reverts back to its original state.

MD5 sum after reboot:  File reverts back to pristine version.
[peterm@redrum char]$ md5sum epca.c
b9f50dcc5528a9fe3db522d7c942024c  epca.c


Version-Release number of selected component (if applicable):
kernel-2.4.21-15

How reproducible:
Always

Steps to Reproduce:
1. enable auditing
2. recompile kernel, make modules, etc
3. look for compilation errors, investigate compilation failures
looking for a '/' inserted into the source code, header file, etc.
4. reboot and edit corrupted file a second time, '/' is no longer present.
5. depending on the corrution to the page cache entries you may
experience an oops.
    

Additional info:


------- Additional Comment #1 From Klaus Weidner (klaus) on 2004-06-03
12:28 -------

note that the addresses where the corruption occurs appear to be the
*last* addresses in a page.

Probably the code in drivers/audit/args.c:do_realpath is walking
backwards too far and inserting a slash before the start of the
string, in a memory page that doesn't belong to it. 

Comment 3 Peter Martuccelli 2004-06-04 21:11:16 UTC
I had three engineers running the patch today with no problems reported. 

I am moving ahead with getting the patch applied to the RHEL3 kernel.


Comment 4 Need Real Name 2004-06-04 22:03:21 UTC
Thanks Peter, Can you tell me if the other observed issue was tested
with this patch ( and possible resolved?). It was the issue of: 
filesystems corruption, after reboot modified files lose the modification

Comment 5 IBM Bug Proxy 2004-06-04 23:26:05 UTC
----- Additional Comments From khoa.com  2004-06-04 19:22 -------
Since fix is already available and will be accepted by Red Hat, I'd like to
move this bug into FixedAwaitingTest state.  Thanks. 

Comment 6 IBM Bug Proxy 2004-06-07 22:31:00 UTC
----- Additional Comments From khake.com  2004-06-07 18:27 -------
Peter: When can you make this available for the team to use as part of the test
case runs?   Thanks. 

Comment 7 Ernie Petrides 2004-06-15 09:56:23 UTC
A fix for this problem has just been committed to the RHEL3 U3
patch pool this evening (in kernel version 2.4.21-15.11.EL).


Comment 8 Peter Martuccelli 2004-07-19 22:00:09 UTC
Users running LAuS to audit system calls without this fix are at risk
of incurring data corruption.  You only incur persistent data
corruption when the flawed code writes to a dirty page.  

Comment 9 John Flanagan 2004-09-02 04:31:45 UTC
An errata has been issued which should help the problem 
described in this bug report. This report is therefore being 
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files, 
please follow the link below. You may reopen this bug report 
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2004-433.html


Comment 10 IBM Bug Proxy 2004-09-15 15:29:01 UTC
----- Additional Comments From markwiz.com  2004-09-15 11:25 EDT -------
IBM - RHEL3 U3 is available and this bug should be fixed. Please test and post
results.