Bug 622647 - Reading /proc/locks yelds corrupt data
Summary: Reading /proc/locks yelds corrupt data
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel
Version: 5.5
Hardware: All
OS: Linux
high
medium
Target Milestone: rc
: ---
Assignee: Jerome Marchand
QA Contact: Boris Ranto
URL:
Whiteboard:
Depends On:
Blocks: 637846
TreeView+ depends on / blocked
 
Reported: 2010-08-10 03:29 UTC by Fabio Olive Leite
Modified: 2018-11-14 19:26 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 637846 (view as bug list)
Environment:
Last Closed: 2011-07-21 10:09:31 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Python program that randomly exercises file locks on 1000 files (875 bytes, text/plain)
2010-08-10 03:32 UTC, Fabio Olive Leite
no flags Details
Greps /proc/locks for lines that do not match what a sane line looks like (264 bytes, text/plain)
2010-08-10 03:34 UTC, Fabio Olive Leite
no flags Details
Patch that mostly gets it (7.83 KB, patch)
2010-08-10 03:37 UTC, Fabio Olive Leite
no flags Details | Diff
Corrected patch (9.12 KB, patch)
2010-09-27 15:19 UTC, Jerome Marchand
no flags Details | Diff
Corrected patch (9.16 KB, patch)
2010-09-30 13:48 UTC, Jerome Marchand
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2011:1065 0 normal SHIPPED_LIVE Important: Red Hat Enterprise Linux 5.7 kernel security and bug fix update 2011-07-21 09:21:37 UTC

Description Fabio Olive Leite 2010-08-10 03:29:36 UTC
Description of problem:

RHEL-5's implementation of /proc/locks is old and relied on the file_lock_list remaining static from the first read up to the last one. Reading this proc file when there is heavy file locking/unlocking activity on the host yelds broken results (repeated, overwritten, partial lines, etc).

Upstream commit 7f8ada98d9edd83d6ebd01e431e15b024a4a3dc4 by Pavel Emelyanov on Mon Oct 1 14:41:15 2007 -0700 changed it to a seq_file implementation. This bug is a request that his work be backported to RHEL-5.

Version-Release number of selected component (if applicable):

kernel-2.6.18-194.3.1.el5

How reproducible:

Needs constant file lock/unlock activity. With that, catting /proc/locks almost always returns some broken lines.

Steps to Reproduce:
1. Run mad-locker.py (randomly locks/unlocks 1000 files in a loop);
2. Run broken-proc-locks-detector.sh (essentially a regex on /proc/locks);
3. If broken-proc-locks-detector.sh returns an error, you had broken lines on /proc/locks.
  
Actual results:

Corrupt /proc/locks contents.

Expected results:

Reliable /proc/locks contents.

Additional info:

Will attach a file lock exerciser, a corrupt /proc/locks detector and my backport from the patch, which still needs some work.

Although everything seems to be correct, with RHEL-5.5's kernel plus this patch I frequently get an infinite loop reading from /proc/locks. When I stop mad-locker.py, reading from /proc/locks finally finishes, as if the check for end of list needs some extra condition. It seems that the list head changes address? How come?

Comment 1 Fabio Olive Leite 2010-08-10 03:32:06 UTC
Created attachment 437752 [details]
Python program that randomly exercises file locks on 1000 files

Use this small python program to create constant activity on file_lock_list and /proc/locks.

Comment 2 Fabio Olive Leite 2010-08-10 03:34:10 UTC
Created attachment 437753 [details]
Greps /proc/locks for lines that do not match what a sane line looks like

This is likely incomplete, but good enough for testing /proc/locks data generation while simple (but constant) file locking activity is taking place.

Comment 3 Fabio Olive Leite 2010-08-10 03:37:17 UTC
Created attachment 437755 [details]
Patch that mostly gets it

This patch should apply cleanly to recent RHEL-5.5 kernel sources, but it still needs work on detecting that the seq_file has reached the end of the list. Sometimes readers of /proc/locks will loop reading it until activity briefly stops and it detects it has reached the end.

Comment 4 Jerome Marchand 2010-09-23 15:57:09 UTC
The provided patch looks like an accurate backport of upstream.
I haven't seen any infinite loop while testing it. How did you hit that?

On the other hand, I get strange numbering of locks (first field). That's because f->private is reset each time we call read() on the file (which is several time is there is a lot of locks). Uptream is also affected. I'm workin on a solution.

Comment 5 Fabio Olive Leite 2010-09-23 17:29:37 UTC
I just ran the python program that exercises locking on a box with some 4 or 8 CPUs, and I noticed that the grep would at times run for several minutes, and an strace revealed it kept reading the same information over and over from /proc/locks, so perhaps there is some race in there where a lock is added or removed when it is about to detect end-of-list and it doesn't.

If you can't reproduce the loop, then it's ok. perhaps it was a side effect of other things going on in the test box.

Comment 6 Jerome Marchand 2010-09-24 10:08:05 UTC
I think I know where the problem comes from. In locks_start(), you use *pos-- which decrement the pointer, not pointed value. It should be (*pos)--.
I will test a corrected version ASAP.

Comment 7 Jerome Marchand 2010-09-27 15:19:42 UTC
Created attachment 449923 [details]
Corrected patch

That patch also corrects the numbering.

Comment 8 Jerome Marchand 2010-09-27 15:21:54 UTC
I just sent the patch correcting the numbering upstream. I will post the patch above after I get good response from upstream.

Comment 9 Jerome Marchand 2010-09-30 13:48:10 UTC
Created attachment 450759 [details]
Corrected patch

The numbering fix patch was not correct. This one is compliant with the corrected upstream version.

Comment 11 RHEL Program Management 2011-02-01 16:56:59 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 17 Jarod Wilson 2011-02-21 20:56:16 UTC
in kernel-2.6.18-245.el5
You can download this test kernel (or newer) from http://people.redhat.com/jwilson/el5

Detailed testing feedback is always welcomed.

Comment 22 errata-xmlrpc 2011-07-21 10:09:31 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2011-1065.html


Note You need to log in before you can comment on or make changes to this bug.