Description of problem: RHEL-5's implementation of /proc/locks is old and relied on the file_lock_list remaining static from the first read up to the last one. Reading this proc file when there is heavy file locking/unlocking activity on the host yelds broken results (repeated, overwritten, partial lines, etc). Upstream commit 7f8ada98d9edd83d6ebd01e431e15b024a4a3dc4 by Pavel Emelyanov on Mon Oct 1 14:41:15 2007 -0700 changed it to a seq_file implementation. This bug is a request that his work be backported to RHEL-5. Version-Release number of selected component (if applicable): kernel-2.6.18-194.3.1.el5 How reproducible: Needs constant file lock/unlock activity. With that, catting /proc/locks almost always returns some broken lines. Steps to Reproduce: 1. Run mad-locker.py (randomly locks/unlocks 1000 files in a loop); 2. Run broken-proc-locks-detector.sh (essentially a regex on /proc/locks); 3. If broken-proc-locks-detector.sh returns an error, you had broken lines on /proc/locks. Actual results: Corrupt /proc/locks contents. Expected results: Reliable /proc/locks contents. Additional info: Will attach a file lock exerciser, a corrupt /proc/locks detector and my backport from the patch, which still needs some work. Although everything seems to be correct, with RHEL-5.5's kernel plus this patch I frequently get an infinite loop reading from /proc/locks. When I stop mad-locker.py, reading from /proc/locks finally finishes, as if the check for end of list needs some extra condition. It seems that the list head changes address? How come?
Created attachment 437752 [details] Python program that randomly exercises file locks on 1000 files Use this small python program to create constant activity on file_lock_list and /proc/locks.
Created attachment 437753 [details] Greps /proc/locks for lines that do not match what a sane line looks like This is likely incomplete, but good enough for testing /proc/locks data generation while simple (but constant) file locking activity is taking place.
Created attachment 437755 [details] Patch that mostly gets it This patch should apply cleanly to recent RHEL-5.5 kernel sources, but it still needs work on detecting that the seq_file has reached the end of the list. Sometimes readers of /proc/locks will loop reading it until activity briefly stops and it detects it has reached the end.
The provided patch looks like an accurate backport of upstream. I haven't seen any infinite loop while testing it. How did you hit that? On the other hand, I get strange numbering of locks (first field). That's because f->private is reset each time we call read() on the file (which is several time is there is a lot of locks). Uptream is also affected. I'm workin on a solution.
I just ran the python program that exercises locking on a box with some 4 or 8 CPUs, and I noticed that the grep would at times run for several minutes, and an strace revealed it kept reading the same information over and over from /proc/locks, so perhaps there is some race in there where a lock is added or removed when it is about to detect end-of-list and it doesn't. If you can't reproduce the loop, then it's ok. perhaps it was a side effect of other things going on in the test box.
I think I know where the problem comes from. In locks_start(), you use *pos-- which decrement the pointer, not pointed value. It should be (*pos)--. I will test a corrected version ASAP.
Created attachment 449923 [details] Corrected patch That patch also corrects the numbering.
I just sent the patch correcting the numbering upstream. I will post the patch above after I get good response from upstream.
Created attachment 450759 [details] Corrected patch The numbering fix patch was not correct. This one is compliant with the corrected upstream version.
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release.
in kernel-2.6.18-245.el5 You can download this test kernel (or newer) from http://people.redhat.com/jwilson/el5 Detailed testing feedback is always welcomed.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2011-1065.html