From Bugzilla Helper: User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.2) Gecko/20040803 Galeon/1.3.17 Description of problem: On a system running kernel-smp-2.4.21-15.0.4.EL, and with /tmp mounted as tmpfs, The LTP test 'read02' hangs in R state and cannot be cleared short of rebooting the system. The LTP source code I used for the test is ltp-full-20040804/testcases/kernel/syscalls/read/read02.c. The third test is the one that hangs. I wrote the attached C program based on this test, which makes the test case more explicit. This code also hangs when /tmp is mounted as tmpfs, and not otherwise. Version-Release number of selected component (if applicable): kernel-smp-2.4.21-15.0.4.EL How reproducible: Always Steps to Reproduce: 1. Compile the attached C code 2. Run it. Actual Results: Program hangs after printing "About to Try" Expected Results: Program should print: About to try Didn't hang! rc = -1 errno = 14 Additional info:
Created attachment 102892 [details] Program hangs on read(2) call Program inlines the LTP test case "ltp-full-20040804/testcases/kernel/syscalls/read/read02.c" If run on a system where /tmp is mounted as tmpfs, the process will hang in 'R' state.
Created attachment 103008 [details] changes to shmem.c A quick look with crash shows the tmpfs test is spinning in do_shmem_file_read(). The loop logic is a bit different from do_generic_file_read(). For the case that is properly returning an error, do_generic_file_read() has the following logic: ret = actor(desc, page, offset, nr); ... if (ret == nr && desc->count) continue; break; For the broken tmpfs case that spins, the logic is: while (nr && desc->count) { ... nr = file_read_actor(desc, page, offset, nr); Simply changing this to be more like the do_generic_file_read()logic resolved the hang problem. while (nr == ret && desc->count) { ... ret = file_read_actor(desc, page, offset, nr); Diffs of the test changes attached.
Created attachment 103274 [details] Patch for do_shmem_file_read loop problems It appears that this problem has already been resolved in later 2.4 kernels. The attached patch file represents the changes to the loop logic found in 2.4.26. This is just the bare essential changes needed to resolve the spinning problem, but there are other changes to do_shmem_file_read() in the later kernels that are probably worth picking up.
I just confirmed that RHL 9 with 2.4.20-31.9 doesn't have this problem by mistake :). It looks like something similar to the patch in comment 3 was already applied.
A fix for this problem has just been committed to the RHEL3 U4 patch pool this evening (in kernel version 2.4.21-20.11.EL).
A fix to the prior fix has just been committed to the RHEL3 U4 patch pool this evening (in kernel version 2.4.21-20.12.EL).
An errata has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2004-550.html