Bug 822189 - [RHEL6.3][kernel debug] Connectathon 'Special' test failures NFSv2,3
[RHEL6.3][kernel debug] Connectathon 'Special' test failures NFSv2,3
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: kernel (Show other bugs)
6.3
Unspecified Linux
medium Severity medium
: rc
: ---
Assigned To: Jeff Layton
yanfu,wang
: Regression
Depends On:
Blocks: 826592
  Show dependency treegraph
 
Reported: 2012-05-16 11:11 EDT by Jeff Burke
Modified: 2014-06-18 03:42 EDT (History)
10 users (show)

See Also:
Fixed In Version: kernel-2.6.32-274.el6
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2012-06-20 04:43:32 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Logs captures from the testing (2.02 MB, application/x-gzip)
2012-05-16 11:11 EDT, Jeff Burke
no flags Details
patch -- don't call rpc_delay() after -ENOMEM error from xprt_dynamic_alloc_slot() (2.73 KB, patch)
2012-05-16 12:52 EDT, Jeff Layton
no flags Details | Diff
patch -- fix loss of task->tk_status after rpc_delay call in xprt_alloc_slot (3.08 KB, patch)
2012-05-17 16:48 EDT, Jeff Layton
no flags Details | Diff
patch -- fix loss of task->tk_status after rpc_delay call in xprt_alloc_slot (1.84 KB, patch)
2012-05-20 07:06 EDT, Jeff Layton
no flags Details | Diff

  None (edit)
Description Jeff Burke 2012-05-16 11:11:37 EDT
Created attachment 584998 [details]
Logs captures from the testing

Description of problem:
 With the latest kernel we saw two instances of a TCP failure.
NFSv2 against RHEL5 server and NFSv3 against RHEL3 server. Both
happened during the CTHON special test. Both were on the same
host ibm-z10-12.rhts.eng.bos.redhat.com. Using RHEL6.3-20120502.3
Server with kernel-debug 2.6.32-272.el6 s/390x.

Version-Release number of selected component (if applicable):
kernel-debug 2.6.32-272.el6
RHEL6.3-20120502.3

How reproducible:
Unknown

  
Actual results:

rpciod/1: page allocation failure. order:0, mode:0x0
CPU: 1 Tainted: G           ---------------  T 2.6.32-272.el6.s390x.debug #1
Process rpciod/1 (pid: 1277, task: 000000001cae0240, ksp: 000000001e32f530)
000000001e32fa90 000000001e32fa10 0000000000000002 0000000000000000 
       000000001e32fab0 000000001e32fa28 000000001e32fa28 00000000004e6224 
       0000000000000000 0000000000000000 0000000000000000 0000000000869900 
       000000000000000d 000000000000000c 000000001e32fa80 0000000000000000 
       0000000000000000 00000000001056ec 000000001e32fa10 000000001e32fa50 
Call Trace:
([<00000000001055e4>] show_trace+0xf0/0x148)
 [<0000000000217436>] __alloc_pages_nodemask+0x72e/0xacc
 [<0000000000255f7a>] cache_alloc_refill+0x49a/0x974
 [<0000000000257994>] kmem_cache_alloc_notrace+0x22c/0x264
 [<000003c0015c7a4e>] xprt_reserve+0x252/0x3c0 [sunrpc]
 [<000003c0015d0466>] __rpc_execute+0xaa/0x42c [sunrpc]
 [<000000000016a010>] worker_thread+0x1f8/0x36c
 [<000000000017068c>] kthread+0xa4/0xac
 [<000000000010a26e>] kernel_thread_starter+0x6/0xc
 [<000000000010a268>] kernel_thread_starter+0x0/0xc
3 locks held by rpciod/1/1277:
 #0:  (rpciod){+.+.+.}, at: [<0000000000169f90>] worker_thread+0x178/0x36c
 #1:  (&task->u.tk_work){+.+.+.}, at: [<0000000000169f90>] worker_thread+0x178/0x36c
 #2:  (&xprt->reserve_lock){+.+...}, at: [<000003c0015c78ac>] xprt_reserve+0xb0/0x3c0 [sunrpc]
Mem-Info:
DMA per-cpu:
CPU    0: hi:  186, btch:  31 usd:  18
CPU    1: hi:  186, btch:  31 usd:  55
active_anon:7671 inactive_anon:13237 isolated_anon:0
 active_file:15756 inactive_file:19418 isolated_file:0
 unevictable:915 dirty:336 writeback:7258 unstable:0
 free:535 slab_reclaimable:7750 slab_unreclaimable:21566
 mapped:8676 shmem:37 pagetables:383 bounce:0
DMA free:2140kB min:2876kB low:3592kB high:4312kB active_anon:30684kB inactive_anon:52948kB active_file:63024kB inactive_file:77672kB unevictable:3660kB isolated(anon):0kB isolated(file):0kB present:517120kB mlocked:0kB dirty:1344kB writeback:29032kB mapped:34704kB shmem:148kB slab_reclaimable:31000kB slab_unreclaimable:86264kB kernel_stack:2800kB pagetables:1532kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 0 0
DMA: 343*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 1*256kB 1*512kB 0*1024kB = 2140kB
36885 total pagecache pages
742 pages in swap cache
Swap cache stats: add 6338, delete 5596, find 1300/1445
Free swap  = 488908kB
Total swap = 507896kB
131072 pages RAM
37224 pages reserved
67021 pages shared
71894 pages non-shared
Expected results:


Additional info:
Comment 1 Jeff Layton 2012-05-16 11:17:37 EDT
The allocation failure is somewhat expected. What isn't is what happened afterward:

    call_reserveresult: status=0, but no request slot, exiting

...and that means that the rpc_task exits with -EIO. I don't see any obvious way to get to call_reserveresult with tk_status == 0 and tk_rqstp == NULL however.
Comment 3 Jeff Layton 2012-05-16 12:00:37 EDT
I think I may see the bug...

xprt_alloc_slot does this:

        switch (PTR_ERR(req)) {
        case -ENOMEM:
                rpc_delay(task, HZ >> 2);
                dprintk("RPC:       dynamic allocation of request slot "
                                "failed! Retrying\n");
                break;

...rpc_delay calls:

    rpc_sleep_on(&delay_queue, task, __rpc_atrun);

...and __rpc_atrun does this:

static void __rpc_atrun(struct rpc_task *task)
{
        task->tk_status = 0;
}

... __rpc_atrun becomes the task->tk_callback, so it's going to get run after the task gets woken back up and reset the tk_status to 0.

It's not yet clear to me how to fix this. Perhaps we need a rpc_delay variant that doesn't reset the status?
Comment 4 Jeff Layton 2012-05-16 12:17:09 EDT
cc'ing Andy since he did most of the upstream work on this...
Comment 5 Jeff Layton 2012-05-16 12:52:05 EDT
Created attachment 585018 [details]
patch -- don't call rpc_delay() after -ENOMEM error from xprt_dynamic_alloc_slot()

Here's a potential patch which should do the right thing. Untested so far,
but it does build.

Jeff, how reproducible is this? Would it be possible to rerun your tests on these low memory hosts and verify that this fixes the issue? If not, maybe we could try to verify this with a fault injection patch that makes every N-th allocation in xprt_dynamic_alloc_slot fail...
Comment 6 Jeff Burke 2012-05-16 13:12:19 EDT
Jeff,
 I did verify that it was reproducible. Can you push a scratch build into brew. Let me know the taskid I will re-run the tests.

Thanks,
Jeff
Comment 8 Jeff Layton 2012-05-16 13:38:40 EDT
Ok, scratch build kicked off...

Note that we'll also want to suppress those page allocation failure warnings
as well so that this code doesn't spam the logs. I have a patch for that too,
but left it out of this scratch build for now so we could verify that we're
getting some allocation failures during the test run.
Comment 10 Jeff Layton 2012-05-17 16:24:29 EDT
Ok, testing last night was somewhat promising. Instead of failing with EIO, we
got this error from the cthon 30M test instead:

write/read 30 MB file
Error: can't sync bigfile37336: Connection timed out

...which is -ETIMEDOUT. In the kernel log, I see a bunch of page allocation failures and then eventually, this:

call_reserveresult: unrecognized error -110, exiting
call_reserveresult: unrecognized error -110, exiting
call_reserveresult: unrecognized error -110, exiting

...I *think* the -ETIMEDOUT is most likely coming from __rpc_queue_timer_fn(). IOW after we get a number of delays, eventually we hit a major RPC timeout and then end up failing the call.

Probably the right solution is to make call_reserveresult handle that error like it would -EAGAIN. Another idea would be to add a new rpc_action that resets the tk_status back to -EAGAIN when the task is ready to run again, but that doesn't
seem quite as clean to me.
Comment 11 Jeff Layton 2012-05-17 16:48:46 EDT
Created attachment 585310 [details]
patch -- fix loss of task->tk_status after rpc_delay call in xprt_alloc_slot

Ok, attempt #2...

This adds a new rpc_action that simply resets the task->tk_status back to -EAGAIN afterward. That seems to be the right thing to do since rpc_delay does something similar (resetting it back to 0).

Trond may have a different opinion once I send this upstream, but let's see if this takes care of the problem and then we'll worry about the implementation details.
Comment 12 Jeff Layton 2012-05-17 17:01:11 EDT
Note that my analysis in comment #10 is a little off. It looks like we'll always end up with the tk_status getting set to -ETIMEDOUT after the delay due to the fact that the tk_timeout is set to 0 in xprt_reserve.
Comment 15 Jeff Layton 2012-05-18 16:18:28 EDT
Jeff Burke reran the test with the new patch last night and it passed this time. I've sent that patch to the upstream mailing list and am waiting for review. I'll plan to post it for 6.3 next week...
Comment 16 RHEL Product and Program Management 2012-05-18 16:20:20 EDT
This request was evaluated by Red Hat Product Management for inclusion
in a Red Hat Enterprise Linux maintenance release. Product Management has 
requested further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed 
products. This request is not yet committed for inclusion in an Update release.
Comment 17 Jeff Layton 2012-05-20 07:06:57 EDT
Created attachment 585624 [details]
patch -- fix loss of task->tk_status after rpc_delay call in xprt_alloc_slot

Trond went with a different patch upstream that moves the error handling into
xprt_alloc_slot, so we'll probably go with a same for RHEL6.
Comment 24 Jarod Wilson 2012-05-23 16:11:39 EDT
Patch(es) available on kernel-2.6.32-274.el6
Comment 26 yanfu,wang 2012-05-25 06:04:52 EDT
(In reply to comment #10)
> Ok, testing last night was somewhat promising. Instead of failing with EIO,
> we
> got this error from the cthon 30M test instead:
> 
> write/read 30 MB file
> Error: can't sync bigfile37336: Connection timed out
> 
> ...which is -ETIMEDOUT. In the kernel log, I see a bunch of page allocation
> failures and then eventually, this:
> 
> call_reserveresult: unrecognized error -110, exiting
> call_reserveresult: unrecognized error -110, exiting
> call_reserveresult: unrecognized error -110, exiting
> 
hi Jeff,
I still got I/O error when I run connectathon against secure nfs:
[root@dell-pesc430-03 cthon04]# ./runcthon --server intel-d3x1311-01.rhts.eng.bos.redhat.com --serverdir /nfs --onlykrb5 --onlyv3| tee log
Fri May 25 05:36:11 EDT 2012
serverdir=/nfs
./server -s -o udp,nfsvers=3,sec=krb5 -m /mnt/nfsv3udp -p /nfs/nfsv3udp intel-d3x1311-01.rhts.eng.bos.redhat.com
./server -s -o tcp,nfsvers=3,sec=krb5 -m /mnt/nfsv3tcp -p /nfs/nfsv3tcp intel-d3x1311-01.rhts.eng.bos.redhat.com
Waiting for 's' to finish...
The '-s' test using '-o udp,nfsvers=3,sec=krb5' args to intel-d3x1311-01.rhts.eng.bos.redhat.com: Failed!!
 Done: 05:39:19 AM
./server -s -o udp,nfsvers=3,sec=krb5i -m /mnt/nfsv3udp -p /nfs/nfsv3udp intel-d3x1311-01.rhts.eng.bos.redhat.com
./server -s -o tcp,nfsvers=3,sec=krb5i -m /mnt/nfsv3tcp -p /nfs/nfsv3tcp intel-d3x1311-01.rhts.eng.bos.redhat.com
Waiting for 's' to finish...
The '-s' test using '-o udp,nfsvers=3,sec=krb5i' args to intel-d3x1311-01.rhts.eng.bos.redhat.com: Failed!!
 Done: 05:39:35 AM
./server -s -o udp,nfsvers=3,sec=krb5p -m /mnt/nfsv3udp -p /nfs/nfsv3udp intel-d3x1311-01.rhts.eng.bos.redhat.com
./server -s -o tcp,nfsvers=3,sec=krb5p -m /mnt/nfsv3tcp -p /nfs/nfsv3tcp intel-d3x1311-01.rhts.eng.bos.redhat.com
Waiting for 's' to finish...
The '-s' test using '-o udp,nfsvers=3,sec=krb5p' args to intel-d3x1311-01.rhts.eng.bos.redhat.com: Failed!!
 Done: 05:39:53 AM

From the nfsudp log:
...
write/read 30 MB file
Error: can't sync bigfile17395: Input/output error
special tests failed
Tests failed, leaving /mnt/nfsv3udp mounted
                                                  
No page allocation failure in /var/log/message.

My test server is intel-d3x1311-01.rhts.eng.bos.redhat.com, client is dell-pesc430-03.rhts.eng.bos.redhat.com:
[root@dell-pesc430-03 cthon04]# free -m
             total       used       free     shared    buffers     cached
Mem:           994        258        736          0         15         81
-/+ buffers/cache:        161        833
Swap:         2015          0       2015

Is the same problem with the bug?
Comment 27 Jeff Layton 2012-05-25 06:19:24 EDT
No, this bug is only a problem with TCP and you had a failure over UDP.

It seems like we had a bug reported recently where we were seeing test failures with krb5p mounts, but I can't seem to find it at the moment.
Comment 28 Jeff Layton 2012-05-25 06:21:30 EDT
That may be related to bug 747496...
Comment 29 yanfu,wang 2012-05-28 02:51:34 EDT
(In reply to comment #28)
> That may be related to bug 747496...
yes, confirm the failure is relate to krb5p/krb5i mounts with UDP, thanks.
Comment 30 yanfu,wang 2012-06-04 23:26:42 EDT
Run several connectathon testing on kernel and kernel-debug against 2.6.32-278.el6, also clone the failed job in comment #20, results show good now:
https://beaker.engineering.redhat.com/jobs/242539
https://beaker.engineering.redhat.com/jobs/242541, https://beaker.engineering.redhat.com/jobs/243300
Comment 32 errata-xmlrpc 2012-06-20 04:43:32 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHSA-2012-0862.html

Note You need to log in before you can comment on or make changes to this bug.