860399 – RHEL6.3: Oops in rpciod, RIP rpcauth_refreshcred

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 860399 - RHEL6.3: Oops in rpciod, RIP rpcauth_refreshcred

Summary: RHEL6.3: Oops in rpciod, RIP rpcauth_refreshcred

Keywords:
Status:	CLOSED DUPLICATE of bug 878204
Alias:	None
Product:	Red Hat Enterprise Linux 6
Classification:	Red Hat
Component:	kernel
Sub Component:
Version:	6.3
Hardware:	x86_64
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	rc
Target Release:	---
Assignee:	Jeff Layton
QA Contact:	Red Hat Kernel QE team
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2012-09-25 18:10 UTC by Kelsey Cummings
Modified:	2012-12-04 21:46 UTC (History)
CC List:	4 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2012-12-04 20:34:25 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Kelsey Cummings 2012-09-25 18:10:34 UTC

Description of problem:

Kernel Oops in rpciod under both 2.6.32-279.5.1.el6.x86_64 and 2.6.32-279.5.2.el6.x86_64.  The system is a dovecot mail server running with about ~3,500 connections, the recent crash on 2.6.32-279.5.2 nfs ops/sec had spiked to  750 over nominal of 300 due to load shifting in the cluster.

How reproducible:

Unknown, likely load related.

BT from 2.6.32-279.5.1, 2.6.32-279.5.2 crash is also at rpcauth_refreshcred+158

      KERNEL: /usr/lib/debug/lib/modules/2.6.32-279.5.1.el6.x86_64/vmlinux
    DUMPFILE: vmcore  [PARTIAL DUMP]
        CPUS: 16
        DATE: Mon Sep  3 10:07:17 2012
      UPTIME: 10 days, 12:45:19
LOAD AVERAGE: 5.83, 5.07, 4.54
       TASKS: 3392
    NODENAME: x.x.sonic.net
     RELEASE: 2.6.32-279.5.1.el6.x86_64
     VERSION: #1 SMP Tue Aug 14 16:11:42 CDT 2012
     MACHINE: x86_64  (2400 Mhz)
      MEMORY: 32 GB
       PANIC: ""
         PID: 1928
     COMMAND: "rpciod/9"
        TASK: ffff880432adb540  [THREAD_INFO: ffff880430af2000]
         CPU: 9
       STATE: TASK_RUNNING (PANIC)

crash> bt
PID: 1928   TASK: ffff880432adb540  CPU: 9   COMMAND: "rpciod/9"
 #0 [ffff880430af3ae0] machine_kexec at ffffffff8103281b
 #1 [ffff880430af3b40] crash_kexec at ffffffff810ba792
 #2 [ffff880430af3c10] oops_end at ffffffff815013c0
 #3 [ffff880430af3c40] die at ffffffff8100f26b
 #4 [ffff880430af3c70] do_general_protection at ffffffff81500f52
 #5 [ffff880430af3ca0] general_protection at ffffffff81500725
    [exception RIP: rpcauth_refreshcred+158]
    RIP: ffffffffa029a85e  RSP: ffff880430af3d50  RFLAGS: 00010286
    RAX: 6e6967756c705f66  RBX: ffff88032977c6c8  RCX: ffff8804318e1800
    RDX: 0000000000000001  RSI: ffff88011922d3c0  RDI: ffff88032977c6c8
    RBP: ffff880430af3d90   R8: ffff880432b136b8   R9: 0000000000000000
    R10: 0000000000000000  R11: 0000000000000000  R12: 0000000000000001
    R13: ffff880432b13600  R14: 0000000000000001  R15: ffffffffa028db80
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #6 [ffff880430af3d98] call_refresh at ffffffffa028dbc0 [sunrpc]
 #7 [ffff880430af3db8] __rpc_execute at ffffffffa0298e37 [sunrpc]
 #8 [ffff880430af3e28] rpc_async_schedule at ffffffffa02991c5 [sunrpc]
 #9 [ffff880430af3e38] worker_thread at ffffffff8108c760
#10 [ffff880430af3ee8] kthread at ffffffff81091d66
#11 [ffff880430af3f48] kernel_thread at ffffffff8100c14a

Comment 2 Jeff Layton 2012-10-12 10:31:29 UTC

Would it be possible for you to open a support case with RH support and supply them with the core for analysis?

Comment 3 Jeff Layton 2012-10-12 17:01:27 UTC

(pasting from what was sent via email)

> > Would it be possible for you to open a support case with RH support and supply
> > them with the core for analysis?
> 
> No, but I'd be happy to supply you with the two crash dumps and/or
> perform any analysis for you locally.  New support policies make 
> community reported bugs more chalenging, eh?
> 

Yep, if you don't have a support contract then you'll need to do some
legwork on your own.

You'll want to track down the place where it crashed and see if you can determine why. Most likely, there's a corrupt pointer someplace that we ended up trying to chase. See if you can determine what was corrupt and the nature of that corruption...

Comment 4 Jeff Layton 2012-12-04 20:34:25 UTC

It's likely that this bug is a duplicate of bug 878204. Unfortunately, that bug is marked private and I can't add you to the cc list.

You may want to try pulling in commit a271c5a0de from upstream kernels and see if that fixes the issue for you. If it does, please note it here and I'll close this bug as a duplicate of that one.

I'm going to go ahead and close this bug as a dup of that one. If you find that that commit doesn't help, then please reopen this bug and I'll try to take another look.

*** This bug has been marked as a duplicate of bug 878204 ***

Note You need to log in before you can comment on or make changes to this bug.