Red Hat Bugzilla – Bug 1038315
RHEL6.5: kernel 2.6.32-431.el6 + openafs 184.108.40.206 panics with RIP cache_alloc_refill called from getname, names_cache corrupted
Last modified: 2014-06-18 03:43:38 EDT
Description of problem:
Reports that after updating to RHEL6.5 they get repeated crashes and have halted updates. The oops looks like this, which looks like names_cache kmem slab is corrupted.
------------[ cut here ]------------
kernel BUG at mm/slab.c:3069!
invalid opcode: 0000 [#1] SMP
last sysfs file: /sys/devices/system/cpu/online
Modules linked in: openafs(P)(U) ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ipv6 vsock(U) dm_mod ppdev parport_pc parport vmware_balloon sg vmci(U) i2c_piix4 i2c_core shpchp ext4 jbd2 mbcache sd_mod crc_t10dif sr_mod cdrom vmxnet3 vmw_pvscsi pata_acpi ata_generic ata_piix [last unloaded: scsi_wait_scan]
Pid: 5843, comm: top Tainted: P --------------- 2.6.32-431.el6.x86_64 #1 VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform
RIP: 0010:[<ffffffff8116ed24>] [<ffffffff8116ed24>] cache_alloc_refill+0x1e4/0x240
RSP: 0018:ffff88023b047e38 EFLAGS: 00010002
RAX: 000000000000000c RBX: ffff88023d820f00 RCX: 0000000000000002
RDX: 000000000000000c RSI: 0000000000000000 RDI: ffff88023fe74880
RBP: ffff88023b047e98 R08: 0000000000000000 R09: 0000000000000000
R10: 00000000000000a8 R11: 0000000000000000 R12: ffff88023fe74880
R13: ffff88023fee1b40 R14: 000000000000000c R15: ffff8802386db880
FS: 00007fdccb94f700(0000) GS:ffff880028280000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fafb4078000 CR3: 000000023834b000 CR4: 00000000000007e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process top (pid: 5843, threadinfo ffff88023b046000, task ffff88023b469500)
ffff880234602c58 0000000000000000 ffff88023fee1b80 000412d0810df51d
<d> ffff88023fee1b60 ffff88023fee1b50 00000001380a0cc0 00007fdccb521d18
<d> 00000000000000d0 ffff88023d820f00 00000000000000d0 0000000000000246
Code: 89 ff e8 70 57 12 00 eb 99 66 0f 1f 44 00 00 41 c7 45 60 01 00 00 00 4d 8b 7d 20 4c 39 7d c0 0f 85 f2 fe ff ff eb 84 0f 0b eb fe <0f> 0b 66 2e 0f 1f 84 00 00 00 00 00 eb f4 8b 55 ac 8b 75 bc 31
RIP [<ffffffff8116ed24>] cache_alloc_refill+0x1e4/0x240
Version-Release number of selected component (if applicable):
At least one person can reproduce it.
Steps to Reproduce:
We have a vmcore. The following unsigned modules are noted in the kernel, but at this time I do not have a reason to believe they are related:
openafs(P)(U) vmci(U) vsock(U)
A quick look at the changelog showed the following change from RHEL6.4 to RHEL6.5 in fs/namei.c involving names_cache, so warrants further investigation:
Author: Jeff Layton <firstname.lastname@example.org>
Date: Mon Feb 18 13:33:15 2013 -0500
[fs] vfs: embed struct filename inside of names_cache allocation if possible
O-Subject: [RHEL6.5 PATCH v3 24/55] BZ#678544: vfs: embed struct filename inside of names_cache allocation if possible
RH-Acked-by: J. Bruce Fields <email@example.com>
In the common case where a name is much smaller than PATH_MAX, an extra
allocation for struct filename is unnecessary. Before allocating a
separate one, try to embed the struct filename inside the buffer first.
If it turns out that that's not long enough, then fall back to allocating
a separate struct filename and redoing the copy.
Signed-off-by: Jeff Layton <firstname.lastname@example.org>
Signed-off-by: Al Viro <email@example.com>
Upstream commit: 7950e3852ab86826b7349a535d2e8b0000340d7f
Hmmm, I'll take that bet...
I'll wager that this module tries to do some getname() stuff, as that's common for ioctls.
Did they rebuild their modules when they updated to 6.5?
Yeah, looking at the upstream openafs repo here, there was some work done around a year ago to account for the rework of the getname/putname API in mainline kernels:
My guess would be that their openafs module needs to be patched and rebuilt to account for the same change in 6.5.
(In reply to Jeff Layton from comment #5)
> Yeah, looking at the upstream openafs repo here, there was some work done
> around a year ago to account for the rework of the getname/putname API in
> mainline kernels:
> My guess would be that their openafs module needs to be patched and rebuilt
> to account for the same change in 6.5.
In this case it is openafs 220.127.116.11-1 being run, and this is very recent, and should include proper fixes for the changes that went into 3.7
So I'm not sure openafs is related here but it still may be at least a contributing factor.
$ git log --oneline | head -1
331f439 Update NEWS for 18.104.22.168
$ git log --oneline | grep putname
c21fded Linux: change test for new putname API
cf33252 Linux: fix afs_putname wrapper for pre-3.7 kernels
5aae6e0 Linux 3.7: putname is no longer exported
Created attachment 833301 [details]
patch -- openafs: fix the afs_putname definition when STRUCT_FILENAME_HAS_NAME is defined
Looks like this is due to bad putname handling in the openafs code. Their code uses getname() to copy the string from userland, but then uses afs_putname to put it.
They have afs_putname() defined wrong and it's causing a double-free on the memory when auditing is enabled. This patch will likely fix it. I don't have much insight into openafs development, so feel free to pass this patch on to them if it'll help.
I tracked down the openafs-devel mailing list and sent them the patch. I'm not a subscriber to the list though, so we'll have to wait for the moderator to approve it.
The best option for submitting patches to OpenAFS is to use
Alternatively, patches can be sent to firstname.lastname@example.org which will open a ticket in the OpenAFS Request Tracker.
Thanks, I'll keep that in mind for the future...
For now, I don't think that patch will help you since putname isn't exported in mainline kernels now. I'm also going to propose a patch upstream soon (once I have a chance to test it) that will unexport getname.
I think what would probably be best for openafs would be to just make a afs_getname that does a names_cache allocation, strncpy_from_user into it and return that (with proper error handling of course). It looks like all your code cares about is the string anyway. Then you can just keep afs_putname doing the kmem_cache_free and all will be well.
I'll leave this bug open for now in case you have more questions, but we'll plan to eventually close it as NOTABUG.
Will this break any system running the 6.5 kernel? Or is some special configuration required to trigger the problem?
Any comments on http://gerrit.openafs.org/10545 would be most welcome.
You need to have syscall auditing enabled in order to hit it. As far as the patch goes...
That new afs_getname function looks unnecessarily complicated. AFAICT, none of your callers of getname actually do anything with the struct filename. I think you just need to do a PATH_MAX allocation out of some slab (names_cachep or your own, or just kmalloc it). Then strncpy_from_user into that. Then when you do your afs_putname, free it appropriately.
At that point you'll have a kernel char * pointer that you can pass to functions that just want the name.
I also wouldn't worry about ifdef'ing any of that stuff for particular kernel versions. Just make the code unconditionally use the new routines. With that you can also get rid of afs_name_to_string() since you know that there will never be a struct filename involved.
...oh and that function seems a little large to be a static inline, doesn't it?
Created attachment 833684 [details]
patch -- stop trying to use getname/putname
Maybe something like this patch instead? Note that it prob needs cleanup -- indentation doesn't follow the openafs style for instance.
You may also want to make your own slabcache as someone in your gerrit tool mentioned.
At this point, I think the openafs folks have a handle on the problem, so I'll go ahead and close this as NOTABUG. Please reopen it if we need to discuss it further.