Red Hat Bugzilla – Full Text Bug Listing
|Summary:||RHEL6.5: kernel 2.6.32-431.el6 + openafs 126.96.36.199 panics with RIP cache_alloc_refill called from getname, names_cache corrupted|
|Product:||Red Hat Enterprise Linux 6||Reporter:||Dave Wysochanski <dwysocha>|
|Component:||kernel||Assignee:||Jeff Layton <jlayton>|
|Status:||CLOSED NOTABUG||QA Contact:||Red Hat Kernel QE team <kernel-qe>|
|Version:||6.5||CC:||dhowells, jaltman, jlayton, marc.c.dionne, rwheeler, smayhew, stephan.wiesand, steved, toracat|
|Fixed In Version:||Doc Type:||Bug Fix|
|Doc Text:||Story Points:||---|
|Last Closed:||2013-12-09 06:45:42 EST||Type:||Bug|
|oVirt Team:||---||RHEL 7.3 requirements from Atomic Host:|
Description Dave Wysochanski 2013-12-04 16:35:02 EST
Description of problem: Reports that after updating to RHEL6.5 they get repeated crashes and have halted updates. The oops looks like this, which looks like names_cache kmem slab is corrupted. ------------[ cut here ]------------ kernel BUG at mm/slab.c:3069! invalid opcode: 0000 [#1] SMP last sysfs file: /sys/devices/system/cpu/online CPU 1 Modules linked in: openafs(P)(U) ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ipv6 vsock(U) dm_mod ppdev parport_pc parport vmware_balloon sg vmci(U) i2c_piix4 i2c_core shpchp ext4 jbd2 mbcache sd_mod crc_t10dif sr_mod cdrom vmxnet3 vmw_pvscsi pata_acpi ata_generic ata_piix [last unloaded: scsi_wait_scan] Pid: 5843, comm: top Tainted: P --------------- 2.6.32-431.el6.x86_64 #1 VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform RIP: 0010:[<ffffffff8116ed24>] [<ffffffff8116ed24>] cache_alloc_refill+0x1e4/0x240 RSP: 0018:ffff88023b047e38 EFLAGS: 00010002 RAX: 000000000000000c RBX: ffff88023d820f00 RCX: 0000000000000002 RDX: 000000000000000c RSI: 0000000000000000 RDI: ffff88023fe74880 RBP: ffff88023b047e98 R08: 0000000000000000 R09: 0000000000000000 R10: 00000000000000a8 R11: 0000000000000000 R12: ffff88023fe74880 R13: ffff88023fee1b40 R14: 000000000000000c R15: ffff8802386db880 FS: 00007fdccb94f700(0000) GS:ffff880028280000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fafb4078000 CR3: 000000023834b000 CR4: 00000000000007e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process top (pid: 5843, threadinfo ffff88023b046000, task ffff88023b469500) Stack: ffff880234602c58 0000000000000000 ffff88023fee1b80 000412d0810df51d <d> ffff88023fee1b60 ffff88023fee1b50 00000001380a0cc0 00007fdccb521d18 <d> 00000000000000d0 ffff88023d820f00 00000000000000d0 0000000000000246 Call Trace: [<ffffffff8116fddf>] kmem_cache_alloc+0x15f/0x190 [<ffffffff81197007>] getname+0x47/0x240 [<ffffffff81185cf2>] do_sys_open+0x32/0x140 [<ffffffff81185e40>] sys_open+0x20/0x30 [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b Code: 89 ff e8 70 57 12 00 eb 99 66 0f 1f 44 00 00 41 c7 45 60 01 00 00 00 4d 8b 7d 20 4c 39 7d c0 0f 85 f2 fe ff ff eb 84 0f 0b eb fe <0f> 0b 66 2e 0f 1f 84 00 00 00 00 00 eb f4 8b 55 ac 8b 75 bc 31 RIP [<ffffffff8116ed24>] cache_alloc_refill+0x1e4/0x240 RSP <ffff88023b047e38> Version-Release number of selected component (if applicable): kernel-2.6.32-431.el6 How reproducible: At least one person can reproduce it. Steps to Reproduce: Unknown. Additional info: We have a vmcore. The following unsigned modules are noted in the kernel, but at this time I do not have a reason to believe they are related: openafs(P)(U) vmci(U) vsock(U) A quick look at the changelog showed the following change from RHEL6.4 to RHEL6.5 in fs/namei.c involving names_cache, so warrants further investigation: Author: Jeff Layton <email@example.com> Date: Mon Feb 18 13:33:15 2013 -0500 [fs] vfs: embed struct filename inside of names_cache allocation if possible Message-id: <firstname.lastname@example.org> Patchwork-id: 56947 O-Subject: [RHEL6.5 PATCH v3 24/55] BZ#678544: vfs: embed struct filename inside of names_cache allocation if possible Bugzilla: 678544 RH-Acked-by: J. Bruce Fields <email@example.com> In the common case where a name is much smaller than PATH_MAX, an extra allocation for struct filename is unnecessary. Before allocating a separate one, try to embed the struct filename inside the buffer first. If it turns out that that's not long enough, then fall back to allocating a separate struct filename and redoing the copy. Signed-off-by: Jeff Layton <firstname.lastname@example.org> Signed-off-by: Al Viro <email@example.com> Upstream commit: 7950e3852ab86826b7349a535d2e8b0000340d7f
Comment 3 Jeff Layton 2013-12-04 20:53:26 EST
Hmmm, I'll take that bet... openafs(P)(U) I'll wager that this module tries to do some getname() stuff, as that's common for ioctls. Did they rebuild their modules when they updated to 6.5?
Comment 5 Jeff Layton 2013-12-05 05:38:10 EST
Yeah, looking at the upstream openafs repo here, there was some work done around a year ago to account for the rework of the getname/putname API in mainline kernels: http://git.openafs.org/?p=openafs.git;a=summary My guess would be that their openafs module needs to be patched and rebuilt to account for the same change in 6.5.
Comment 6 Dave Wysochanski 2013-12-05 07:08:10 EST
(In reply to Jeff Layton from comment #5) > Yeah, looking at the upstream openafs repo here, there was some work done > around a year ago to account for the rework of the getname/putname API in > mainline kernels: > > http://git.openafs.org/?p=openafs.git;a=summary > > My guess would be that their openafs module needs to be patched and rebuilt > to account for the same change in 6.5. Thanks Jeff!
Comment 10 Dave Wysochanski 2013-12-05 13:32:08 EST
In this case it is openafs 188.8.131.52-1 being run, and this is very recent, and should include proper fixes for the changes that went into 3.7 So I'm not sure openafs is related here but it still may be at least a contributing factor. http://git.openafs.org/?p=openafs.git;a=commit;h=331f439a25810c3031cb4edb9dcb0afae6039145 $ git log --oneline | head -1 331f439 Update NEWS for 184.108.40.206 $ git log --oneline | grep putname c21fded Linux: change test for new putname API cf33252 Linux: fix afs_putname wrapper for pre-3.7 kernels 5aae6e0 Linux 3.7: putname is no longer exported ...
Comment 13 Jeff Layton 2013-12-05 14:13:00 EST
Created attachment 833301 [details] patch -- openafs: fix the afs_putname definition when STRUCT_FILENAME_HAS_NAME is defined Looks like this is due to bad putname handling in the openafs code. Their code uses getname() to copy the string from userland, but then uses afs_putname to put it. They have afs_putname() defined wrong and it's causing a double-free on the memory when auditing is enabled. This patch will likely fix it. I don't have much insight into openafs development, so feel free to pass this patch on to them if it'll help.
Comment 14 Jeff Layton 2013-12-05 14:26:37 EST
I tracked down the openafs-devel mailing list and sent them the patch. I'm not a subscriber to the list though, so we'll have to wait for the moderator to approve it.
Comment 15 Jeffrey Altman 2013-12-05 20:39:48 EST
Jeff, The best option for submitting patches to OpenAFS is to use http://gerrit.openafs.org/ Alternatively, patches can be sent to firstname.lastname@example.org which will open a ticket in the OpenAFS Request Tracker. Thanks. Jeffrey Altman OpenAFS Gatekeeper
Comment 16 Jeff Layton 2013-12-06 06:09:41 EST
Thanks, I'll keep that in mind for the future... For now, I don't think that patch will help you since putname isn't exported in mainline kernels now. I'm also going to propose a patch upstream soon (once I have a chance to test it) that will unexport getname. I think what would probably be best for openafs would be to just make a afs_getname that does a names_cache allocation, strncpy_from_user into it and return that (with proper error handling of course). It looks like all your code cares about is the string anyway. Then you can just keep afs_putname doing the kmem_cache_free and all will be well.
Comment 17 Jeff Layton 2013-12-06 07:28:06 EST
I'll leave this bug open for now in case you have more questions, but we'll plan to eventually close it as NOTABUG.
Comment 18 Stephan Wiesand 2013-12-06 11:08:27 EST
Thanks. Will this break any system running the 6.5 kernel? Or is some special configuration required to trigger the problem? Any comments on http://gerrit.openafs.org/10545 would be most welcome.
Comment 19 Jeff Layton 2013-12-06 11:38:03 EST
You need to have syscall auditing enabled in order to hit it. As far as the patch goes... That new afs_getname function looks unnecessarily complicated. AFAICT, none of your callers of getname actually do anything with the struct filename. I think you just need to do a PATH_MAX allocation out of some slab (names_cachep or your own, or just kmalloc it). Then strncpy_from_user into that. Then when you do your afs_putname, free it appropriately. At that point you'll have a kernel char * pointer that you can pass to functions that just want the name. I also wouldn't worry about ifdef'ing any of that stuff for particular kernel versions. Just make the code unconditionally use the new routines. With that you can also get rid of afs_name_to_string() since you know that there will never be a struct filename involved.
Comment 20 Jeff Layton 2013-12-06 11:46:43 EST
...oh and that function seems a little large to be a static inline, doesn't it?
Comment 21 Jeff Layton 2013-12-06 12:00:08 EST
Created attachment 833684 [details] patch -- stop trying to use getname/putname Maybe something like this patch instead? Note that it prob needs cleanup -- indentation doesn't follow the openafs style for instance. You may also want to make your own slabcache as someone in your gerrit tool mentioned.
Comment 22 Jeff Layton 2013-12-09 06:45:42 EST
At this point, I think the openafs folks have a handle on the problem, so I'll go ahead and close this as NOTABUG. Please reopen it if we need to discuss it further.