Hide Forgot
I've been doing some work on Docker, which heavily uses containers, including net namespaces. Recently I've upgraded to F20, and I've got this panic at least 3 times now when running the docker test suite: https://www.dropbox.com/sc/y6jq2pso2lze82l/k1VfAt0wpV Some info from the backtrace typed in: BUG: unable to handler kernel paging request at ffff... 3.11.2-301.fc20.x86_64 Workqueue: netns_cleanup_net __nf_ct_ext_destroy nf_conntrack_free destroy_contrack ... Kernel panic - no syncinc: Fatal exception in interrupt
Possible fix ? http://www.spinics.net/lists/netfilter-devel/msg28026.html
(In reply to Alexander Larsson from comment #1) > Possible fix ? > http://www.spinics.net/lists/netfilter-devel/msg28026.html It's possible, yes. Here is a scratch build with the patch included. Could you please test it when it finishes building and let us know if it solves the issue for you? http://koji.fedoraproject.org/koji/taskinfo?taskID=6030994
Ok, i run hours of the docker tests with that kernel, no crash. This is no guarantee of course, but thats far much better than with the other kernels.
(In reply to Alexander Larsson from comment #3) > Ok, i run hours of the docker tests with that kernel, no crash. This is no > guarantee of course, but thats far much better than with the other kernels. Great, thanks. I'll get that patch included.
The same is happening in F19 now btw, did you get it fixed there too?
(In reply to Alexander Larsson from comment #5) > The same is happening in F19 now btw, did you get it fixed there too? Yes.
kernel-3.11.4-201.fc19 has been submitted as an update for Fedora 19. https://admin.fedoraproject.org/updates/kernel-3.11.4-201.fc19
kernel-3.11.4-101.fc18 has been submitted as an update for Fedora 18. https://admin.fedoraproject.org/updates/kernel-3.11.4-101.fc18
kernel-3.11.4-301.fc20 has been submitted as an update for Fedora 20. https://admin.fedoraproject.org/updates/kernel-3.11.4-301.fc20
Package kernel-3.11.4-201.fc19: * should fix your issue, * was pushed to the Fedora 19 testing repository, * should be available at your local mirror within two days. Update it with: # su -c 'yum update --enablerepo=updates-testing kernel-3.11.4-201.fc19' as soon as you are able to, then reboot. Please go to the following url: https://admin.fedoraproject.org/updates/FEDORA-2013-18820/kernel-3.11.4-201.fc19 then log in and leave karma (feedback).
kernel-3.11.4-301.fc20 has been pushed to the Fedora 20 stable repository. If problems still persist, please make note of it in this bug report.
kernel-3.11.4-201.fc19 has been pushed to the Fedora 19 stable repository. If problems still persist, please make note of it in this bug report.
Ugh, I got this again (same backtrace), with the kernel-3.11.4-300.1.fc20 scratch build above. I've been running this a lot though, and not previously seen it reproduce with that kernel. I wonder whats up with that. Backtrace: https://www.dropbox.com/sc/yfvcvho9099cjuw/qe4QYSEU3L
kernel-3.11.4-101.fc18 has been pushed to the Fedora 18 stable repository. If problems still persist, please make note of it in this bug report.
Got it again with 3.11.4-302.fc20
OK, reopening. Neil, any ideas on this one?
I'd try this one that made 3.13-rc1 (backtrace slightly different but Pablo had CONFIG_DEBUG_OBJECTS_FREE on in this case): commit 0c3c6c00c69649f4749642b3e5d82125fde1600c Author: Pablo Neira Ayuso <pablo> Date: Mon Nov 18 12:53:59 2013 +0100 netfilter: nf_conntrack: decrement global counter after object release nf_conntrack_free() decrements our counter (net->ct.count) before releasing the conntrack object. That counter is used in the nf_conntrack_cleanup_net_list path to check if it's time to kmem_cache_destroy our cache of conntrack objects. I think we have a race there that should be easier to trigger (although still hard) with CONFIG_DEBUG_OBJECTS_FREE as object releases become slowier according to the following splat: [ 1136.321305] WARNING: CPU: 2 PID: 2483 at lib/debugobjects.c:260 debug_print_object+0x83/0xa0() [ 1136.321311] ODEBUG: free active (active state 0) object type: timer_list hint: delayed_work_timer_fn+0x0/0x20 ... [ 1136.321390] Call Trace: [ 1136.321398] [<ffffffff8160d4a2>] dump_stack+0x45/0x56 [ 1136.321405] [<ffffffff810514e8>] warn_slowpath_common+0x78/0xa0 [ 1136.321410] [<ffffffff81051557>] warn_slowpath_fmt+0x47/0x50 [ 1136.321414] [<ffffffff812f8883>] debug_print_object+0x83/0xa0 [ 1136.321420] [<ffffffff8106aa90>] ? execute_in_process_context+0x90/0x90 [ 1136.321424] [<ffffffff812f99fb>] debug_check_no_obj_freed+0x20b/0x250 [ 1136.321429] [<ffffffff8112e7f2>] ? kmem_cache_destroy+0x92/0x100 [ 1136.321433] [<ffffffff8115d945>] kmem_cache_free+0x125/0x210 [ 1136.321436] [<ffffffff8112e7f2>] kmem_cache_destroy+0x92/0x100 [ 1136.321443] [<ffffffffa046b806>] nf_conntrack_cleanup_net_list+0x126/0x160 [nf_conntrack] [ 1136.321449] [<ffffffffa046c43d>] nf_conntrack_pernet_exit+0x6d/0x80 [nf_conntrack] [ 1136.321453] [<ffffffff81511cc3>] ops_exit_list.isra.3+0x53/0x60 [ 1136.321457] [<ffffffff815124f0>] cleanup_net+0x100/0x1b0 [ 1136.321460] [<ffffffff8106b31e>] process_one_work+0x18e/0x430 [ 1136.321463] [<ffffffff8106bf49>] worker_thread+0x119/0x390 [ 1136.321467] [<ffffffff8106be30>] ? manage_workers.isra.23+0x2a0/0x2a0 [ 1136.321470] [<ffffffff8107210b>] kthread+0xbb/0xc0 [ 1136.321472] [<ffffffff81072050>] ? kthread_create_on_node+0x110/0x110 [ 1136.321477] [<ffffffff8161b8fc>] ret_from_fork+0x7c/0xb0 [ 1136.321479] [<ffffffff81072050>] ? kthread_create_on_node+0x110/0x110 [ 1136.321481] ---[ end trace 25f53c192da70825 ]--- Reported-by: Linus Torvalds <torvalds> Signed-off-by: Pablo Neira Ayuso <pablo>
So far, unable to reproduce it locally
As per https://github.com/dotcloud/docker/issues/2960#issuecomment-33854171 this still happens in 3.13.1, which has the fix from comment 18, so that did not fix it.
Alex, can you enable kdump and try to recreate?
*********** MASS BUG UPDATE ************** We apologize for the inconvenience. There is a large number of bugs to go through and several of them have gone stale. Due to this, we are doing a mass bug update across all of the Fedora 20 kernel bugs. Fedora 20 has now been rebased to 3.13.4-200.fc20. Please test this kernel update and let us know if you issue has been resolved or if it is still present with the newer kernel. If you experience different issues, please open a new bug report for those.
I've not tried 3.13.4, but i've seen it in 3.12.10-300.fc20 (not tried later kernels yet) and others in 3.13.1, so I don't think this is fixed.
Got this in 3.13.5-202.fc20.x86_64 too.
I asked in the docker meeting today for people who have seen this, and a bunch of people had never seen it and some had. One thing that seemed to be consistent with not seeing the panic is running the kernel in a VM. So, maybe this only triggers on bare metal.
I have been encountering this same issue (LXC, non-Docker) on Amazon EC2 (see https://bugzilla.kernel.org/show_bug.cgi?id=65191). So this is definitely happening on PV as well.
*********** MASS BUG UPDATE ************** We apologize for the inconvenience. There is a large number of bugs to go through and several of them have gone stale. Due to this, we are doing a mass bug update across all of the Fedora 20 kernel bugs. Fedora 20 has now been rebased to 3.14.4-200.fc20. Please test this kernel update (or newer) and let us know if you issue has been resolved or if it is still present with the newer kernel. If you experience different issues, please open a new bug report for those.
This is still happening. This seems to be the upstream bug: https://bugzilla.kernel.org/show_bug.cgi?id=65191
Seems like this has a possible fix at: https://bugzilla.kernel.org/show_bug.cgi?id=65191
fix is now in linus tree, https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=945b2b2d259d1a4364a2799e80e8ff32f8c6ee6f
The upstream netfilter maintainer has said he has the patch queued to be sent to stable. I'll try and get it queued for the 3.15.y rebase we're working on for F20.
Added in Fedora git. F19 will pick it up via the next 3.14.y stable rebase.
kernel-3.15.3-200.fc20 has been submitted as an update for Fedora 20. https://admin.fedoraproject.org/updates/kernel-3.15.3-200.fc20
Package kernel-3.15.3-200.fc20: * should fix your issue, * was pushed to the Fedora 20 testing repository, * should be available at your local mirror within two days. Update it with: # su -c 'yum update --enablerepo=updates-testing kernel-3.15.3-200.fc20' as soon as you are able to, then reboot. Please go to the following url: https://admin.fedoraproject.org/updates/FEDORA-2014-8017/kernel-3.15.3-200.fc20 then log in and leave karma (feedback).
kernel-3.15.3-200.fc20 has been pushed to the Fedora 20 stable repository. If problems still persist, please make note of it in this bug report.
When fix will be available in rhel7/centos7? It's absent in 3.10.0-123.13.2.el7.