Bug 427998 - RHEL4: Can enter no tick idle mode with RCU pending leading to hang
RHEL4: Can enter no tick idle mode with RCU pending leading to hang
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: kernel-xen (Show other bugs)
4.6
All Linux
low Severity low
: rc
: ---
Assigned To: Andrew Jones
Virtualization Bugs
:
Depends On:
Blocks: 458302
  Show dependency treegraph
 
Reported: 2008-01-08 11:19 EST by Ian Campbell
Modified: 2011-02-16 11:03 EST (History)
5 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2011-02-16 11:03:26 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:


Attachments (Terms of Use)
git 677517771b7b6efaf8617e70f655b16f3cafcc9b backported to 2.6.9-67.EL (2.75 KB, patch)
2008-01-08 11:21 EST, Ian Campbell
no flags Details | Diff
Git 986733e01d258c26107f1da9d8d47c718349ad2f backported to 2.6.9-67.EL (2.29 KB, patch)
2008-01-08 11:21 EST, Ian Campbell
no flags Details | Diff
xen-unstable.hg 10327:c230dbe793d6 backported to 2.6.9-67.EL (1.90 KB, patch)
2008-01-08 11:22 EST, Ian Campbell
no flags Details | Diff
xen-unstable.hg 10532:4b45f7f62dc7 backported to 2.6.9-67.EL (1.36 KB, patch)
2008-01-08 11:23 EST, Ian Campbell
no flags Details | Diff

  None (edit)
Description Ian Campbell 2008-01-08 11:19:37 EST
We have found that the 2.6.9-67.ELxen kernel can occasionally enter tickless
mode when an RCU is pending. We've mainly noticed it very early on at start of
day or late during shutdown when there isn't much other activity going on. 
When this triggers it is usually in synchronize_kernel() which means the guest
essentially hangs until some external event (e.g. a SysRQ) unwedges it. 

Usually we see it when loading/unloading iptables modules during startup or
shutdown. i.e.
modprobe      D C02AB810  2932  4668   4630                     (NOTLB)
dc939ed4 00000286 dc90d2c0 c02ab810 dfca2da4 c1410320 00004a9a 6bc82a97 
       00000423 c16610e0 df76e170 df76e2dc c01c2e62 dc939f38 dc939f38 dc939ec8 
       c026fdac c027a631 00000000 dc939f38 dc939f3c dc939f38 dc939ef0 dc939f28 
Call Trace:
 [<c01c2e62>] alloc_layer+0x3a/0x40
 [<c026fdac>] __cond_resched+0x14/0x3c
 [<c026f7dd>] wait_for_completion+0x9c/0xd3
 [<c01185cb>] default_wake_function+0x0/0x12
 [<c01185cb>] default_wake_function+0x0/0x12
 [<c01224bd>] unregister_proc_table+0x38/0x69
 [<c012d547>] synchronize_kernel+0x41/0x46
 [<c012d4fa>] wakeme_after_rcu+0x0/0xc
 [<e091e77a>] init_or_cleanup+0x18f/0x20b [ip_conntrack]
 [<e09219c1>] fini+0x7/0x9 [ip_conntrack]
 [<c01325d0>] sys_delete_module+0x13e/0x187
 [<c014f57d>] do_munmap+0x11d/0x129
 [<c014f5d1>] sys_munmap+0x48/0x63
 [<c010740f>] syscall_call+0x7/0xb

The callchain here isn't especially clear, I believe it is something like
sys_delete_module -> fini -> init_or_cleanup -> nf_unregister_hook ->
synchronize_net -> synchronize_kernel. The last few links are optimized into
tailcalls which is why they don't appear in the trace.

It is very tricky to reproduce since it reproduces very rarely, we mainly see it
during our automated testing. About the only way I've found is a reboot loop and
an aweful lot of patience.

The fix is xen-unstable.hg 10327:c230dbe793d6 and 10532:4b45f7f62dc7 which in
turn require git 677517771b7b6efaf8617e70f655b16f3cafcc9b and
986733e01d258c26107f1da9d8d47c718349ad2f.

http://xenbits.xensource.com/xen-unstable.hg?rev/c230dbe793d6
http://xenbits.xensource.com/xen-unstable.hg?rev/4b45f7f62dc7
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=986733e01d258c26107f1da9d8d47c718349ad2f
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=677517771b7b6efaf8617e70f655b16f3cafcc9b
Comment 1 Ian Campbell 2008-01-08 11:21:19 EST
Created attachment 291063 [details]
git 677517771b7b6efaf8617e70f655b16f3cafcc9b backported to 2.6.9-67.EL
Comment 2 Ian Campbell 2008-01-08 11:21:55 EST
Created attachment 291064 [details]
Git 986733e01d258c26107f1da9d8d47c718349ad2f backported to 2.6.9-67.EL
Comment 3 Ian Campbell 2008-01-08 11:22:40 EST
Created attachment 291065 [details]
xen-unstable.hg 10327:c230dbe793d6 backported to 2.6.9-67.EL
Comment 4 Ian Campbell 2008-01-08 11:23:15 EST
Created attachment 291066 [details]
xen-unstable.hg 10532:4b45f7f62dc7 backported to 2.6.9-67.EL
Comment 5 Ian Campbell 2008-01-08 11:24:18 EST
Patches apply in the order:
git-677517771b7b6efaf8617e70f655b16f3cafcc9b
git-986733e01d258c26107f1da9d8d47c718349ad2f
xen-unstable-10327-c230dbe793d6
xen-unstable-10532-4b45f7f62dc7

They are against 2.6.9-67.0.1.EL not -67.EL as I said above.
Comment 6 Don Dutile 2008-03-06 17:55:03 EST
Ian,

Why is the 4th patch needed.
It states that a problem exists for a dom0 hang, but
rhel4-xenU is a domU-only kernel.

Comment 7 Ian Campbell 2008-03-14 10:28:51 EDT
Sorry for the delay responding. I could have sworn I replied to this but I
must've forgotten to hit Submit or something.

The problem was initially noticed in domain 0 in different circumstances to
reported here (all I know about it is what is given in Ack's commit message).

The problem reported here was subsequently seen in domainU and the fix turned
out (coincidentally) to be the same. The issue is that a domain can go tickless
either with RCU events or timers pending. In the later case
next_timer_interrupt() returns a time in the recent past hence the changes to 10532.

The original upstream patch at
http://xenbits.xensource.com/xen-unstable.hg?rev/4b45f7f62dc7 has an additional
hunk which I dropped because hrtimers aren't relevant to RHEL4 but the comment
in that same hunk is probably useful actually:
++	/*
++	 * If timers are pending, "expires" will be in the recent past
++	 * of "jiffies". If there are no hr_timers registered, "hr_expires"
++	 * will be "jiffies + MAX_JIFFY_OFFSET"; this is *just* short of being
++	 * considered to be before "jiffies". This makes it very likely that
++	 * "hr_expires" *will* be considered to be before "expires".
++	 * So we must check when there are pending timers (expires <= jiffies)
++	 * to ensure that we don't accidently tell the caller that there is
++	 * nothing scheduled until half an epoch (MAX_JIFFY_OFFSET)!
++	 */

Now that I look again it's possible that I am mistaken and that without hrtimers
the remaining hunk isn't needed either. Our testing has always included all 4 of
the patches so I'd be reluctant to say that it definately isn't required.
Comment 11 Paolo Bonzini 2009-06-23 03:22:33 EDT
The fourth patch is wrong, it may use j uninitialized:

	/* Leave ourselves in tick mode if rcu or softirq or timer pending. */
	if (rcu_needs_cpu(cpu) || local_softirq_pending() ||
	    (j = next_timer_interrupt(), time_before_eq(j, jiffies))) {
 		cpu_clear(cpu, nohz_cpu_mask);
 		j = jiffies + 1;
 	}
 
 	if (HYPERVISOR_set_timer_op(jiffies_to_st(j)) != 0)

I'll work on a fix to post upstream.
Comment 12 Chris Lalancette 2009-06-23 03:41:20 EDT
(In reply to comment #11)
> The fourth patch is wrong, it may use j uninitialized:
> 
>  /* Leave ourselves in tick mode if rcu or softirq or timer pending. */
>  if (rcu_needs_cpu(cpu) || local_softirq_pending() ||
>      (j = next_timer_interrupt(), time_before_eq(j, jiffies))) {
>    cpu_clear(cpu, nohz_cpu_mask);
>    j = jiffies + 1;
>   }
> 
>   if (HYPERVISOR_set_timer_op(jiffies_to_st(j)) != 0)

How so?  If rcu_needs_cpu(cpu) or local_softirq_pending() is true, then we enter the if block and set j = jiffies + 1.  If both of those are false, then we enter the third || condition and set j = next_timer_interrupt().  So how can j be uninitialized?

Chris Lalancette
Comment 13 Paolo Bonzini 2009-06-23 05:29:19 EDT
Oops, tricky...  Well, if upstream does it this way I guess we have to do the same.
Comment 15 Andrew Jones 2009-07-01 14:18:16 EDT
This is a difficult bug to recreate, but the proposed patches have been integrated into a test build at http://people.redhat.com/drjones/virttest/1-2/. The build is available for anyone who has seen the bug and would like to test the patches to see if it goes away.
Comment 20 RHEL Product and Program Management 2010-10-12 13:51:32 EDT
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.
Comment 21 Vivek Goyal 2010-10-13 12:11:19 EDT
Committed in 89.42.EL . RPMS are available at http://people.redhat.com/vgoyal/rhel4/
Comment 23 Binbin Yu 2011-01-13 00:30:46 EST
with host: xen-3.0.3-120-x86_64.el5 kernel-xen-2.6.18-238.el5
guest: kernel-2.6.9-94.ELxen, 64bit
[1] no call trace after 2000 times reboot with iptables enabled
[2] code sanity check is ok, patch is applied successfully  

so change this to verified.
Comment 24 errata-xmlrpc 2011-02-16 11:03:26 EST
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2011-0263.html

Note You need to log in before you can comment on or make changes to this bug.