Bug 501178
Summary: | RHEL5: NMI lockups seen after enabling cpuspeed on -147.el5 & -148.el5 | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 5 | Reporter: | Prarit Bhargava <prarit> |
Component: | kernel | Assignee: | Prarit Bhargava <prarit> |
Status: | CLOSED ERRATA | QA Contact: | Red Hat Kernel QE team <kernel-qe> |
Severity: | high | Docs Contact: | |
Priority: | high | ||
Version: | 5.4 | CC: | agospoda, dzickus, emcnabb, jburke, mgahagan, prarit, pzijlstr |
Target Milestone: | beta | Keywords: | Regression |
Target Release: | --- | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2009-09-02 08:34:04 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Prarit Bhargava
2009-05-17 15:07:53 UTC
Can be replicated using kernel-PAE...-148.el5 & manually starting the cpuspeed service. P. pzijlstr -- if you decide to use the system to debug, I have *disabled* cpuspeed in chkconfig. You have to start it manually. FYI, P. Can be reproduced using kernel-2.6.18-148.el5, and manually restarting cpuspeed service or loading the powernow-k8 module. P. There are reports of slow-downs during boot (see comment #10 in BZ 499999). I don't know if this is related to this BZ, but it definitely could be. P. Backing out cca0fdba55f43de5eed0d60e257c4c17d3e33396 (and making minor changes to set_cyc2ns_scale() ) fix the issue -- ie) the above change is responsible for the strange behavior. ... this feels like a type-casting issue ... P. From the previous patch: @@ -5104,7 +5338,8 @@ void __cpuinit init_idle(struct task_struct *idle, int cpu) idle->cpus_allowed = cpumask_of_cpu(cpu); set_task_cpu(idle, cpu); - spin_lock_irqsave(&rq->lock, flags); + spin_lock(&rq->lock); + rq->curr = rq->idle = idle; #if defined(CONFIG_SMP) && defined(__ARCH_WANT_UNLOCKED_CTXSW) idle->oncpu = 1; Not 100% sure if this is it but this looks wrong. This changes the lock acquisition to spin_lock ... but the "free'ing" of the lock is still spin_lock_irqrestore, which would use a bogus flags value. (Unless I'm reading the patch wrong) P. Prarit, Your observation in comment #6 might be correct, but the only real difference between spin_lock and spin_lock_irqsave is the call to local_irq_save that is added. Since a call to that is done at the beginning of init_idle it might not be needed. It still seems a bit odd that there is a gap between when interrupts are saved and when the lock is taken (and I don't really see that anywhere upstream). The closest thing out there that is trying to emulate this might be here: commit 5cbd54ef470d880fc37fbe4b21eb514806d51e0d Author: Ingo Molnar <mingo> Date: Wed Nov 12 20:05:50 2008 +0100 sched: fix init_idle()'s use of sched_clock() I'm still not sure this is critical, but it could be interesting to replace this hunk in cca0fdba55f43de5eed0d60e257c4c17d3e33396 @@ -5096,7 +5328,9 @@ void __cpuinit init_idle(struct task_struct *idle, int cpu) struct rq *rq = cpu_rq(cpu); unsigned long flags; - idle->timestamp = sched_clock(); + local_irq_save(flags); + + idle->timestamp = sched_clock_local(); idle->sleep_avg = 0; idle->array = NULL; idle->prio = idle->normal_prio = MAX_PRIO; @@ -5104,7 +5338,8 @@ void __cpuinit init_idle(struct task_struct *idle, int cpu) idle->cpus_allowed = cpumask_of_cpu(cpu); set_task_cpu(idle, cpu); - spin_lock_irqsave(&rq->lock, flags); + spin_lock(&rq->lock); + rq->curr = rq->idle = idle; #if defined(CONFIG_SMP) && defined(__ARCH_WANT_UNLOCKED_CTXSW) idle->oncpu = 1; with something like this: @@ -5096,7 +5328,9 @@ void __cpuinit init_idle(struct task_struct *idle, int cpu) struct rq *rq = cpu_rq(cpu); unsigned long flags; - idle->timestamp = sched_clock(); + spin_lock_irqsave(&rq->lock, flags); + + idle->timestamp = sched_clock_local(); idle->sleep_avg = 0; idle->array = NULL; idle->prio = idle->normal_prio = MAX_PRIO; @@ -5104,7 +5338,8 @@ void __cpuinit init_idle(struct task_struct *idle, int cpu) idle->cpus_allowed = cpumask_of_cpu(cpu); set_task_cpu(idle, cpu); - spin_lock_irqsave(&rq->lock, flags); - rq->curr = rq->idle = idle; #if defined(CONFIG_SMP) && defined(__ARCH_WANT_UNLOCKED_CTXSW) idle->oncpu = 1; Just a thought from a casual observer.... Its as Andy writes, I simply moved the irq_save() up a few lines, since sched_clock_local() wants IRQs disabled, but assigning stuff to idle doesn't require rq->lock. That said, reverting this patch removes the whole call path observed in the NMI watchdog: [<c043800f>] ktime_get_ts+0x16/0x44 [<c0420c8c>] get_monotonic_time+0xe/0x1b [<c0420d15>] sched_clock_tick+0x7c/0xb6 [<c0420d80>] scheduler_tick+0x31/0x340 [<c0621d73>] schedule+0xa1f/0xaab [<c042d118>] update_process_times+0x57/0x62 [<c041788d>] smp_apic_timer_interrupt+0x66/0x75 [<c0405aee>] apic_timer_interrupt+0x2a/0x30 [<c0403bf3>] default_idle+0x3e/0x6d [<c0403bf5>] default_idle+0x40/0x6d [<c0403cc6>] cpu_idle+0xa4/0xbe As it includes adding sched_clock_tick() to scheduler_tick(). Now it would be interesting to see where: c043800f is in the code, would anybody be having the debuginfo of that particular kernel build handy? I'm betting that is right smack in the middle of that xtime_lock loop. Which would mean I missed a path where we're holding xtime_lock over update_process_times()... i386-SMP, right? /me goes look. Peter, yup -- i386 smp (well ... i686.config ;) ). P. Peter, From self-compiled 148.el5: crash> dis c043800f 0xc043800f <ktime_get_ts+0x16>: mov 0xc0854320,%edx crash> dis -l ktime_get_ts /home/kernel/kernel/hrtimer.c: 109 0xc0437ff9 <ktime_get_ts>: push %esi 0xc0437ffa <ktime_get_ts+0x1>: mov %eax,%esi 0xc0437ffc <ktime_get_ts+0x3>: push %ebx include/linux/seqlock.h: 88 0xc0437ffd <ktime_get_ts+0x4>: mov 0xc0749c80,%ebx include/linux/seqlock.h: 89 0xc0438003 <ktime_get_ts+0xa>: lfence 0xc0438006 <ktime_get_ts+0xd>: data16 0xc0438007 <ktime_get_ts+0xe>: nop /home/kernel/kernel/hrtimer.c: 115 0xc0438008 <ktime_get_ts+0xf>: mov %esi,%eax 0xc043800a <ktime_get_ts+0x11>: call 0xc042bf5e <getnstimeofday> /home/kernel/kernel/hrtimer.c: 116 0xc043800f <ktime_get_ts+0x16>: mov 0xc0854320,%edx 0xc0438015 <ktime_get_ts+0x1c>: mov 0xc0854324,%ecx include/linux/seqlock.h: 103 0xc043801b <ktime_get_ts+0x22>: lfence 0xc043801e <ktime_get_ts+0x25>: data16 0xc043801f <ktime_get_ts+0x26>: nop /home/kernel/kernel/hrtimer.c: 118 0xc0438020 <ktime_get_ts+0x27>: mov %ebx,%eax 0xc0438022 <ktime_get_ts+0x29>: and $0x1,%ebx 0xc0438025 <ktime_get_ts+0x2c>: xor 0xc0749c80,%eax 0xc043802b <ktime_get_ts+0x32>: or %ebx,%eax 0xc043802d <ktime_get_ts+0x34>: jne 0xc0437ffd <ktime_get_ts+0x4> /home/kernel/kernel/hrtimer.c: 120 0xc043802f <ktime_get_ts+0x36>: add 0x4(%esi),%ecx 0xc0438032 <ktime_get_ts+0x39>: mov %esi,%eax 0xc0438034 <ktime_get_ts+0x3b>: add (%esi),%edx /home/kernel/kernel/hrtimer.c: 122 0xc0438036 <ktime_get_ts+0x3d>: pop %ebx 0xc0438037 <ktime_get_ts+0x3e>: pop %esi /home/kernel/kernel/hrtimer.c: 120 0xc0438038 <ktime_get_ts+0x3f>: jmp 0xc0428b04 <set_normalized_timespec> From IRC chat with Peter: <prarit> Okay -- I think I may have found it. Hold on a few minutes and let me re-review. <prarit> time_cpufreq_notifier <prarit> -> acquires xtime_lock via write_seqlock_irq(&xtime_lock); <prarit> -> set_cyc2ns_scale <prarit> -> sched_clock_idle_wakeup_event <prarit> -> sched_clock_tick <prarit> -> get_monotonic_time <prarit> -> ktime_get_ts <prarit> which attempts a <prarit> seq = read_seqbegin(&xtime_lock) DEADLOCK. The write_seqlock_irq() actually *acquires* the lock via a spin_lock() call. read_seqbegin() only checks to see if the lock is free and barrels on through using a sequence # to determine if any changes to xtime have been made. P. This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release. in kernel-2.6.18-150.el5 You can download this test kernel from http://people.redhat.com/dzickus/el5 Please do NOT transition this bugzilla state to VERIFIED until our QE team has sent specific instructions indicating when to do so. However feel free to provide a comment indicating that this fix has been verified. An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2009-1243.html |