Bug 489521 - Disable all cpus' watchdog on error in check_nmi_watchdog()
Summary: Disable all cpus' watchdog on error in check_nmi_watchdog()
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: kernel
Version: 4.8
Hardware: x86_64
OS: Linux
low
medium
Target Milestone: rc
: ---
Assignee: Aristeu Rozanski
QA Contact: Red Hat Kernel QE team
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2009-03-10 15:27 UTC by Prarit Bhargava
Modified: 2010-06-24 21:37 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2010-06-24 21:37:02 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Prarit Bhargava 2009-03-10 15:27:25 UTC
Description of problem:

During code inspection (dzickus & myself) it was noticed that the 4.8 kernel does not disable all cpus' watchdog when nmi_watchdog == NMI_LOCAL_APIC:

int __init check_nmi_watchdog (void)
{
        int counts[NR_CPUS];
        int cpu;

        if (!atomic_read(&nmi_watchdog_active))
                return 0;

        printk(KERN_INFO "testing NMI watchdog ... ");

        for (cpu = 0; cpu < NR_CPUS; cpu++)
                counts[cpu] = cpu_pda[cpu].__nmi_count; 
        local_irq_enable();
        mdelay((10*1000)/nmi_hz); // wait 10 ticks

        for (cpu = 0; cpu < NR_CPUS; cpu++) {
                if (!cpu_online(cpu))
                        continue;
                if (!per_cpu(wd_enabled, cpu))
                        continue;

                if (cpu_pda[cpu].__nmi_count - counts[cpu] <= 5) {
                        printk("CPU#%d: NMI appears to be stuck (%d)!\n", 
                               cpu,
                               cpu_pda[cpu].__nmi_count);
                        if (atomic_dec_and_test(&nmi_watchdog_active))
                                nmi_active = 0;
                        per_cpu(wd_enabled, cpu) = 0; <<< only disables _this_ cpu's watchdog, not all of them.
                        goto error;
                }
        }
        if (!atomic_read(&nmi_watchdog_active)) {
                atomic_set(&nmi_watchdog_active, -1);
                nmi_active = -1;
                goto error;
        }

Comment 1 Don Zickus 2010-06-24 21:37:02 UTC
This is only seen in the error path and with RHEL-4 seen the end of its life soon, I don't think it is worth fixing.


Note You need to log in before you can comment on or make changes to this bug.