Red Hat Bugzilla – Bug 644903
Kernel divide by zero in find_busiest_group
Last modified: 2012-04-19 04:56:27 EDT
Description of problem: When the laptop was unattended this weekend it rebooted. I found the crash information about this. The crash output indicates that it was due to a divide by zero in find_busiest_group. Version-Release number of selected component (if applicable): kernel-2.6.32-71.el6.x86_64 firefox-3.6.9-2.el6.x86_64 How reproducible: Unknown Steps to Reproduce: 1. Unknown Actual results: Panic and reboot Expected results: No crash Additional info: This GDB was configured as "x86_64-unknown-linux-gnu"... KERNEL: /usr/lib/debug/lib/modules/2.6.32-71.el6.x86_64/vmlinux DUMPFILE: /var/crash/127.0.0.1-2010-10-16-05:03:17/vmcore [PARTIAL DUMP] CPUS: 4 DATE: Sat Oct 16 05:03:11 2010 UPTIME: 8 days, 07:35:47 LOAD AVERAGE: 0.16, 0.17, 0.11 TASKS: 449 NODENAME: cannondale RELEASE: 2.6.32-71.el6.x86_64 VERSION: #1 SMP Wed Sep 1 01:33:01 EDT 2010 MACHINE: x86_64 (2659 Mhz) MEMORY: 3.8 GB PANIC: "" PID: 12405 COMMAND: "firefox" TASK: ffff88010ed54080 [THREAD_INFO: ffff880105174000] CPU: 2 STATE: TASK_INTERRUPTIBLE (PANIC) crash> bt PID: 12405 TASK: ffff88010ed54080 CPU: 2 COMMAND: "firefox" #0 [ffff880105175670] machine_kexec at ffffffff8103695b #1 [ffff8801051756d0] crash_kexec at ffffffff810b8f08 #2 [ffff8801051757a0] oops_end at ffffffff814cbbd0 #3 [ffff8801051757d0] die at ffffffff8101733b #4 [ffff880105175800] do_trap at ffffffff814cb4a4 #5 [ffff880105175860] do_divide_error at ffffffff810150af #6 [ffff880105175900] divide_error at ffffffff81013efb [exception RIP: find_busiest_group+1388] RIP: ffffffff81062e9c RSP: ffff8801051759b8 RFLAGS: 00010046 RAX: 0000000000000000 RBX: ffff880105175bc4 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000004 RDI: 0000000000000004 RBP: ffff880105175b38 R8: ffff880028310850 R9: 0000000000000000 R10: 0000000000000000 R11: 0000000000000001 R12: 00000000ffffff01 R13: 0000000000016980 R14: ffffffffffffffff R15: 0000000000000000 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #7 [ffff8801051759b0] find_busiest_group at ffffffff81062b84 #8 [ffff880105175b40] thread_return at ffffffff814c864a #9 [ffff880105175c00] futex_wait_queue_me at ffffffff810a25a9 #10 [ffff880105175c40] futex_wait at ffffffff810a3678 #11 [ffff880105175dc0] do_futex at ffffffff810a4dc1 #12 [ffff880105175ef0] sys_futex at ffffffff810a581b #13 [ffff880105175f80] system_call_fastpath at ffffffff81013172 RIP: 00000032a4a0b7a9 RSP: 00007fb5bf9fec98 RFLAGS: 00010202 RAX: 00000000000000ca RBX: ffffffff81013172 RCX: 0000000000000000 RDX: 0000000001ce4255 RSI: 0000000000000189 RDI: 00007fb5c4a333cc RBP: 00007fb5cb8e9690 R8: 00007fb5cb8e9690 R9: 00000000ffffffff R10: 00007fb5bf9fed20 R11: 0000000000000206 R12: 0000000000000000 R13: 0000000000000000 R14: 00007fb5bf9fed20 R15: 0000000001ce4255 ORIG_RAX: 00000000000000ca CS: 0033 SS: 002b crash>
Not sure exacly how this happened but we should never blindly divide in the kernel without making sure the denominator is not zero! diff --git a/kernel/sched.c b/kernel/sched.c index f8e5a25..df7753d 100644 --- a/kernel/sched.c +++ b/kernel/sched.c @@ -4162,7 +4162,8 @@ find_busiest_group(struct sched_domain *sd, int this_cpu, if (sds.this_load >= sds.max_load) goto out_balanced; - sds.avg_load = (SCHED_LOAD_SCALE * sds.total_load) / sds.total_pwr; + sds.avg_load = (SCHED_LOAD_SCALE * sds.total_load) / + (sds.total_pwr?sds.total_pwr:1); if (sds.this_load >= sds.avg_load) goto out_balanced;
After talking with Peterz & Ingo about this, the problem seems to be scale_rt_power() can return a negative value to update_cpu_power() which sets the sdg->cpu_power. This potentially allows the arithmetic sum of all sdg->cpu_power within a schedule domain(total_pwr) to be zero. Finally find_busiest_group() uses total_pwr as the denominator when calculating the load average: sds.avg_load = (SCHED_LOAD_SCALE * sds.total_load) / sds.total_pwr; This fix is part of commit aa483808516ca5cacfa0e5849691f64fec25828e but we cant include all of it because it relies on other commits that really break the kABI. So we are only including the hunk that prevents scale_rt_power() from returning a negative value. -------------------------------------------------------------------------------------------------- commit aa483808516ca5cacfa0e5849691f64fec25828e Author: Venkatesh Pallipadi <venki@google.com> Date: Mon Oct 4 17:03:22 2010 -0700 sched: Remove irq time from available CPU power The idea was suggested by Peter Zijlstra here: http://marc.info/?l=linux-kernel&m=127476934517534&w=2 irq time is technically not available to the tasks running on the CPU. This patch removes irq time from CPU power piggybacking on sched_rt_avg_update(). Tested this by keeping CPU X busy with a network intensive task having 75% oa a single CPU irq processing (hard+soft) on a 4-way system. And start seven cycle soakers on the system. Without this change, there will be two tasks on each CPU. With this change, there is a single task on irq busy CPU X and remaining 7 tasks are spread around among other 3 CPUs. Signed-off-by: Venkatesh Pallipadi <venki@google.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1286237003-12406-8-git-send-email-venki@google.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> -------------------------------------------------------------------------------------------------------------- diff --git a/kernel/sched.c b/kernel/sched.c index f8e5a25..60ef538 100644 --- a/kernel/sched.c +++ b/kernel/sched.c @@ -3611,7 +3611,13 @@ unsigned long scale_rt_power(int cpu) sched_avg_update(rq); total = sched_avg_period() + (rq->clock - rq->age_stamp); - available = total - rq->rt_avg; + + if (unlikely(total < rq->rt_avg)) { + /* Ensures that power won't end up being negative */ + available = 0; + } else { + available = total - rq->rt_avg; + } if (unlikely((s64)total < SCHED_LOAD_SCALE)) total = SCHED_LOAD_SCALE;
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release.
Patch(es) available on kernel-2.6.32-89.el6
*** Bug 669795 has been marked as a duplicate of this bug. ***
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2011-0542.html