Bug 644903

Summary: Kernel divide by zero in find_busiest_group
Product: Red Hat Enterprise Linux 6 Reporter: William Cohen <wcohen>
Component: kernelAssignee: Larry Woodman <lwoodman>
Status: CLOSED ERRATA QA Contact: Red Hat Kernel QE team <kernel-qe>
Severity: high Docs Contact:
Priority: low    
Version: 6.0CC: arozansk, gbeshers, lwoodman
Target Milestone: rc   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: kernel-2.6.32-85.el6 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 771416 (view as bug list) Environment:
Last Closed: 2011-05-19 12:40:10 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 771416    

Description William Cohen 2010-10-20 15:14:09 UTC
Description of problem:

When the laptop was unattended this weekend it rebooted. I found the
crash information about this. The crash output indicates that it was
due to a divide by zero in find_busiest_group.


Version-Release number of selected component (if applicable):

kernel-2.6.32-71.el6.x86_64
firefox-3.6.9-2.el6.x86_64


How reproducible:

Unknown

Steps to Reproduce:
1. Unknown

  
Actual results:

Panic and reboot


Expected results:

No crash


Additional info:

This GDB was configured as "x86_64-unknown-linux-gnu"...

      KERNEL: /usr/lib/debug/lib/modules/2.6.32-71.el6.x86_64/vmlinux
    DUMPFILE: /var/crash/127.0.0.1-2010-10-16-05:03:17/vmcore  [PARTIAL DUMP]
        CPUS: 4
        DATE: Sat Oct 16 05:03:11 2010
      UPTIME: 8 days, 07:35:47
LOAD AVERAGE: 0.16, 0.17, 0.11
       TASKS: 449
    NODENAME: cannondale
     RELEASE: 2.6.32-71.el6.x86_64
     VERSION: #1 SMP Wed Sep 1 01:33:01 EDT 2010
     MACHINE: x86_64  (2659 Mhz)
      MEMORY: 3.8 GB
       PANIC: ""
         PID: 12405
     COMMAND: "firefox"
        TASK: ffff88010ed54080  [THREAD_INFO: ffff880105174000]
         CPU: 2
       STATE: TASK_INTERRUPTIBLE (PANIC)

crash> bt
PID: 12405  TASK: ffff88010ed54080  CPU: 2   COMMAND: "firefox"
 #0 [ffff880105175670] machine_kexec at ffffffff8103695b
 #1 [ffff8801051756d0] crash_kexec at ffffffff810b8f08
 #2 [ffff8801051757a0] oops_end at ffffffff814cbbd0
 #3 [ffff8801051757d0] die at ffffffff8101733b
 #4 [ffff880105175800] do_trap at ffffffff814cb4a4
 #5 [ffff880105175860] do_divide_error at ffffffff810150af
 #6 [ffff880105175900] divide_error at ffffffff81013efb
    [exception RIP: find_busiest_group+1388]
    RIP: ffffffff81062e9c  RSP: ffff8801051759b8  RFLAGS: 00010046
    RAX: 0000000000000000  RBX: ffff880105175bc4  RCX: 0000000000000000
    RDX: 0000000000000000  RSI: 0000000000000004  RDI: 0000000000000004
    RBP: ffff880105175b38   R8: ffff880028310850   R9: 0000000000000000
    R10: 0000000000000000  R11: 0000000000000001  R12: 00000000ffffff01
    R13: 0000000000016980  R14: ffffffffffffffff  R15: 0000000000000000
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #7 [ffff8801051759b0] find_busiest_group at ffffffff81062b84
 #8 [ffff880105175b40] thread_return at ffffffff814c864a
 #9 [ffff880105175c00] futex_wait_queue_me at ffffffff810a25a9
#10 [ffff880105175c40] futex_wait at ffffffff810a3678
#11 [ffff880105175dc0] do_futex at ffffffff810a4dc1
#12 [ffff880105175ef0] sys_futex at ffffffff810a581b
#13 [ffff880105175f80] system_call_fastpath at ffffffff81013172
    RIP: 00000032a4a0b7a9  RSP: 00007fb5bf9fec98  RFLAGS: 00010202
    RAX: 00000000000000ca  RBX: ffffffff81013172  RCX: 0000000000000000
    RDX: 0000000001ce4255  RSI: 0000000000000189  RDI: 00007fb5c4a333cc
    RBP: 00007fb5cb8e9690   R8: 00007fb5cb8e9690   R9: 00000000ffffffff
    R10: 00007fb5bf9fed20  R11: 0000000000000206  R12: 0000000000000000
    R13: 0000000000000000  R14: 00007fb5bf9fed20  R15: 0000000001ce4255
    ORIG_RAX: 00000000000000ca  CS: 0033  SS: 002b
crash>

Comment 2 Larry Woodman 2010-11-16 13:06:04 UTC
Not sure exacly how this happened but we should never blindly divide
in the kernel without making sure the denominator is not zero!

diff --git a/kernel/sched.c b/kernel/sched.c
index f8e5a25..df7753d 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -4162,7 +4162,8 @@ find_busiest_group(struct sched_domain *sd, int this_cpu,
        if (sds.this_load >= sds.max_load)
                goto out_balanced;

-       sds.avg_load = (SCHED_LOAD_SCALE * sds.total_load) / sds.total_pwr;
+       sds.avg_load = (SCHED_LOAD_SCALE * sds.total_load) /
+                                       (sds.total_pwr?sds.total_pwr:1);

        if (sds.this_load >= sds.avg_load)
                goto out_balanced;

Comment 3 Larry Woodman 2010-11-18 15:16:01 UTC
After talking with Peterz & Ingo about this, the problem seems to be scale_rt_power()
can return a negative value to update_cpu_power() which sets the sdg->cpu_power.
This potentially allows the arithmetic sum of all sdg->cpu_power within a schedule
domain(total_pwr) to be zero.  Finally find_busiest_group() uses total_pwr as the
denominator when calculating the load average:

sds.avg_load = (SCHED_LOAD_SCALE * sds.total_load) / sds.total_pwr;

This fix is part of commit aa483808516ca5cacfa0e5849691f64fec25828e but we cant
include all of it because it relies on other commits that really break the kABI.  So we
are only including the hunk that prevents scale_rt_power() from returning a negative
value.

--------------------------------------------------------------------------------------------------
commit aa483808516ca5cacfa0e5849691f64fec25828e
Author: Venkatesh Pallipadi <venki>
Date:   Mon Oct 4 17:03:22 2010 -0700

  sched: Remove irq time from available CPU power
    The idea was suggested by Peter Zijlstra here:
      http://marc.info/?l=linux-kernel&m=127476934517534&w=2
    irq time is technically not available to the tasks running on the CPU.
  This patch removes irq time from CPU power piggybacking on
  sched_rt_avg_update().
    Tested this by keeping CPU X busy with a network intensive task having 75%
  oa a single CPU irq processing (hard+soft) on a 4-way system. And start seven
  cycle soakers on the system. Without this change, there will be two tasks on
  each CPU. With this change, there is a single task on irq busy CPU X and
  remaining 7 tasks are spread around among other 3 CPUs.
    Signed-off-by: Venkatesh Pallipadi <venki>
  Signed-off-by: Peter Zijlstra <a.p.zijlstra>
  LKML-Reference: <1286237003-12406-8-git-send-email-venki>
  Signed-off-by: Ingo Molnar <mingo>
--------------------------------------------------------------------------------------------------------------

diff --git a/kernel/sched.c b/kernel/sched.c
index f8e5a25..60ef538 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -3611,7 +3611,13 @@ unsigned long scale_rt_power(int cpu)
 	sched_avg_update(rq);
 
 	total = sched_avg_period() + (rq->clock - rq->age_stamp);
-	available = total - rq->rt_avg;
+
+	if (unlikely(total < rq->rt_avg)) {
+		/* Ensures that power won't end up being negative */
+		available = 0;
+	} else {
+		available = total - rq->rt_avg;
+	}
 
 	if (unlikely((s64)total < SCHED_LOAD_SCALE))
 		total = SCHED_LOAD_SCALE;

Comment 5 RHEL Program Management 2010-11-19 17:00:17 UTC
This request was evaluated by Red Hat Product Management for inclusion
in a Red Hat Enterprise Linux maintenance release. Product Management has 
requested further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed 
products. This request is not yet committed for inclusion in an Update release.

Comment 6 Aristeu Rozanski 2010-12-13 15:15:19 UTC
Patch(es) available on kernel-2.6.32-89.el6

Comment 9 George Beshers 2011-03-01 16:22:30 UTC
*** Bug 669795 has been marked as a duplicate of this bug. ***

Comment 11 errata-xmlrpc 2011-05-19 12:40:10 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2011-0542.html