Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 669795

Summary:

RHEL Divide-by-zero in find_busiest_group()

Product:

Red Hat Enterprise Linux 6

Reporter:

George Beshers <gbeshers>

Component:

kernel

Assignee:

George Beshers <gbeshers>

Status:

CLOSED DUPLICATE

QA Contact:

Red Hat Kernel QE team <kernel-qe>

Severity:

urgent

Docs Contact:

Priority:

high

Version:

6.2

CC:

bugzilla, ctatman, gbeshers, jeast, jwest, loriann, lwoodman, martinez, mmilgram, syeghiay, tee, xwzh2008

Target Milestone:

Keywords:

Reopened

Target Release:

6.4

Hardware:

x86_64

OS:

Linux

Whiteboard:

Fixed In Version:

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2012-07-19 02:13:24 UTC

Type:

---

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

Bug Blocks:

782183, 830305, 840683

Attachments:

Description	Flags
Patch -- offsets when applied to 2.6.32-99	none
Patch tested with 2.6.32-117	none

Description George Beshers 2011-01-14 19:32:33 UTC

Created attachment 473574 [details]
Patch -- offsets when applied to 2.6.32-99

Description of problem:
  Case 00402323


When we did an init 0 of a UV1000 system to shut it down, we hit a divide-by-zero.

[root@UV00000140-P000 ~]# init 0
[root@UV00000140-P000 ~]# initctl: Event failed
Shutting down Avahi daemon: [ OK ]
Stopping atd: [ OK ]
Stopping availmon: [ OK ]
...
ACPI: Preparing to enter system sleep state S5
Disabling non-boot CPUs ...
Broke affinity for irq 4
Broke affinity for irq 24
Broke affinity for irq 70
Broke affinity for irq 71
Broke affinity for irq 72
divide error: 0000 [#1] SMP
last sysfs file:
/sys/devices/pci0000:00/0000:00:03.0/0000:04:00.0/host0/target0:1:0/0:1:0:0/block/sda/dev
CPU 194
Modules linked in: sit tunnel4 autofs4 sunrpc ipt_REJECT ip6t_REJECT ipv6
vfat fat dm_mirror dm_region_hash dm_log numatools(U) xpmem(U) xp gru(U)
xvma(U) hwperf(U) uv_mmtimer(U) uinput sg i2c_i801 i2c_core iTCO_wdt
iTCO_vendor_support ioatdma igb dca ext4 mbcache jbd2 sd_mod crc_t10dif
mptsas mptscsih mptbase scsi_transport_sas dm_mod [last unloaded:
nf_conntrack]

Modules linked in: sit tunnel4 autofs4 sunrpc ipt_REJECT ip6t_REJECT ipv6
vfat fat dm_mirror dm_region_hash dm_log numatools(U) xpmem(U) xp gru(U)
xvma(U) hwperf(U) uv_mmtimer(U) uinput sg i2c_i801 i2c_core iTCO_wdt
iTCO_vendor_support ioatdma igb dca ext4 mbcache jbd2 sd_mod crc_t10dif
mptsas mptscsih mptbase scsi_transport_sas dm_mod [last unloaded:
nf_conntrack]
Pid: 32704, comm: kstop/194 Not tainted 2.6.32-71.el6.x86_64 #1 Stoutland
Platform
RIP: 0010:[<ffffffff81062e9c>] [<ffffffff81062e9c>]
find_busiest_group+0x56c/0xb40
RSP: 0018:ffff881076c2dbf0 EFLAGS: 00010046
RAX: 0000000000000000 RBX: ffff881076c2ddfc RCX: 0000000000000000
RDX: 0000000000000000 RSI: 00000000000001c0 RDI: 0000000000000000
RBP: ffff881076c2dd70 R08: ffff89801c431b68 R09: 00000000000001c0
R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000000
R13: 0000000000016980 R14: ffffffffffffffff R15: 0000000000000000
FS: 0000000000000000(0000) GS:ffff89801c440000(0000)
knlGS:0000000000000000
CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 00007f4f1b2f2d80 CR3: 0000000001001000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process kstop/194 (pid: 32704, threadinfo ffff881076c2c000, task
ffff8810761aaaf0)
Stack:
ffff881076c2dd10 ffff881076c2dc80 ffff89801c451908 ffff881076c2dde8
<0> 00ff881076c2dda0 00ffffff00000002 000000c200000000 00000000ffffffff
<0> ffff89801c451800 0000000000000008 0000000000016980 0000000000016980
Call Trace:
[<ffffffff814c864a>] thread_return+0x412/0x778
[<ffffffff81013c8e>] ? apic_timer_interrupt+0xe/0x20
[<ffffffff810c5fe0>] ? stop_cpu+0x0/0xf0
[<ffffffff8108c69c>] worker_thread+0x1fc/0x2a0
[<ffffffff81091ca0>] ? autoremove_wake_function+0x0/0x40
[<ffffffff8108c4a0>] ? worker_thread+0x0/0x2a0
[<ffffffff81091936>] kthread+0x96/0xa0
[<ffffffff810141ca>] child_rip+0xa/0x20
[<ffffffff810918a0>] ? kthread+0x0/0xa0
[<ffffffff810141c0>] ? child_rip+0x0/0x20
Code: ff c7 85 bc fe ff ff 01 00 00 00 e9 8d fc ff ff 0f 1f 80 00 00 00 00
48 8b 95 e0 fe ff ff 48 8b 45 a8 8b 4a 08 48 c1 e0 0a 31 d2 <48> f7 f1 48
8b 4d b0 48 89 45 a0 31 c0 48 85 c9 74 0c 48 8b 45
RIP [<ffffffff81062e9c>] find_busiest_group+0x56c/0xb40
RSP <ffff881076c2dbf0>
---[ end trace 7ba40d9e8ca26a63 ]---

Lots of threads took the same failure. 

We looked into the problem and found that cpus are disabled as part of shutdown. This exposes a race condition in the load balancing code. if the OS tries to load balance a domain that has no cpus, it divides by zero.

We have a patch to fix this bug in RHEL6. It guards against the divide by 0 and the page faults that we've seen. The current community sched.c code is very different than RHEL6's. Our fix changes a function and a structure which no longer exist in the community. There are no divides by cpu_power in 2.6.37.
Attachments

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:

Comment 2 George Beshers 2011-01-17 17:27:26 UTC

Requesting this be an exception for RHEL6.1.
The patch eliminates for divide-by-zero conditions
that have caused the kernel panic.

Comment 4 George Beshers 2011-02-28 15:32:47 UTC

Created attachment 481395 [details]
Patch tested with 2.6.32-117


Posted patch.

Comment 5 George Beshers 2011-03-01 16:22:30 UTC

I will reopen if we see this again --- testing on a 4-rack
machine at the end of the week.

George

*** This bug has been marked as a duplicate of bug 644903 ***

Comment 6 George Beshers 2011-04-28 16:54:11 UTC

Not seeing this on snap4.

Comment 7 George Beshers 2011-06-14 15:39:24 UTC

Reopening.

We are seeing this on a system that is about to ship to a customer
with Rhel6.1-ga installed.

I am investigating what might be special about the machine in
manufacturing relative to the test machines.

======================================================


Linux version 2.6.32-131.0.15.el6.x86_64 (mockbuild.bos.redhat.com) (gcc version 4.4.4 20100726 (Red Hat 4.4.4-13) (GCC) )
 #1 SMP Tue May 10 15:42:40 EDT 2011


ACPI: Preparing to enter system sleep state S5
Disabling non-boot CPUs ...
Broke affinity for irq 112
Broke affinity for irq 4
Broke affinity for irq 105
Broke affinity for irq 18
Broke affinity for irq 106
Broke affinity for irq 21
Broke affinity for irq 107
Broke affinity for irq 24
Broke affinity for irq 108
Broke affinity for irq 109
Broke affinity for irq 110
Broke affinity for irq 97
Broke affinity for irq 98
Broke affinity for irq 99
Broke affinity for irq 100
Broke affinity for irq 101
Broke affinity for irq 102
Broke affinity for irq 103
Broke affinity for irq 104
divide error: 0000 [#1] SMP 
last sysfs file: /sys/devices/pci0000:00/0000:00:1a.7/usb1/1-4/1-4:1.0/host1/target1:0:0/1:0:0:0/block/sr0/dev
CPU 644 
Modules linked in: autofs4 sunrpc ip6t_REJECT ipv6 vfat fat dm_mirror dm_region_hash dm_log numatools(U) xvma(U) uv_mmtimer(U) hwperf(U) uinput microcode ghes hed ixgbe mdio i2c_i801 i2c_core iTCO_wdt iTCO_vendor_support ioatdma sg igb dca ext4 mbcache jbd2 sr_mod cdrom sd_mod crc_t10dif usb_storage mptsas mptscsih mptbase scsi_transport_sas dm_mod [last unloaded: ip_tables]

Modules linked in: autofs4 sunrpc ip6t_REJECT ipv6 vfat fat dm_mirror dm_region_hash dm_log numatools(U) xvma(U) uv_mmtimer(U) hwperf(U) uinput microcode ghes hed ixgbe mdio i2c_i801 i2c_core iTCO_wdt iTCO_vendor_support ioatdma sg igb dca ext4 mbcache jbd2 sr_mod cdrom sd_mod crc_t10dif usb_storage mptsas mptscsih mptbase scsi_transport_sas dm_mod [last unloaded: ip_tables]
Pid: 1936, comm: migration/644 Tainted: G        W  ----------------   2.6.32-131.0.15.el6.x86_64 #1 Stoutland Platform
RIP: 0010:[<ffffffff81053b65>]  [<ffffffff81053b65>] find_busiest_group+0x5c5/0xb20
RSP: 0018:ffff88087bf0fba0  EFLAGS: 00010046
RAX: 0000000000000000 RBX: ffff88087bf0fdbc RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000300 RDI: 0000000000000000
RBP: ffff88087bf0fd30 R08: ffff8a800e450be0 R09: 0000000000000300
R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
R13: 0000000000015f80 R14: ffffffffffffffff R15: 0000000000000000
FS:  0000000000000000(0000) GS:ffff8a800e500000(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 00007f2b233c7098 CR3: 0000000001a25000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process migration/644 (pid: 1936, threadinfo ffff88087bf0e000, task ffff88087bf080c0)
Stack:
 ffff88087bf0fcd0 ffff88087bf0fc40 0000000000000000 0000000000000000
<0> 0000000000000000 ffff88087bf0fda8 0000000000000000 0000028400000002
<0> ffffffff00000000 ffff8a800e5108c0 0000000000000000 0000000000000008
Call Trace:
 [<ffffffff814db693>] thread_return+0x3aa/0x777
 [<ffffffff8100bc8e>] ? apic_timer_interrupt+0xe/0x20
 [<ffffffff810c3690>] ? stop_machine_cpu_stop+0x0/0xe0
 [<ffffffff810c3605>] cpu_stopper_thread+0x125/0x1b0
 [<ffffffff814db337>] ? thread_return+0x4e/0x777
 [<ffffffff8105dc72>] ? default_wake_function+0x12/0x20
 [<ffffffff810c34e0>] ? cpu_stopper_thread+0x0/0x1b0
 [<ffffffff8108ddf6>] kthread+0x96/0xa0
 [<ffffffff8100c1ca>] child_rip+0xa/0x20
 [<ffffffff8108dd60>] ? kthread+0x0/0xa0
 [<ffffffff8100c1c0>] ? child_rip+0x0/0x20

Comment 9 Marc Milgram 2011-06-30 13:59:25 UTC

In the case I saw, the customer was running linux-2.6.32-71.7.1.el6.x86_64.  As is indicated, that might be fixed by 644903.

Comment 10 gbeshers 2011-06-30 16:14:24 UTC

Yes, that caught another problem, but
the one I reopened for is in 6.1.

George

Comment 11 Jason 2011-07-22 16:16:09 UTC

We're also seeing this bug in with KVM on RHEL 6.1 and a RHEL 6.1 guest on shutdown.



555 Halting system...
556 md: stopping all md devices.
557 ACPI: Preparing to enter system sleep state S5
558 Disabling non-boot CPUs ...
559 BUG: soft lockup - CPU#0 stuck for 67s! [migration/0:5]
560 Modules linked in: nfs lockd fscache(T) nfs_acl auth_rpcgss autofs4 sunrpc ipv6 dm_mirror dm_region_hash dm_log microcode virtio_balloon virtio_net i2c_piix4 i2c_core ext4 mbcache jbd2 virtio_blk virtio_pci virtio_ring virtio pata_acpi ata_    generic ata_piix dm_mod [last unloaded: speedstep_lib]
561 CPU 0:
562 Modules linked in: nfs lockd fscache(T) nfs_acl auth_rpcgss autofs4 sunrpc ipv6 dm_mirror dm_region_hash dm_log microcode virtio_balloon virtio_net i2c_piix4 i2c_core ext4 mbcache jbd2 virtio_blk virtio_pci virtio_ring virtio pata_acpi ata_    generic ata_piix dm_mod [last unloaded: speedstep_lib]
563 Pid: 5, comm: migration/0 Tainted: G           ---------------- T 2.6.32-131.0.15.el6.x86_64 #1 KVM
564 RIP: 0010:[<ffffffff810c36ff>]  [<ffffffff810c36ff>] stop_machine_cpu_stop+0x6f/0xe0
565 RSP: 0018:ffff880198dabdd0  EFLAGS: 00000293
566 RAX: 0000000000000001 RBX: ffff880198dabdf0 RCX: ffff8800282111e8
567 RDX: 0000000000000000 RSI: ffff88019752c040 RDI: ffff880197161d28
568 RBP: ffffffff8100bc8e R08: ffff880198daa000 R09: 0000000000000001
569 R10: 00000000ffffffff R11: 0000000000000000 R12: 0000000000000000
570 R13: ffffffff814db337 R14: ffff880198dabdf0 R15: ffff8801936e8f00
571 FS:  0000000000000000(0000) GS:ffff880028200000(0000) knlGS:0000000000000000
572 CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
573 CR2: 00007fff295baeb0 CR3: 000000019362b000 CR4: 00000000000006f0
574 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
575 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
576 Call Trace:
577  [<ffffffff810c3690>] ? stop_machine_cpu_stop+0x0/0xe0
578  [<ffffffff810c35ba>] ? cpu_stopper_thread+0xda/0x1b0
579  [<ffffffff814db337>] ? thread_return+0x4e/0x777
580  [<ffffffff8105dc72>] ? default_wake_function+0x12/0x20
581  [<ffffffff810c34e0>] ? cpu_stopper_thread+0x0/0x1b0
582  [<ffffffff8108ddf6>] ? kthread+0x96/0xa0
583  [<ffffffff8100c1ca>] ? child_rip+0xa/0x20
584  [<ffffffff8108dd60>] ? kthread+0x0/0xa0
585  [<ffffffff8100c1c0>] ? child_rip+0x0/0x20

Comment 13 George Beshers 2012-07-18 18:58:12 UTC

I have not seen this on rhel6.3 in all of our testing.
Is anyone else still seeing this?  Jason?

George

Comment 14 Larry Woodman 2012-07-18 20:11:59 UTC

George, I found the actual cause of this and fixed in RHEL6.3.  Sorry I didnt find and close this BZ as a DUP of BZ785959, I think you should do that now...


>>>email sent to rhkernel-list by me on 02/23/2012 05:51 PM

[RHEL6.3 Patch] Fix Kernel divide by zero panic in find_busiest_group()
--------------------------------------------------------------------------------
RHEL6 is missing the attached upstream patch that at first glance appears to
be just a cleanup.  However this patch changes init_sched_groups_power()
to call update_group_power() which in turn calls update_cpu_power().  The
update_cpu_power() function verifies that the sched_group->cpu_power
is not initialized with a zero where init_sched_groups_power() did not do this.
Since find_busiest_group() uses sched_group->cpu_power in the demoninator
of a fraction it must not be zero!

-----------------------------------------------------------------------------------------------------
static void update_cpu_power(struct sched_domain *sd, int cpu)
{
...
        if (sched_feat(ARCH_POWER))
                power *= arch_scale_freq_power(sd, cpu);
        else
                power *= default_scale_freq_power(sd, cpu);

        power >>= SCHED_LOAD_SHIFT;

        power *= scale_rt_power(cpu);
        power >>= SCHED_LOAD_SHIFT;

>>>if (!power)
>>>        power = 1;

        sdg->cpu_power = power;
}
-------------------------------------------------------------------------------------------------------

The attached patch fixes this panic and BZ785959.  Since I can not reproduce
this problem I'm building a kernel in brew for the customers to test and verify
the fix.  I'll respond with the results once they verify it.



rhel6-cpu_power_init.patch

commit d274cb30f4a08045492d3f0c47cdf1a25668b1f5
Author: Peter Zijlstra <a.p.zijlstra>
Date:   Thu Apr 7 14:09:43 2011 +0200

    sched: Simplify ->cpu_power initialization
    
    The code in update_group_power() does what init_sched_groups_power()
    does and more, so remove the special init_ code and call the generic
    code instead.
    
    Also move the sd->span_weight initialization because
    update_group_power() needs it.
    
    Signed-off-by: Peter Zijlstra <a.p.zijlstra>
    Cc: Mike Galbraith <efault>
    Cc: Nick Piggin <npiggin>
    Cc: Linus Torvalds <torvalds>
    Cc: Andrew Morton <akpm>
    Link: http://lkml.kernel.org/r/20110407122941.875856012@chello.nl
    Signed-off-by: Ingo Molnar <mingo>

diff --git a/kernel/sched.c b/kernel/sched.c
index 3ce2ab6..071cf49 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -8618,9 +8618,6 @@ cpu_attach_domain(struct sched_domain *sd, struct root_domain *rd, int cpu)
 	struct rq *rq = cpu_rq(cpu);
 	struct sched_domain *tmp;
 
-	for (tmp = sd; tmp; tmp = tmp->parent)
-		tmp->span_weight = cpumask_weight(sched_domain_span(tmp));
-
 	/* Remove the sched domains which do not contribute to scheduling. */
 	for (tmp = sd; tmp; ) {
 		struct sched_domain *parent = tmp->parent;
@@ -9098,46 +9095,12 @@ static void free_sched_groups(const struct cpumask *cpu_map,
  */
 static void init_sched_groups_power(int cpu, struct sched_domain *sd)
 {
-	struct sched_domain *child;
-	struct sched_group *group;
-	long power;
-	int weight;
-
 	WARN_ON(!sd || !sd->groups);
 
 	if (cpu != group_first_cpu(sd->groups))
 		return;
 
-	child = sd->child;
-
-	sd->groups->cpu_power = 0;
-
-	if (!child) {
-		power = SCHED_LOAD_SCALE;
-		weight = cpumask_weight(sched_domain_span(sd));
-		/*
-		 * SMT siblings share the power of a single core.
-		 * Usually multiple threads get a better yield out of
-		 * that one core than a single thread would have,
-		 * reflect that in sd->smt_gain.
-		 */
-		if ((sd->flags & SD_SHARE_CPUPOWER) && weight > 1) {
-			power *= sd->smt_gain;
-			power /= weight;
-			power >>= SCHED_LOAD_SHIFT;
-		}
-		sd->groups->cpu_power += power;
-		return;
-	}
-
-	/*
-	 * Add cpu_power of each child group to this groups cpu_power.
-	 */
-	group = child->groups;
-	do {
-		sd->groups->cpu_power += group->cpu_power;
-		group = group->next;
-	} while (group != child->groups);
+	update_group_power(sd, cpu);
 }
 
 /*
@@ -9444,7 +9407,7 @@ static int __build_sched_domains(const struct cpumask *cpu_map,
 {
 	enum s_alloc alloc_state = sa_none;
 	struct s_data d;
-	struct sched_domain *sd;
+	struct sched_domain *sd, *tmp;
 	int i;
 #ifdef CONFIG_NUMA
 	d.sd_allnodes = 0;
@@ -9467,6 +9430,9 @@ static int __build_sched_domains(const struct cpumask *cpu_map,
 		sd = __build_book_sched_domain(&d, cpu_map, attr, sd, i);
 		sd = __build_mc_sched_domain(&d, cpu_map, attr, sd, i);
 		sd = __build_smt_sched_domain(&d, cpu_map, attr, sd, i);
+
+		for (tmp = sd; tmp; tmp = tmp->parent)
+			tmp->span_weight = cpumask_weight(sched_domain_span(tmp));
 	}
 
 	for_each_cpu(i, cpu_map) {

Comment 15 George Beshers 2012-07-19 00:31:21 UTC

John or Larry,

I can't access 784959 so I can't close this as a duplicate.

I agree it should be closed.

Comment 16 Larry Woodman 2012-07-19 02:13:24 UTC


*** This bug has been marked as a duplicate of bug 784959 ***