Bug 473258 - [4.7] ethtool operation to the slave device of bonding makes the system hang up.
[4.7] ethtool operation to the slave device of bonding makes the system hang up.
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: kernel (Show other bugs)
4.7
All Linux
urgent Severity high
: rc
: ---
Assigned To: Andy Gospodarek
Martin Jenner
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2008-11-27 07:51 EST by Flavio Leitner
Modified: 2014-06-29 19:00 EDT (History)
8 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2009-05-18 15:05:00 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
output of debug kernel 2.6.9-78.10.EL.IT231780.1 (72.63 KB, text/plain)
2008-12-29 12:30 EST, Flavio Leitner
no flags Details
e1000-ethtool-fix.patch (2.71 KB, patch)
2009-02-09 20:04 EST, Andy Gospodarek
no flags Details | Diff

  None (edit)
Description Flavio Leitner 2008-11-27 07:51:46 EST
Description of Problem:
 When the ethtool command invoked for getting the information of the
 slave device under the bonding device, the system hung occurred.

 We think that the system hung occurred at the obtaining process of
 the semaphore in the e1000 driver.

 This problem is the same as IT#221541/BZ#445951 that occurs by RHEL5.1.

Version-Release number of selected component:

 Red Hat Enterprise Linux Version Number: RHEL4
 Release Number: 4.7
 Architecture: x86
 Kernel Version: 2.6.9-78.ELsmp
 Related Package Version: kernel-2.6.9-78.ELsmp
 Related Middleware / Application: /sbin/ethtool

Drivers or hardware or architecture dependency:
 depend on following LAN controller and driver.
  - Intel 80003ES2LAN Gigabit Ethernet Controller
  - e1000 driver
  - bonding driver

How reproducible:
 Sometimes

Step to Reproduce:
  - create a bonding device using Intel 80003ES2LAN Gigabit Ethernet Cnotroller
      [device configuration]
       bond0 ---+--- eth0 (80003ES2LAN)
                |
                +--- eth1 (80003ES2LAN)

      1) Add the following to /etc/modprobe.conf
           alias bond0 bonding
           options bond0 mode=1 miimon=1 primary=eth0 updelay=1000
      2) Set the following to /etc/sysconfig/network-scripts/ifcfg-bond0
           DEVICE=bond0
           BOOTPROTO=none
           BROADCAST=192.168.0.255
           IPADDR=192.168.0.5
           NETMASK=255.255.255.0
           NETWORK=192.168.0.0
           ONBOOT=yes
           USERCTL=no
      3) Set the following to /etc/sysconfig/network-scripts/ifcfg-ethX
           DEVICE=ethX
           BOOTPROTO=none
           HWADDR=XX:XX:XX:XX:XX:XX
           MASTER=bond0
           SLAVE=yes
           USERCTL=no
           ONBOOT=yes

  - execute a ethtool command to slave device.
      # /sbin/ethtool eth0
    This may happen easily under the multiple invocation of command.

Actual Results:
 The system hung up.

Expected Results:
 The system does not hung up.

Hardware configuration:

 Model: PRIMERGY RX600S4
 CPU Info: Intel Xeon Processor 2.93GHz x 1
 Memory Info: 2[GB]
 Hardware Component Information:
   M/B: Intel 7300
   LAN: Intel 80003ES2LAN Gigabit Ethernet Controller (onbord)


Call Trace:
 [<c021203c>] __handle_sysrq+0x62/0xd9
 [<c020cd4c>] kbd_event+0x83/0xb0
 [<c02737e9>] input_event+0x331/0x351
 [<c0271bd8>] hidinput_hid_event+0x1d0/0x208
 [<c0224142>] get_device+0xe/0x14
 [<c026e1a5>] hid_process_event+0x28/0x52
 [<c026e47b>] hid_input_field+0x2ac/0x2f9
 [<c026e541>] hid_input_report+0x79/0x98
 [<c026e5f6>] hid_irq_in+0x96/0xf2
 [<c026577e>] usb_hcd_giveback_urb+0x14/0x3e
 [<f88aeceb>] uhci_finish_urb+0x27/0x32 [uhci_hcd]
 [<f88aed28>] uhci_finish_completion+0x32/0x38 [uhci_hcd]
 [<f88aef00>] uhci_irq+0x19b/0x240 [uhci_hcd]
 [<c02657ce>] usb_hcd_irq+0x26/0x4b
 [<c01074d2>] handle_IRQ_event+0x25/0x4f
 [<c0107a32>] do_IRQ+0x11c/0x1ae
 =======================
 [<c02e13b4>] common_interrupt+0x18/0x20
 [<c0112e1e>] delay_pmtmr+0xd/0x13
 [<c01c6ca1>] __delay+0x9/0xa
 [<f896e267>] e1000_get_software_semaphore+0x6c/0x7d [e1000]
 [<f896e184>] e1000_get_hw_eeprom_semaphore+0x17/0x66 [e1000]
 [<f896a091>] e1000_swfw_sync_acquire+0x42/0xdb [e1000]
 [<f896a4a3>] e1000_write_kmrn_reg+0x2d/0x62 [e1000]
 [<f8969305>] e1000_configure_kmrn_for_10_100+0x1e/0x78 [e1000]
 [<f8969eae>] e1000_get_speed_and_duplex+0xd4/0xf8 [e1000]
 [<f896eab0>] e1000_get_settings+0x9c/0xdb [e1000]
 [<f88f8679>] bond_update_speed_duplex+0x38/0xea [bonding1]
 [<c011e838>] __wake_up_common+0x36/0x51
 [<f88fa31e>] bond_mii_monitor+0x366/0x3f7 [bonding1]
 [<f88f9fb8>] bond_mii_monitor+0x0/0x3f7 [bonding1]
 [<c012a939>] run_timer_softirq+0x123/0x145
 [<c0126d84>] __do_softirq+0x4c/0xb1
 [<c01081a3>] do_softirq+0x4f/0x56
 =======================
 =======================
 [<c01174c8>] smp_apic_timer_interrupt+0x9a/0x9c
 [<c02e1436>] apic_timer_interrupt+0x1a/0x20
 [<f896a09d>] e1000_swfw_sync_acquire+0x4e/0xdb [e1000]
 [<f896a198>] e1000_read_phy_reg+0x2e/0xc4 [e1000]
 [<f8969330>] e1000_configure_kmrn_for_10_100+0x49/0x78 [e1000]
 [<f8969eae>] e1000_get_speed_and_duplex+0xd4/0xf8 [e1000]
 [<f896eab0>] e1000_get_settings+0x9c/0xdb [e1000]
 [<c028de3e>] ethtool_get_settings+0x2f/0x85
 [<c028f438>] dev_ethtool+0xc6/0x27a
 [<c028d318>] dev_ioctl+0x2c7/0x4ab
 [<c02c0ae7>] udp_ioctl+0x0/0xc5
 [<c02c6c8e>] inet_ioctl+0xa0/0xa5
 [<c0283bcd>] sock_ioctl+0x28c/0x2b4
 [<c016cc5e>] sys_ioctl+0x227/0x269
 [<c02e09db>] syscall_call+0x7/0xb
Comment 1 Flavio Leitner 2008-11-27 09:43:34 EST
The last trace shows the CPU still looping:
e1000_swfw_sync_acquire+0x4e/0xdb
...
3414     while (timeout) {
3415             if (e1000_get_hw_eeprom_semaphore(hw))
3416                 return -E1000_ERR_SWFW_SYNC;
3417
3418             swfw_sync = E1000_READ_REG(hw, SW_FW_SYNC);
** CPU was around here ** 
3419             if (!(swfw_sync & (fwmask | swmask))) {
3420                 break;
3421             }
3422
3423             /* firmware currently using resource (fwmask) */
3424             /* or other software thread currently using resource (swmask
        ) */
3425             e1000_put_hw_eeprom_semaphore(hw);
3426             mdelay(5);
3427             timeout--;
3428     }

then it was interrupted and do_softirq() was called and also went thru e1000_swfw_sync_acquire(). There it tries to get software_semaphore which
depends on a HW register E1000_SWSM_SMBI. I'd say this should deadlock
but it is looping with a timeout, so after some time it should fail and 
moves on normally.

The stack trace with do_softirq() code path shows the CPU is looping, 
waiting on mdelay().

The delay on e1000_swfw_sync_acquire() can be 1sec and on  e1000_get_hw_eeprom_semaphore() it is hw->eeprom.word_size + 1
This board doesn't have eeprom.word_size ready as other models, it reads 
from eeprom and I have no idea yet how big it can be.

An easy way to fix this would be adding local_bh_{dis,ena}able() at
e1000_swfw_sync_acquire() to avoid do_softirq() to get in the middle,
but looking at Linus tree or from sf.net this board is supported only
by e1000e and e1000_get_settings() didn't call anything else, so this 
bug shouldn't happen there.

next questions to solve:
1) Is this really hung? Or after some time it moves on?

2) What is the size of this eeprom? Perhaps the timeout is just too big.

3) Is there any plan for e1000/e1000e driver on RHEL4u7? I think if
we move this board to e1000e and update this driver, it should fix too.

Flavio
Comment 3 Issue Tracker 2008-12-15 11:40:51 EST
Hello,

This is the output from console log:

-> e1000_swfw_sync_acquire: timeout=200
  -> e1000_get_software_semaphore: timeout=2049
  <- e1000_get_software_semaphore: timeout=2049
  -> e1000_get_hw_eeprom_semaphore: timeout=2049
  <- e1000_get_hw_eeprom_semaphore: timeout=2049
<- e1000_swfw_sync_acquire: timeout=200

and it is repeating this part, so it's not a deadlock neither a HW lock
problem. The stack trace also looks good to me.

Well, I'm thinking if bond_mii_monitor() is too fast then, so I went check

your bonding options and I found these lines below:

  $ grep bond rx600s4-372001/etc/modprobe.conf
  alias bond0 bonding
  options bond0 mode=1 miimon=1 primary=eth2 updelay=1000
  install bond1 /sbin/modprobe --ignore-install bonding -o bonding1 mode=1

  miimon=1 primary=eth3 updelay=2000 

See the miimon=1 (default is 100), perhaps it is too fast.

Can you get 5 shots of sysrq+t followed by sysrq+w while reproducing it?
i.e:
1. <sysrq+t>
2. <sysrq+t>
3. wait 1 sec
4. repeat this 5x.
there is no need to use debug kernel.
 
Can you change that to miimon to 100 and see if it helps?

thanks,
Flavio

Internal Status set to 'Waiting on Support'

This event sent from IssueTracker by fleitner 
 issue 231780
Comment 4 Issue Tracker 2008-12-22 08:40:10 EST
Hi,

The traces shows CPU looping but it didn't match with previous log
provided.
Can you confirm if the data provided with the test kernel
kernel-2.6.9-78.10.EL.IT231780.1.src.rpm was gathered while the problem 
was happening?

I'm referring to this update:
---
I booted by this kernel, and gathered the console log.
I append it.
Could you check the contents?
---

If not, can you get it again while the problem is happening?

thanks,
Flavio

Internal Status set to 'Waiting on Support'

This event sent from IssueTracker by fleitner 
 issue 231780
Comment 5 Flavio Leitner 2008-12-29 12:30:42 EST
Created attachment 327926 [details]
output of debug kernel 2.6.9-78.10.EL.IT231780.1

Hi,

The problem here seems to be the big loop with delay on e1000_get_software_semaphore().

e1000_get_software_semaphore(struct e1000_hw *hw)
{
    int32_t timeout = hw->eeprom.word_size + 1;

    while (timeout) {
         ...
    
        mdelay(1);
        timeout--;
    }
}

The timeout value there is 2049, which means almost 2 seconds inside of
that while() {}. The bonding timer expires every 100ms, (in customer 
case was 1ms), then it calls bond_mii_monitor() and ends up there again
for more 2 seconds leaving ethtool stuck.

This is the ethtool stack interrupted:
 [<c02e13b4>] common_interrupt+0x18/0x20
 [<c0112e1e>] delay_pmtmr+0xd/0x13
 [<c01c6ca1>] __delay+0x9/0xa
 [<f896e267>] e1000_get_software_semaphore+0x6c/0x7d [e1000]
 [<f896e184>] e1000_get_hw_eeprom_semaphore+0x17/0x66 [e1000]
 [<f896a091>] e1000_swfw_sync_acquire+0x42/0xdb [e1000]
 [<f896a4a3>] e1000_write_kmrn_reg+0x2d/0x62 [e1000]
 [<f8969305>] e1000_configure_kmrn_for_10_100+0x1e/0x78 [e1000]
 [<f8969eae>] e1000_get_speed_and_duplex+0xd4/0xf8 [e1000]
 [<f896eab0>] e1000_get_settings+0x9c/0xdb [e1000]
 [<f88f8679>] bond_update_speed_duplex+0x38/0xea [bonding1]
 [<c011e838>] __wake_up_common+0x36/0x51
 [<f88fa31e>] bond_mii_monitor+0x366/0x3f7 [bonding1]
 [<f88f9fb8>] bond_mii_monitor+0x0/0x3f7 [bonding1]
 [<c012a939>] run_timer_softirq+0x123/0x145
 [<c0126d84>] __do_softirq+0x4c/0xb1
 [<c01081a3>] do_softirq+0x4f/0x56
 =======================
 [<c01174c8>] smp_apic_timer_interrupt+0x9a/0x9c
 [<c02e1436>] apic_timer_interrupt+0x1a/0x20
 [<f896a09d>] e1000_swfw_sync_acquire+0x4e/0xdb [e1000]
 [<f896a198>] e1000_read_phy_reg+0x2e/0xc4 [e1000]
 [<f8969330>] e1000_configure_kmrn_for_10_100+0x49/0x78 [e1000]
 [<f8969eae>] e1000_get_speed_and_duplex+0xd4/0xf8 [e1000]
 [<f896eab0>] e1000_get_settings+0x9c/0xdb [e1000]
 [<c028de3e>] ethtool_get_settings+0x2f/0x85
 [<c028f438>] dev_ethtool+0xc6/0x27a
 [<c028d318>] dev_ioctl+0x2c7/0x4ab
 [<c02c0ae7>] udp_ioctl+0x0/0xc5
 [<c02c6c8e>] inet_ioctl+0xa0/0xa5
 [<c0283bcd>] sock_ioctl+0x28c/0x2b4
 [<c016cc5e>] sys_ioctl+0x227/0x269
 [<c02e09db>] syscall_call+0x7/0xb

Well, I can't say if this big delay is really needed or not.
Thoughts?


(attaching the console output of debug kernel showing it looping)
Flavio
Comment 6 Flavio Leitner 2009-01-13 13:15:30 EST
Hello,

Here is another call trace leading to the same problem:

- This problem occurs when SW interrupt for MII monitor occurs while
 acquiring HW semaphore in SNMP state acquisition processing.
 At that time, the MII monitor cannot acquire the semaphore, because 
 the SNMP has already aquired the semaphore, and it must wait for 16 
 seconds for timeout. Therefore the system stops for 16 seconds.

 -----
 sys_ioctl
  ->e1000_get_speed_and_duplex
    ->e1000_configure_kmrn_for_1000
      ->e1000_write_kmrn_reg
        ->e1000_swfw_sync_acquire
          ->e1000_get_hw_eeprom_semaphore
            * SW interrupt by MII monitor occurs while dealing with this function.
            ->bond_mii_monitor
              ->e1000_get_speed_and_duplex
                ->e1000_configure_kmrn_for_1000
                  ->e1000_write_kmrn_reg
                    ->e1000_swfw_sync_acquire
                      ->e1000_get_hw_eeprom_semaphore
                        ->e1000_get_software_semaphore
                        * It is made to wait until timeout(16 seconds) in this function.


Flavio
Comment 9 Andy Gospodarek 2009-02-09 15:55:01 EST
Flavio's analysis is absolutely correct.  It's interesting that concurrent calls to check link status have such long timeouts.  Whether the mii monitoring time-out is set to be too small or not (I'd say 1ms is too small), there is a chance for deadlock with anything set, so we should try and come up with a fix.

As I look at what is done here

I'm not convinced that these bits:

3203 
3204     if ((hw->mac_type == e1000_80003es2lan) &&
3205         (hw->media_type == e1000_media_type_copper)) {
3206         if (*speed == SPEED_1000)
3207             ret_val = e1000_configure_kmrn_for_1000(hw);
3208         else
3209             ret_val = e1000_configure_kmrn_for_10_100(hw, *duplex);
3210         if (ret_val)
3211             return ret_val;
3212     }
3213 

need to be run on every call to e1000_get_speed_and_duplex().  I can see how they are valuable when the link first comes up since they make sure the interpacket gap and it appears that they disable some 'false carrier detection' bit (I haven't looked at the datasheet to determine what the false-carrier stuff actually does).

As I see it there are a couple of options:

1.  Break out the code above so that it's in a separate function and only called when needed.  This would mean we would drop it from e1000_get_speed_and_duplex and have to place some calls do the new function after the calls to e1000_get_speed_and_duplex.

2.  Add a new conditional argument to e1000_get_speed_and_duplex so the offending code is not called when we deem it not necessary.  This would require the same level of research to determine where this call is needed and where it is not.

3.  Do not modify the function signature and make sure the code above doesn't get called when e1000_get_speed_and_duplex is called in irq context.  This is a bit hacky, but would certainly work around the issue since the call would not me made by miimon (since timers run in softirq context).  This would allow it to be called in the watchdog though since the watchdog runs as a workqueue in our code so it's a normal, non-irq thread.

4. Other.

Any way you look at it, I don't see how calling e1000_configure_kmrn_for_1000 or e1000_configure_kmrn_for_10_100 is necessary when simply checking link status.  I'm hoping to get someone at Intel to verify this.
Comment 10 Andy Gospodarek 2009-02-09 16:20:09 EST
Places where we should not call 'questionable code' from comment #9 since it will be deadlock or delay prone are:

e1000_get_settings
e1000_check_for_link
e1000_config_fc_after_link_up
e1000_config_dsp_after_link_change

Places where we could call 'questionable code' from comment #9 since it will not be as deadlock prone:

e1000_watchdog_task

Since there is only one case where we should really call this I am going to suggest creating a new function to deal with this and only calling it there.

Upon inspecting the e1000e driver it appears the same basic code it only called is called in functions equivalent to e1000_watchdog_task and e1000_config_fc_after_link_up, so I will do some looking around and try to determine if this is necessary in both places in e1000 or just in e1000_watchdog_task.
Comment 11 Andy Gospodarek 2009-02-09 17:30:17 EST
Jesse, you can read the whole bug if you want, but if you start at comment #9 you will get the information you need.  I pinged Jeff K about this too, but he appears to be in a meeting and does not appear to have a bugzilla account his Intel address (at least not the one I know!).
Comment 12 Jesse Brandeburg 2009-02-09 19:21:18 EST
I agree with your assertion that it is not important to make those deep call paths from the ethtool code.

I believe they fixed these issues in newer bonding by never calling ethtool (IOCTL) calls from interrupt context (softirq) which seems like it was always a bad idea once it is explained that all ethtool calls were designed with the assumption that they were being called from *user* context, not interrupt context.
Comment 13 Andy Gospodarek 2009-02-09 19:47:34 EST
Jesse, that was one of the benefits of the move from timers to workqueues with bonding.  Those changes are in RHEL5, but I don't think we'll ever see them in RHEL4.

I'll whip up a patch and post it here soon.
Comment 14 Andy Gospodarek 2009-02-09 20:04:41 EST
Created attachment 331385 [details]
e1000-ethtool-fix.patch

First pass at a patch to resolve this.
Comment 15 Issue Tracker 2009-02-10 15:09:31 EST
Hi,

Engineering has a patch moving the offending function to another context.
Could you give a try and report back your results?

thanks,
Flavio

Internal Status set to 'Waiting on Support'

This event sent from IssueTracker by fleitner 
 issue 249481
Comment 16 Andy Gospodarek 2009-02-12 07:34:51 EST
My test kernels have been updated to include a patch for this bugzilla.

http://people.redhat.com/agospoda/#rhel4

Please test them and report back your results.  Without immediate
feedback there is a good chance this or any other fix for this driver
will not be included in the upcoming update.
Comment 17 Issue Tracker 2009-02-14 07:51:42 EST
From FJ:
---
I checked the same test on kernel-smp-2.6.9-81.EL.gtest.59 again.

And, it worked just fine.
Thank you for your fix.
---

Internal Status set to 'Waiting on Engineering'

This event sent from IssueTracker by moshiro@redhat.com 
 issue 231780
Comment 19 RHEL Product and Program Management 2009-02-20 14:43:34 EST
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.
Comment 21 Vivek Goyal 2009-02-26 11:47:20 EST
Committed in 82.EL . RPMS are available at http://people.redhat.com/vgoyal/rhel4/
Comment 29 errata-xmlrpc 2009-05-18 15:05:00 EDT
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2009-1024.html

Note You need to log in before you can comment on or make changes to this bug.