Bug 241719
| Summary: | bonding causes kernel issue. | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 5 | Reporter: | Jeremiah Johnson <jeremiah.johnson> | 
| Component: | kernel | Assignee: | Andy Gospodarek <agospoda> | 
| Status: | CLOSED DUPLICATE | QA Contact: | Martin Jenner <mjenner> | 
| Severity: | medium | Docs Contact: | |
| Priority: | medium | ||
| Version: | 5.0 | CC: | adrian, k.georgiou, peterm, tjb, ville.skytta | 
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | All | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2008-01-30 18:20:08 UTC | Type: | --- | 
| Regression: | --- | Mount Type: | --- | 
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| 
        
          Description
        
        
          Jeremiah Johnson
        
        
        
        
        
          2007-05-29 19:38:05 UTC
        
       This is related to bug 210577 and fixes planned for that one should resolve this issue. Same on Fedora 7 with e1000 on x86_64. (In reply to comment #2) > Same on Fedora 7 with e1000 on x86_64. Not surprising since the fix isn't upstream yet. Also, there are test kernels that contain a patch that should resolve this issue. You can get them here: http://people.redhat.com/agospoda/#rhel5 Any testing you can do would be appreciated! gospo, I don't see any test kernels available. If one becomes available today I will test it. I updated to 2.6.18-8.1.6.el5 on a server that did not have bonding yet configured, but has the exact same hardware. We're still seeing the error but I don't think this problem was resolved in the new kernel update. Jeremiah, You can download the test kernels here: http://people.redhat.com/agospoda/#rhel5 They are ones I've built for testing fixes and are not offically supported. Hrm, I had went to that link earlier and nothing was displaying. Maybe a browser issue, anyways 2.6.18-22.el5.gtest.18 on my server I don't see any traces when I restart networking with bonding enabled. Glad to hear that someone else gets the same results that I get. :-) Feel free to put that test kernel through any test cycles you like since I'd like to make sure others agree with me that the bugs are worked out. We will continue to run your kernel until our testing phase for this system has finished. If we run into any other problems we will let you know. What are the effects of this issue? It seems still present in latest EL5 kernels, but despite of the assertion failure messages, bonding-alb appears to work in some quick tests (tested with two tg3 interfaces). we hit similar problems with OVZ kernels (based on RHEL5.1):
The call trace is:
bond_mii_monitor
  write_lock(&bond->curr_slave_lock);
  bond_select_active_slave
    bond_change_active_slave
      bond_alb_handle_active_change
        alb_set_slave_mac_addr
           dev_set_mac_address
dev_set_mac_adress will call (sooner or later) device notifier chain which
should be run under rtnl_lock exclusively without any other locks get.
The mainstream kernel has this problem fixed by complete locking rewrite.
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=history;f=drivers/net/bonding/bond_main.c;h=49a198206e3de901a74f34fc40694bb056ad922c;hb=HEAD
see patches from Jay Vosburgh, 2007-10-24
Kirill, This is fixed already in the latest rhel5 dev kernels. You can also check my test kernels: http://people.redhat.com/agospoda/#rhel5 for some bonding updates that I hope to get included in 5.2. Any feedback you can provide is helpful. *** This bug has been marked as a duplicate of 251902 *** |