Bug 1357738
Summary: | The device's master is unset when downed outside NetworkManager | |||
---|---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Zhengtong <zhengtli> | |
Component: | NetworkManager | Assignee: | Lubomir Rintel <lrintel> | |
Status: | CLOSED ERRATA | QA Contact: | Desktop QE <desktop-qa-list> | |
Severity: | medium | Docs Contact: | ||
Priority: | medium | |||
Version: | 7.3 | CC: | aloughla, atragler, bgalvani, dcbw, knoel, lrintel, qzhang, rkhan, sukulkar, thaller, thuth, vbenes, virt-maint, weliao, zhengtli | |
Target Milestone: | rc | Keywords: | Regression | |
Target Release: | --- | |||
Hardware: | All | |||
OS: | Unspecified | |||
Whiteboard: | ||||
Fixed In Version: | NetworkManager-1.4.0-3.el7 | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 1523572 (view as bug list) | Environment: | ||
Last Closed: | 2016-11-03 19:24:35 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 1523572 |
Description
Zhengtong
2016-07-19 03:40:41 UTC
Does this bug also occur on x86, or only on ppc64le? Did this bug also occur on RHEL7.2 (host), or is it a regression in RHEL7.3? only happens on ppc64le. the network can resume on x86. Observation: When taking the link down, the tap interface is removed from the bridge: # brctl show bridge name bridge id STP enabled interfaces virbr0 8000.9a616a17e4a0 yes tap0 # ip link set tap0 down # brctl show bridge name bridge id STP enabled interfaces virbr0 8000.000000000000 yes And after enabling the interface again, the tap0 is not automatically connected again: # ip link set tap0 up # brctl show bridge name bridge id STP enabled interfaces virbr0 8000.000000000000 yes I can manually connect the tap0 interface to the bridge again: # brctl addif virbr0 tap0 # brctl show bridge name bridge id STP enabled interfaces virbr0 8000.9a616a17e4a0 yes tap0 ... and after doing so, the network in the guest is working properly again! Now the question is: Why is here a difference between x86 and ppc64 ? I've now checked this on a x86_64 host, too, and I get the very same behavior there as on ppc64le - after setting the link down, the tap interface is removed from the bridge, so the guest network does not come up automatically again when doing the "ip link set tap0 up". I also got to execute "brctl addif virbr0 tap0" there manually, too, to get it working again. Zhengtong, could you please describe your setup on x86 (where it was working) in more details? Which host kernel version and qemu version did you use on x86? What do you get when you execute "brctrl show" there before/after each step? The configurations I tested with is Host: 3.10.0-370.el7.x86_64 qemu-kvm-rhev-2.6.0-17.el7 Guest: 3.10.0-327.10.1.el7.x86_64 That's interesting. After I set the tap0 down, the tap0 device is still attached on virbr0 bridge Steps and result: step 1. After guest boot up. keep pinging virbr0 (192.168.122.1) step 2. On host. set the tap0 down [root@dhcp-9-217 ~]# ip link set tap0 down [root@dhcp-9-217 ~]# brctl show bridge name bridge id STP enabled interfaces switch 8000.00151736e1c5 no enp1s0f1 enp2s0 virbr0 8000.525400d3c77f no tap0 virbr0-nic And the pinging process paused. step 3. On host. set the tap0 up again. [root@dhcp-9-217 ~]# ip link set tap0 up [root@dhcp-9-217 ~]# brctl show bridge name bridge id STP enabled interfaces switch 8000.00151736e1c5 no enp1s0f1 enp2s0 virbr0 8000.525400d3c77f no tap0 virbr0-nic And the pinging process resumed. I didn't use the latest host&guest kernel . but I think that the linking back automatically is the natural behaviour. OK, thanks a lot for the information! ... I think we're on the right track here: Yesterday, when I was seeing the failure on x86, too, I was using the latest snapshot of RHEL 7.3 on the host (kernel 3.10.0-481.el7.x86_64, qemu-kvm-rhev-2.6.0-18). Today, I installed RHEL 7.2 on the x86 host (kernel 3.10.0-327.18.2.el7.x86_64, qemu-kvm-1.5.3-105.el7_2.3.x86_64), and now I get the same behavior as you, i.e. the guest network continues to work after setting up the link again! So it seem like there has been a modification between the two versions that has introduced this different behavior ... I'll do more tests to isolate the exact problem ... I've now also installed RHEL 7.2 (with kernel 3.10.0-327.18.2.el7.ppc64le) with qemu-kvm-rhev-2.6.0-18.el7.ppc64le on our POWER8 servers - and the guest network continues to work there, too, after setting the link up again. So this is definitely a regression from RHEL 7.2 to the current version of 7.3. Since I used qemu-kvm-rhev-2.6.0 on both RHEL 7.2 and RHEL 7.3 when testing on ppc, I think the problem is likely not in QEMU itself. So I've now done an additional test, too: I've installed kernel 3.10.0-481 on the RHEL 7.2 installation and tried again after booting it - however, the guest network then still continues to work after setting the link up again, so the problem is likely also not in the kernel... not sure what else can be the culprit... I'll keep on searching... As mentioned earlier, the problem can also be reproduced on x86 (I just tried the RHEL-7.3-20160802 snapshot again and reproduced it there), so I'm changing the "Hardware" field to "All". I think I've now found the component that is causing the problems: NetworkManager. If I disable NetworkManager before the test, the tap0 does _not_ get removed from the bridge when setting the interface down, and the guest network continues to work after after setting the link up again! So I'm re-assigning this ticket to the NetworkManager component for further investigation. Quick question, why does the tap device need to be "down"? I'm not saying there is no NM bug here, just curious. (In reply to Dan Williams from comment #12) > Quick question, why does the tap device need to be "down"? I'm not saying > there is no NM bug here, just curious. I think this is a test for product robustness. It is possible that we want to do some debug or test stuff with setting the tap device down, but that's not for sure. I can see what is wrong. Working on the fix. For QE testing: ip link add br0 type bridge ip addr add 192.0.2.1/24 dev br0 ip tuntap add tap0 mode tap ip addr add 192.0.2.2/24 dev tap0 ip link set tap0 master br0 ip link set tap0 up ip link set br0 up # The device has a master... sleep 1 ip link show tap0 nmcli c nmcli c down tap0 # ...and now it does not. sleep 1 ip link show tap0 nmcli c ip link del br0 ip link del tap0 fixed upstream: https://cgit.freedesktop.org/NetworkManager/NetworkManager/commit/?id=3127fb0d17bff0b250218c7bf82b4335b5290825 bridge slave correctly preserves IP address and master settings when juggled with ip link set dev $dev up/down command. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2016-2581.html |