Bug 1064467
| Summary: | detaching of a network that shares ovirtmgmt's gateway with 'connectivity check failed' | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | [oVirt] vdsm | Reporter: | Elad <ebenahar> | ||||||||
| Component: | General | Assignee: | Edward Haas <edwardh> | ||||||||
| Status: | CLOSED NOTABUG | QA Contact: | Michael Burman <mburman> | ||||||||
| Severity: | high | Docs Contact: | |||||||||
| Priority: | unspecified | ||||||||||
| Version: | --- | CC: | acanan, bazulay, bugs, danken, ebenahar, gklein, mburman, mgoldboi, myakove, rbalakri, srevivo, ylavi | ||||||||
| Target Milestone: | ovirt-4.1.0-alpha | Flags: | danken:
ovirt-4.1?
rule-engine: planning_ack? rule-engine: devel_ack+ rule-engine: testing_ack? |
||||||||
| Target Release: | --- | ||||||||||
| Hardware: | x86_64 | ||||||||||
| OS: | Unspecified | ||||||||||
| Whiteboard: | |||||||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||||||
| Doc Text: | Story Points: | --- | |||||||||
| Clone Of: | Environment: | ||||||||||
| Last Closed: | 2016-07-25 08:27:21 UTC | Type: | Bug | ||||||||
| Regression: | --- | Mount Type: | --- | ||||||||
| Documentation: | --- | CRM: | |||||||||
| Verified Versions: | Category: | --- | |||||||||
| oVirt Team: | Network | RHEL 7.3 requirements from Atomic Host: | |||||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||||
| Embargoed: | |||||||||||
| Attachments: |
|
||||||||||
The interesting bits are hiding in supervdsm.log. Could you attach it?
The removed network shares the same subnet and gateway as ovirtmgmt. I guess the bug relates to this.
'ISCSI_network1': {'addr': '10.35.102.72',
'bridged': True,
'cfg': {'BOOTPROTO': 'dhcp',
'DELAY': '0',
'DEVICE': 'ISCSI_network1',
'NM_CONTROLLED': 'no',
'ONBOOT': 'yes',
'STP': 'no',
'TYPE': 'Bridge'},
'gateway': '10.35.102.254',
'iface': 'ISCSI_network1',
'ipv6addrs': ['fe80::214:5eff:fe17:cfd2/64'],
'ipv6gateway': '::',
'mtu': '1500',
'netmask': '255.255.255.0',
'ports': ['eth1'],
'stp': 'off'},
'ovirtmgmt': {'addr': '10.35.102.11',
'bridged': True,
'cfg': {'BOOTPROTO': 'dhcp',
'DEFROUTE': 'yes',
'DELAY': '0',
'DEVICE': 'ovirtmgmt',
'NM_CONTROLLED': 'no',
'ONBOOT': 'yes',
'STP': 'no',
'TYPE': 'Bridge'},
'gateway': '10.35.102.254',
'iface': 'ovirtmgmt',
'ipv6addrs': ['fe80::214:5eff:fe17:cfd0/64'],
'ipv6gateway': '::',
'mtu': '1500',
'netmask': '255.255.255.0',
'ports': ['eth0', 'vnet0'],
'stp': 'off'}}
Created attachment 862651 [details]
supervdsm.log
Elad, could you repeat the test with two networks that do not share the same subnet and gateway? Can reproduce with networks that do not share the same subnet and gateway. (In reply to Meni Yakove from comment #4) > Can reproduce with networks that do not share the same subnet and gateway. Please explain the reproduction: clearly not EVERY removal of a network breaks another's connectivity. If the network was set after ovirtmgmt network and defaultRoute=False was not passed for it, you might have had the following happen:
configure ovirtmgmt
ovirtmgmt sets a default gateway
configure 2nd-net
2nd-net overrides the default gateway
remove 2nd-net
2nd-net removes its gateway so there's no default gateway on the system.
If we don't (I have to check) we should take all the nets where defaultRoute is not specified as if they had defaultRoute=False.
....
Okay I just checked. That is not the case, the defaulRoute are correctly set. What happens though is that the fact that both networks have the same subnet, which makes the ip rule for sourceRouting of 2nd-net override that of ovirtmgmt. When removing it, we remove the table, even though the ovirtmgmt rule still points there. So that's the bug ;-) There should be refcounting for tables, I guess.
Meni, could you explain comment 4? How do you reproduce this bug? Hi dan. About comment 4 of Meni, I can't reproduce this bug on - rhevm-3.4.2-0.2.el6ev.noarch vdsm-4.14.13-2.el6ev.x86_64 And not on - ovirt-engine-3.5.0-0.0.master.20140821064931.gitb794d66.el6.noarch vdsm-4.16.2-1.gite8cba75.el6.x86_64 But, i can tell you that while trying reproduce this bug i lost connectivity on both systems. In one of them the host(3.4.2) enrolled back and the second host(3.5 rc_1.1) didn't enrolled back and stayed on non-operational state and i have to say that this is very bad. We need to investigate this. Dan, Attaching relevant logs from my host. Host that lost connectivity and didn't enrolled back. Stayed on non-operational state. All of that while trying to reproduce this bug. Best regards, Michael Created attachment 931343 [details]
Logs
cannot reproduce with ovirt-engine-3.5.0-0.0.master.20140821064931.gitb794d66.el6.noarch vdsm-4.16.2-1.gite8cba75.el6.x86_64 and different subnets This is an automated message. This Bugzilla report has been opened on a version which is not maintained anymore. Please check if this bug is still relevant in oVirt 3.5.4. If it's not relevant anymore, please close it (you may use EOL or CURRENT RELEASE resolution) If it's an RFE please update the version to 4.0 if still relevant. This bug is still relevant on 3.5.4.2-1.3.el6ev At least i believe it's the same bug. engine log --> 2015-09-08 16:59:58,414 ERROR [org.ovirt.vdsm.jsonrpc.client.reactors.ReactorClient] (org.ovirt.thread.pool-7-thread-13) Exception during connection 2015-09-08 16:59:58,414 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.PollVDSCommand] (ajp-/127.0.0.1:8702-18) [104809e9] java.util.concurrent.ExecutionException: org.ovirt.engine.core.vdsbroker.xmlrpc.XmlRpcRunTimeException: Connection issues during send request 2015-09-08 16:59:58,416 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.PollVDSCommand] (ajp-/127.0.0.1:8702-18) [104809e9] Command PollVDSCommand(HostName = orchid-vds2.qa.lab.tlv.redhat.com, HostId = 768978c0-b3f2-41bd-a9a0-41c2d697ce71) execution failed. Exception: RuntimeException: java.util.concurrent.ExecutionException: org.ovirt.engine.core.vdsbroker.xmlrpc.XmlRpcRunTimeException: Connection issues during send request Can't remove/detach the network, setup networks command hangs out forever without any timeout. No error message displayed in the end, because there is no timeout. Reproducible only when the second network has the same subnet and gateway as the management network. Failed to refresh the capabilities of host Any news on this issue? Are we going to handle it on 4.0? With no progress for 2 years and not customer tickets, I'd CLOSE-WONTFIX. Yes, we're slow. But this is annoying, and should be fixed in 4.y in my opinion. Recent work on default gateway detection and its proper setting has revealed the same problem. Setting two networks with the same subnet, while one has defaultRoute flag set will cause the default route to appear on both networks. VDSM better not support such a network setup at all and fail on validation. With newer default gateway handling, I would not expect to see the management network lost, but that requires additional testing. Could you please try and check if this has been resolved in 4.0? (Patch linked to this bug) I actually can't test it. I can't attach network to host that shares the same subnet and gateway as the management network. Because i'm loosing connectivity to the host. Error while executing action HostSetupNetworks: Network error during communication with the Host So i'm not sure how i can check that. In this case, I would suggest closing this ticket and document that we do not support at the moment the creation of host networks with overlapping subnets. We do not support at the moment the creation of host networks with overlapping subnets. |
Created attachment 862416 [details] logs and screenshot Description of problem: I tried to detach a network from a host which was in maintenance. It failed with a ConfigNetworkError error message in vdsm.log Version-Release number of selected component (if applicable): vdsm-4.14.1-3.el6.x86_64 How reproducible: Always Steps to Reproduce: 1. create a new network under DC 2. put host in maintenance 3. attach the new network to the host 4. detach the network from the host Actual results: Detaching the network fails with this message in vdsm.log: Thread-41199::ERROR::2014-02-12 18:12:37,917::API::1409::vds::(_rollback) connectivity check failed Traceback (most recent call last): File "/usr/share/vdsm/API.py", line 1407, in _rollback yield rollbackCtx File "/usr/share/vdsm/API.py", line 1294, in setupNetworks supervdsm.getProxy().setupNetworks(networks, bondings, options) File "/usr/share/vdsm/supervdsm.py", line 50, in __call__ return callMethod() File "/usr/share/vdsm/supervdsm.py", line 48, in <lambda> **kwargs) File "<string>", line 2, in setupNetworks File "/usr/lib64/python2.6/multiprocessing/managers.py", line 740, in _callmethod raise convert_to_error(kind, result) ConfigNetworkError: (10, 'connectivity check failed') On engine.log: 2014-02-12 18:02:38,476 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.SetupNetworksVDSCommand] (ajp--127.0.0.1-8702-2) [1fc476d1] Command SetupNetworksVDSCommand(HostName = green-vdsb, HostId = 6c3c4675-4294-44ca-ab26-522d16ca5b69, force=false, checkConnectivity=true, conectivityTimeout=120, networks=[], bonds=[], interfaces=[eth0 {id=a7cd0400-1e40-4e1c-9406-58ffa56102b0, vdsId=6c3c4675-4294-44ca-ab26-522d16ca5b69, name=eth0, macAddress=00:14:5e:17:cf:d0, networkName=ovirtmgmt, bondName=null, bootProtocol=DHCP, address=10.35.102.11, subnet=255.255.255.0, gateway=10.35.102.254, mtu=1500, bridged=true, speed=1000, type=2, networkImplementationDetails={inSync=true, managed=true}}, eth1 {id=fc9204ed-dfdc-4bbe-b331-b066a039b78a, vdsId=6c3c4675-4294-44ca-ab26-522d16ca5b69, name=eth1, macAddress=00:14:5e:17:cf:d2, networkName=null, bondName=null, bootProtocol=null, address=null, subnet=null, gateway=null, mtu=1500, bridged=true, speed=1000, type=0, networkImplementationDetails=null}], removedNetworks=[ISCSI_network1], removedBonds=[]) execution failed. Exception: RuntimeException: java.util.concurrent.ExecutionException: java.lang.reflect.InvocationTargetException Expected results: Detaching a network should succeed Additional info: engine.log vdsm.log and screenshout