Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1064467

Summary: detaching of a network that shares ovirtmgmt's gateway with 'connectivity check failed'
Product: [oVirt] vdsm Reporter: Elad <ebenahar>
Component: GeneralAssignee: Edward Haas <edwardh>
Status: CLOSED NOTABUG QA Contact: Michael Burman <mburman>
Severity: high Docs Contact:
Priority: unspecified    
Version: ---CC: acanan, bazulay, bugs, danken, ebenahar, gklein, mburman, mgoldboi, myakove, rbalakri, srevivo, ylavi
Target Milestone: ovirt-4.1.0-alphaFlags: danken: ovirt-4.1?
rule-engine: planning_ack?
rule-engine: devel_ack+
rule-engine: testing_ack?
Target Release: ---   
Hardware: x86_64   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-07-25 08:27:21 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Network RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
logs and screenshot
none
supervdsm.log
none
Logs none

Description Elad 2014-02-12 16:28:48 UTC
Created attachment 862416 [details]
logs and screenshot

Description of problem:
I tried to detach a network from a host which was in maintenance. It failed with a ConfigNetworkError error message in vdsm.log

Version-Release number of selected component (if applicable):
vdsm-4.14.1-3.el6.x86_64

How reproducible:
Always 

Steps to Reproduce:
1. create a new network under DC
2. put host in maintenance
3. attach the new network to the host
4. detach the network from the host 

Actual results:
Detaching the network fails with this message in vdsm.log:

Thread-41199::ERROR::2014-02-12 18:12:37,917::API::1409::vds::(_rollback) connectivity check failed
Traceback (most recent call last):
  File "/usr/share/vdsm/API.py", line 1407, in _rollback
    yield rollbackCtx
  File "/usr/share/vdsm/API.py", line 1294, in setupNetworks
    supervdsm.getProxy().setupNetworks(networks, bondings, options)
  File "/usr/share/vdsm/supervdsm.py", line 50, in __call__
    return callMethod()
  File "/usr/share/vdsm/supervdsm.py", line 48, in <lambda>
    **kwargs)
  File "<string>", line 2, in setupNetworks
  File "/usr/lib64/python2.6/multiprocessing/managers.py", line 740, in _callmethod
    raise convert_to_error(kind, result)
ConfigNetworkError: (10, 'connectivity check failed')


On engine.log:

2014-02-12 18:02:38,476 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.SetupNetworksVDSCommand] (ajp--127.0.0.1-8702-2) [1fc476d1] Command SetupNetworksVDSCommand(HostName = green-vdsb, HostId = 6c3c4675-4294-44ca-ab26-522d16ca5b69, force=false, checkConnectivity=true, conectivityTimeout=120,
        networks=[],
        bonds=[],
        interfaces=[eth0 {id=a7cd0400-1e40-4e1c-9406-58ffa56102b0, vdsId=6c3c4675-4294-44ca-ab26-522d16ca5b69, name=eth0, macAddress=00:14:5e:17:cf:d0, networkName=ovirtmgmt, bondName=null, bootProtocol=DHCP, address=10.35.102.11, subnet=255.255.255.0, gateway=10.35.102.254, mtu=1500, bridged=true, speed=1000, type=2, networkImplementationDetails={inSync=true, managed=true}},
                eth1 {id=fc9204ed-dfdc-4bbe-b331-b066a039b78a, vdsId=6c3c4675-4294-44ca-ab26-522d16ca5b69, name=eth1, macAddress=00:14:5e:17:cf:d2, networkName=null, bondName=null, bootProtocol=null, address=null, subnet=null, gateway=null, mtu=1500, bridged=true, speed=1000, type=0, networkImplementationDetails=null}],
        removedNetworks=[ISCSI_network1],
        removedBonds=[]) execution failed. Exception: RuntimeException: java.util.concurrent.ExecutionException: java.lang.reflect.InvocationTargetException

Expected results:
Detaching a network should succeed

Additional info: engine.log vdsm.log and screenshout

Comment 1 Dan Kenigsberg 2014-02-12 18:11:00 UTC
The interesting bits are hiding in supervdsm.log. Could you attach it?

The removed network shares the same subnet and gateway as ovirtmgmt. I guess the bug relates to this.

 'ISCSI_network1': {'addr': '10.35.102.72',
                    'bridged': True,
                    'cfg': {'BOOTPROTO': 'dhcp',
                            'DELAY': '0',
                            'DEVICE': 'ISCSI_network1',
                            'NM_CONTROLLED': 'no',
                            'ONBOOT': 'yes',
                            'STP': 'no',
                            'TYPE': 'Bridge'},
                    'gateway': '10.35.102.254',
                    'iface': 'ISCSI_network1',
                    'ipv6addrs': ['fe80::214:5eff:fe17:cfd2/64'],
                    'ipv6gateway': '::',
                    'mtu': '1500',
                    'netmask': '255.255.255.0',
                    'ports': ['eth1'],
                    'stp': 'off'},
 'ovirtmgmt': {'addr': '10.35.102.11',
               'bridged': True,
               'cfg': {'BOOTPROTO': 'dhcp',
                       'DEFROUTE': 'yes',
                       'DELAY': '0',
                       'DEVICE': 'ovirtmgmt',
                       'NM_CONTROLLED': 'no',
                       'ONBOOT': 'yes',
                       'STP': 'no',
                       'TYPE': 'Bridge'},
               'gateway': '10.35.102.254',
               'iface': 'ovirtmgmt',
               'ipv6addrs': ['fe80::214:5eff:fe17:cfd0/64'],
               'ipv6gateway': '::',
               'mtu': '1500',
               'netmask': '255.255.255.0',
               'ports': ['eth0', 'vnet0'],
               'stp': 'off'}}

Comment 2 Elad 2014-02-13 07:41:39 UTC
Created attachment 862651 [details]
supervdsm.log

Comment 3 Dan Kenigsberg 2014-02-13 18:49:12 UTC
Elad, could you repeat the test with two networks that do not share the same subnet and gateway?

Comment 4 Meni Yakove 2014-07-27 06:42:04 UTC
Can reproduce with networks that do not share the same subnet and gateway.

Comment 5 Dan Kenigsberg 2014-07-28 08:27:36 UTC
(In reply to Meni Yakove from comment #4)
> Can reproduce with networks that do not share the same subnet and gateway.

Please explain the reproduction: clearly not EVERY removal of a network breaks another's connectivity.

Comment 6 Antoni Segura Puimedon 2014-08-08 21:57:23 UTC
If the network was set after ovirtmgmt network and defaultRoute=False was not passed for it, you might have had the following happen:

configure ovirtmgmt
    ovirtmgmt sets a default gateway
configure 2nd-net
    2nd-net overrides the default gateway

remove 2nd-net
    2nd-net removes its gateway so there's no default gateway on the system.


If we don't (I have to check) we should take all the nets where defaultRoute is not specified as if they had defaultRoute=False.

....
Okay I just checked. That is not the case, the defaulRoute are correctly set. What happens though is that the fact that both networks have the same subnet, which makes the ip rule for sourceRouting of 2nd-net override that of ovirtmgmt. When removing it, we remove the table, even though the ovirtmgmt rule still points there. So that's the bug ;-) There should be refcounting for tables, I guess.

Comment 7 Dan Kenigsberg 2014-08-26 17:15:25 UTC
Meni, could you explain comment 4? How do you reproduce this bug?

Comment 8 Michael Burman 2014-08-27 06:59:05 UTC
Hi dan.

About comment 4 of Meni,

I can't reproduce this bug on - 
rhevm-3.4.2-0.2.el6ev.noarch
vdsm-4.14.13-2.el6ev.x86_64

And not on - 
ovirt-engine-3.5.0-0.0.master.20140821064931.gitb794d66.el6.noarch
vdsm-4.16.2-1.gite8cba75.el6.x86_64

But, i can tell you that while trying reproduce this bug i lost connectivity on both systems. In one of them the host(3.4.2) enrolled back and the second host(3.5 rc_1.1) didn't enrolled back and stayed on non-operational state and i have to say that this is very bad. We need to investigate this.

Comment 9 Michael Burman 2014-08-27 08:52:07 UTC
Dan,

Attaching relevant logs from my host. Host that lost connectivity and didn't enrolled back. Stayed on non-operational state. 
All of that while trying to reproduce this bug.

Best regards,

Michael

Comment 10 Michael Burman 2014-08-27 08:52:51 UTC
Created attachment 931343 [details]
Logs

Comment 11 Martin Pavlik 2014-08-27 09:09:53 UTC
cannot reproduce with 
ovirt-engine-3.5.0-0.0.master.20140821064931.gitb794d66.el6.noarch
vdsm-4.16.2-1.gite8cba75.el6.x86_64

and different subnets

Comment 12 Sandro Bonazzola 2015-09-04 08:59:46 UTC
This is an automated message.
This Bugzilla report has been opened on a version which is not maintained anymore.
Please check if this bug is still relevant in oVirt 3.5.4.
If it's not relevant anymore, please close it (you may use EOL or CURRENT RELEASE resolution)
If it's an RFE please update the version to 4.0 if still relevant.

Comment 13 Michael Burman 2015-09-08 14:25:37 UTC
This bug is still relevant on 3.5.4.2-1.3.el6ev
At least i believe it's the same bug.  


engine log -->
2015-09-08 16:59:58,414 ERROR [org.ovirt.vdsm.jsonrpc.client.reactors.ReactorClient] (org.ovirt.thread.pool-7-thread-13) Exception during connection
2015-09-08 16:59:58,414 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.PollVDSCommand] (ajp-/127.0.0.1:8702-18) [104809e9] java.util.concurrent.ExecutionException: org.ovirt.engine.core.vdsbroker.xmlrpc.XmlRpcRunTimeException: Connection issues during send request
2015-09-08 16:59:58,416 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.PollVDSCommand] (ajp-/127.0.0.1:8702-18) [104809e9] Command PollVDSCommand(HostName = orchid-vds2.qa.lab.tlv.redhat.com, HostId = 768978c0-b3f2-41bd-a9a0-41c2d697ce71) execution failed. Exception: RuntimeException: java.util.concurrent.ExecutionException: org.ovirt.engine.core.vdsbroker.xmlrpc.XmlRpcRunTimeException: Connection issues during send request

Can't remove/detach the network, setup networks command hangs out forever without any timeout. No error message displayed in the end, because there is no timeout.
Reproducible only when the second network has the same subnet and gateway as the management network. 
Failed to refresh the capabilities of host

Comment 15 Yaniv Kaul 2016-03-10 10:41:18 UTC
Any news on this issue? Are we going to handle it on 4.0? With no progress for 2 years and not customer tickets, I'd CLOSE-WONTFIX.

Comment 16 Dan Kenigsberg 2016-03-13 08:22:26 UTC
Yes, we're slow. But this is annoying, and should be fixed in 4.y in my opinion.

Comment 17 Edward Haas 2016-05-29 14:14:36 UTC
Recent work on default gateway detection and its proper setting has revealed the same problem.

Setting two networks with the same subnet, while one has defaultRoute flag set will cause the default route to appear on both networks.

VDSM better not support such a network setup at all and fail on validation.
With newer default gateway handling, I would not expect to see the management network lost, but that requires additional testing.

Comment 18 Edward Haas 2016-07-24 14:51:42 UTC
Could you please try and check if this has been resolved in 4.0?
(Patch linked to this bug)

Comment 19 Michael Burman 2016-07-25 07:30:25 UTC
I actually can't test it. 
I can't attach network to host that shares the same subnet and gateway as the management network. Because i'm loosing connectivity to the host. 

Error while executing action HostSetupNetworks: Network error during communication with the Host

So i'm not sure how i can check that.

Comment 20 Edward Haas 2016-07-25 08:09:41 UTC
In this case, I would suggest closing this ticket and document that we do not support at the moment the creation of host networks with overlapping subnets.

Comment 21 Edward Haas 2016-07-25 08:27:21 UTC
We do not support at the moment the creation of host networks with overlapping subnets.