Bug 1313586 - Host unreacheble if vdsm-network fail to setup
Summary: Host unreacheble if vdsm-network fail to setup
Keywords:
Status: CLOSED UPSTREAM
Alias: None
Product: vdsm
Classification: oVirt
Component: Services
Version: 4.18.0
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ovirt-4.0.0-beta
: ---
Assignee: Edward Haas
QA Contact: Meni Yakove
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-03-02 00:43 UTC by Badalyan Vyacheslav
Modified: 2016-06-21 20:44 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-05-22 13:29:19 UTC
oVirt Team: Network
Embargoed:
ykaul: ovirt-4.0.0?
rule-engine: planning_ack?
rule-engine: devel_ack?
rule-engine: testing_ack?


Attachments (Terms of Use)

Description Badalyan Vyacheslav 2016-03-02 00:43:36 UTC
Description of problem:

If host have some errors and cant't finish job then host is unacessebe. 
Errors may be -
1. Eth driver does not accept MTU 9000
2. Host allready have ovirt interfaces and vlan bridge to it. VSDN add new brudge and we get bad loop in provider.
3. After change network host is unreacheble by routing or iptablees issue.

Expected results:

Rollback confuguration to last worked and send error with details to Dashbboard.

Comment 1 Badalyan Vyacheslav 2016-03-02 01:04:40 UTC
My configuration

2-4 ETH -> Bonding LACP -> VLANs (3-5 vlans. 1500 and 900 MTU. Managment in  vlan with MTU 9000)

Comment 2 Badalyan Vyacheslav 2016-03-02 01:51:35 UTC
Also here

MainProcess|jsonrpc.Executor/5::ERROR::2016-02-17 23:26:59,345::supervdsmServer::118::SuperVdsm.ServerCallback::(wrapper) Error in setupNetworks
Traceback (most recent call last):
  File "/usr/share/vdsm/supervdsmServer", line 116, in wrapper
    res = func(*args, **kwargs)
  File "/usr/share/vdsm/supervdsmServer", line 241, in setupNetworks
    return setupNetworks(networks, bondings, **options)
  File "/usr/share/vdsm/network/api.py", line 939, in setupNetworks
    logger, _netinfo)
  File "/usr/share/vdsm/network/api.py", line 768, in _add_missing_networks
    implicitBonding=True, _netinfo=_netinfo, **d)
  File "/usr/share/vdsm/network/api.py", line 222, in wrapped
    ret = func(**attrs)
  File "/usr/share/vdsm/network/api.py", line 314, in _addNetwork
    _netinfo=_netinfo, configurator=configurator, opts=options)
  File "/usr/share/vdsm/network/api.py", line 138, in _objectivizeNetwork
    nics, mtu, _netinfo, implicitBonding)
  File "/usr/share/vdsm/network/models.py", line 301, in objectivize
    destroyOnMasterRemoval=destroyOnMasterRemoval)
  File "/usr/share/vdsm/network/models.py", line 209, in __init__
    self.validateOptions(options)
  File "/usr/share/vdsm/network/models.py", line 341, in validateOptions
    'valid bonding option' % key)
ConfigNetworkError: (25, "'miinmon' is not a valid bonding option")

Comment 3 Dan Kenigsberg 2016-03-02 09:37:01 UTC
Please include the complete supervdsm.log, as a rollback should have taken place, and specify the precise vdsm version and release (4.18.0 has not yet been released).

Comment 4 Badalyan Vyacheslav 2016-03-13 16:53:58 UTC
i recreate all configurations and get worked.

1. If i have 2 networks with gateway, vdsm create Route Tables with bad. If you host do forwardng and packet recive from NETWORK1, ip route look to route table 1 and don't see second network. Route tables must know about ALL networks.

2. If i save network, in vdsm logs i see many PING HOSTS and its not save. In Engine forewer task SetUp Network. Only reboot engine helps. If i dont check "test connection to engine" - all done normal!

3. If default GW or some cluster host is unreacheble for engine host. Engine do forewer check. But GW is HA VM on one of host. I must emulate GW in network to engine start and run VM!

Part 3 its VERY bad! Engine (HOSTED) MUST CAN start in degress mode without all storage domains online! It's must check only storage domain needed to start ENGINE. I can't access to Engine WEB if one of hosts with storages is offline. Is do meny trubles to start datacenert after mass power down!

Comment 5 Badalyan Vyacheslav 2016-03-13 16:54:35 UTC
i recreate all configurations and get worked.

1. If i have 2 networks with gateway, vdsm create Route Tables with bad. If you host do forwardng and packet recive from NETWORK1, ip route look to route table 1 and don't see second network. Route tables must know about ALL networks.

2. If i save network, in vdsm logs i see many PING HOSTS and its not save. In Engine forewer task SetUp Network. Only reboot engine helps. If i dont check "test connection to engine" - all done normal!

3. If default GW or some cluster host is unreacheble for engine host. Engine do forewer check. But GW is HA VM on one of host. I must emulate GW in network to engine start and run VM!

Part 3 its VERY bad! Engine (HOSTED) MUST CAN start in degress mode without all storage domains online! It's must check only storage domain needed to start ENGINE. I can't access to Engine WEB if one of hosts with storages is offline. Is do meny trubles to start datacenert after mass power down!

Comment 6 Dan Kenigsberg 2016-03-14 08:09:45 UTC
Badalyan, this bug was opened about Vdsm's failure to handle mistyped "miinmon" on boot time. Vdsm should have rejected this bond option in the first place, hence my requests of comment 3. Please provide the information that was requested there.


Regarding your three new points: I must admit that I do not understand the worries expressed there, but they seem unrelated to this bug. Would you open open a fresh bug for each, and remember to include package versions and all relevant logs.

Comment 7 Sandro Bonazzola 2016-05-02 10:05:16 UTC
Moving from 4.0 alpha to 4.0 beta since 4.0 alpha has been already released and bug is not ON_QA.

Comment 8 Dan Kenigsberg 2016-05-22 13:29:19 UTC
Please reopen when the information rerequested in comment 6 is available.

Comment 9 Badalyan Vyacheslav 2016-06-21 20:44:20 UTC
Fixed in 3.6.6


Note You need to log in before you can comment on or make changes to this bug.