This service will be undergoing maintenance at 00:00 UTC, 2017-10-23 It is expected to last about 30 minutes
Bug 1313586 - Host unreacheble if vdsm-network fail to setup
Host unreacheble if vdsm-network fail to setup
Status: CLOSED UPSTREAM
Product: vdsm
Classification: oVirt
Component: Services (Show other bugs)
4.18.0
x86_64 Linux
unspecified Severity high (vote)
: ovirt-4.0.0-beta
: ---
Assigned To: Edward Haas
Meni Yakove
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2016-03-01 19:43 EST by Badalyan Vyacheslav
Modified: 2016-06-21 16:44 EDT (History)
3 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2016-05-22 09:29:19 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: Network
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
ykaul: ovirt‑4.0.0?
rule-engine: planning_ack?
rule-engine: devel_ack?
rule-engine: testing_ack?


Attachments (Terms of Use)

  None (edit)
Description Badalyan Vyacheslav 2016-03-01 19:43:36 EST
Description of problem:

If host have some errors and cant't finish job then host is unacessebe. 
Errors may be -
1. Eth driver does not accept MTU 9000
2. Host allready have ovirt interfaces and vlan bridge to it. VSDN add new brudge and we get bad loop in provider.
3. After change network host is unreacheble by routing or iptablees issue.

Expected results:

Rollback confuguration to last worked and send error with details to Dashbboard.
Comment 1 Badalyan Vyacheslav 2016-03-01 20:04:40 EST
My configuration

2-4 ETH -> Bonding LACP -> VLANs (3-5 vlans. 1500 and 900 MTU. Managment in  vlan with MTU 9000)
Comment 2 Badalyan Vyacheslav 2016-03-01 20:51:35 EST
Also here

MainProcess|jsonrpc.Executor/5::ERROR::2016-02-17 23:26:59,345::supervdsmServer::118::SuperVdsm.ServerCallback::(wrapper) Error in setupNetworks
Traceback (most recent call last):
  File "/usr/share/vdsm/supervdsmServer", line 116, in wrapper
    res = func(*args, **kwargs)
  File "/usr/share/vdsm/supervdsmServer", line 241, in setupNetworks
    return setupNetworks(networks, bondings, **options)
  File "/usr/share/vdsm/network/api.py", line 939, in setupNetworks
    logger, _netinfo)
  File "/usr/share/vdsm/network/api.py", line 768, in _add_missing_networks
    implicitBonding=True, _netinfo=_netinfo, **d)
  File "/usr/share/vdsm/network/api.py", line 222, in wrapped
    ret = func(**attrs)
  File "/usr/share/vdsm/network/api.py", line 314, in _addNetwork
    _netinfo=_netinfo, configurator=configurator, opts=options)
  File "/usr/share/vdsm/network/api.py", line 138, in _objectivizeNetwork
    nics, mtu, _netinfo, implicitBonding)
  File "/usr/share/vdsm/network/models.py", line 301, in objectivize
    destroyOnMasterRemoval=destroyOnMasterRemoval)
  File "/usr/share/vdsm/network/models.py", line 209, in __init__
    self.validateOptions(options)
  File "/usr/share/vdsm/network/models.py", line 341, in validateOptions
    'valid bonding option' % key)
ConfigNetworkError: (25, "'miinmon' is not a valid bonding option")
Comment 3 Dan Kenigsberg 2016-03-02 04:37:01 EST
Please include the complete supervdsm.log, as a rollback should have taken place, and specify the precise vdsm version and release (4.18.0 has not yet been released).
Comment 4 Badalyan Vyacheslav 2016-03-13 12:53:58 EDT
i recreate all configurations and get worked.

1. If i have 2 networks with gateway, vdsm create Route Tables with bad. If you host do forwardng and packet recive from NETWORK1, ip route look to route table 1 and don't see second network. Route tables must know about ALL networks.

2. If i save network, in vdsm logs i see many PING HOSTS and its not save. In Engine forewer task SetUp Network. Only reboot engine helps. If i dont check "test connection to engine" - all done normal!

3. If default GW or some cluster host is unreacheble for engine host. Engine do forewer check. But GW is HA VM on one of host. I must emulate GW in network to engine start and run VM!

Part 3 its VERY bad! Engine (HOSTED) MUST CAN start in degress mode without all storage domains online! It's must check only storage domain needed to start ENGINE. I can't access to Engine WEB if one of hosts with storages is offline. Is do meny trubles to start datacenert after mass power down!
Comment 5 Badalyan Vyacheslav 2016-03-13 12:54:35 EDT
i recreate all configurations and get worked.

1. If i have 2 networks with gateway, vdsm create Route Tables with bad. If you host do forwardng and packet recive from NETWORK1, ip route look to route table 1 and don't see second network. Route tables must know about ALL networks.

2. If i save network, in vdsm logs i see many PING HOSTS and its not save. In Engine forewer task SetUp Network. Only reboot engine helps. If i dont check "test connection to engine" - all done normal!

3. If default GW or some cluster host is unreacheble for engine host. Engine do forewer check. But GW is HA VM on one of host. I must emulate GW in network to engine start and run VM!

Part 3 its VERY bad! Engine (HOSTED) MUST CAN start in degress mode without all storage domains online! It's must check only storage domain needed to start ENGINE. I can't access to Engine WEB if one of hosts with storages is offline. Is do meny trubles to start datacenert after mass power down!
Comment 6 Dan Kenigsberg 2016-03-14 04:09:45 EDT
Badalyan, this bug was opened about Vdsm's failure to handle mistyped "miinmon" on boot time. Vdsm should have rejected this bond option in the first place, hence my requests of comment 3. Please provide the information that was requested there.


Regarding your three new points: I must admit that I do not understand the worries expressed there, but they seem unrelated to this bug. Would you open open a fresh bug for each, and remember to include package versions and all relevant logs.
Comment 7 Sandro Bonazzola 2016-05-02 06:05:16 EDT
Moving from 4.0 alpha to 4.0 beta since 4.0 alpha has been already released and bug is not ON_QA.
Comment 8 Dan Kenigsberg 2016-05-22 09:29:19 EDT
Please reopen when the information rerequested in comment 6 is available.
Comment 9 Badalyan Vyacheslav 2016-06-21 16:44:20 EDT
Fixed in 3.6.6

Note You need to log in before you can comment on or make changes to this bug.