Description of problem: Hosted Engine installation fails during the ovirtmgmt interface configuration if a VLAN exists on the same interface. Version-Release number of selected component (if applicable): How reproducible: 100% Steps to Reproduce: 1. Configure a VLAN on the same interface of the future ovirtmgmt interface 2. Try to deploy the hosted engine and wait it to fail Actual results: Unable to deploy/install hosted engine Expected results: Hosted engine deployed correctly Additional info: I'm using a VLAN on the same interface to achieve better networking on shared storage for hosted engine, since it's not possible to use multiple paths on iSCSI yet I was considering LACP with NFS on a separate VLAN just for Hosted Engine. But this is appears to be not supported or broken. Here is the relevant interface configurations : 11: bond0: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP qlen 1000 link/ether 40:f2:e9:f3:5c:62 brd ff:ff:ff:ff:ff:ff inet 146.164.37.103/24 brd 146.164.37.255 scope global bond0 valid_lft forever preferred_lft forever inet6 fe80::42f2:e9ff:fef3:5c62/64 scope link valid_lft forever preferred_lft forever 13: bond0.10@bond0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP qlen 1000 link/ether 40:f2:e9:f3:5c:62 brd ff:ff:ff:ff:ff:ff inet 192.168.10.3/28 brd 192.168.10.15 scope global bond0.10 valid_lft forever preferred_lft forever inet6 fe80::42f2:e9ff:fef3:5c62/64 scope link valid_lft forever preferred_lft forever Finally there are two tries, one using the NFS storage on the VLAN interface and the second one on the management interface, but keeping the VLAN interface configured: FIRST TRY (NFSv4 on VLAN Interface): --== CONFIGURATION PREVIEW ==-- Bridge interface : bond0 Engine FQDN : ovirt.cc.if.ufrj.br Bridge name : ovirtmgmt Host address : ovirt3.cc.if.ufrj.br SSH daemon port : 22 Firewall manager : iptables Gateway address : 146.164.37.1 Storage Domain type : nfs4 Image size GB : 50 Host ID : 1 Storage connection : 192.168.10.14:/mnt/pool0/ovirt/he Memory size MB : 16384 Console type : vnc Number of CPUs : 4 MAC address : 00:16:3e:6a:7a:f9 OVF archive (for disk boot) : /usr/share/ovirt-engine-appliance/ovirt-engine-appliance-4.2-20171219.1.el7.centos.ova Appliance version : 4.2-20171219.1.el7.centos Restart engine VM after engine-setup: True Engine VM timezone : America/Sao_Paulo CPU Type : model_SandyBridge Please confirm installation settings (Yes, No)[Yes]: [ INFO ] Stage: Transaction setup [ INFO ] Stage: Misc configuration [ INFO ] Stage: Package installation [ INFO ] Stage: Misc configuration [ INFO ] Configuring libvirt [ INFO ] Configuring VDSM [ INFO ] Starting vdsmd [ INFO ] Configuring the management bridge [ ERROR ] Failed to execute stage 'Misc configuration': Failed to setup networks {'ovirtmgmt': {'bonding': 'bond0', 'bootproto': 'dhcp', 'blockingdhcp': True, 'defaultRoute': True}}. Error: "Command Host.setupNetworks with args {'bondings': {}, 'options': {'connectivityCheck': False}, 'networks': {'ovirtmgmt': {'bonding': 'bond0', 'bootproto': 'dhcp', 'blockingdhcp': True, 'defaultRoute': True}}} failed: (code=-32603, message=Internal JSON-RPC error: {'reason': "Attempt to call function: <bound method Global.setupNetworks of <vdsm.API.Global object at 0x3fbb0d0>> with arguments: ({u'ovirtmgmt': {u'bonding': u'bond0', u'bootproto': u'dhcp', u'blockingdhcp': True, u'defaultRoute': True}}, {}, {u'connectivityCheck': False}) error: 'NoneType' object is not iterable"})" [ INFO ] Yum Performing yum transaction rollback [ INFO ] Stage: Clean up [ INFO ] Generating answer file '/var/lib/ovirt-hosted-engine-setup/answers/answers-20180103232130.conf' [ INFO ] Stage: Pre-termination [ INFO ] Stage: Termination [ ERROR ] Hosted Engine deployment failed: this system is not reliable, please check the issue,fix and redeploy Log file is located at /var/log/ovirt-hosted-engine-setup/ovirt-hosted-engine-setup-20180103210452-utpu4b.log SECOND TRY (NFSv4 directly on management interface (with VLAN still enabled)): --== CONFIGURATION PREVIEW ==-- Bridge interface : bond0 Engine FQDN : ovirt.cc.if.ufrj.br Bridge name : ovirtmgmt Host address : ovirt3.cc.if.ufrj.br SSH daemon port : 22 Firewall manager : iptables Gateway address : 146.164.37.1 Storage Domain type : nfs4 Image size GB : 50 Host ID : 1 Storage connection : 146.164.37.20:/mnt/pool0/ovirt/he Memory size MB : 16384 Console type : vnc Number of CPUs : 4 MAC address : 00:16:3e:71:19:ab OVF archive (for disk boot) : /usr/share/ovirt-engine-appliance/ovirt-engine-appliance-4.2-20171219.1.el7.centos.ova Appliance version : 4.2-20171219.1.el7.centos Restart engine VM after engine-setup: True Engine VM timezone : America/Sao_Paulo CPU Type : model_SandyBridge Please confirm installation settings (Yes, No)[Yes]: [ INFO ] Stage: Transaction setup [ INFO ] Stage: Misc configuration [ INFO ] Stage: Package installation [ INFO ] Stage: Misc configuration [ INFO ] Configuring libvirt [ INFO ] Configuring VDSM [ INFO ] Starting vdsmd [ INFO ] Configuring the management bridge [ ERROR ] Failed to execute stage 'Misc configuration': Failed to setup networks {'ovirtmgmt': {'bonding': 'bond0', 'bootproto': 'dhcp', 'blockingdhcp': True, 'defaultRoute': True}}. Error: "Command Host.setupNetworks with args {'bondings': {}, 'options': {'connectivityCheck': False}, 'networks': {'ovirtmgmt': {'bonding': 'bond0', 'bootproto': 'dhcp', 'blockingdhcp': True, 'defaultRoute': True}}} failed: (code=-32603, message=Internal JSON-RPC error: {'reason': "Attempt to call function: <bound method Global.setupNetworks of <vdsm.API.Global object at 0x2e9ad50>> with arguments: ({u'ovirtmgmt': {u'bonding': u'bond0', u'bootproto': u'dhcp', u'blockingdhcp': True, u'defaultRoute': True}}, {}, {u'connectivityCheck': False}) error: 'NoneType' object is not iterable"})" [ INFO ] Yum Performing yum transaction rollback [ INFO ] Stage: Clean up [ INFO ] Generating answer file '/var/lib/ovirt-hosted-engine-setup/answers/answers-20180104000405.conf' [ INFO ] Stage: Pre-termination [ INFO ] Stage: Termination [ ERROR ] Hosted Engine deployment failed: this system is not reliable, please check the issue,fix and redeploy Log file is located at /var/log/ovirt-hosted-engine-setup/ovirt-hosted-engine-setup-20180104000050-o3a91g.log Additional log files are available on this link: http://www.if.ufrj.br/~ferrao/ovirt
Created attachment 1376680 [details] ovirt-hosted-engine-setup-20180103210452-utpu4b.log
Created attachment 1376681 [details] ovirt-hosted-engine-setup-20180104000050-o3a91g.log
I attached the two logs from http://www.if.ufrj.br/~ferrao/ovirt . Please attach also /var/log/vdsm/* . Thanks!
Created attachment 1376982 [details] vdsm.log
Created attachment 1376983 [details] upgrade.log
Created attachment 1376984 [details] supervdsm.log
Created attachment 1376985 [details] mom.log
This seems to be related to this BZ: https://bugzilla.redhat.com/show_bug.cgi?id=1523661 If you have an option to check the fix on your scenario it will be great: https://gerrit.ovirt.org/#/c/85243
Not sure if it's the same problem, although is definitely similar. This is because in my case I don't lose the connection to the oVirt Host. It just fails when trying to deploy the Hosted Engine. The ovirtmgmt is untagged when the storage traffic for the HE is tagged on the same interface. But anyway, I can test with the patches. I'm only a bit lost of what to do. I can't find an .iso on the repo with the master: http://resources.ovirt.org/pub/ovirt-master-snapshot On the other thread Bernhard Seidl said something about ovirt-node-ng-installer-master-2018010109. Can you point the directions to the file? Thanks, V.
(In reply to Vinícius Ferrão from comment #9) > Can you point the directions to the file? Please see this page for links to iso files for all versions: https://www.ovirt.org/node/
Thanks Didi. Just to feed more information: I'm downloading the ISO right now. I don't know what happened but the download speed is at 9KB/s, so it will take a full day to complete if things won't get better. In the meantime I've applied the patch from https://gerrit.ovirt.org/#/c/85243/ by hand on my existing installation and it failed exactly on the same point with the same characteristics. I'll be trying again on the master release but I don't know if stills necessary. Thanks, V.
(In reply to Vinícius Ferrão from comment #11) > > In the meantime I've applied the patch from > https://gerrit.ovirt.org/#/c/85243/ by hand on my existing installation and > it failed exactly on the same point with the same characteristics. I'll be > trying again on the master release but I don't know if stills necessary. Have you restarted supervdsmd service after the change?
Hello Edward, I’ve rebooted the machine just to be safe.
Hello, I've installed the master release and the issue persisted: [root@ovirt3 vdsm]# imgbase w You are on ovirt-node-ng-4.2.1-0.20180108.0+1 [ INFO ] Configuring the management bridge [ ERROR ] Failed to execute stage 'Misc configuration': Failed to setup networks {'ovirtmgmt': {'bonding': 'bond0', 'ipaddr': u'146.164.37.103', 'netmask': u'255.255.255.0', 'defaultRoute': True, 'gateway': u'146.164.37.1'}}. Error: "Command Host.setupNetworks with args {'bondings': {}, 'options': {'connectivityCheck': False}, 'networks': {'ovirtmgmt': {'bonding': 'bond0', 'ipaddr': u'146.164.37.103', 'netmask': u'255.255.255.0', 'defaultRoute': True, 'gateway': u'146.164.37.1'}}} failed: (code=-32603, message=Internal JSON-RPC error: {'reason': "Attempt to call function: <bound method Global.setupNetworks of <vdsm.API.Global object at 0x4077390>> with arguments: ({u'ovirtmgmt': {u'bonding': u'bond0', u'ipaddr': u'146.164.37.103', u'netmask': u'255.255.255.0', u'defaultRoute': True, u'gateway': u'146.164.37.1'}}, {}, {u'connectivityCheck': False}) error: 'NoneType' object is not iterable"})" [ INFO ] Yum Performing yum transaction rollback [ INFO ] Stage: Clean up [ INFO ] Generating answer file '/var/lib/ovirt-hosted-engine-setup/answers/answers-20180108141407.conf' [ INFO ] Stage: Pre-termination [ INFO ] Stage: Termination [ ERROR ] Hosted Engine deployment failed: this system is not reliable, please check the issue,fix and redeploy Log file is located at /var/log/ovirt-hosted-engine-setup/ovirt-hosted-engine-setup-20180108133646-hippw5.log
Created attachment 1378588 [details] Log files Attaching all the log related files from the version ovirt-node-ng-4.2.1-0.20180108.0+1.
There are two patches that should fix the problem you see. Could you please try them out? https://gerrit.ovirt.org/86131 https://gerrit.ovirt.org/86138
Created attachment 1379273 [details] News logfiles I've deployed the patches and it handled the situation better, but it unsets the IP address on my VLAN interface, which is needed for storage network traffic and deploying the Hosted Engine. And the end it fails with unreachable network storage, which is expected since bond0.10@bond0 is unset. I'm attaching the new logs.
(In reply to Vinícius Ferrão from comment #17) > Created attachment 1379273 [details] > News logfiles > > I've deployed the patches and it handled the situation better, but it unsets > the IP address on my VLAN interface, which is needed for storage network > traffic and deploying the Hosted Engine. > > And the end it fails with unreachable network storage, which is expected > since bond0.10@bond0 is unset. This problem is beyond a simple bug, it is a scenario which has never been supported so far. VDSM acquires bond0, but is not instructed to acquire bond0.10. In such a case, bond0.10 is not in VDSM control, but probably under NetworkManager, and in such a case, things start to get weird. It is a grey area between NetworkManager controlled and ifcfg controlled devices. It could have been solved if the VLAN/Storage network is created by oVirt and not externally, because then we acquire the VLAN as well.
Hello Edward, thanks for the quick response. What I’m trying to achieve is some isolation from the storage network exclusively for the Hosted Engine from the ovirtmgmt interface itself. I wasn’t aware that’s not a supported scenario. With this in mind some points come to question: 1. Even if this continues as unsupported the yesterday patch is still relevant, since it stops breaking the installation. 2. Can this be a supported scenario? This ticket can be changed for a RFE instead? 3. I’m missing the point when you said: It could have been solved if The was created by oVirt. Since I’m on the installation phase, trying to deploy the Hosted engine, there’s a way to do this? If yes what should be edited is the documentation. Thanks, V.
(In reply to Vinícius Ferrão from comment #19) > Hello Edward, thanks for the quick response. > > What I’m trying to achieve is some isolation from the storage network > exclusively for the Hosted Engine from the ovirtmgmt interface itself. I > wasn’t aware that’s not a supported scenario. With this in mind some points > come to question: > > 1. Even if this continues as unsupported the yesterday patch is still > relevant, since it stops breaking the installation. Right, this BZ exposed several problems with acquiring NetworkManager devices. > > 2. Can this be a supported scenario? This ticket can be changed for a RFE > instead? I suggest opening a new BZ with a request to optionally define the storage network during the setup stage. > > 3. I’m missing the point when you said: It could have been solved if The was > created by oVirt. Since I’m on the installation phase, trying to deploy the > Hosted engine, there’s a way to do this? If yes what should be edited is the > documentation. What you can try an do is something like this: - Start VDSM (vdsmd service). - Set the storage network using vdsm. vdsm-client -f network.json Host setupNetworks - Start the setup as done so far. cat network.json: {"networks": {"storage": {"bonding": "bond0", "bridged": False, "vlan": 10, "ipaddr": "192.168.10.3", "netmask": "255.255.255.240", "gateway": "x.x.x.x", "defaultRoute": False}}, "bondings": {}, "options": {"connectivityCheck": false}} I have not tried it myself.
Created attachment 1379651 [details] Screenshot and patches Edward, following your recommendation on setting the storage network with VDSM solved the issue and I was able to deploy a functional hosted engine. I've removed the gateway from the example JSON and fixed the capitalisation on "False" because it was not accepted by vdsm-client until I rewrite false in lowercase. There's only one caveat: . After the successfully deployed hosted engine the storage network appears as unmanaged, as you can see on image that I will attach with the respective logs. There's anything that I should do, or this should be handled by oVirt Hosted Engine Installation Script? Thanks, V.
Additional findings: if I create a network with the exact same name inside oVirt Hosted Engine it will be defined as handled by oVirt/VDSM and does not appears as unmanaged anymore. To do this, I've gone to Network -> Networks and clicked on New. Matched the information about the name and VLAN tagging and clicked OK. It worked automatically. So my guessing is: this should be handled during the HE deploy phase. If additional screenshots are needed please let me know. Thanks, V.
(In reply to Vinícius Ferrão from comment #22) > Additional findings: if I create a network with the exact same name inside > oVirt Hosted Engine it will be defined as handled by oVirt/VDSM and does not > appears as unmanaged anymore. > > To do this, I've gone to Network -> Networks and clicked on New. Matched the > information about the name and VLAN tagging and clicked OK. It worked > automatically. > > So my guessing is: this should be handled during the HE deploy phase. If > additional screenshots are needed please let me know. Yes, this is the correct path to take in order to make it "managed". Regarding embedding all this in the deploy phase, it should probably discussed under the RFE hat. I am not sure if the deploy phase can (or want to) touch Engine configuration.
I suggest finalizing this BZ with the given patches and a manual workaround of creating a network over the vlan. Please open an RFE for proposing to support this scenario as part of the deployment phase.
Thanks Edward, I'm closing the issue with "NEXTRELEASE"; not sure if this is what I should do. But to confirm, oVirt 4.2.1 would have your new patches, right? Thanks, V.
(In reply to Edward Haas from comment #24) > I suggest finalizing this BZ with the given patches and a manual workaround > of creating a network over the vlan. > > Please open an RFE for proposing to support this scenario as part of the > deployment phase. AFAIK it should be already there, it's just about using it. On my opinion the issue is just here: 2018-01-03 23:17:24,735-0200 DEBUG otopi.plugins.otopi.dialog.human dialog.__logString:204 DIALOG:SEND Please indicate a nic to set ovirtmgmt bridge on: (bond0, bond0.10, eno3, enp0s29u1u1u5, eno4) [bond0]: 2018-01-03 23:17:29,307-0200 DEBUG otopi.context context.dumpEnvironment:821 ENVIRONMENT DUMP - BEGIN 2018-01-03 23:17:29,308-0200 DEBUG otopi.context context.dumpEnvironment:831 ENV OVEHOSTED_NETWORK/bridgeIf=str:'bond0' and, in the other attempt: 2018-01-04 00:01:10,132-0200 DEBUG otopi.plugins.gr_he_common.network.bridge bridge._customization:219 Nics valid: bond0,bond0.10,eno3,enp0s29u1u1u5,eno4 2018-01-04 00:01:10,136-0200 DEBUG otopi.plugins.otopi.dialog.human human.queryString:158 query ovehosted_bridge_if 2018-01-04 00:01:10,136-0200 DEBUG otopi.plugins.otopi.dialog.human dialog.__logString:204 DIALOG:SEND Please indicate a nic to set ovirtmgmt bridge on: (bond0, bond0.10, eno3, enp0s29u1u1u5, eno4) [bond0]: 2018-01-04 00:01:11,344-0200 DEBUG otopi.context context.dumpEnvironment:821 ENVIRONMENT DUMP - BEGIN 2018-01-04 00:01:11,344-0200 DEBUG otopi.context context.dumpEnvironment:831 ENV OVEHOSTED_NETWORK/bridgeIf=str:'bond0' If you choose bond0, hosted-engine setup will try to create the management bridge over bond0 with no vlan. If you enter bond0.10, hosted-engine setup will try to create the management bridge over bond0.10 correctly setting the vlan tag also in the definition of the logical network in the engine. Could you please retry entering bond0.10 on this question: Please indicate a nic to set ovirtmgmt bridge on: (bond0, bond0.10, eno3, enp0s29u1u1u5, eno4) [bond0]: Please reopen the bug if it doesn't work when you choose bond0.10 since in that case it's a regression.
(In reply to Simone Tiraboschi from comment #26) > > If you choose bond0, hosted-engine setup will try to create the management > bridge over bond0 with no vlan. > > If you enter bond0.10, hosted-engine setup will try to create the management > bridge over bond0.10 correctly setting the vlan tag also in the definition > of the logical network in the engine. This is not the scenario in question here. He has bond0 on which the management will be created and bond0.10 on which storage communication is performed. He chooses bond0 as intended and as a consequence looses bond0.10 IP settings. (bond0.10 is not acquired by ovirt at any stage and the workaround proposed is there to take care of that)
Hello Simone, I don't think this will work. The oVirt Management interface should only be bond0. Later it asks for the Hosted Engine IP Address and I use an address from this network: 146.164.37.100/24; as you can see on the logs. The VLAN interface on top of bond0 is used exclusively for the Hosted Engine NFS Storage, so it's a non routed network, it talks only with the other oVirt Hosts and the Storage. So it's not a management interface. It has the IP address 192.168.10.3/28. Thanks, V. PS: I was writing this during Edward posting, sorry if it's redundant.
(In reply to Edward Haas from comment #27) > (In reply to Simone Tiraboschi from comment #26) > > > > If you choose bond0, hosted-engine setup will try to create the management > > bridge over bond0 with no vlan. > > > > If you enter bond0.10, hosted-engine setup will try to create the management > > bridge over bond0.10 correctly setting the vlan tag also in the definition > > of the logical network in the engine. > > This is not the scenario in question here. > He has bond0 on which the management will be created and bond0.10 on which > storage communication is performed. > He chooses bond0 as intended and as a consequence looses bond0.10 IP > settings. > (bond0.10 is not acquired by ovirt at any stage and the workaround proposed > is there to take care of that) Ok, sorry, so let's reopen https://bugzilla.redhat.com/show_bug.cgi?id=1533624