Bug 1317125 - hosted engine deployment fails when ovirtmgmt bridge is manually created
Summary: hosted engine deployment fails when ovirtmgmt bridge is manually created
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-hosted-engine-setup
Version: 3.6.3
Hardware: Unspecified
OS: Unspecified
low
low
Target Milestone: ---
: ---
Assignee: Simone Tiraboschi
QA Contact: Nikolai Sednev
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-03-12 07:33 UTC by Roman Hodain
Modified: 2022-04-16 09:23 UTC (History)
12 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-03-20 14:19:34 UTC
oVirt Team: Integration
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 1586280 0 low CLOSED SetupNetworks fails to create the management bridge over vlan over a bond if the untagged bond is configured with dhcp b... 2022-06-27 07:52:45 UTC
Red Hat Issue Tracker RHV-45746 0 None None None 2022-04-16 09:23:45 UTC
Red Hat Knowledge Base (Solution) 2199501 0 None None None 2016-03-15 14:28:48 UTC

Internal Links: 1586280

Description Roman Hodain 2016-03-12 07:33:05 UTC
Description of problem:
When ovirtmgmt bridge is manually created by the user. For example by using ifcfg-* files. The hosted engine deployment will fail.

Version-Release number of selected component (if applicable):
ovirt-hosted-engine-setup-1.3.3.4-1.el7ev

How reproducible:
100%

Steps to Reproduce:
1. Fresh installation of RHEL7
2. Install required packages for hosted engine
3. configure the ovirtmgmt bridge via ifcfg-& file
4. run hosted-engine --deploy and provide information so the deployment can start.

Actual results:
2016-03-11 13:58:01 DEBUG otopi.context context._executeMethod:156 method exception
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/otopi/context.py", line 146, in _executeMethod
    method['method']()
  File "/usr/share/ovirt-hosted-engine-setup/scripts/../plugins/ovirt-hosted-engine-setup/network/bridge.py", line 295, in _get_hostname_from_bridge_if
    raise RuntimeError(_('Cannot acquire bridge address'))
RuntimeError: Cannot acquire bridge address
2016-03-11 13:58:01 ERROR otopi.context context._executeMethod:165 Failed to execute stage 'Setup validation': Cannot acquire bridge address

Expected results:
There are two possible approaches:

   - Hosted-engine-setup will handle this properly and finish the installation.

   - Hosted-engine-setup will exit with reasonable error message.

Additional info:

This issue is caused by the fact that is the hosted engine detects that the ovirtmgmt bridge exists and expects that vdsm is aware of a network called ovirtmgmt. Vdsm is not aware of any network called ovirtmgmt as the bridge was set manually by an admin. The list of networks provide by vdsm is empty at this stage.

Comment 1 Yaniv Kaul 2016-03-13 07:50:03 UTC
Roman - why would the bridge be set manually? Is that supported?

Comment 2 Roman Hodain 2016-03-15 11:57:33 UTC
(In reply to Yaniv Kaul from comment #1)
> Roman - why would the bridge be set manually? Is that supported?
Well we do not ask the user to create the bridge, but we do not say that it is supported or not. The problem is that it is really not clear why the deployment failed. The tool should react properly and provide proper information why it fails. This is a corner case, but still it should be handled.

Comment 3 Yaniv Kaul 2016-03-15 12:21:17 UTC
(In reply to Roman Hodain from comment #2)
> (In reply to Yaniv Kaul from comment #1)
> > Roman - why would the bridge be set manually? Is that supported?
> Well we do not ask the user to create the bridge, but we do not say that it
> is supported or not. The problem is that it is really not clear why the
> deployment failed. The tool should react properly and provide proper
> information why it fails. This is a corner case, but still it should be
> handled.

I'm in favor of agreeing it's unsupported. There are so many unsupported scenarios - we can't list them all.

Comment 4 Simone Tiraboschi 2016-03-15 13:09:07 UTC
By the way running hosted-engine-setup on an host where the management bridge has been created by a previous attempt is not an issue and VDSM, and so hosted-engine-setup, recognizes it.

The issue is just where the bridge has been manually created and due to some other reasons is not recognized by VDSM.

Comment 8 Yaniv Lavi 2016-03-20 14:19:34 UTC
(In reply to Yaniv Kaul from comment #3)
> (In reply to Roman Hodain from comment #2)
> > (In reply to Yaniv Kaul from comment #1)
> > > Roman - why would the bridge be set manually? Is that supported?
> > Well we do not ask the user to create the bridge, but we do not say that it
> > is supported or not. The problem is that it is really not clear why the
> > deployment failed. The tool should react properly and provide proper
> > information why it fails. This is a corner case, but still it should be
> > handled.
> 
> I'm in favor of agreeing it's unsupported. There are so many unsupported
> scenarios - we can't list them all.

ack. closing.

Comment 9 Ari Lemmke 2017-09-10 11:17:30 UTC
First of all if you have this kind of things built in you
should _always_ check preconditions.
(I do know it is a mental thing. pun intended)

Our case is that the only connection to that blade is through
ovirtmgmt which is on bond0.2190 which is on bond0 which is on
enp2s0f0 and enp2s0f1 and the system is closed. No other way to connect
using a network. and that is going to be the only way to connect to
the system.

//arl        Ari Lemmke (look Wikipedia)

(self censored most of text I would have written, sorry)

Comment 10 Yaniv Kaul 2017-09-10 13:12:06 UTC
Perhaps we should improve the error message, and suggest to change the value of VEHOSTED_NETWORK/bridgeIf ?

Comment 11 Sandro Bonazzola 2017-09-20 08:38:00 UTC
(In reply to Yaniv Kaul from comment #10)
> Perhaps we should improve the error message, and suggest to change the value
> of VEHOSTED_NETWORK/bridgeIf ?

No problem for me adding better error text. Simone can you handle in a separate BZ?

(In reply to Ari Lemmke from comment #9)

> Our case is that the only connection to that blade is through
> ovirtmgmt which is on bond0.2190 which is on bond0 which is on
> enp2s0f0 and enp2s0f1 and the system is closed. No other way to connect
> using a network. and that is going to be the only way to connect to
> the system.

Simone, chance that this case can be handled by engine if we deploy hosted engine as in Jenny's POC?

Comment 12 Simone Tiraboschi 2017-09-20 09:40:47 UTC
(In reply to Yaniv Kaul from comment #10)
> Perhaps we should improve the error message, and suggest to change the value
> of VEHOSTED_NETWORK/bridgeIf ?

It's not just that: with OVEHOSTED_NETWORK/bridgeName we could honor a custom management bridge name if we are going to restore a backup of a previously existing engine where the management bridge has been customized but it will not force a custom name into a freshly deployed engine.

See:
https://bugzilla.redhat.com/show_bug.cgi?id=1231799#c24


With OVEHOSTED_NETWORK/bridgeIf instead we can choose the interface to build the management bridge on but I don't see any reason to force it via answer file since bond0.2190 is a valid name and hosted-engine-setup should be able to detect it and propose it as the default choice.

Comment 13 Simone Tiraboschi 2017-09-20 09:42:47 UTC
(In reply to Sandro Bonazzola from comment #11)
> Simone, chance that this case can be handled by engine if we deploy hosted
> engine as in Jenny's POC?

AFAIK no since the boostrap VM should still be started over a management bridge in order to be able to use the engine there to deploy the host.
Better to wait for Jenny confirmation here.

Comment 14 Jenny Tokar 2017-09-24 07:53:24 UTC
(In reply to Simone Tiraboschi from comment #13)
> AFAIK no since the boostrap VM should still be started over a management
> bridge in order to be able to use the engine there to deploy the host.
> Better to wait for Jenny confirmation here.

Actually, since adding the ovirtmgmt bridge manually causes issues with engine deployment the bootstrap vm will be started over libvirt default bridge and the engine host setup will handle the network and will add the ovirtmgmt bridge to the host.

Comment 15 Fernando 2018-05-21 06:24:26 UTC
I really couldn't understand the reason to rush and close a bug report like this as done before all the discussion that followed.

I am facing EXACTLY the same issue as Ari Lemmke on Comment 9 with apparently no solution.
I had to create the bridge manually. This has a bond0.1234 interface in which in turn has bond0 underneath which is made of ens15f0 and ens15f1. There is NO OTHER way to connect to network other than this (as there are no other interfaces that can be used in the server). And the hosted-engine --deploy doesn't recognize bond0.1234, only bond0 which is just a trunk interface to have all Vlan interfaces on the top and therefore the bridges linked to them.

Have already tried to edit the answers file and force the custom bridge name and interface (bond0.1234) but it keeps complaining and fails with the error message:

[ ERROR ] Failed to execute stage 'Environment customization': Cannot acquire nic/bridge address

Is it VDSM not aware of the existing bridge configuration or remains an issue ?

Comment 16 Yaniv Kaul 2018-05-21 06:31:27 UTC
(In reply to Fernando from comment #15)
> I really couldn't understand the reason to rush and close a bug report like
> this as done before all the discussion that followed.
> 
> I am facing EXACTLY the same issue as Ari Lemmke on Comment 9 with
> apparently no solution.
> I had to create the bridge manually. This has a bond0.1234 interface in
> which in turn has bond0 underneath which is made of ens15f0 and ens15f1.
> There is NO OTHER way to connect to network other than this (as there are no
> other interfaces that can be used in the server). And the hosted-engine
> --deploy doesn't recognize bond0.1234, only bond0 which is just a trunk
> interface to have all Vlan interfaces on the top and therefore the bridges
> linked to them.

So the specific issue we'd like to handle is either (or both):
1. Hosted-Engine deployment should recognize bond0.1234 (which I assume is a VLAN over a bonded interface, right?)
2. Ability to configure such setup before hand. I assume 'manually' should be replaced by the Cockpit-based UI to configure the network. Is there anything not supported there?

> 
> Have already tried to edit the answers file and force the custom bridge name
> and interface (bond0.1234) but it keeps complaining and fails with the error
> message:
> 
> [ ERROR ] Failed to execute stage 'Environment customization': Cannot
> acquire nic/bridge address
> 
> Is it VDSM not aware of the existing bridge configuration or remains an
> issue ?

The answer file properties are probably not well documented or supported. As we've moved to Ansible-based deployment, our next step would be to tidy it, document it and provide the ability to cleanly run it via pure Ansible execution. This will take a while.

Comment 17 Simone Tiraboschi 2018-05-21 07:56:44 UTC
(In reply to Fernando from comment #15)
> I had to create the bridge manually. This has a bond0.1234 interface in
> which in turn has bond0 underneath which is made of ens15f0 and ens15f1.
> There is NO OTHER way to connect to network other than this (as there are no
> other interfaces that can be used in the server). And the hosted-engine
> --deploy doesn't recognize bond0.1234, only bond0 which is just a trunk
> interface to have all Vlan interfaces on the top and therefore the bridges
> linked to them.

The setup is able to detect a vlan over a bonded interface (if it doesn't maybe it's just because the bond type is not allowed for VMs: you have to choose type 1, 2, 3 or 4, see https://www.ovirt.org/documentation/admin-guide/chap-Logical_Networks/#bonding-modes ), I don't see why you need to manually create the management bridge.

Can you please attach your setup logs?

Comment 18 Fernando 2018-05-21 15:20:15 UTC
Yaniv:

1: Yes it is a Vlan over a bonded interface
2: Cockpit could be an option, I don`t see a problem of having to use it. But are you saying with that that using Cockpit instead of doing manually could make VDSM aware of the bridge and therefore the hosted-engine --deploy recognize it correctly in order to connect the self-hosted-engine ?

Simone:
Not in my case and on Ari's. It detected each individual physical interface and the bond0 one, but not the bond0.1234.
The bond mode used is the only one I use in all setups, mode 4 (802.3ad).
The reason to create the management bridge manually there are some:
- In my case I don't want it be called ovirtmgmgt as in other clusters I have a proper network name to standardize it and once the host is up and running the Engine doesn't allow to change.
- As mentioned by Ari and in my case is the same scenario I have only 2 x 10Gb interfaces so on the top of it I have to pass management traffic in one Vlan, VMs traffic in another, storage traffic in a third one and so on. It's much better to simplify all on the top of 2 physical interfaces, helps with cabling and troubleshooting therefore not all scenarios we can put the management network on the top of dedicated physical interfaces.

Wth regards the of the last attempt I have checked it and there is nothing significant there, only informational messages. Only error message is that above:
[ ERROR ] Failed to execute stage 'Environment customization': Cannot acquire nic/bridge address

I am wondering if there is a more manual way to deploy the self-hosted-engine given this scenario that makes possible for the Self-Hostead-Engine to connect to the existing bridge.

Comment 20 Simone Tiraboschi 2018-05-21 16:13:29 UTC
(In reply to Fernando from comment #18)
> Yaniv:
> 
> 1: Yes it is a Vlan over a bonded interface
> 2: Cockpit could be an option, I don`t see a problem of having to use it.
> But are you saying with that that using Cockpit instead of doing manually
> could make VDSM aware of the bridge and therefore the hosted-engine --deploy
> recognize it correctly in order to connect the self-hosted-engine ?
> 
> Simone:
> Not in my case and on Ari's. It detected each individual physical interface
> and the bond0 one, but not the bond0.1234.

For me it worked as expected,
did you configured an IPv4 address (statically or via DHCP) on that interface?

 [root@c74he20180302h1 ~]# ip a
 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1
     link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
     inet 127.0.0.1/8 scope host lo
        valid_lft forever preferred_lft forever
     inet6 ::1/128 scope host 
        valid_lft forever preferred_lft forever
 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
     link/ether 00:1a:4a:16:01:6d brd ff:ff:ff:ff:ff:ff
     inet 192.168.1.8/24 brd 192.168.1.255 scope global dynamic eth0
        valid_lft 21168sec preferred_lft 21168sec
     inet6 fe80::21a:4aff:fe16:16d/64 scope link 
        valid_lft forever preferred_lft forever
 3: eth1: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master bond0 state UP qlen 1000
     link/ether 00:1a:4a:16:01:68 brd ff:ff:ff:ff:ff:ff
 4: eth2: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master bond0 state UP qlen 1000
     link/ether 00:1a:4a:16:01:68 brd ff:ff:ff:ff:ff:ff
 20: ;vdsmdummy;: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN qlen 1000
     link/ether 06:2e:61:6a:c4:4b brd ff:ff:ff:ff:ff:ff
 33: bond0: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP qlen 1000
     link/ether 00:1a:4a:16:01:68 brd ff:ff:ff:ff:ff:ff
     inet6 fe80::f889:6b5a:75fe:1655/64 scope link 
        valid_lft forever preferred_lft forever
 34: bond0.1234@bond0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP qlen 1000
     link/ether 00:1a:4a:16:01:68 brd ff:ff:ff:ff:ff:ff
     inet 192.168.2.123/24 brd 192.168.2.255 scope global bond0.1234
        valid_lft forever preferred_lft forever
     inet6 fe80::e9ad:58f7:9218:5c41/64 scope link 
        valid_lft forever preferred_lft forever
 [root@c74he20180302h1 ~]# ip -d link show bond0.1234
 34: bond0.1234@bond0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT qlen 1000
     link/ether 00:1a:4a:16:01:68 brd ff:ff:ff:ff:ff:ff promiscuity 0 
     vlan protocol 802.1Q id 1234 <REORDER_HDR> addrgenmode none 
 [root@c74he20180302h1 ~]# ip -d link show bond0
 33: bond0: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT qlen 1000
     link/ether 00:1a:4a:16:01:68 brd ff:ff:ff:ff:ff:ff promiscuity 0 
     bond mode 802.3ad miimon 100 updelay 0 downdelay 0 use_carrier 1 arp_interval 0 arp_validate none arp_all_targets any primary_reselect always fail_over_mac none xmit_hash_policy layer2 resend_igmp 1 num_grat_arp 1 all_slaves_active 0 min_links 0 lp_interval 1 packets_per_slave 1 lacp_rate slow ad_select stable ad_aggregator 1 ad_num_ports 1 ad_actor_key 0 ad_partner_key 1 ad_partner_mac 00:00:00:00:00:00 ad_actor_sys_prio 65535 ad_user_port_key 0 ad_actor_system 00:00:00:00:00:00:00:00 tlb_dynamic_lb 0 addrgenmode none 
 [root@c74he20180302h1 ~]# 
 [root@c74he20180302h1 ~]# hosted-engine --deploy
 [ INFO  ] Stage: Initializing
 [ INFO  ] Stage: Environment setup
           During customization use CTRL-D to abort.
           Continuing will configure this host for serving as hypervisor and create a local VM with a running engine.
           The locally running engine will be used to configure a storage domain and create a VM there.
           At the end the disk of the local VM will be moved to the shared storage.
           Are you sure you want to continue? (Yes, No)[Yes]: 
           Configuration files: []
           Log file: /var/log/ovirt-hosted-engine-setup/ovirt-hosted-engine-setup-20180521180401-8w1h35.log
           Version: otopi-1.7.8_master (otopi-1.7.8-0.0.master.20180219140230.git5e8bf5a.el7.centos)
 [ INFO  ] Stage: Environment packages setup
 [ INFO  ] Stage: Programs detection
 [ INFO  ] Stage: Environment setup
 [ INFO  ] Stage: Environment customization
          
           --== STORAGE CONFIGURATION ==--
          
          
           --== HOST NETWORK CONFIGURATION ==--
          
           Please indicate a pingable gateway IP address [192.168.1.1]: 192.168.2.1
 [ INFO  ] TASK [Gathering Facts]
 [ INFO  ] ok: [localhost]
 [ INFO  ] TASK [Detecting interface on existing management bridge]
 [ INFO  ] skipping: [localhost]
 [ INFO  ] TASK [Get all active network interfaces]
 [ INFO  ] TASK [Filter bonds with bad naming]
 [ INFO  ] TASK [Generate output list]
 [ INFO  ] ok: [localhost]
           Please indicate a nic to set ovirtmgmt bridge on: (bond0.1234, eth0) [bond0.1234]:

Comment 33 Simone Tiraboschi 2018-06-05 21:30:09 UTC
I tried to deploy hosted-engine over VLAN over a bond (bond0.123) and it correctly worked if both the tagged bond (bond0.123) and the untagged one (bond0) are correctly configured with an IPv4 address.
SetupNetworks is instead failing if bond0.123 is correctly configured while bond0 lacks an IPv4 address (although configured with IPV4_FAILURE_FATAL=no).

I just opened https://bugzilla.redhat.com/show_bug.cgi?id=1586280


Note You need to log in before you can comment on or make changes to this bug.