Description of problem: 1. User creates a new logical network (ie.: evil_network) 2. Go to the Hosted-Engine VM and add a NIC with evil_network profile 3. Engine updates OVFs on the hosted_storage with this new NIC. 4. ovirt-ha-agent extracts the OVF and finds a new nic, rewrites vm.conf 5. Now vm.conf looks like this: nicModel:pv,macAddr:00:16:3E:6A:7A:F9,linkActive:true,network:ovirtmgmt... nicModel:pv,macAddr:00:1a:4a:16:01:02,linkActive:true,network:evil_network... 6. When starting the HE VM, evil_network will end up in the VM XML: <interface type='bridge'> <source bridge='evil_network'/> PROBLEM: A) evil_network is a network_name, not vdsm_name. This network does not exist in the host: # brctl show | cut -f1 bridge name ;vdsmdummy; ond2e03dd5745d4 <--- this is evil network ovirtmgmt B) The vm fails to start, in this case this specific name also is invalid: libvirtError: Network interface name 'evil_network' is too long: Numerical result out of range To make things worse, I don't see vdsm_name in the OVF, just network_name. Version-Release number of selected component (if applicable): 4.1.6 (not confirmed - HE currently down) 4.2.0 (reproducer) How reproducible: 100% Steps to Reproduce: As above Actual results: HostedEngine will not start, environment is down. Expected results: HostedEngine Up
Adding workaround to bring the HE VM up. Basically: # hosted-engine --set-maintenance --mode=global # sed 's/<network_name>/<vdsm_name>/' /var/run/ovirt-hosted-engine-ha/vm.conf > /root/vm.conf.fixed # hosted-engine --vm-start --vm-conf=/root/vm.conf.fixed
hosted-engine should be fixed to query Engine for the on-host identifier of the network. Until fixed, the best workaround is to connect only standard short-named networks to your Hosted Engine.
In 4.1 we are creating a configuration dictionary for VDSM parsing the VM OVF from the OVF_STORE volume. In 4.2 we will instead directly consume the libvirt XML as created by the engine as for https://bugzilla.redhat.com/1504606 but due to another issue (1560666) we temporary reverted it.
(In reply to Germano Veit Michel from comment #0) > To make things worse, I don't see vdsm_name in the OVF, just network_name. ovirt-ha-agent is simply consuming it from there. Not reproducible for me with 4.2 beta3: [root@c74he20180302h1 ~]# virsh -r dumpxml HostedEngine | grep bridge <interface type='bridge'> <source bridge='ovirtmgmt'/> <interface type='bridge'> <source bridge='evil_network'/> [root@c74he20180302h1 ~]# brctl show | cut -f1 bridge name ;vdsmdummy; evil_network ovirtmgmt virbr0 Germano, how did you managed to get a bridge with a name different from the logical network?
(In reply to Simone Tiraboschi from comment #5) > Germano, how did you managed to get a bridge with a name different from the > logical network? Try something with special characters or longer than kernel netdev IFNAMSIZ. It should force it to use a different vdsm_network name. my_long_evil_network should do it. This was renamed from gluster_internal in my tests. Maybe thats why evil_network in the first place doesn't make the two names differ.
I'll just clarify about how/when vdsm-name (the "on-host identifier") is generated. In a scenario where a network's name is either longer than IFNAMSIZ or contains unicode characters, the bridge on the host will different from the network as it can not be used to name a Linux bridge. In this scenario, vdsm name will consist of the prefix "on" and the first 13 characters of the network's GUID, e.g. "ond2e03dd5745d4". "evil_network" should not indeed cause the bridge's name to be different than "evil_network".
(In reply to Leon Goldberg from comment #7) > I'll just clarify about how/when vdsm-name (the "on-host identifier") is > generated. > > In a scenario where a network's name is either longer than IFNAMSIZ or > contains unicode characters, the bridge on the host will different from the > network as it can not be used to name a Linux bridge. > > In this scenario, vdsm name will consist of the prefix "on" and the first 13 > characters of the network's GUID, e.g. "ond2e03dd5745d4". > > "evil_network" should not indeed cause the bridge's name to be different > than "evil_network". "my_long_evil_network" however should as it is longer than IFNAMSIZ (15)
OK, reproduced with 'very_very_very_very_evil_network_is_it_evil_enough_?' which become 'on9ce625dccc824' on my host. In the OVF on the OVF_STORE we have: <NetworkSection> <Info>List of networks</Info> <Network ovf:name="ovirtmgmt" /> <Network ovf:name="very_very_very_very_evil_network_is_it_evil_enough_?" /> </NetworkSection> ... <Item> <rasd:Caption>Ethernet adapter on very_very_very_very_evil_network_is_it_evil_enough_?</rasd:Caption> <rasd:InstanceId>f7cfe938-eaa2-4352-8ae3-dd510f44de58</rasd:InstanceId> <rasd:ResourceType>10</rasd:ResourceType> <rasd:OtherResourceType>very_very_very_very_evil_network_is_it_evil_enough_?</rasd:OtherResourceType> <rasd:ResourceSubType>3</rasd:ResourceSubType> <rasd:Connection>very_very_very_very_evil_network_is_it_evil_enough_?</rasd:Connection> <rasd:Linked>true</rasd:Linked> <rasd:Name>nic1</rasd:Name> <rasd:ElementName>nic1</rasd:ElementName> <rasd:MACAddress>00:1a:4a:16:01:00</rasd:MACAddress> <rasd:speed>1000</rasd:speed> <Type>interface</Type> <Device>bridge</Device> <rasd:Address>{type=pci, slot=0x09, bus=0x00, domain=0x0000, function=0x0}</rasd:Address> <BootOrder>0</BootOrder> <IsPlugged>true</IsPlugged> <IsReadOnly>false</IsReadOnly> <Alias>net1</Alias> <SpecParams> <inbound /> <outbound /> </SpecParams> </Item> The XML for libvirt generated by the engine looks instead safe since it contains <interface type="bridge"> <model type="virtio" /> <link state="up" /> <source bridge="on9ce625dccc824" /> <alias name="ua-f7cfe938-eaa2-4352-8ae3-dd510f44de58" /> <address bus="0x00" domain="0x0000" function="0x0" slot="0x09" type="pci" /> <mac address="00:1a:4a:16:01:00" /> <filterref filter="vdsm-no-mac-spoofing" /> <bandwidth /> </interface> but we cannot still consume it as for https://bugzilla.redhat.com/1560666
Looks good Simone. Seems a bit too much to backport to 4.1, so I'm fine with having this fixed only in 4.2. Support can easily handle this via KCS as the workaround to bring HE up is quite simple. If you want to make this BZ depend on the other one, that's fine.
Just FYI, the libvirtxml support was reenabled for hosted engine last week.
based on comment 13 there should not be no problem in 4.2.z. (but let's let QA test that) based on comment 10, we are not to fix the bug in 4.1.z.
Meital, can you please ack this bug?
Bug was added manually to the Erratum by Leon.
SHE-VM with long network names 15>characters long, is getting started on ha-hosts after being shut-down. Moving to verified. Works for me on these components: ovirt-engine-setup-4.2.3.2-0.1.el7.noarch rhvm-appliance-4.2-20180420.0.el7.noarch ovirt-hosted-engine-setup-2.2.18-1.el7ev.noarch ovirt-hosted-engine-ha-2.2.10-1.el7ev.noarch Linux 3.10.0-862.el7.x86_64 #1 SMP Wed Mar 21 18:14:51 EDT 2018 x86_64 x86_64 x86_64 GNU/Linux Red Hat Enterprise Linux Server release 7.5 (Maipo) I've named new network "asdfghjklqwertyuio" (18 characters long) and it also created vNIC profile named by the same name. I've mapped the new network to both ha-hosts. I've attached new vNIC to engine with the new network. I've moved environment to global maintenance and vm-shutdown on hosting ha-host. I've returned back from global maintenance and waited for ha-agent to start the engine. Engine got started and I was able to login.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:1472
BZ<2>Jira Resync
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days