Bug 1561850 - long network_name causes a different vdsm_name that breaks starting HostedEngine VM
Summary: long network_name causes a different vdsm_name that breaks starting HostedEng...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-hosted-engine-ha
Version: 4.1.6
Hardware: x86_64
OS: Linux
urgent
high
Target Milestone: ovirt-4.2.3
: ---
Assignee: Leon Goldberg
QA Contact: Nikolai Sednev
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-03-29 04:01 UTC by Germano Veit Michel
Modified: 2023-09-15 00:07 UTC (History)
14 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-05-15 17:32:29 UTC
oVirt Team: Network
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Knowledge Base (Solution) 3394971 0 None None None 2018-03-29 04:39:34 UTC
Red Hat Product Errata RHBA-2018:1472 0 None None None 2018-05-15 17:33:55 UTC

Description Germano Veit Michel 2018-03-29 04:01:53 UTC
Description of problem:

1. User creates a new logical network (ie.: evil_network)
2. Go to the Hosted-Engine VM and add a NIC with evil_network profile
3. Engine updates OVFs on the hosted_storage with this new NIC.
4. ovirt-ha-agent extracts the OVF and finds a new nic, rewrites vm.conf
5. Now vm.conf looks like this:
   nicModel:pv,macAddr:00:16:3E:6A:7A:F9,linkActive:true,network:ovirtmgmt...
   nicModel:pv,macAddr:00:1a:4a:16:01:02,linkActive:true,network:evil_network...
6. When starting the HE VM, evil_network will end up in the VM XML:
   <interface type='bridge'>
      <source bridge='evil_network'/>

PROBLEM:
A) evil_network is a network_name, not vdsm_name. This network does not exist in the host:
   # brctl show | cut -f1
   bridge name
   ;vdsmdummy;
   ond2e03dd5745d4   <--- this is evil network
   ovirtmgmt

B) The vm fails to start, in this case this specific name also is invalid:
libvirtError: Network interface name 'evil_network' is too long: Numerical result out of range

To make things worse, I don't see vdsm_name in the OVF, just network_name.

Version-Release number of selected component (if applicable):
4.1.6 (not confirmed - HE currently down)
4.2.0 (reproducer)

How reproducible:
100%

Steps to Reproduce:
As above

Actual results:
HostedEngine will not start, environment is down.

Expected results:
HostedEngine Up

Comment 1 Germano Veit Michel 2018-03-29 04:37:24 UTC
Adding workaround to bring the HE VM up. Basically:

# hosted-engine --set-maintenance --mode=global
# sed 's/<network_name>/<vdsm_name>/' /var/run/ovirt-hosted-engine-ha/vm.conf > /root/vm.conf.fixed
# hosted-engine --vm-start --vm-conf=/root/vm.conf.fixed

Comment 2 Dan Kenigsberg 2018-03-29 06:41:46 UTC
hosted-engine should be fixed to query Engine for the on-host identifier of the network.

Until fixed, the best workaround is to connect only standard short-named networks to  your Hosted Engine.

Comment 4 Simone Tiraboschi 2018-03-29 08:04:35 UTC
In 4.1 we are creating a configuration dictionary for VDSM parsing the VM OVF from the OVF_STORE volume.

In 4.2 we will instead directly consume the libvirt XML as created by the engine as for https://bugzilla.redhat.com/1504606 but due to another issue (1560666) we temporary reverted it.

Comment 5 Simone Tiraboschi 2018-03-29 09:04:07 UTC
(In reply to Germano Veit Michel from comment #0)
> To make things worse, I don't see vdsm_name in the OVF, just network_name.

ovirt-ha-agent is simply consuming it from there.


Not reproducible for me with 4.2 beta3:

[root@c74he20180302h1 ~]# virsh -r dumpxml HostedEngine | grep bridge
    <interface type='bridge'>
      <source bridge='ovirtmgmt'/>
    <interface type='bridge'>
      <source bridge='evil_network'/>

[root@c74he20180302h1 ~]# brctl show | cut -f1
bridge name
;vdsmdummy;
evil_network
ovirtmgmt
virbr0


Germano, how did you managed to get a bridge with a name different from the logical network?

Comment 6 Germano Veit Michel 2018-03-29 09:25:45 UTC
(In reply to Simone Tiraboschi from comment #5)
> Germano, how did you managed to get a bridge with a name different from the
> logical network?

Try something with special characters or longer than kernel netdev IFNAMSIZ. It should force it to use a different vdsm_network name. my_long_evil_network should do it. This was renamed from gluster_internal in my tests. Maybe thats why evil_network in the first place doesn't make the two names differ.

Comment 7 Leon Goldberg 2018-03-29 09:36:17 UTC
I'll just clarify about how/when vdsm-name (the "on-host identifier") is generated.

In a scenario where a network's name is either longer than IFNAMSIZ or contains unicode characters, the bridge on the host will different from the network as it can not be used to name a Linux bridge.

In this scenario, vdsm name will consist of the prefix "on" and the first 13 characters of the network's GUID, e.g. "ond2e03dd5745d4".

"evil_network" should not indeed cause the bridge's name to be different than "evil_network".

Comment 8 Leon Goldberg 2018-03-29 09:38:08 UTC
(In reply to Leon Goldberg from comment #7)
> I'll just clarify about how/when vdsm-name (the "on-host identifier") is
> generated.
> 
> In a scenario where a network's name is either longer than IFNAMSIZ or
> contains unicode characters, the bridge on the host will different from the
> network as it can not be used to name a Linux bridge.
> 
> In this scenario, vdsm name will consist of the prefix "on" and the first 13
> characters of the network's GUID, e.g. "ond2e03dd5745d4".
> 
> "evil_network" should not indeed cause the bridge's name to be different
> than "evil_network".

"my_long_evil_network" however should as it is longer than IFNAMSIZ (15)

Comment 9 Simone Tiraboschi 2018-03-29 09:49:53 UTC
OK, reproduced with 'very_very_very_very_evil_network_is_it_evil_enough_?' which become 'on9ce625dccc824' on my host.

In the OVF on the OVF_STORE we have:

   <NetworkSection>
      <Info>List of networks</Info>
      <Network ovf:name="ovirtmgmt" />
      <Network ovf:name="very_very_very_very_evil_network_is_it_evil_enough_?" />
   </NetworkSection>
...
         <Item>
            <rasd:Caption>Ethernet adapter on very_very_very_very_evil_network_is_it_evil_enough_?</rasd:Caption>
            <rasd:InstanceId>f7cfe938-eaa2-4352-8ae3-dd510f44de58</rasd:InstanceId>
            <rasd:ResourceType>10</rasd:ResourceType>
            <rasd:OtherResourceType>very_very_very_very_evil_network_is_it_evil_enough_?</rasd:OtherResourceType>
            <rasd:ResourceSubType>3</rasd:ResourceSubType>
            <rasd:Connection>very_very_very_very_evil_network_is_it_evil_enough_?</rasd:Connection>
            <rasd:Linked>true</rasd:Linked>
            <rasd:Name>nic1</rasd:Name>
            <rasd:ElementName>nic1</rasd:ElementName>
            <rasd:MACAddress>00:1a:4a:16:01:00</rasd:MACAddress>
            <rasd:speed>1000</rasd:speed>
            <Type>interface</Type>
            <Device>bridge</Device>
            <rasd:Address>{type=pci, slot=0x09, bus=0x00, domain=0x0000, function=0x0}</rasd:Address>
            <BootOrder>0</BootOrder>
            <IsPlugged>true</IsPlugged>
            <IsReadOnly>false</IsReadOnly>
            <Alias>net1</Alias>
            <SpecParams>
               <inbound />
               <outbound />
            </SpecParams>
         </Item>


The XML for libvirt generated by the engine looks instead safe since it contains
      <interface type="bridge">
         <model type="virtio" />
         <link state="up" />
         <source bridge="on9ce625dccc824" />
         <alias name="ua-f7cfe938-eaa2-4352-8ae3-dd510f44de58" />
         <address bus="0x00" domain="0x0000" function="0x0" slot="0x09" type="pci" />
         <mac address="00:1a:4a:16:01:00" />
         <filterref filter="vdsm-no-mac-spoofing" />
         <bandwidth />
      </interface>

but we cannot still consume it as for https://bugzilla.redhat.com/1560666

Comment 10 Germano Veit Michel 2018-03-30 01:09:18 UTC
Looks good Simone.

Seems a bit too much to backport to 4.1, so I'm fine with having this fixed only in 4.2. Support can easily handle this via KCS as the workaround to bring HE up is quite simple.

If you want to make this BZ depend on the other one, that's fine.

Comment 13 Martin Sivák 2018-04-09 08:25:38 UTC
Just FYI, the libvirtxml support was reenabled for hosted engine last week.

Comment 14 Dan Kenigsberg 2018-04-10 13:28:25 UTC
based on comment 13 there should not be no problem in 4.2.z. (but let's let QA test that)

based on comment 10, we are not to fix the bug in 4.1.z.

Comment 16 Michael Burman 2018-04-11 07:52:20 UTC
Meital, can you please ack this bug?

Comment 18 Meni Yakove 2018-04-23 08:36:56 UTC
Bug was added manually to the Erratum by Leon.

Comment 19 Nikolai Sednev 2018-04-23 10:09:40 UTC
SHE-VM with long network names 15>characters long, is getting started on ha-hosts after being shut-down.
Moving to verified.
Works for me on these components:
ovirt-engine-setup-4.2.3.2-0.1.el7.noarch
rhvm-appliance-4.2-20180420.0.el7.noarch
ovirt-hosted-engine-setup-2.2.18-1.el7ev.noarch
ovirt-hosted-engine-ha-2.2.10-1.el7ev.noarch
Linux 3.10.0-862.el7.x86_64 #1 SMP Wed Mar 21 18:14:51 EDT 2018 x86_64 x86_64 x86_64 GNU/Linux
Red Hat Enterprise Linux Server release 7.5 (Maipo)


I've named new network "asdfghjklqwertyuio" (18 characters long) and it also created vNIC profile named by the same name.
I've mapped the new network to both ha-hosts.
I've attached new vNIC to engine with the new network.
I've moved environment to global maintenance and vm-shutdown on hosting ha-host.
I've returned back from global maintenance and waited for ha-agent to start the engine.
Engine got started and I was able to login.

Comment 23 errata-xmlrpc 2018-05-15 17:32:29 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:1472

Comment 24 Franta Kust 2019-05-16 13:09:37 UTC
BZ<2>Jira Resync

Comment 25 Red Hat Bugzilla 2023-09-15 00:07:13 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days


Note You need to log in before you can comment on or make changes to this bug.