Bug 1512478

Summary: Fedora host invalid error: Host $HOSTNAME does not comply with the cluster $CLUSTER networks, the following networks are missing on host: 'ovirtmgmt'
Product: [oVirt] ovirt-engine Reporter: John Boero <boeroboy>
Component: Host-DeployAssignee: Edward Haas <edwardh>
Status: CLOSED WONTFIX QA Contact: Daniel Gur <dagur>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 4.2.0CC: boeroboy, bugs, danken, lsvaty, mailinglists, pasik, ylavi
Target Milestone: ---Flags: sbonazzo: ovirt-4.3-
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-08-08 08:10:16 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Network RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1460625    
Attachments:
Description Flags
engine.log
none
vdsm.log (last 60 lines including host reinstall)
none
supervdsm.log none

Description John Boero 2017-11-13 10:25:39 UTC
Description of problem:
I've successfully gotten my Fedora 25 hosts working well with master snapshot but new builds are giving me errors - the following networks are missing on host: 'ovirtmgmt' even though the ovirtmgmt bridge is correctly configured.  


Version-Release number of selected component (if applicable):
vdsm
Version     : 4.20.3
Release     : 47.git4801123.fc25

How reproducible:
Always
  


Steps to Reproduce:
1. Add a F25 host to a functional cluster using management network "ovirtmgmt".
2. Try again if necessary, creating ovirtmgmt bridge manually and adding network device to it.
3. Observe failure late in configuration events that the host isn't compliant and is missing network "ovirtmgmt."
4. Verify the ovirtmgmt network is configured.

Actual results:
Host goes unresponsive as engine detects it to be missing ovirtmgmt.  Host details still populate as vdsmd and dependent services are running fine.

Expected results:
Host should go active and ovirtmgmt should be detected.

Additional info:
I've tried to delve into the network detection code, but it's a bit deep just following the error which is defined as VDS_SET_NONOPERATIONAL_NETWORK.  Any help from the devs would be great.  I know it's Fedora but please don't just tell me to just go use EL7 as I'm trying to forge the way ahead.

 jboero  z600    ~  $  ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: enp1s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master ovirtmgmt state UP group default qlen 1000
    link/ether 78:e7:d1:c5:d9:6f brd ff:ff:ff:ff:ff:ff
    inet6 fe80::7ae7:d1ff:fec5:d96f/64 scope link
       valid_lft forever preferred_lft forever
18: ovirtmgmt: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 78:e7:d1:c5:d9:6f brd ff:ff:ff:ff:ff:ff
    inet 192.168.2.25/24 brd 192.168.2.255 scope global dynamic ovirtmgmt
       valid_lft 85665sec preferred_lft 85665sec
    inet6 fe80::7ae7:d1ff:fec5:d96f/64 scope link
       valid_lft forever preferred_lft forever
19: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default
    link/ether 02:42:70:63:c7:bc brd ff:ff:ff:ff:ff:ff
    inet 172.17.0.1/16 scope global docker0
       valid_lft forever preferred_lft forever
20: ;vdsmdummy;: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether 16:ba:c8:d5:e6:39 brd ff:ff:ff:ff:ff:ff

Comment 1 John Boero 2017-11-13 10:26:37 UTC
I have an earlier host with the same configuration and a previous build of master snapshot and it is working fine.

Comment 2 Sandro Bonazzola 2017-11-13 10:57:12 UTC
Please note Fedora 25 is not supported on master.
Right now, only CentOS 7 is supported for master.

Comment 3 John Boero 2017-11-13 11:26:45 UTC
Is Fedora 25 supported at all on the 4.2 alpha?

Comment 4 Sandro Bonazzola 2017-11-13 12:26:36 UTC
(In reply to John Boero from comment #3)
> Is Fedora 25 supported at all on the 4.2 alpha?

No

Comment 5 John Boero 2017-11-13 14:11:45 UTC
Then why is there a Fedora build out there for 4.2 master?  Just trying to help.  Not using this in production.

Comment 6 Sandro Bonazzola 2017-11-13 14:23:27 UTC
(In reply to John Boero from comment #5)
> Then why is there a Fedora build out there for 4.2 master?  Just trying to
> help.  Not using this in production.

And this is more than welcome! We'd like to be able to run oVirt on Fedora too but lacking resources to keep the pace with the changes happening in Fedora.
So any contribution toward supporting Fedora is more than welcome.

We keep building packages for various Fedora releases allowing people like you to test it and help getting oVirt running on Fedora.

Comment 7 Dan Kenigsberg 2017-11-17 21:41:35 UTC
Please supply {super,}vdsm.log from the added host, and engine.log from Engine.

Comment 8 John Boero 2017-11-19 21:38:14 UTC
Hi I'll load these this week.  Please wait a bit.
Thanks

Comment 9 John Boero 2017-11-21 14:12:23 UTC
Created attachment 1356715 [details]
engine.log

FQDN has been sanitized.

Comment 10 John Boero 2017-11-21 14:13:51 UTC
Created attachment 1356716 [details]
vdsm.log (last 60 lines including host reinstall)

Comment 11 John Boero 2017-11-21 14:15:07 UTC
Created attachment 1356717 [details]
supervdsm.log

FQDN has been scrubbed.

Comment 12 John Boero 2018-04-03 07:44:48 UTC
Further on this, I've encountered another instance on a CentOS host:

! Network ovirtmgmt is not attached to any interface on host $HOST.
X Host $HOST installation failed. Failed to configure management network on the host.
! Network ovirtmgmt is not attached to any interface on host $HOST.



I have 4x10GBE Teamed and attached as the only interface to management network, which is the default route.  The Ansible playbooks don't seem to care.  Anybody?



~  brctl show
bridge name     bridge id               STP enabled     interfaces
;vdsmdummy;             8000.000000000000       no
ovirtmgmt               8000.001b21da472b       yes             team0
~  ip route show
default via 192.168.2.1 dev ovirtmgmt proto static metric 425 
169.254.0.0/16 dev enp1s0 scope link metric 1002 
192.168.2.0/24 dev enp1s0 proto kernel scope link src 192.168.2.110 
192.168.2.0/24 dev ovirtmgmt proto kernel scope link src 192.168.2.100 metric 425

Comment 13 Sam McLeod 2018-05-04 00:47:01 UTC
I've been having this exact same issue with oVirt 4.2 on CentOS 7 for the past 2-3 weeks and haven't been able to put my finger on it.

Comment 14 Edward Haas 2018-05-04 01:26:29 UTC
(In reply to John Boero from comment #12)

> I have 4x10GBE Teamed and attached as the only interface to management
> network, which is the default route.  The Ansible playbooks don't seem to
> care.  Anybody?
> 
> ~  brctl show
> bridge name     bridge id               STP enabled     interfaces
> ;vdsmdummy;             8000.000000000000       no
> ovirtmgmt               8000.001b21da472b       yes             team0

The original issue of this bug had to do with a failure to define ovirtmgmt network. supervdsm logs showed that after adding the bridge over the nic, connectivity to the host was lost (perhaps it got a different IP address from dhcp).
Attempting to create the bridge manually is not enough, as VDSM also persist network data and uses some of that information to report back to Engine.

Regarding the last attempt, unfortunately we still do not support team link aggregation. Only bond is supported.
We actually had no practical requirement to add support to it so far, I think all users needs have been answered by bonding.
Could you try this with bonds?

Comment 15 John Boero 2018-05-08 07:33:55 UTC
That's too bad.  Teams have much better 802.3ad performance capabilities and are finally supported by NetworkManager.  It feels like the more automation around oVirt installation, the less control over config.  If the core unit of ovirt is a bridge and a bridge can transparently support a team/bond, why does this matter?

Looking forward to teams being supported - actually looking forward to oVirt finally dumping archaic network config for NM.  I'll have to re-install with bonds for now but frustrating.

Thanks folks.

Comment 16 Yaniv Lavi 2018-08-08 08:10:16 UTC
We do not plan to support teams currently.
If we move to NM based networking, we will be able to reconsider this.