Bug 1667411

Summary: ovirtmgmt always out of sync on ovirt node
Product: [oVirt] ovirt-engine Reporter: Netbulae <info>
Component: GeneralAssignee: Dominik Holler <dholler>
Status: CLOSED DUPLICATE QA Contact: Michael Burman <mburman>
Severity: high Docs Contact:
Priority: unspecified    
Version: 4.3.0CC: bugs, dholler, info
Target Milestone: ---Keywords: Regression
Target Release: ---Flags: rule-engine: ovirt-4.3+
rule-engine: blocker+
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-01-18 14:46:17 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Network RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
supervdsm.log
none
vdsm.log none

Description Netbulae 2019-01-18 11:56:28 UTC
Description of problem:

Hi,

We're switching from Centos 7 hosts to oVirt node ng (tried 4.2.7, 4.3rc1 and 4.3rc2) and after adding them to oVirt (currently on 4.3rc2) the ovirtmgmt interface is always out of sync.

Tried synching the network and refresh capabilities. I also tried removing ovirtmgmt from the interface and adding it to a bond, then I get this error:

Cannot setup Networks. The following Network definitions on the Network Interface are different than those on the Logical Network. Please synchronize the Network Interface before editing network ovirtmgmt. The non-synchronized values are: ${OUTAVERAGELINKSHARE} ${HOST_OUT_OF_SYNC} - null, ${DC_OUT_OF_SYNC} - 50

I can setup the bond at install so the ovirtmgmt will use it so I can use the host, but I'm hesitant to do this in production as I cannot change the interface anymore.

Comment 1 Dominik Holler 2019-01-18 12:04:05 UTC
Would you please share the vdsm.log from the affected host?
Especially the lines containing hostQos or setupNetworks are relevant,
but best would be the whole vdsm.log and supervdsm.log files, and the relevant part of engine.log.

If possible, can you please check what happens if you change the 50 to another value in
Weighted Share in the QoS in
Compute > Data Centers > xxx > Host Network ?

Comment 2 Netbulae 2019-01-18 13:45:47 UTC
(In reply to Dominik Holler from comment #1)
> Would you please share the vdsm.log from the affected host?
> Especially the lines containing hostQos or setupNetworks are relevant,
> but best would be the whole vdsm.log and supervdsm.log files, and the
> relevant part of engine.log.
> 

vdsm.log and supervdsm.log attached.

Part of the engine.log:

2019-01-18 14:38:53,772+01 INFO  [org.ovirt.engine.core.bll.lock.InMemoryLockManager] (EE-ManagedThreadFactory-engine-Thread-2919) [36476bcb] Failed to acquire lock and wait lock 'HostEngineLock:{exclusiveLocks='[3592e846-1470-468d-986c-3594af9527cc=VDS_INIT]', sharedLocks=''}'
2019-01-18 14:38:53,872+01 INFO  [org.ovirt.engine.core.bll.lock.InMemoryLockManager] (EE-ManagedThreadFactory-engine-Thread-2919) [36476bcb] Failed to acquire lock and wait lock 'HostEngineLock:{exclusiveLocks='[3592e846-1470-468d-986c-3594af9527cc=VDS_INIT]', sharedLocks=''}'
2019-01-18 14:38:53,966+01 INFO  [org.ovirt.engine.core.bll.lock.InMemoryLockManager] (EE-ManagedThreadFactory-engine-Thread-2919) [36476bcb] Failed to acquire lock and wait lock 'HostEngineLock:{exclusiveLocks='[3592e846-1470-468d-986c-3594af9527cc=VDS_INIT]', sharedLocks=''}'
2019-01-18 14:38:53,998+01 INFO  [org.ovirt.engine.core.bll.lock.InMemoryLockManager] (EE-ManagedThreadFactory-engine-Thread-2919) [36476bcb] Failed to acquire lock and wait lock 'HostEngineLock:{exclusiveLocks='[3592e846-1470-468d-986c-3594af9527cc=VDS_INIT]', sharedLocks=''}'
2019-01-18 14:38:54,002+01 INFO  [org.ovirt.engine.core.bll.lock.InMemoryLockManager] (EE-ManagedThreadFactory-engine-Thread-2919) [36476bcb] Failed to acquire lock and wait lock 'HostEngineLock:{exclusiveLocks='[3592e846-1470-468d-986c-3594af9527cc=VDS_INIT]', sharedLocks=''}'
2019-01-18 14:38:54,335+01 INFO  [org.ovirt.engine.core.bll.network.host.HostSetupNetworksCommand] (default task-354) [c2c9ff12-a918-479b-aaf8-af419a1d81e2] Lock Acquired to object 'EngineLock:{exclusiveLocks='[HOST_NETWORK25dff3be-f2b7-4d42-8240-fd2959d84f5c=HOST_NETWORK]', sharedLocks=''}'
2019-01-18 14:38:54,397+01 WARN  [org.ovirt.engine.core.bll.network.host.HostSetupNetworksCommand] (default task-354) [c2c9ff12-a918-479b-aaf8-af419a1d81e2] Validation of action 'HostSetupNetworks' failed for user bla@******-authz. Reasons: VAR__ACTION__SETUP,VAR__TYPE__NETWORKS,OUTAVERAGELINKSHARE,HOST_OUT_OF_SYNC,DC_OUT_OF_SYNC,$OUT_OF_SYNC_VALUES ${OUTAVERAGELINKSHARE} ${HOST_OUT_OF_SYNC} - null, ${DC_OUT_OF_SYNC} - 50,NETWORK_NOT_IN_SYNC,$NETWORK_NOT_IN_SYNC ovirtmgmt
2019-01-18 14:38:54,398+01 INFO  [org.ovirt.engine.core.bll.network.host.HostSetupNetworksCommand] (default task-354) [c2c9ff12-a918-479b-aaf8-af419a1d81e2] Lock freed to object 'EngineLock:{exclusiveLocks='[HOST_NETWORK25dff3be-f2b7-4d42-8240-fd2959d84f5c=HOST_NETWORK]', sharedLocks=''}'
2019-01-18 14:38:54,401+01 INFO  [org.ovirt.engine.core.bll.lock.InMemoryLockManager] (EE-ManagedThreadFactory-engine-Thread-2919) [36476bcb] Failed to acquire lock and wait lock 'HostEngineLock:{exclusiveLocks='[3592e846-1470-468d-986c-3594af9527cc=VDS_INIT]', sharedLocks=''}'

> If possible, can you please check what happens if you change the 50 to
> another value in
> Weighted Share in the QoS in
> Compute > Data Centers > xxx > Host Network ?

It's a clone from production and we have about 10 QOS policies on that. So I changed the MGMT Qos from 50 to 60, but it didn't help. That was the only Qos set to 50....

Comment 3 Netbulae 2019-01-18 14:23:23 UTC
Created attachment 1521552 [details]
supervdsm.log

noob mistake, forgot to attach :-&

Comment 4 Netbulae 2019-01-18 14:23:45 UTC
Created attachment 1521554 [details]
vdsm.log

Comment 5 Dominik Holler 2019-01-18 14:46:17 UTC
Thanks for the logfiles. They show that the cluster has switchtype OVS.
Unfortunately QoS is not yet supported on clusters with switchtype OVS.
This is tracked in bug 1380271 and documented in
https://ovirt.org/develop/release-management/features/network/openvswitch/native-openvswitch.html#limitations .

To avoid the out-of-sync state either the switchtype of the cluster or the QoS has to be removed from the management network.

*** This bug has been marked as a duplicate of bug 1380271 ***

Comment 6 Netbulae 2019-01-18 15:06:48 UTC
Hmm that will be a problem for us in production.

We have two clusters an old a new one. 

The old cluster has limited network bandwith and we have the customer vlans on the same interface of management. That's why we use the Qos.

Also because of all the switch configuration changes for customer networks we would like to use the OVS setup in the new cluster.

Currently as we cannot have both Qos and OVS in the same DC, we will have problems in the old cluster with customers able to use all the bandwith of the management NIC or the scaling issues with the LEGACY network mode....

Why is the Qos only for networks in the comlete DC and not for each cluster?

Comment 7 Dominik Holler 2019-01-18 15:13:52 UTC
(In reply to Netbulae from comment #6)
> Hmm that will be a problem for us in production.
> 
> We have two clusters an old a new one. 
> 
> The old cluster has limited network bandwith and we have the customer vlans
> on the same interface of management. That's why we use the Qos.
> 
> Also because of all the switch configuration changes for customer networks
> we would like to use the OVS setup in the new cluster.
> 
> Currently as we cannot have both Qos and OVS in the same DC, we will have
> problems in the old cluster with customers able to use all the bandwith of
> the management NIC or the scaling issues with the LEGACY network mode....
> 
> Why is the Qos only for networks in the comlete DC and not for each cluster?

Would assigning the Management role to a new logical network (instead of ovirtmgmt)
just in the OVS cluster help?

Comment 8 Netbulae 2019-01-18 16:54:31 UTC
Yes, thanks that works!

I had to remove all the nodes and reinstall them but I do that many times when testing anyway ;-)

Comment 9 Dominik Holler 2019-01-19 09:24:30 UTC
(In reply to Netbulae from comment #8)
> Yes, thanks that works!
> 
> I had to remove all the nodes and reinstall them but I do that many times
> when testing anyway ;-)

Thanks for the feedback!

Comment 10 Netbulae 2019-01-24 12:24:16 UTC
As this is a duplicate of a bug that has been closed for years, can we have QOS for OVS back on the planning at least?

If it's a "won't fix" then sombody should update the page at least:

https://ovirt.org/develop/release-management/features/network/openvswitch/native-openvswitch.html

Network Entities

In the following table, networking entities are listed with their implementation options.

Entity 	Kernel 	OVS 	Remark
QoS 	X 	X


And the link in "Host QoS - OvS supports different parameters for configuring QoS, which are incompatible with the ones used for a Linux bridge." brings me back to 

 bug 1380271


 Dan Kenigsberg 2016-12-05 06:24:36 UTC
 Deferring OvS bugs until they are deemed a priority again.

Comment 11 Netbulae 2019-01-24 12:55:24 UTC
Also I have an issue removing the ovirtmgmt network assignment for the cluster "Error while executing action: The Management Network ('ovirtmgmt') is mandatory and cannot be removed"

I have it as non required now but it's still assigned.

Comment 12 Dominik Holler 2019-01-26 21:40:09 UTC
(In reply to Netbulae from comment #11)
> Also I have an issue removing the ovirtmgmt network assignment for the
> cluster "Error while executing action: The Management Network ('ovirtmgmt')
> is mandatory and cannot be removed"
> 
> I have it as non required now but it's still assigned.

Please ensure that the role Management is assigned to another network in
Compute > Clusters > xxx > Logical Networks > Manage Networks

Comment 13 Netbulae 2019-01-27 16:34:00 UTC
Yes I have it assigned to ovirtmgmt-OVS and I want to remove the original ovirtmgmt from this cluster so people don't mistakenly use that one when deploying more nodes..