Bug 1528853 - [TEXT] Host becomes non-operational if it has an un-synced network with vm<>non-VM difference
Summary: [TEXT] Host becomes non-operational if it has an un-synced network with vm<>n...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: ovirt-engine
Classification: oVirt
Component: BLL.Network
Version: 4.2.0
Hardware: x86_64
OS: Linux
low
medium
Target Milestone: ovirt-4.2.2
: ---
Assignee: Alona Kaplan
QA Contact: Michael Burman
URL:
Whiteboard:
: 1285785 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-12-24 15:43 UTC by Michael Burman
Modified: 2018-03-29 11:18 UTC (History)
7 users (show)

Fixed In Version: ovirt-engine-4.2.2
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-03-29 11:18:35 UTC
oVirt Team: Network
Embargoed:
rule-engine: ovirt-4.2+
ylavi: blocker-
ylavi: exception+


Attachments (Terms of Use)
Logs (1.18 MB, application/x-gzip)
2017-12-24 15:43 UTC, Michael Burman
no flags Details
engine log, new reproduction (391.33 KB, application/x-gzip)
2018-01-14 12:26 UTC, Michael Burman
no flags Details
failed qa engine log (478.71 KB, application/x-gzip)
2018-02-18 12:16 UTC, Michael Burman
no flags Details


Links
System ID Private Priority Status Summary Last Updated
oVirt gerrit 87175 0 master MERGED engine: Expand 'unsync vm<->no-vm' message 2018-02-08 12:40:34 UTC
oVirt gerrit 87337 0 ovirt-engine-4.2 MERGED engine: Expand 'unsync vm<->no-vm' message 2018-02-08 13:58:17 UTC

Description Michael Burman 2017-12-24 15:43:14 UTC
Created attachment 1371898 [details]
Logs

Description of problem:
Host become non-operational if it has an un-synced network with vm<>non-VM difference.

It's not possible activate a host that has a network attached to it and it is un-synced cause of a bridge property false/true.

If trying to activate the host before syncing the network the host become non-operational. 

Version-Release number of selected component (if applicable):
4.2.0.2-0.1.el7
vdsm-4.20.9.3-1.el7ev.x86_64

How reproducible:
100%

Steps to Reproduce:
1. Create 2 DCs, run host in DC1
2. Create network called 'net1' on both DCs, on DC1 set it as VM network, on DC2 set it as non-VM network. Attach net1 to host in DC1.
3. Set host to maintenance and  Move the host to DC2(net1 network is now out-of-sync and non-VM network in DC2)
4. Try to activate host

Actual results:
Host become non-operational. It can be up again only after we synced the network.

Expected results:
Host should be up and operational, even if we didn't synced the network yet. It shouldn't be non-operational.

Comment 1 Dan Kenigsberg 2017-12-24 19:21:49 UTC
is "net1" a required network? If it is, I think this is not a bug.

Comment 2 Red Hat Bugzilla Rules Engine 2017-12-24 19:21:55 UTC
This bug report has Keywords: Regression or TestBlocker.
Since no regressions or test blockers are allowed between releases, it is also being identified as a blocker for this release. Please resolve ASAP.

Comment 3 Michael Burman 2017-12-25 06:36:02 UTC
(In reply to Red Hat Bugzilla Rules Engine from comment #2)
> This bug report has Keywords: Regression or TestBlocker.
> Since no regressions or test blockers are allowed between releases, it is
> also being identified as a blocker for this release. Please resolve ASAP.

(In reply to Dan Kenigsberg from comment #1)
> is "net1" a required network? If it is, I think this is not a bug.

I should have mentioned this important detail, 'net1' isn't a required network and that is why it is a bug.

Comment 4 Michael Burman 2017-12-25 06:37:02 UTC
It's not possible to make the host operational unless syncing the network.

Comment 5 Dan Kenigsberg 2017-12-27 09:21:59 UTC
Taking back to 4.2 because it hurts our QE automation.

Comment 6 Alona Kaplan 2018-01-08 10:07:45 UTC
Hi Michael,

I'm closing the bug since you and me cannot reproduce it on the latest engine.

If you manage to reproduce it, please reopen.

Comment 7 Michael Burman 2018-01-14 12:24:39 UTC
(In reply to Alona Kaplan from comment #6)
> Hi Michael,
> 
> I'm closing the bug since you and me cannot reproduce it on the latest
> engine.
> 
> If you manage to reproduce it, please reopen.

Managed to reproduce) on 4.2.1.1-0.1.el7 with the same steps described in comment#0

Comment 8 Michael Burman 2018-01-14 12:26:00 UTC
Created attachment 1380986 [details]
engine log, new reproduction

Comment 9 Alona Kaplan 2018-01-21 14:10:32 UTC
To reproduce the issue the network should be 'vm' network in the dc and 'non-vm' network in the host.

(Step 2 in the bug description should be changed to - Create network called 'net1' on both DCs, on DC1 set it as non-VM network, on DC2 set it as VM network. Attach net1 to host in DC1.)

In this case, the host is marked as non-operation, no matter if the network is required or not.
Maybe we should consider keeping this logic only for required networks, but it doesn't seem to me like a regression, this code wasn't changed since 2014.
Are you sure in previous versions the behaviour was different?

Comment 10 Michael Burman 2018-01-21 14:54:18 UTC
Hi Alona,

I understand, you are right, this happens on 4.1.9 as well, so it's not regression.

Currently the audit log will show - 
- Host navy-vds1.qa.lab.tlv.redhat.com does not comply with the cluster Cluster1 networks, the following VM networks are non-VM networks: 'net-1'
- Host navy-vds1.qa.lab.tlv.redhat.com's following network(s) are not synchronized with their Logical Network configuration: net-1'
- Status of host navy-vds1.qa.lab.tlv.redhat.com was set to NonOperational.

Meni, how would you like to proceed?

Comment 11 Meni Yakove 2018-01-21 15:01:20 UTC
We can leave it as is, If the host cannot run VMs it should be NonOperational, 
But we need to make sure that the NonOperational host status is attached to the -VM networks errors.

Comment 12 Dan Kenigsberg 2018-01-22 13:44:35 UTC
moved to 4.2.1 by mistake.

Comment 13 Alona Kaplan 2018-02-05 16:45:54 UTC
Changed the first message to -

'Host navy-vds1.qa.lab.tlv.redhat.com does not comply with the cluster Cluster1 networks, the following VM networks are non-VM networks: 'net-1'. The host will become NonOperational.'

Comment 14 Alona Kaplan 2018-02-11 07:39:31 UTC
*** Bug 1285785 has been marked as a duplicate of this bug. ***

Comment 15 Michael Burman 2018-02-18 09:12:50 UTC
The new first message doesn't appear in our latest d/s build 4.2.2-0.1.el7

Still see the old message - 
"Host orchid-vds1.qa.lab.tlv.redhat.com does not comply with the cluster Cluster1 networks, the following VM networks are non-VM networks: 'net-3'"

Comment 16 Alona Kaplan 2018-02-18 11:15:22 UTC
Hi Michael,
Please attach the engine log.

Comment 17 Michael Burman 2018-02-18 12:16:13 UTC
Created attachment 1397561 [details]
failed qa engine log

Comment 18 Alona Kaplan 2018-02-18 14:25:25 UTC
Hi Michael,

According to the attached logs the version of your engine is 4.2.1.6-0.1.el7 and not 4.2.2-0.1.el7. Please upgrade your engine.

2018-02-18 11:20:43,637+02 INFO  [org.ovirt.engine.core.uutils.config.ShellLikeConfd] (ServerService Thread Pool -- 61) [] Value of property 'PACKAGE_DISPLAY_VERSION' is '4.2.1.6-0.1.el7'.
2018-02-18 11:20:43,638+02 INFO  [org.ovirt.engine.core.uutils.config.ShellLikeConfd] (ServerService Thread Pool -- 61) [] Value of property 'PACKAGE_NAME' is 'ovirt-engine'.
2018-02-18 11:20:43,638+02 INFO  [org.ovirt.engine.core.uutils.config.ShellLikeConfd] (ServerService Thread Pool -- 61) [] Value of property 'PACKAGE_VERSION' is '4.2.1.6'.

Comment 19 Michael Burman 2018-02-18 15:03:47 UTC
Yes, i had an issue on my rhvm env and now it fixed. Retesting

Comment 20 Michael Burman 2018-02-18 15:11:38 UTC
Verified on - 4.2.2-0.1.el7

"Host orchid-vds2.qa.lab.tlv.redhat.com does not comply with the cluster Cluster1 networks, the following VM networks are non-VM networks: 'net-3'. The host will become NonOperational."

Comment 21 Sandro Bonazzola 2018-03-29 11:18:35 UTC
This bugzilla is included in oVirt 4.2.2 release, published on March 28th 2018.

Since the problem described in this bug report should be
resolved in oVirt 4.2.2 release, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.