Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1920330

Summary: [OVS] [RHV] Reboot of host with ovs interfaces lefts the interfaces in activating state
Product: [oVirt] ovirt-engine Reporter: shopin
Component: ovirt-host-deploy-ansibleAssignee: Ales Musil <amusil>
Status: CLOSED CURRENTRELEASE QA Contact: Michael Burman <mburman>
Severity: high Docs Contact:
Priority: high    
Version: 4.4.4.7CC: amusil, bugs, dfodor, dholler, mperina, shopin
Target Milestone: ovirt-4.4.8Keywords: TestOnly
Target Release: ---Flags: pm-rhel: ovirt-4.4+
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: ovirt-engine-4.4.8 Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-08-19 06:22:59 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Network RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1861296, 1891437, 1921107, 1955594    
Bug Blocks:    
Attachments:
Description Flags
Ovirt Event
none
engine.log
none
host-deploy.log
none
supervdsm.log
none
journalctl none

Description shopin 2021-01-26 05:25:38 UTC
Created attachment 1750760 [details]
Ovirt Event

I have deployed a new cluster on versions:
3 Hosts: Centos 8.3 and last update
Ovirt: 4.4.4.7-1.el8

Cluster properties: switch - OVS; Firewall - Firewalld

During the initial deployment of the host in such a cluster, everything is successfully installed, all virtual networks and bridges are created, and the host is activated.

But after that, Firewalld does not work correctly, for example, when executing the command
firewall-cmd --reload
Issued by:
Error: COMMAND_FAILED: 'python-nftables' failed:
JSON blob:
{"nftables": [{"metainfo": {"json_schema_version": 1}}, {"add": {"table": {"family": "inet", "name": "firewalld_policy_drop"}}}, {"add": {"chain": {"family": "inet", "table": "firewalld_policy_drop", "name": "filter_input", "type": "filter", "hook": "input", "prio": 9, "policy": "drop"}}}, {"add": {"chain": {"family": "inet", "table": "firewalld_policy_drop", "name": "filter_forward", "type": "filter", "hook": "forward", "prio": 9, "policy": "drop"}}}, {"add": {"chain": {"family": "inet", "table": "firewalld_policy_drop", "name": "filter_output", "type": "filter", "hook": "output", "prio": 9, "policy": "drop"}}}, {"add": {"rule": {"family": "inet", "table": "firewalld_policy_drop", "chain": "filter_input", "expr": [{"match": {"left": {"ct": {"key": "state"}}, "op": "in", "right": {"set": ["established", "related"]}}}, {"accept": null}]}}}, {"add": {"rule": {"family": "inet", "table": "firewalld_policy_drop", "chain": "filter_forward", "expr": [{"match": {"left": {"ct": {"key": "state"}}, "op": "in", "right": {"set": ["established", "related"]}}}, {"accept": null}]}}}, {"add": {"rule": {"family": "inet", "table": "firewalld_policy_drop", "chain": "filter_output", "expr": [{"match": {"left": {"ct": {"key": "state"}}, "op": "in", "right": {"set": ["established", "related"]}}}, {"accept": null}]}}}]}

Now all ansible-playbook ovirt starts to execute uncorrectly

For example this error is thrown every time I try to reinstall the host:

VDSM SMnode02 command CollectVdsNetworkDataAfterInstallationVDS failed: Internal JSON-RPC error: {'reason': ’management’}

I understand that ovirt correctly executes scripts, and this problem is related to the applications Firewalld, openvswitch, NetworkManager.

Is there some workaround so that at least ovirt can re-install hosts?

Comment 1 shopin 2021-01-26 05:45:04 UTC
Created attachment 1750762 [details]
engine.log

Comment 2 shopin 2021-01-26 05:45:48 UTC
Created attachment 1750763 [details]
host-deploy.log

Comment 3 RHEL Program Management 2021-01-26 07:53:52 UTC
The documentation text flag should only be set after 'doc text' field is provided. Please provide the documentation text and set the flag to '?' again.

Comment 4 Martin Perina 2021-01-26 07:56:52 UTC
Do you have any custom firewalld rules defined on your setup?

https://www.ovirt.org/documentation/administration_guide/#Configuring_Host_Firewall_Rules

If so, does everything work fine, when you remove those custom rules?

Comment 5 shopin 2021-01-26 08:05:35 UTC
(In reply to Martin Perina from comment #4)
> Do you have any custom firewalld rules defined on your setup?
> 
> https://www.ovirt.org/documentation/administration_guide/
> #Configuring_Host_Firewall_Rules
> 
> If so, does everything work fine, when you remove those custom rules?

No, I don't have any custom rules. All the rules in firewalld are now default and those that ovirt sets. As I understand it, the problem is that now firewalld incorrectly processes ovs-port and ovs-bridge. During the installation and configuration process, ovirt actively interacts with firewalld, and this causes the scripts to fail.

All versions used are the latest stable, from centos 8 and ovirt 4.4

Comment 8 Ales Musil 2021-01-26 11:14:24 UTC
Hi, 

can you please provide supervdsm.log from the affected host and possibly output from journalctl?

Comment 9 Dominik Holler 2021-01-26 11:26:31 UTC
(In reply to shopin from comment #0)
> 
> Is there some workaround so that at least ovirt can re-install hosts?

Could you let us know if the following flow does work for you:
1. Add a fresh host to a cluster with linux-bridge switchtype
2. Set the host in maintance mode
3. Change the cluster of the host to the OVS one
4. Active the host

Comment 10 shopin 2021-01-26 11:28:27 UTC
Created attachment 1750862 [details]
supervdsm.log

Comment 11 shopin 2021-01-26 11:41:50 UTC
(In reply to Dominik Holler from comment #9)
> (In reply to shopin from comment #0)
> > 
> > Is there some workaround so that at least ovirt can re-install hosts?
> 
> Could you let us know if the following flow does work for you:
> 1. Add a fresh host to a cluster with linux-bridge switchtype
> 2. Set the host in maintance mode
> 3. Change the cluster of the host to the OVS one
> 4. Active the host

I discovered with this problem in December. Then I made sure that there are no problems if you use linux-bridge switch type. Everything really works as it should. And installing the host and reinstalling from the GUI.

If I change the linux-bridge -> OVS host, then everything is ok too. Since this is the primary ovs configuration on the host.

In OVS mode, I can only install a fresh host (it is important that there is no information about this host in ovirt and there is no ovs-bridge on the host), and then if I want to reinstall it, it will happen with an error. Moreover, the reinstall error appears when the checkbox is turned on and off - configure firewall settings

Comment 12 shopin 2021-01-26 11:43:53 UTC
(In reply to shopin from comment #11)
> (In reply to Dominik Holler from comment #9)
> > (In reply to shopin from comment #0)
> > > 
> > > Is there some workaround so that at least ovirt can re-install hosts?
> > 
> > Could you let us know if the following flow does work for you:
> > 1. Add a fresh host to a cluster with linux-bridge switchtype
> > 2. Set the host in maintance mode
> > 3. Change the cluster of the host to the OVS one
> > 4. Active the host
> 

The cluster switching issue is that this host is in a cluster with glusterfs volume

Comment 13 Ales Musil 2021-01-26 11:48:59 UTC
Did you reboot the host in the process? After installing it in ovs cluster?

Comment 14 shopin 2021-01-26 11:49:24 UTC
Created attachment 1750869 [details]
journalctl

Comment 15 Dana 2021-01-26 12:15:07 UTC
Can you also attach the log from the host deploy that failed after executing firewall-cmd --reload?

Comment 16 shopin 2021-01-26 12:20:52 UTC
(In reply to Dana from comment #15)
> Can you also attach the log from the host deploy that failed after executing
> firewall-cmd --reload?

[root@smnode02 ~]# firewall-cmd --reload
Error: COMMAND_FAILED: 'python-nftables' failed:
JSON blob:
{"nftables": [{"metainfo": {"json_schema_version": 1}}, {"add": {"table": {"family": "inet", "name": "firewalld_policy_drop"}}}, {"add": {"chain": {"family": "inet", "table": "firewalld_policy_drop", "name": "filter_input", "type": "filter", "hook": "input", "prio": 9, "policy": "drop"}}}, {"add": {"chain": {"family": "inet", "table": "firewalld_policy_drop", "name": "filter_forward", "type": "filter", "hook": "forward", "prio": 9, "policy": "drop"}}}, {"add": {"chain": {"family": "inet", "table": "firewalld_policy_drop", "name": "filter_output", "type": "filter", "hook": "output", "prio": 9, "policy": "drop"}}}, {"add": {"rule": {"family": "inet", "table": "firewalld_policy_drop", "chain": "filter_input", "expr": [{"match": {"left": {"ct": {"key": "state"}}, "op": "in", "right": {"set": ["established", "related"]}}}, {"accept": null}]}}}, {"add": {"rule": {"family": "inet", "table": "firewalld_policy_drop", "chain": "filter_forward", "expr": [{"match": {"left": {"ct": {"key": "state"}}, "op": "in", "right": {"set": ["established", "related"]}}}, {"accept": null}]}}}, {"add": {"rule": {"family": "inet", "table": "firewalld_policy_drop", "chain": "filter_output", "expr": [{"match": {"left": {"ct": {"key": "state"}}, "op": "in", "right": {"set": ["established", "related"]}}}, {"accept": null}]}}}]}

Comment 17 shopin 2021-01-26 12:26:35 UTC
(In reply to Ales Musil from comment #13)
> Did you reboot the host in the process? After installing it in ovs cluster?

During the installation process the host was not rebooted. And then, yes, it was restarted. I'm still trying different options right now. And there is a working option to activate the host, this is to completely clean it from network configurations and install it in GUI, also for this I need to clean all the information about it in ovirt, including about glusterfs volume.

Comment 18 Ales Musil 2021-01-26 12:47:50 UTC
(In reply to shopin from comment #17)
> (In reply to Ales Musil from comment #13)
> > Did you reboot the host in the process? After installing it in ovs cluster?
> 
> During the installation process the host was not rebooted. And then, yes, it
> was restarted. I'm still trying different options right now. And there is a
> working option to activate the host, this is to completely clean it from
> network configurations and install it in GUI, also for this I need to clean
> all the information about it in ovirt, including about glusterfs volume.

OK then the reboot is causing issues that can be seen in supervdsm.log,
also the state in which the host is basically without any connection. 
This is a known issue [0]. The workaround to enable those interfaces is to restart
the physical port "nmcli con down ovs-port-$INTERFACE" and then after few secs "nmcli con up ovs-port-$INTERFACE".
$INTERFACE being your physical interface connected to ovirtmgmt e.g "enp1s0".

Comment 19 Ales Musil 2021-01-26 12:48:39 UTC
(In reply to Ales Musil from comment #18)
> (In reply to shopin from comment #17)
> > (In reply to Ales Musil from comment #13)
> > > Did you reboot the host in the process? After installing it in ovs cluster?
> > 
> > During the installation process the host was not rebooted. And then, yes, it
> > was restarted. I'm still trying different options right now. And there is a
> > working option to activate the host, this is to completely clean it from
> > network configurations and install it in GUI, also for this I need to clean
> > all the information about it in ovirt, including about glusterfs volume.
> 
> OK then the reboot is causing issues that can be seen in supervdsm.log,
> also the state in which the host is basically without any connection. 
> This is a known issue [0]. The workaround to enable those interfaces is to
> restart
> the physical port "nmcli con down ovs-port-$INTERFACE" and then after few
> secs "nmcli con up ovs-port-$INTERFACE".
> $INTERFACE being your physical interface connected to ovirtmgmt e.g "enp1s0".

[0] https://bugzilla.redhat.com/1891437

Comment 20 shopin 2021-01-26 12:56:07 UTC
(In reply to Ales Musil from comment #18)
> (In reply to shopin from comment #17)
> > (In reply to Ales Musil from comment #13)
> > > Did you reboot the host in the process? After installing it in ovs cluster?
> > 
> > During the installation process the host was not rebooted. And then, yes, it
> > was restarted. I'm still trying different options right now. And there is a
> > working option to activate the host, this is to completely clean it from
> > network configurations and install it in GUI, also for this I need to clean
> > all the information about it in ovirt, including about glusterfs volume.
> 
> OK then the reboot is causing issues that can be seen in supervdsm.log,
> also the state in which the host is basically without any connection. 
> This is a known issue [0]. The workaround to enable those interfaces is to
> restart
> the physical port "nmcli con down ovs-port-$INTERFACE" and then after few
> secs "nmcli con up ovs-port-$INTERFACE".
> $INTERFACE being your physical interface connected to ovirtmgmt e.g "enp1s0".

Yes, I know about the NetworkManager bug, and I already used a workaround. But there is another problem associated with firewalld. Firewall does not work correctly with ovs, and this affects ovirt scripts.

Comment 21 Ales Musil 2021-01-26 14:05:45 UTC
Another important thing that I forgot to mention. If the host has multiple networks configured this workaround
has to applied to every physical interface.

Comment 22 Martin Perina 2021-01-27 06:09:17 UTC
Rebooting a host is important part of host deploy process and it's required for several usecases (for example switching the host from iptables to firewalld or updating linux kernel command line parameters). This is a bug in NetworkManager, which should survive reboot of the host without any issues. So on oVirt/RHV side we don't have any bug, we just need to wait for NetworkManager team to fix BZ1891437.

In the meantime users need to use workaround mentioned in Comment 18 and it's important to add that it need to be applied to all physical interfaces of the host.

Comment 23 Martin Perina 2021-01-27 06:10:58 UTC
(In reply to Ales Musil from comment #21)
> Another important thing that I forgot to mention. If the host has multiple
> networks configured this workaround
> has to applied to every physical interface.

Did the workaround work for you if you apply it to all physical network interfaces of the host?

Comment 24 shopin 2021-01-27 07:51:46 UTC
(In reply to Martin Perina from comment #23)
> (In reply to Ales Musil from comment #21)
> > Another important thing that I forgot to mention. If the host has multiple
> > networks configured this workaround
> > has to applied to every physical interface.
> 
> Did the workaround work for you if you apply it to all physical network
> interfaces of the host?

This workaround works when I restart the host. But once again, these are different problems. There is a bug in NetworkManager. And there is a problem with firewalld, which I have identified. Firewalld does not work correctly with ovs. Since ovirt uses these technologies, I want to indicate that there is such a problem, but it rather applies to the developers of firewalld or openvswitch.

Comment 26 Michael Burman 2021-07-18 09:48:55 UTC
Verified on - rhvm-4.4.8-0.19.el8ev.noarch with:

NetworkManager-1.30.0-9.el8_4.x86_64
vdsm-4.40.80.1-1.el8ev.x86_64
nmstate-1.0.2-13.el8_4.noarch

Comment 27 Sandro Bonazzola 2021-08-19 06:22:59 UTC
This bugzilla is included in oVirt 4.4.8 release, published on August 19th 2021.

Since the problem described in this bug report should be resolved in oVirt 4.4.8 release, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.