Bug 1920330
| Summary: | [OVS] [RHV] Reboot of host with ovs interfaces lefts the interfaces in activating state | ||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | [oVirt] ovirt-engine | Reporter: | shopin | ||||||||||||
| Component: | ovirt-host-deploy-ansible | Assignee: | Ales Musil <amusil> | ||||||||||||
| Status: | CLOSED CURRENTRELEASE | QA Contact: | Michael Burman <mburman> | ||||||||||||
| Severity: | high | Docs Contact: | |||||||||||||
| Priority: | high | ||||||||||||||
| Version: | 4.4.4.7 | CC: | amusil, bugs, dfodor, dholler, mperina, shopin | ||||||||||||
| Target Milestone: | ovirt-4.4.8 | Keywords: | TestOnly | ||||||||||||
| Target Release: | --- | Flags: | pm-rhel:
ovirt-4.4+
|
||||||||||||
| Hardware: | x86_64 | ||||||||||||||
| OS: | Linux | ||||||||||||||
| Whiteboard: | |||||||||||||||
| Fixed In Version: | ovirt-engine-4.4.8 | Doc Type: | No Doc Update | ||||||||||||
| Doc Text: | Story Points: | --- | |||||||||||||
| Clone Of: | Environment: | ||||||||||||||
| Last Closed: | 2021-08-19 06:22:59 UTC | Type: | Bug | ||||||||||||
| Regression: | --- | Mount Type: | --- | ||||||||||||
| Documentation: | --- | CRM: | |||||||||||||
| Verified Versions: | Category: | --- | |||||||||||||
| oVirt Team: | Network | RHEL 7.3 requirements from Atomic Host: | |||||||||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||||||||
| Embargoed: | |||||||||||||||
| Bug Depends On: | 1861296, 1891437, 1921107, 1955594 | ||||||||||||||
| Bug Blocks: | |||||||||||||||
| Attachments: |
|
||||||||||||||
Created attachment 1750762 [details]
engine.log
Created attachment 1750763 [details]
host-deploy.log
The documentation text flag should only be set after 'doc text' field is provided. Please provide the documentation text and set the flag to '?' again. Do you have any custom firewalld rules defined on your setup? https://www.ovirt.org/documentation/administration_guide/#Configuring_Host_Firewall_Rules If so, does everything work fine, when you remove those custom rules? (In reply to Martin Perina from comment #4) > Do you have any custom firewalld rules defined on your setup? > > https://www.ovirt.org/documentation/administration_guide/ > #Configuring_Host_Firewall_Rules > > If so, does everything work fine, when you remove those custom rules? No, I don't have any custom rules. All the rules in firewalld are now default and those that ovirt sets. As I understand it, the problem is that now firewalld incorrectly processes ovs-port and ovs-bridge. During the installation and configuration process, ovirt actively interacts with firewalld, and this causes the scripts to fail. All versions used are the latest stable, from centos 8 and ovirt 4.4 Hi, can you please provide supervdsm.log from the affected host and possibly output from journalctl? (In reply to shopin from comment #0) > > Is there some workaround so that at least ovirt can re-install hosts? Could you let us know if the following flow does work for you: 1. Add a fresh host to a cluster with linux-bridge switchtype 2. Set the host in maintance mode 3. Change the cluster of the host to the OVS one 4. Active the host Created attachment 1750862 [details]
supervdsm.log
(In reply to Dominik Holler from comment #9) > (In reply to shopin from comment #0) > > > > Is there some workaround so that at least ovirt can re-install hosts? > > Could you let us know if the following flow does work for you: > 1. Add a fresh host to a cluster with linux-bridge switchtype > 2. Set the host in maintance mode > 3. Change the cluster of the host to the OVS one > 4. Active the host I discovered with this problem in December. Then I made sure that there are no problems if you use linux-bridge switch type. Everything really works as it should. And installing the host and reinstalling from the GUI. If I change the linux-bridge -> OVS host, then everything is ok too. Since this is the primary ovs configuration on the host. In OVS mode, I can only install a fresh host (it is important that there is no information about this host in ovirt and there is no ovs-bridge on the host), and then if I want to reinstall it, it will happen with an error. Moreover, the reinstall error appears when the checkbox is turned on and off - configure firewall settings (In reply to shopin from comment #11) > (In reply to Dominik Holler from comment #9) > > (In reply to shopin from comment #0) > > > > > > Is there some workaround so that at least ovirt can re-install hosts? > > > > Could you let us know if the following flow does work for you: > > 1. Add a fresh host to a cluster with linux-bridge switchtype > > 2. Set the host in maintance mode > > 3. Change the cluster of the host to the OVS one > > 4. Active the host > The cluster switching issue is that this host is in a cluster with glusterfs volume Did you reboot the host in the process? After installing it in ovs cluster? Created attachment 1750869 [details]
journalctl
Can you also attach the log from the host deploy that failed after executing firewall-cmd --reload? (In reply to Dana from comment #15) > Can you also attach the log from the host deploy that failed after executing > firewall-cmd --reload? [root@smnode02 ~]# firewall-cmd --reload Error: COMMAND_FAILED: 'python-nftables' failed: JSON blob: {"nftables": [{"metainfo": {"json_schema_version": 1}}, {"add": {"table": {"family": "inet", "name": "firewalld_policy_drop"}}}, {"add": {"chain": {"family": "inet", "table": "firewalld_policy_drop", "name": "filter_input", "type": "filter", "hook": "input", "prio": 9, "policy": "drop"}}}, {"add": {"chain": {"family": "inet", "table": "firewalld_policy_drop", "name": "filter_forward", "type": "filter", "hook": "forward", "prio": 9, "policy": "drop"}}}, {"add": {"chain": {"family": "inet", "table": "firewalld_policy_drop", "name": "filter_output", "type": "filter", "hook": "output", "prio": 9, "policy": "drop"}}}, {"add": {"rule": {"family": "inet", "table": "firewalld_policy_drop", "chain": "filter_input", "expr": [{"match": {"left": {"ct": {"key": "state"}}, "op": "in", "right": {"set": ["established", "related"]}}}, {"accept": null}]}}}, {"add": {"rule": {"family": "inet", "table": "firewalld_policy_drop", "chain": "filter_forward", "expr": [{"match": {"left": {"ct": {"key": "state"}}, "op": "in", "right": {"set": ["established", "related"]}}}, {"accept": null}]}}}, {"add": {"rule": {"family": "inet", "table": "firewalld_policy_drop", "chain": "filter_output", "expr": [{"match": {"left": {"ct": {"key": "state"}}, "op": "in", "right": {"set": ["established", "related"]}}}, {"accept": null}]}}}]} (In reply to Ales Musil from comment #13) > Did you reboot the host in the process? After installing it in ovs cluster? During the installation process the host was not rebooted. And then, yes, it was restarted. I'm still trying different options right now. And there is a working option to activate the host, this is to completely clean it from network configurations and install it in GUI, also for this I need to clean all the information about it in ovirt, including about glusterfs volume. (In reply to shopin from comment #17) > (In reply to Ales Musil from comment #13) > > Did you reboot the host in the process? After installing it in ovs cluster? > > During the installation process the host was not rebooted. And then, yes, it > was restarted. I'm still trying different options right now. And there is a > working option to activate the host, this is to completely clean it from > network configurations and install it in GUI, also for this I need to clean > all the information about it in ovirt, including about glusterfs volume. OK then the reboot is causing issues that can be seen in supervdsm.log, also the state in which the host is basically without any connection. This is a known issue [0]. The workaround to enable those interfaces is to restart the physical port "nmcli con down ovs-port-$INTERFACE" and then after few secs "nmcli con up ovs-port-$INTERFACE". $INTERFACE being your physical interface connected to ovirtmgmt e.g "enp1s0". (In reply to Ales Musil from comment #18) > (In reply to shopin from comment #17) > > (In reply to Ales Musil from comment #13) > > > Did you reboot the host in the process? After installing it in ovs cluster? > > > > During the installation process the host was not rebooted. And then, yes, it > > was restarted. I'm still trying different options right now. And there is a > > working option to activate the host, this is to completely clean it from > > network configurations and install it in GUI, also for this I need to clean > > all the information about it in ovirt, including about glusterfs volume. > > OK then the reboot is causing issues that can be seen in supervdsm.log, > also the state in which the host is basically without any connection. > This is a known issue [0]. The workaround to enable those interfaces is to > restart > the physical port "nmcli con down ovs-port-$INTERFACE" and then after few > secs "nmcli con up ovs-port-$INTERFACE". > $INTERFACE being your physical interface connected to ovirtmgmt e.g "enp1s0". [0] https://bugzilla.redhat.com/1891437 (In reply to Ales Musil from comment #18) > (In reply to shopin from comment #17) > > (In reply to Ales Musil from comment #13) > > > Did you reboot the host in the process? After installing it in ovs cluster? > > > > During the installation process the host was not rebooted. And then, yes, it > > was restarted. I'm still trying different options right now. And there is a > > working option to activate the host, this is to completely clean it from > > network configurations and install it in GUI, also for this I need to clean > > all the information about it in ovirt, including about glusterfs volume. > > OK then the reboot is causing issues that can be seen in supervdsm.log, > also the state in which the host is basically without any connection. > This is a known issue [0]. The workaround to enable those interfaces is to > restart > the physical port "nmcli con down ovs-port-$INTERFACE" and then after few > secs "nmcli con up ovs-port-$INTERFACE". > $INTERFACE being your physical interface connected to ovirtmgmt e.g "enp1s0". Yes, I know about the NetworkManager bug, and I already used a workaround. But there is another problem associated with firewalld. Firewall does not work correctly with ovs, and this affects ovirt scripts. Another important thing that I forgot to mention. If the host has multiple networks configured this workaround has to applied to every physical interface. Rebooting a host is important part of host deploy process and it's required for several usecases (for example switching the host from iptables to firewalld or updating linux kernel command line parameters). This is a bug in NetworkManager, which should survive reboot of the host without any issues. So on oVirt/RHV side we don't have any bug, we just need to wait for NetworkManager team to fix BZ1891437. In the meantime users need to use workaround mentioned in Comment 18 and it's important to add that it need to be applied to all physical interfaces of the host. (In reply to Ales Musil from comment #21) > Another important thing that I forgot to mention. If the host has multiple > networks configured this workaround > has to applied to every physical interface. Did the workaround work for you if you apply it to all physical network interfaces of the host? (In reply to Martin Perina from comment #23) > (In reply to Ales Musil from comment #21) > > Another important thing that I forgot to mention. If the host has multiple > > networks configured this workaround > > has to applied to every physical interface. > > Did the workaround work for you if you apply it to all physical network > interfaces of the host? This workaround works when I restart the host. But once again, these are different problems. There is a bug in NetworkManager. And there is a problem with firewalld, which I have identified. Firewalld does not work correctly with ovs. Since ovirt uses these technologies, I want to indicate that there is such a problem, but it rather applies to the developers of firewalld or openvswitch. Verified on - rhvm-4.4.8-0.19.el8ev.noarch with: NetworkManager-1.30.0-9.el8_4.x86_64 vdsm-4.40.80.1-1.el8ev.x86_64 nmstate-1.0.2-13.el8_4.noarch This bugzilla is included in oVirt 4.4.8 release, published on August 19th 2021. Since the problem described in this bug report should be resolved in oVirt 4.4.8 release, it has been closed with a resolution of CURRENT RELEASE. If the solution does not work for you, please open a new bug report. |
Created attachment 1750760 [details] Ovirt Event I have deployed a new cluster on versions: 3 Hosts: Centos 8.3 and last update Ovirt: 4.4.4.7-1.el8 Cluster properties: switch - OVS; Firewall - Firewalld During the initial deployment of the host in such a cluster, everything is successfully installed, all virtual networks and bridges are created, and the host is activated. But after that, Firewalld does not work correctly, for example, when executing the command firewall-cmd --reload Issued by: Error: COMMAND_FAILED: 'python-nftables' failed: JSON blob: {"nftables": [{"metainfo": {"json_schema_version": 1}}, {"add": {"table": {"family": "inet", "name": "firewalld_policy_drop"}}}, {"add": {"chain": {"family": "inet", "table": "firewalld_policy_drop", "name": "filter_input", "type": "filter", "hook": "input", "prio": 9, "policy": "drop"}}}, {"add": {"chain": {"family": "inet", "table": "firewalld_policy_drop", "name": "filter_forward", "type": "filter", "hook": "forward", "prio": 9, "policy": "drop"}}}, {"add": {"chain": {"family": "inet", "table": "firewalld_policy_drop", "name": "filter_output", "type": "filter", "hook": "output", "prio": 9, "policy": "drop"}}}, {"add": {"rule": {"family": "inet", "table": "firewalld_policy_drop", "chain": "filter_input", "expr": [{"match": {"left": {"ct": {"key": "state"}}, "op": "in", "right": {"set": ["established", "related"]}}}, {"accept": null}]}}}, {"add": {"rule": {"family": "inet", "table": "firewalld_policy_drop", "chain": "filter_forward", "expr": [{"match": {"left": {"ct": {"key": "state"}}, "op": "in", "right": {"set": ["established", "related"]}}}, {"accept": null}]}}}, {"add": {"rule": {"family": "inet", "table": "firewalld_policy_drop", "chain": "filter_output", "expr": [{"match": {"left": {"ct": {"key": "state"}}, "op": "in", "right": {"set": ["established", "related"]}}}, {"accept": null}]}}}]} Now all ansible-playbook ovirt starts to execute uncorrectly For example this error is thrown every time I try to reinstall the host: VDSM SMnode02 command CollectVdsNetworkDataAfterInstallationVDS failed: Internal JSON-RPC error: {'reason': ’management’} I understand that ovirt correctly executes scripts, and this problem is related to the applications Firewalld, openvswitch, NetworkManager. Is there some workaround so that at least ovirt can re-install hosts?