Bug 1441530
Summary: | setupNetworks fails after firewalld creates .bak files | ||||||
---|---|---|---|---|---|---|---|
Product: | [oVirt] vdsm | Reporter: | RamaKasturi <knarra> | ||||
Component: | Core | Assignee: | Edward Haas <edwardh> | ||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | RamaKasturi <knarra> | ||||
Severity: | medium | Docs Contact: | |||||
Priority: | high | ||||||
Version: | --- | CC: | bugs, cshao, danken, dguo, ipilcher, knarra, rbarry, sabose, stirabos, surs, ylavi, yzhao | ||||
Target Milestone: | ovirt-4.1.4 | Keywords: | Regression, Reopened | ||||
Target Release: | --- | Flags: | sabose:
ovirt-4.1?
sbonazzo: blocker? sabose: planning_ack? rule-engine: devel_ack+ rule-engine: testing_ack+ |
||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | v4.19.23 | Doc Type: | If docs needed, set a value | ||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | |||||||
: | 1443169 (view as bug list) | Environment: | |||||
Last Closed: | 2017-07-28 14:10:40 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | Network | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | |||||||
Bug Blocks: | 1411323, 1443169, 1485863 | ||||||
Attachments: |
|
Description
RamaKasturi
2017-04-12 07:55:14 UTC
Logs? Created attachment 1271112 [details]
uploading the log file
Uploaded the log file. I'm not able to reproduce this with a kickstart, interactive install, or changing the IP with Cockpit Can you please provide complete steps to reproduce, including installation details? Below are the steps that i performed and i hit the bug above: ============================================================= 1) Download the latest RHV-H ISO 2) On the hypervisor use virtual storage and plugin the ISO . 3) I have two nics on my system, turn network on for both of these nics. 4) click on the configure button -> General -> select network when available-> save 5) provide the password and install. 6) reboot the system. 7) Once the system is up, click on Virtualization -> HostedEngine -> select Radio button 'HostedEngine with Gluster' 8) Finish the gluster deployment. 9) Once gluster deployment is over there comes a button called 'Continue with Hosted Engine setup' 10) click on that button which takes user to HE deployment. 11) Provide the answers for the question asked. 12) When it reaches the step where ovirtmgmt bridge is configured, it fails with the above error mentioned. Above are the exact same steps i follow while doing the setup. I didn't meet the issue with nfs storage during configuring of ovirtmgmt bridge. Test version: [root@dell-per730-35 ~]# imgbase w [INFO] You are on rhvh-4.1-0.20170403.0+1 [root@dell-per730-35 ~]# rpm -qa |grep cockpit cockpit-bridge-126-1.el7.x86_64 cockpit-ws-126-1.el7.x86_64 cockpit-shell-126-1.el7.noarch cockpit-storaged-126-1.el7.noarch cockpit-ovirt-dashboard-0.10.7-0.0.16.el7ev.noarch Test steps: 1. Install RHVH4.1 2. Add the host to engine(automatically configure the ovirtmgmt bridge) 3. Login cockpit and redeploy HE with nfs storage After step2, [root@dell-per730-35 ~]# ip a s 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: em1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master ovirtmgmt state UP qlen 1000 link/ether 24:6e:96:19:b9:a4 brd ff:ff:ff:ff:ff:ff 3: em2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000 link/ether 24:6e:96:19:b9:a5 brd ff:ff:ff:ff:ff:ff 4: em3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000 link/ether 24:6e:96:19:b9:a6 brd ff:ff:ff:ff:ff:ff 5: em4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000 link/ether 24:6e:96:19:b9:a7 brd ff:ff:ff:ff:ff:ff 6: p7p1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000 link/ether a0:36:9f:9d:3b:fe brd ff:ff:ff:ff:ff:ff 7: p7p2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000 link/ether a0:36:9f:9d:3b:ff brd ff:ff:ff:ff:ff:ff 8: p5p1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN qlen 1000 link/ether a0:36:9f:ae:9f:50 brd ff:ff:ff:ff:ff:ff 9: ;vdsmdummy;: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN qlen 1000 link/ether 56:16:42:af:ec:79 brd ff:ff:ff:ff:ff:ff 10: ovirtmgmt: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP qlen 1000 link/ether 24:6e:96:19:b9:a4 brd ff:ff:ff:ff:ff:ff inet 10.73.131.65/23 brd 10.73.131.255 scope global dynamic ovirtmgmt valid_lft 43029sec preferred_lft 43029sec inet6 2620:52:0:4982:266e:96ff:fe19:b9a4/64 scope global mngtmpaddr dynamic valid_lft 2591992sec preferred_lft 604792sec inet6 fe80::266e:96ff:fe19:b9a4/64 scope link valid_lft forever preferred_lft forever After step3, HE redeploy successfully. [root@dell-per730-35 opt]# hosted-engine --vm-status --== Host 1 status ==-- conf_on_shared_storage : True Status up-to-date : True Hostname : dell-per730-35.lab.eng.pek2.redhat.com Host ID : 1 Engine status : {"health": "good", "vm": "up", "detail": "up"} Score : 3400 stopped : False Local maintenance : False crc32 : 08b14375 local_conf_timestamp : 3560 Host timestamp : 3545 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=3545 (Wed Apr 12 08:39:05 2017) host-id=1 score=3400 vm_conf_refresh_time=3560 (Wed Apr 12 08:39:20 2017) conf_on_shared_storage=True maintenance=False state=EngineUp stopped=False I also can't reproduce this. Steps: Install a system in interactive mode with multiple NICs Configure both NICs to start at boot Install and reboot No .bak files Log into cockpit. Start HE setup. No .bak files I don't have a gluster test environment. Can you please check when these .bak files appear and provide exact steps to reproduce up to that point? Hi Ryan, I did some testing and found that .bak files are getting generated during gluster deployment and at the step of firewall configuration. Below are the steps which happen during firewall config using gdeploy: ======================================================================= 1) since firewalld is already started gdeploy will not start it. 2) Add glusterfs service to firewalld rules 3) Reloads the firewall 4) Open / close firewalld ports 5) Reloads the firewall. I am not sure during which of the above step causes the .bak files to be created. Do you have any idea ? I'll read through the gdeploy code, but I'd expect it to be there. Are these created when using gdeploy on a rhgs host? Just to follow up, I didn't find references to .bak in gdeploy, but I don't know the codebase well. All of those steps are strictly related to gdeploy, though. Can you grab the generated gdeploy config file? I didn't either, but comment#8 is a very good indicator that they are. Do these appear on rhgs hosts? (In reply to Ryan Barry from comment #12) > I didn't either, but comment#8 is a very good indicator that they are. > > Do these appear on rhgs hosts? By rhgs hosts you mean systems installed using RHGS ISO ? If yes, i am not sure about that. But i can check it out. Sachi, can you check if the .bak files are created as part of gdeploy process as per comment 8? Sahina, gdeploy does not create any .bak files, I had a discussion regarding this with Kasturi. And we both went through the config, I have requested her to loop me in while she tries again. Hi Ryan, Me and sachi checked the setup and found that, if ifcfg files are changed and firewalld is reloaded we see that .bak files are getting generated on the node. Is this behaviour expected with RHV-H? We performed the same test with RHGS and we do not see .bak files getting generated when ifcfg files are changed and firewall is reloaded. Thanks kasturi (In reply to RamaKasturi from comment #16) > Hi Ryan, > > Me and sachi checked the setup and found that, if ifcfg files are changed > and firewalld is reloaded we see that .bak files are getting generated on > the node. Is this behaviour expected with RHV-H? > This is irrespective of whether gdeploy/ansible is used or not. No, this is not expected, and I can't reproduce that. We don't make any changes from platform. "systemctl restart firewalld.service" does not result in .bak files. Deploying hosted engine does not result in .bak files. What are the steps to reproduce? This appears to be straight from firewalld, and should be reproducible/not reproducible on all systems (RHEL and RHVH): https://github.com/t-woerner/firewalld/commit/fe6cf16e5a5ef3e49cdb554af8cf18024371554a (In reply to Ryan Barry from comment #18) > No, this is not expected, and I can't reproduce that. > > We don't make any changes from platform. > > "systemctl restart firewalld.service" does not result in .bak files. > > Deploying hosted engine does not result in .bak files. > > What are the steps to reproduce? it is easily reproducible with a minimal of 3 steps. 1) Open the file /etc/sysconfig/network-scripts/ifcg-<interface> and add the lines 'ZONE=public' at the end. 2) Restart firewalld service by running the command "systemctl restart firewalld.service". 3) .bak files are seen. This should also be reproducible on RHGS and RHEL. I'll see if there's a way to disable this. (In reply to Ryan Barry from comment #21) > This should also be reproducible on RHGS and RHEL. > > I'll see if there's a way to disable this. cool, thanks Ryan. Per https://bugzilla.redhat.com/show_bug.cgi?id=1381314#c21, NetworkManager and network.service both ignore files with .bak, which is probably why this was never seen before. If we can't work around this in RHV-H (I didn't find a good option to disable the creation of backup files, though I didn't go through the code), can otopi/ovirt-hosted-engine-setup behave the same way as NM/network.service? (In reply to Ryan Barry from comment #23) > If we can't work around this in RHV-H (I didn't find a good option to > disable the creation of backup files, though I didn't go through the code), > can otopi/ovirt-hosted-engine-setup behave the same way as > NM/network.service? hosted-engine-setup is just calling setupNetworks on vdsm/supervdsm. it's not dealing at all with ifcg files. The exceptions comes from vdsm. Dguo, Due to yzhao is on holiday, so could you help to reproduce this issue with QE env? Thanks. Moving this bug to VDSM. Network subsystem needs to be fixed to cope with new firewalld behavior. Cancel the needinfo flag due to change component. Please attach {super,}vdsm.log and the content of relevant ifcfg files. Why do you add ZONE to ifcfg? Modification to ifcfg files is not supported, as they maybe overwritten on boot by vdsm. Ran sosreport and it is present in the location below: ========================================================== http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/HC/1441530/ content from ifcfg files: =========================== [root@rhsqa-grafton6 ~]# cat /etc/sysconfig/network-scripts/ifcfg-enp4s0f0 # Generated by VDSM version 4.19.15-1.el7ev DEVICE=enp4s0f0 ONBOOT=yes BOOTPROTO=dhcp MTU=1500 DEFROUTE=no NM_CONTROLLED=no IPV6INIT=yes IPV6_AUTOCONF=yes ZONE=public [root@rhsqa-grafton6 ~]# cat /etc/sysconfig/network-scripts/ifcfg-ens3f0 # Generated by VDSM version 4.19.15-1.el7ev DEVICE=ens3f0 BRIDGE=ovirtmgmt ONBOOT=yes MTU=1500 DEFROUTE=no NM_CONTROLLED=no IPV6INIT=no ZONE to ifcfg files is added by gdeploy, AFAIK. (In reply to RamaKasturi from comment #29) > Ran sosreport and it is present in the location below: > ========================================================== > http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/HC/1441530/ > > content from ifcfg files: > =========================== > [root@rhsqa-grafton6 ~]# cat /etc/sysconfig/network-scripts/ifcfg-enp4s0f0 > # Generated by VDSM version 4.19.15-1.el7ev > DEVICE=enp4s0f0 > ONBOOT=yes > BOOTPROTO=dhcp > MTU=1500 > DEFROUTE=no > NM_CONTROLLED=no > IPV6INIT=yes > IPV6_AUTOCONF=yes > ZONE=public > ZONE to ifcfg files is added by gdeploy, AFAIK. We do not support ZONE in the ifcfg files, it can be overwritten by VDSM. VDSM does not expect anyone to touch the ifcfg files and doing so is an integration challenge. Regarding the logs, please attach the supervdsm logs from the time the problem occurs, I cannot figure out which is the relevant log file. moving to 4.1.5, since we cannot block 4.1.4 for too long. Hi Dan, I have tested with the latest bits of RHV-H and RHV 4.1.3 and i do not see the issue any more. with .bak files existing hosted engine deployment succeeds and no issues are seen. You can close this bug for now, i will reopen this in case i see the issue again with all the relevant logs. Thanks kasturi Reopening the bug as i have seen an unexpected exception during setupNetworks because of the existence of .bak files and below is what i see in the supervdsm and vdsm logs. I am not seeing the issue with hosted engine deployment but with add host and attaching glusternw to the interface. Error seen in supervdsm.log file: ===================================== MainProcess|jsonrpc/6::INFO::2017-07-12 12:14:01,707::legacy_switch::215::root::(_add_network) Adding network glusternw with vlan=None, bonding=None, nic=ens4f0, mtu=1500, b ridged=False, defaultRoute=False, options={u'switch': u'legacy', u'bootproto': u'dhcp'} MainProcess|jsonrpc/6::INFO::2017-07-12 12:14:01,707::legacy_switch::242::root::(_add_network) Configuring device ens4f0 MainProcess|jsonrpc/6::ERROR::2017-07-12 12:14:01,734::supervdsmServer::97::SuperVdsm.ServerCallback::(wrapper) Error in setupNetworks Traceback (most recent call last): File "/usr/share/vdsm/supervdsmServer", line 95, in wrapper res = func(*args, **kwargs) File "/usr/lib/python2.7/site-packages/vdsm/network/api.py", line 217, in setupNetworks _setup_networks(networks, bondings, options) File "/usr/lib/python2.7/site-packages/vdsm/network/api.py", line 238, in _setup_networks netswitch.setup(networks, bondings, options, in_rollback) File "/usr/lib/python2.7/site-packages/vdsm/network/netswitch.py", line 132, in setup _setup_legacy(legacy_nets, legacy_bonds, options, in_rollback) File "/usr/lib/python2.7/site-packages/vdsm/network/netswitch.py", line 153, in _setup_legacy bondings, _netinfo) File "/usr/lib/python2.7/site-packages/vdsm/network/legacy_switch.py", line 464, in add_missing_networks _netinfo=_netinfo, **attrs) File "/usr/lib/python2.7/site-packages/vdsm/network/legacy_switch.py", line 182, in wrapped return func(network, configurator, **kwargs) File "/usr/lib/python2.7/site-packages/vdsm/network/legacy_switch.py", line 243, in _add_network net_ent_to_configure.configure(**options) File "/usr/lib/python2.7/site-packages/vdsm/network/models.py", line 100, in configure self.configurator.configureNic(self, **opts) File "/usr/lib/python2.7/site-packages/vdsm/network/configurators/ifcfg.py", line 201, in configureNic IfcfgAcquire.acquire_device(nic.name) File "/usr/lib/python2.7/site-packages/vdsm/network/configurators/ifcfg_acquire.py", line 38, in acquire_device IfcfgAcquireNMonline.acquire_device(device) File "/usr/lib/python2.7/site-packages/vdsm/network/configurators/ifcfg_acquire.py", line 64, in acquire_device fpath = IfcfgAcquireNMonline._ifcfg_file_lookup(active_connection.uuid) File "/usr/lib/python2.7/site-packages/vdsm/network/configurators/ifcfg_acquire.py", line 75, in _ifcfg_file_lookup uuid, _ = networkmanager.ifcfg2connection(ifcfg_file) File "/usr/lib/python2.7/site-packages/vdsm/network/nm/networkmanager.py", line 86, in ifcfg2connection return nm_ifcfg.ifcfg2connection(ifcfg_file_path) File "/usr/lib/python2.7/site-packages/vdsm/network/nm/nmdbus/__init__.py", line 65, in ifcfg2connection con_info = self.ifcfg.GetIfcfgDetails(ifcfg_path) File "/usr/lib64/python2.7/site-packages/dbus/proxies.py", line 70, in __call__ return self._proxy_method(*args, **keywords) File "/usr/lib64/python2.7/site-packages/dbus/proxies.py", line 145, in __call__ **keywords) File "/usr/lib64/python2.7/site-packages/dbus/connection.py", line 651, in call_blocking message, timeout) DBusException: org.freedesktop.NetworkManager.Settings.InvalidConnection: ifcfg path '/etc/sysconfig/network-scripts/ifcfg-eno2.bak' is not an ifcfg base file MainProcess|jsonrpc/0::DEBUG::2017-07-12 12:14:01,929::commands::93::root::(execCmd) SUCCESS: <err> = ''; <rc> = 0 Following is the Traceback seen in vdsm.log file: ================================================ Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/yajsonrpc/__init__.py", line 572, in _handle_request res = method(**params) File "/usr/lib/python2.7/site-packages/vdsm/rpc/Bridge.py", line 198, in _dynamicMethod result = fn(*methodArgs) File "/usr/share/vdsm/API.py", line 1575, in setupNetworks supervdsm.getProxy().setupNetworks(networks, bondings, options) File "/usr/lib/python2.7/site-packages/vdsm/supervdsm.py", line 53, in __call__ return callMethod() File "/usr/lib/python2.7/site-packages/vdsm/supervdsm.py", line 51, in <lambda> **kwargs) File "<string>", line 2, in setupNetworks File "/usr/lib64/python2.7/multiprocessing/managers.py", line 773, in _callmethod raise convert_to_error(kind, result) DBusException: org.freedesktop.NetworkManager.Settings.InvalidConnection: ifcfg path '/etc/sysconfig/network-scripts/ifcfg-eno2.bak' is not an ifcfg base file Copied supervdsm and vdsm logs here. http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/HC/1441530/ Verified and works fine with build Red Hat Virtualization Manager Version: 4.1.4.2-0.1.el7. When there are .bak files present in the system i do not see any exceptions during add host and attaching glusternw to the interface. I see that 'Setup Host Networks' dialog is hung for a long time. I will try reproducing it and file another issue. I just hit this while trying to deploy RHHI. In my case, it is complaining about ifcfg-bond0.210.bak. bond0.210 is my Gluster "backend" network, and it appears that something (Gluster setup?) added "ZONE=public" to ifcfg-bond0.210 and created the .bak file. (In reply to Ian Pilcher from comment #36) > I just hit this while trying to deploy RHHI. In my case, it is complaining > about ifcfg-bond0.210.bak. bond0.210 is my Gluster "backend" network, and > it appears that something (Gluster setup?) added "ZONE=public" to > ifcfg-bond0.210 and created the .bak file. This issue has been fixed with ovirt4.1.4 and if you are using the same ovirt version you should not be seeing the issue. gdeploy adds the line "ZONE=public" to ifcfg files and restarts firewalld which creates the .bak files. To workaround the issue before deploying you can remove the .bak files. |