Bug 1272161
Summary: | network lost after update of console managed storage node | ||||||
---|---|---|---|---|---|---|---|
Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | Martin Bukatovic <mbukatov> | ||||
Component: | vdsm | Assignee: | Ramesh N <rnachimu> | ||||
Status: | CLOSED ERRATA | QA Contact: | Sachin <sashinde> | ||||
Severity: | high | Docs Contact: | |||||
Priority: | unspecified | ||||||
Version: | rhgs-3.1 | CC: | amukherj, danken, edwardh, ltrilety, mbukatov, mzywusko, nlevinki, rcyriac, rhinduja, rhs-bugs, rnachimu, sabose, sankarshan, sashinde | ||||
Target Milestone: | --- | Keywords: | ZStream | ||||
Target Release: | RHGS 3.1.3 | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | vdsm-4.16.30-1.4 | Doc Type: | Bug Fix | ||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2016-06-23 05:27:11 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | |||||||
Bug Blocks: | 1299184 | ||||||
Attachments: |
|
Description
Martin Bukatovic
2015-10-15 15:27:00 UTC
Created attachment 1083307 [details]
Output of yum update
This looks similar to Bug 1277951, which was closed due to lack of repro data. Triveni, Martin, can you check if 3.1.2 vdsm solves the issue - we rebased 3.1.2 vdsm to 4.16.30 to solve some of the network issues. Using my archive of libvirt snapshots, I restored machines from Sep 08 2015, re-registered them back into current stable cdn channels and updated storage servers (following steps to reproduce from this BZ) to: 1) current stable version from cdn (RHGS 3.1.1) vdsm-4.16.20-1.2.el6rhs -> vdsm-4.16.20-1.3.el6rhs (this is the same situation as described in this BZ) 2) latest version from qe puddle repo (RHGS 3.1.2) vdsm-4.16.20-1.2.el6rhs -> vdsm-4.16.30-1.3.el6rhs I was able to reproduce the issue in both cases. From supervdsm.log after reboot: restore-net::INFO::2016-01-08 10:10:33,243::ifcfg::423::root::(_loadBackupFiles) Loaded /var/lib/vdsm/netconfback/ifcfg-eth0 restore-net::INFO::2016-01-08 10:10:33,243::ifcfg::423::root::(_loadBackupFiles) Loaded /var/lib/vdsm/netconfback/ifcfg-ovirtmgmt But looks like the contents of these files are empty - and causes the files to be removed further down in the log.. restore-net::DEBUG::2016-01-08 10:10:34,060::ifcfg::377::root::(restoreAtomicBackup) Removing empty configuration backup /etc/sysconfig/network-scripts/ifcfg-ovirtmgmt restore-net::DEBUG::2016-01-08 10:10:34,060::ifcfg::377::root::(restoreAtomicBackup) Removing empty configuration backup /etc/sysconfig/network-scripts/ifcfg-eth0 Dan, what would cause the backup files to be empty? Additional information, state of vdsm netconfback files before update and reboot: ~~~ # ls -l /var/lib/vdsm/netconfback/ total 8 -rw-r--r--. 1 vdsm root 30 Jan 13 14:45 ifcfg-eth0 -rw-r--r--. 1 vdsm root 30 Jan 13 14:45 ifcfg-ovirtmgmt # cat /var/lib/vdsm/netconfback/ifcfg-eth0 # original file did not exist # cat /var/lib/vdsm/netconfback/ifcfg-ovirtmgmt # original file did not exist ~~~ While actual ifcfg files were there: ~~~ # cat /etc/sysconfig/network-scripts/ifcfg-eth0 # Generated by VDSM version 4.16.20-1.2.el6rhs DEVICE=eth0 HWADDR=52:54:00:70:7a:8c BRIDGE=ovirtmgmt ONBOOT=yes MTU=1500 NM_CONTROLLED=no # cat /etc/sysconfig/network-scripts/ifcfg-ovirtmgmt # Generated by VDSM version 4.16.20-1.2.el6rhs DEVICE=ovirtmgmt TYPE=Bridge DELAY=0 STP=off ONBOOT=yes BOOTPROTO=dhcp MTU=1500 DEFROUTE=yes NM_CONTROLLED=no HOTPLUG=no ~~~ Additional information: using libvirt and guestfish tools, I scripted extraction of the entire /var/lib/vdsm/ directory into tarball for each snapshot of node1 I have. Then I have searched for netconfback files in each tarball. And it turned out that this issue is likely related to nagios configuration (at lest in my case). See excerpt of my snapshot list: ~~~ Name Creation Time State ------------------------------------------------------------ ... w37_01_rhgsinstalled 2015-09-08 10:57:23 +0200 shutoff w37_02_volumedefined 2015-09-08 11:28:25 +0200 shutoff w37_03_nagiosconfigured 2015-09-08 17:21:14 +0200 shutoff ... ~~~ Almost empty netconfback files appreared in w37_03_nagiosconfigured snapshot for the first time. For every snapshot before that, the directory was empty. Note that w37_03_nagiosconfigured is the snapshot I restored when I hit this issue for the first time (as I noted in comment 6). I don't think Nagios has anything to do with network configurations. I feel there is something wrong with vdsm during upgrade. May be edwardh or Dan can help us here. Have u faced similar issues with VDSM in RHEV-M?. Looks like this problem is caused by "net_persistence = ifcfg" in /etc/vdsm/vdsm.conf. Somehow we have a state where /var/lib/vdsm/netconfback/ifcfg-eth0 and /var/lib/vdsm/netconfback/ifcfg-ovirtmgmt has the content '# original file did not exist' and "net_persistence = ifcfg" in /etc/vdsm/vdsm.conf. Rebooting the system in this state removes the ifcfg files at /var/lib/vdsm/netconfback/ and /etc/sysconfig/network-scripts/ and ends up with system not having any network. If I remove the field "net_persistence = ifcfg" in /etc/vdsm/vdsm.conf and reboot then this problem is not seen. This problem is reproducible in vdsm-4.16.20-1.2.el6rhs.x86_64. Edward Haas: Do have any idea about why the network-config backup files in folder /var/lib/vdsm/netconfback/* doesn't have any valid data? Do u have a similar issue in RHEV-M? Similar bz#1263979 was fixed in vdsm-4.16.28-1.el6ev.x86_64 for RHEV-M 3.5.6. But I am able to reproduce this issue in 4.16.30-1. We have added "net_persistence=ifcfg" as an workaround for bz#1203422. It was suggested by Dan in "https://bugzilla.redhat.com/show_bug.cgi?id=1215011#c2". But bz#1203422 is fixed now. So can we remove this config?. Regards, Ramesh We would prefer moving to the unified persistent mode (the default), so removing the config is recommended. If moving back to the unified persistent mode solves your issue, would it satisfy this bug? We would prefer not to touch the older ifcfg persistent mode too much, keeping our focus on the unified mode. (In reply to Edward Haas from comment #16) > We have added "net_persistence=ifcfg" as an workaround for bz#1203422. It > was suggested by Dan in > "https://bugzilla.redhat.com/show_bug.cgi?id=1215011#c2". But bz#1203422 is > fixed now. So can we remove this config?. > > Regards, > Ramesh > > > We would prefer moving to the unified persistent mode (the default), so > removing the config is recommended. We can move to unified persistent mode. We don't see any issue with that. But I would like to understand the scenario when /var/lib/vdsm/netconfback/ifcfg-eth0 and /var/lib/vdsm/netconfback/ifcfg-ovirtmgmt will have the content '# original file did not exist'. This will help me to understand the seriousness if this issue. Regards, Ramesh > > If moving back to the unified persistent mode solves your issue, would it > satisfy this bug? We would prefer not to touch the older ifcfg persistent > mode too much, keeping our focus on the unified mode. "net_persistence=ifcfg" was added as an workaround for bz#1203422. bz#1203422 is already fixed. So we can remove "net_persistence=ifcfg" from /etc/vdsm/vdsm.conf. Note: Though "net_persistence=ifcfg" was added during rpm install/update, it is getting removed during setupHostNetworks. Reverted the patch as mentioned in comment 18 I have verified the bug and found no issues with fixed in version vdsm-4.16.30-1.4 for both RHEL7 and RHEL6 nodes. Steps followed: 1. Install RHGS3.1.1 node and then add it to RHSC. 2. i found that "net_persistence=ifcfg" not present. 3. I updated the vdsm package 3.1.3 and put node in maintenance mode. 4. reboot the node and confirmed that network configs are proper 5. activated the host on RHSC and found no issues. 6. same steps followed for 3.1.2 as well. [root@dhcp35-139 ~]# cat /var/lib/vdsm/netconfback/ifcfg-ovirtmgmt cat: /var/lib/vdsm/netconfback/ifcfg-ovirtmgmt: No such file or directory [root@dhcp35-139 ~]# cat /etc/vdsm/vdsm.conf | grep ifcfg [root@dhcp35-139 ~]# [root@dhcp35-139 ~]# cat /etc/vdsm/vdsm.conf [vars] ssl = true [addresses] management_port = 54321 [root@dhcp35-139 ~]# Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2016:1242 |