Bug 1144639

Summary: vdsm-reg: interfaces removed only lo remains
Product: Red Hat Enterprise Virtualization Manager Reporter: Douglas Schilling Landgraf <dougsland>
Component: vdsmAssignee: Barak <bazulay>
Status: CLOSED ERRATA QA Contact: Martin Pavlik <mpavlik>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 3.5.0CC: bazulay, cshao, danken, dougsland, ecohen, fdeutsch, gklein, hadong, huiwa, iheim, leiwang, lpeer, lsurette, mpavlik, myakove, nyechiel, yaniwang, ycui, yeylon
Target Milestone: ---   
Target Release: 3.5.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: network
Fixed In Version: 4.16.6 Doc Type: Bug Fix
Doc Text:
Previously, the VDSM registration service was not updated to include the new VDSM persistence scheme, meaning that VDSM registration would not persist its bridge configuration. VDSM registration now uses the new persistence scheme, so that after registration the management bridge persists as it should.
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-02-11 21:12:43 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Network RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1094719, 1147536, 1164308, 1164311    
Attachments:
Description Flags
messages
none
vdsm_logs none

Description Douglas Schilling Landgraf 2014-09-20 03:33:06 UTC
Description of problem:

After RHEV-H approval 'rhevm', 'ens3' interface are removed only remaining lo interface. Cannot complete the deploy.
  
Version-Release number of selected component (if applicable):

rhev-hypervisor7-7.0-20140904.0.el7ev
vdsm-4.16.3-2.el7.x86_64
vdsm-reg-4.16.3-2.el7.noarch

How reproducible:

- Install RHEV-H 7.0-20140904.0
- Configure Network as Static IP address
- Register into oVirt Engine 
- Approve the host

Additional info:

No /etc/sysconfig/network-scripts/ifcfg-rhevm or ifcfg-ens3

From /var/log/messages:
============================
Sep 20 02:34:13 localhost vdsmd_init_common.sh: vdsm: Running upgrade_300_nets
Sep 20 02:34:14 localhost ntpd[3571]: Deleting interface #6 rhevm, fe80::5054:ff:fe01:60f8#123, interface stats: received=0, sent=0, dropped=0, active_time=60 secs
Sep 20 02:34:14 localhost ntpd[3571]: Deleting interface #5 ens3, fe80::5054:ff:fe01:60f8#123, interface stats: received=0, sent=0, dropped=0, active_time=1112 secs
Sep 20 02:34:14 localhost ntpd[3571]: Deleting interface #3 rhevm, 192.168.100.166#123, interface stats: received=0, sent=0, dropped=0, active_time=1112 secs
Sep 20 02:34:14 localhost systemd: Started Virtual Desktop Server Manager.

Comment 1 Douglas Schilling Landgraf 2014-09-20 03:34:56 UTC
Created attachment 939474 [details]
messages

Comment 2 Douglas Schilling Landgraf 2014-09-20 03:59:56 UTC
Created attachment 939476 [details]
vdsm_logs

Comment 3 Antoni Segura Puimedon 2014-09-22 09:29:44 UTC
Hi Douglas,

does /var/lib/vdsm/upgrades consist of:
    ➜  ~  find /var/lib/vdsm/upgrade/
    /var/lib/vdsm/upgrade/
    /var/lib/vdsm/upgrade/upgrade-unified-persistence

It looks to me as if the upgrade to unified persistence didn't run, so rhevm
was not created under /var/lib/vdsm/persistence/netconf/nets/rhevm and then
when the network restoration script ran it restored to {} because there were no
saved nets in /var/lib/vdsm/persistence/netconf

Comment 5 Douglas Schilling Landgraf 2014-09-24 19:32:39 UTC
(In reply to Antoni Segura Puimedon from comment #3)
> Hi Douglas,
> 
> does /var/lib/vdsm/upgrades consist of:
>     ➜  ~  find /var/lib/vdsm/upgrade/
>     /var/lib/vdsm/upgrade/
>     /var/lib/vdsm/upgrade/upgrade-unified-persistence
> 
> It looks to me as if the upgrade to unified persistence didn't run, so rhevm
> was not created under /var/lib/vdsm/persistence/netconf/nets/rhevm and then
> when the network restoration script ran it restored to {} because there were
> no
> saved nets in /var/lib/vdsm/persistence/netconf

Hi Toni,

Here the data:

# cat /var/lib/vdsm/upgrade/upgrade-unified-persistence
#

# cd /
# find . -name upgrade-unified-persistence
./var/lib/stateless/writable/var/lib/vdsm/upgrade/upgrade-unified-persistence
./var/lib/vdsm/upgrade/upgrade-unified-persistence
./config/var/lib/vdsm/upgrade/upgrade-unified-persistence

There is a broken symlink for netconf:
# ls -la /var/lib/vdsm/persistence/netconf -> /var/lib/vdsm/persistence/netconf.1411585891117873466

Please let me know if you need any additional data.

Comment 6 Douglas Schilling Landgraf 2014-09-25 19:23:23 UTC
I have configured static ip address to rhev-hypervisor7-7.0-20140925.0.iso [1] and got the same issue.

[1] https://brewweb.devel.redhat.com/taskinfo?taskID=8025131

Comment 7 Antoni Segura Puimedon 2014-09-26 15:45:41 UTC
The problem was that vdsm-reg was creating the rhevm bridge with addNetwork which created /var/run/vdsm/nets/rhevm when vdsm had never started before.

That means that when rhevm would approve the vdsm-reg request and ask vdsmd to start, vdsmd would see that /var/run/vdsm/nets_restored did not exist, thus it would want to restore networks, i.e., delete networks in /var/run/vdsm/nets and replace them with /var/lib/vdsm/persistence/netconf/nets (which would not exist at that point). That would make rhevh lose any connectivity that the rhevm bridge was providing, which results in this bug.

The solution I propose, and that I will send a patch with, is that vdsm-reg, after doing addNetwork for rhevm it calls /usr/share/vdsm/vdsm-store-net-config "unified" (or "ifcfg" depending on the net_persistence vdsm config).

Comment 9 Antoni Segura Puimedon 2014-09-29 21:31:57 UTC
After solving the issue above we noticed that there was a problem of persisting the positional arguments to the addNetwork script that vdsm-reg uses. Just submitted a patch for that and it solved the issue in a repeated registration. We'll test in a clean installation asap.

Comment 13 Martin Pavlik 2014-10-22 10:41:15 UTC
VERIFIED

Comment 17 errata-xmlrpc 2015-02-11 21:12:43 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2015-0159.html