Bug 1222139

Summary: [network] upgrade to vdsm 4.16.13.1-1.el6ev does not recognize rhevm logical network
Product: Red Hat Enterprise Virtualization Manager Reporter: Marina Kalinin <mkalinin>
Component: vdsmAssignee: Ido Barkan <ibarkan>
Status: CLOSED INSUFFICIENT_DATA QA Contact: Meni Yakove <myakove>
Severity: high Docs Contact:
Priority: high    
Version: 3.5.1CC: bazulay, danken, ecohen, fdeutsch, gklein, lpeer, lsurette, mkalinin, yeylon, ylavi
Target Milestone: ---   
Target Release: 3.5.4   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: network
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-06-29 08:41:07 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Network RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1203422, 1249396, 1249397    
Bug Blocks:    
Attachments:
Description Flags
supervdsm.log
none
ip route show table all none

Description Marina Kalinin 2015-05-15 21:03:43 UTC
Description of problem:
I upgraded my RHEL host from vdsm-4.14.18-4.el6ev to vdsm-4.16.13.1-1.el6ev. 
After the upgrade rhevm logical network disappeared.
Check the config files, the network was setup as onboot=no and defrout=no.
~~~
# cat /etc/sysconfig/network-scripts/ifcfg-rhevm
# Generated by VDSM version 4.16.13.1-1.el6ev
DEVICE=rhevm
TYPE=Bridge
DELAY=0
STP=off
ONBOOT=no
BOOTPROTO=dhcp
MTU=1500
DEFROUTE=no
NM_CONTROLLED=no
HOTPLUG=no
~~~
# cat /var/lib/vdsm/persistence/netconf/nets/rhevm
 {"nic": "eth0", "mtu": 1500, "bootproto": "dhcp", "stp": false, "bridged": true, "defaultRoute": false}

After modifying persistence setting and running restore script, network came back ok.
~~~
# vim /var/lib/vdsm/persistence/netconf/nets/rhevm
# rm -f /var/run/vdsm/nets_restored
# /usr/share/vdsm/vdsm-restore-net-config
~~~

I will attach some logs now.

Comment 1 Marina Kalinin 2015-05-15 21:08:14 UTC
Created attachment 1026060 [details]
supervdsm.log

Comment 2 Marina Kalinin 2015-05-15 21:09:40 UTC
The upgrade happened somewhere after 2015-05-14 16:32:23.

Comment 3 Dan Kenigsberg 2015-05-17 19:00:36 UTC
Any chance you can provide the output of `route -n` and `ip route show table all` prior of the upgrade?

/var/log/vdsm/upgrade.log is a must for debugging upgrade-related issues.

Comment 5 Marina Kalinin 2015-05-18 16:09:08 UTC
Created attachment 1026764 [details]
ip route show table all

Comment 6 Dan Kenigsberg 2015-05-18 21:39:21 UTC
0.0.0.0         10.10.183.254   0.0.0.0         UG    0      0        0 rhevm

seems perfect.

The lack of upgrade.log is most disturbing. Have you ever seen an upgraded 3.5 setup lacking it?

/var/log/message-20150517  has disturbing logs. Could it be that vdsm was upgraded while it was running? Was the host put on maintenance beforehence? We try hard to keep this working, but it is not the recommended way.

May 14 17:01:59 cisco-b200m3-01 abrt: detected unhandled Python exception in '/usr/share/vdsm/supervdsmServer'
May 14 17:01:59 cisco-b200m3-01 abrtd: New client connected
May 14 17:01:59 cisco-b200m3-01 abrtd: Directory 'pyhook-2015-05-14-17:01:59-28780' creation detected
May 14 17:01:59 cisco-b200m3-01 abrt-server[28786]: Saved Python crash dump of pid 28780 to /var/spool/abrt/pyhook-2015-05-14-17:01:59-28780
May 14 17:01:59 cisco-b200m3-01 respawn: slave '/usr/share/vdsm/supervdsmServer --sockfile /var/run/vdsm/svdsm.sock --pidfile /var/run/vdsm/supervdsmd.pid' died too quickly, respawning slave

and lots of

May 15 16:22:09 cisco-b200m3-01 kernel: Neighbour table overflow.

Comment 7 Marina Kalinin 2015-05-18 22:04:59 UTC
The host was in maintenance, of course.
However, I could not get to it since long time after the upgrade and I could not get to the console (ucs blades...), so I had to reboot it (using power management). Once it got back, I could ssh into it, but could not get to the storage. So that's how I discovered it didn't have default route.

Comment 8 Yaniv Lavi 2015-06-29 08:41:07 UTC
Please reopen if you have a reproduction environment.