1222139 – [network] upgrade to vdsm 4.16.13.1-1.el6ev does not recognize rhevm logical network

Bug 1222139 - [network] upgrade to vdsm 4.16.13.1-1.el6ev does not recognize rhevm logical network

Summary: [network] upgrade to vdsm 4.16.13.1-1.el6ev does not recognize rhevm logical ...

Keywords:
Status:	CLOSED INSUFFICIENT_DATA
Alias:	None
Product:	Red Hat Enterprise Virtualization Manager
Classification:	Red Hat
Component:	vdsm
Sub Component:
Version:	3.5.1
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	3.5.4
Assignee:	Ido Barkan
QA Contact:	Meni Yakove
Docs Contact:
URL:
Whiteboard:	network
Depends On:	1203422 1249396 1249397
Blocks:
TreeView+	depends on / blocked

Reported:	2015-05-15 21:03 UTC by Marina Kalinin
Modified:	2016-02-10 19:49 UTC (History)
CC List:	10 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2015-06-29 08:41:07 UTC
oVirt Team:	Network
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
supervdsm.log (452.79 KB, text/plain) 2015-05-15 21:08 UTC, Marina Kalinin	no flags	Details
ip route show table all (7.30 KB, text/plain) 2015-05-18 16:09 UTC, Marina Kalinin	no flags	Details
View All

Description Marina Kalinin 2015-05-15 21:03:43 UTC

Description of problem:
I upgraded my RHEL host from vdsm-4.14.18-4.el6ev to vdsm-4.16.13.1-1.el6ev. 
After the upgrade rhevm logical network disappeared.
Check the config files, the network was setup as onboot=no and defrout=no.
~~~
# cat /etc/sysconfig/network-scripts/ifcfg-rhevm
# Generated by VDSM version 4.16.13.1-1.el6ev
DEVICE=rhevm
TYPE=Bridge
DELAY=0
STP=off
ONBOOT=no
BOOTPROTO=dhcp
MTU=1500
DEFROUTE=no
NM_CONTROLLED=no
HOTPLUG=no
~~~
# cat /var/lib/vdsm/persistence/netconf/nets/rhevm
 {"nic": "eth0", "mtu": 1500, "bootproto": "dhcp", "stp": false, "bridged": true, "defaultRoute": false}

After modifying persistence setting and running restore script, network came back ok.
~~~
# vim /var/lib/vdsm/persistence/netconf/nets/rhevm
# rm -f /var/run/vdsm/nets_restored
# /usr/share/vdsm/vdsm-restore-net-config
~~~

I will attach some logs now.

Comment 1 Marina Kalinin 2015-05-15 21:08:14 UTC

Created attachment 1026060 [details]
supervdsm.log

Comment 2 Marina Kalinin 2015-05-15 21:09:40 UTC

The upgrade happened somewhere after 2015-05-14 16:32:23.

Comment 3 Dan Kenigsberg 2015-05-17 19:00:36 UTC

Any chance you can provide the output of `route -n` and `ip route show table all` prior of the upgrade?

/var/log/vdsm/upgrade.log is a must for debugging upgrade-related issues.

Comment 5 Marina Kalinin 2015-05-18 16:09:08 UTC

Created attachment 1026764 [details]
ip route show table all

Comment 6 Dan Kenigsberg 2015-05-18 21:39:21 UTC

0.0.0.0         10.10.183.254   0.0.0.0         UG    0      0        0 rhevm

seems perfect.

The lack of upgrade.log is most disturbing. Have you ever seen an upgraded 3.5 setup lacking it?

/var/log/message-20150517  has disturbing logs. Could it be that vdsm was upgraded while it was running? Was the host put on maintenance beforehence? We try hard to keep this working, but it is not the recommended way.

May 14 17:01:59 cisco-b200m3-01 abrt: detected unhandled Python exception in '/usr/share/vdsm/supervdsmServer'
May 14 17:01:59 cisco-b200m3-01 abrtd: New client connected
May 14 17:01:59 cisco-b200m3-01 abrtd: Directory 'pyhook-2015-05-14-17:01:59-28780' creation detected
May 14 17:01:59 cisco-b200m3-01 abrt-server[28786]: Saved Python crash dump of pid 28780 to /var/spool/abrt/pyhook-2015-05-14-17:01:59-28780
May 14 17:01:59 cisco-b200m3-01 respawn: slave '/usr/share/vdsm/supervdsmServer --sockfile /var/run/vdsm/svdsm.sock --pidfile /var/run/vdsm/supervdsmd.pid' died too quickly, respawning slave

and lots of

May 15 16:22:09 cisco-b200m3-01 kernel: Neighbour table overflow.

Comment 7 Marina Kalinin 2015-05-18 22:04:59 UTC

The host was in maintenance, of course.
However, I could not get to it since long time after the upgrade and I could not get to the console (ucs blades...), so I had to reboot it (using power management). Once it got back, I could ssh into it, but could not get to the storage. So that's how I discovered it didn't have default route.

Comment 8 Yaniv Lavi 2015-06-29 08:41:07 UTC

Please reopen if you have a reproduction environment.

Note You need to log in before you can comment on or make changes to this bug.