Bug 1222139

Summary:

[network] upgrade to vdsm 4.16.13.1-1.el6ev does not recognize rhevm logical network

Product:

Red Hat Enterprise Virtualization Manager

Reporter:

Marina Kalinin <mkalinin>

Component:

vdsm

Assignee:

Ido Barkan <ibarkan>

Status:

CLOSED INSUFFICIENT_DATA

QA Contact:

Meni Yakove <myakove>

Severity:

high

Docs Contact:

Priority:

high

Version:

3.5.1

CC:

bazulay, danken, ecohen, fdeutsch, gklein, lpeer, lsurette, mkalinin, yeylon, ylavi

Target Milestone:

---

Target Release:

3.5.4

Hardware:

Unspecified

OS:

Unspecified

Whiteboard:

network

Fixed In Version:

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2015-06-29 08:41:07 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

Network

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

1203422, 1249396, 1249397

Bug Blocks:

Attachments:

Description	Flags
supervdsm.log	none
ip route show table all	none

Description Marina Kalinin 2015-05-15 21:03:43 UTC

Description of problem:
I upgraded my RHEL host from vdsm-4.14.18-4.el6ev to vdsm-4.16.13.1-1.el6ev. 
After the upgrade rhevm logical network disappeared.
Check the config files, the network was setup as onboot=no and defrout=no.
~~~
# cat /etc/sysconfig/network-scripts/ifcfg-rhevm
# Generated by VDSM version 4.16.13.1-1.el6ev
DEVICE=rhevm
TYPE=Bridge
DELAY=0
STP=off
ONBOOT=no
BOOTPROTO=dhcp
MTU=1500
DEFROUTE=no
NM_CONTROLLED=no
HOTPLUG=no
~~~
# cat /var/lib/vdsm/persistence/netconf/nets/rhevm
 {"nic": "eth0", "mtu": 1500, "bootproto": "dhcp", "stp": false, "bridged": true, "defaultRoute": false}

After modifying persistence setting and running restore script, network came back ok.
~~~
# vim /var/lib/vdsm/persistence/netconf/nets/rhevm
# rm -f /var/run/vdsm/nets_restored
# /usr/share/vdsm/vdsm-restore-net-config
~~~

I will attach some logs now.

Comment 1 Marina Kalinin 2015-05-15 21:08:14 UTC

Created attachment 1026060 [details]
supervdsm.log

Comment 2 Marina Kalinin 2015-05-15 21:09:40 UTC

The upgrade happened somewhere after 2015-05-14 16:32:23.

Comment 3 Dan Kenigsberg 2015-05-17 19:00:36 UTC

Any chance you can provide the output of `route -n` and `ip route show table all` prior of the upgrade?

/var/log/vdsm/upgrade.log is a must for debugging upgrade-related issues.

Comment 5 Marina Kalinin 2015-05-18 16:09:08 UTC

Created attachment 1026764 [details]
ip route show table all

Comment 6 Dan Kenigsberg 2015-05-18 21:39:21 UTC

0.0.0.0         10.10.183.254   0.0.0.0         UG    0      0        0 rhevm

seems perfect.

The lack of upgrade.log is most disturbing. Have you ever seen an upgraded 3.5 setup lacking it?

/var/log/message-20150517  has disturbing logs. Could it be that vdsm was upgraded while it was running? Was the host put on maintenance beforehence? We try hard to keep this working, but it is not the recommended way.

May 14 17:01:59 cisco-b200m3-01 abrt: detected unhandled Python exception in '/usr/share/vdsm/supervdsmServer'
May 14 17:01:59 cisco-b200m3-01 abrtd: New client connected
May 14 17:01:59 cisco-b200m3-01 abrtd: Directory 'pyhook-2015-05-14-17:01:59-28780' creation detected
May 14 17:01:59 cisco-b200m3-01 abrt-server[28786]: Saved Python crash dump of pid 28780 to /var/spool/abrt/pyhook-2015-05-14-17:01:59-28780
May 14 17:01:59 cisco-b200m3-01 respawn: slave '/usr/share/vdsm/supervdsmServer --sockfile /var/run/vdsm/svdsm.sock --pidfile /var/run/vdsm/supervdsmd.pid' died too quickly, respawning slave

and lots of

May 15 16:22:09 cisco-b200m3-01 kernel: Neighbour table overflow.

Comment 7 Marina Kalinin 2015-05-18 22:04:59 UTC

The host was in maintenance, of course.
However, I could not get to it since long time after the upgrade and I could not get to the console (ucs blades...), so I had to reboot it (using power management). Once it got back, I could ssh into it, but could not get to the storage. So that's how I discovered it didn't have default route.

Comment 8 Yaniv Lavi 2015-06-29 08:41:07 UTC

Please reopen if you have a reproduction environment.