1403945 – ovsdb-server is dead after upgrade from 2.5.0-14 to 2.5.0-22

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1403945 - ovsdb-server is dead after upgrade from 2.5.0-14 to 2.5.0-22

Summary: ovsdb-server is dead after upgrade from 2.5.0-14 to 2.5.0-22

Keywords:
Status:	CLOSED CANTFIX
Alias:	None
Product:	Red Hat Enterprise Linux 7
Classification:	Red Hat
Component:	openvswitch
Sub Component:
Version:	7.3
Hardware:	x86_64
OS:	Linux
Priority:	urgent
Severity:	urgent
Target Milestone:	rc
Target Release:	---
Assignee:	Open vSwitch development team
QA Contact:	Network QE
Docs Contact:
URL:
Whiteboard:
Depends On:	1403080
Blocks:
TreeView+	depends on / blocked

Reported:	2016-12-12 16:51 UTC by Ihar Hrachyshka
Modified:	2016-12-29 16:57 UTC (History)
CC List:	13 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:	1403080
Environment:
Last Closed:	2016-12-12 17:06:57 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Comment 1 Ihar Hrachyshka 2016-12-12 16:59:33 UTC

Cloned the original bug to openvswitch component because it does not seem OSP related.

The reduced steps to reproduce are (starting with RHEL 7.3):

- yum install openvswitch-2.5.0-14.git20160727.el7fdp.x86_64
- systemctl start openvswitch
- check that both ovsdb-server and vswitchd processes are up
- yum install openvswitch-2.5.0-22.git20160727.el7fdp.x86_64

Observe that only vswitchd process is up.

Before package update:

[root@localhost ~]# ps ax|grep ovs
 3363 ?        S<s    0:00 ovsdb-server: monitoring pid 3364 (healthy)
 3364 ?        S<     0:00 ovsdb-server /etc/openvswitch/conf.db -vconsole:emer -vsyslog:err -vfile:info --remote=punix:/var/run/openvswitch/db.sock --private-key=db:Open_vSwitch,SSL,private_key --certificate=db:Open_vSwitch,SSL,certificate --bootstrap-ca-cert=db:Open_vSwitch,SSL,ca_cert --no-chdir --log-file=/var/log/openvswitch/ovsdb-server.log --pidfile=/var/run/openvswitch/ovsdb-server.pid --detach --monitor
 3373 ?        S<s    0:00 ovs-vswitchd: monitoring pid 3374 (healthy)
 3374 ?        S<Ll   0:00 ovs-vswitchd unix:/var/run/openvswitch/db.sock -vconsole:emer -vsyslog:err -vfile:info --mlockall --no-chdir --log-file=/var/log/openvswitch/ovs-vswitchd.log --pidfile=/var/run/openvswitch/ovs-vswitchd.pid --detach --monitor
 3424 pts/1    S+     0:00 grep --color=auto ovs

After package update:

[root@localhost ~]# ps ax|grep ovs
 3563 ?        S<s    0:00 ovs-vswitchd: monitoring pid 3564 (healthy)
 3564 ?        S<Ll   0:00 ovs-vswitchd unix:/var/run/openvswitch/db.sock -vconsole:emer -vsyslog:err -vfile:info --mlockall --no-chdir --log-file=/var/log/openvswitch/ovs-vswitchd.log --pidfile=/var/run/openvswitch/ovs-vswitchd.pid --detach --monitor
 3666 pts/1    S+     0:00 grep --color=auto ovs

In ovsdb-server.log, the only message after update is:
2016-12-12T16:44:09.528Z|00002|daemon_unix(monitor)|INFO|pid 3554 died, exit status 0, exiting

In vswitchd log:

2016-12-12T16:44:09.528Z|00036|reconnect|INFO|unix:/var/run/openvswitch/db.sock: connection closed by peer
2016-12-12T16:44:10.528Z|00037|reconnect|INFO|unix:/var/run/openvswitch/db.sock: connecting...
2016-12-12T16:44:10.528Z|00038|reconnect|INFO|unix:/var/run/openvswitch/db.sock: connection attempt failed (No such file or directory)
2016-12-12T16:44:10.528Z|00039|reconnect|INFO|unix:/var/run/openvswitch/db.sock: waiting 2 seconds before reconnect
2016-12-12T16:44:12.528Z|00040|reconnect|INFO|unix:/var/run/openvswitch/db.sock: connecting...
2016-12-12T16:44:12.528Z|00041|reconnect|INFO|unix:/var/run/openvswitch/db.sock: connection attempt failed (No such file or directory)
2016-12-12T16:44:12.528Z|00042|reconnect|INFO|unix:/var/run/openvswitch/db.sock: waiting 4 seconds before reconnect
2016-12-12T16:44:16.529Z|00043|reconnect|INFO|unix:/var/run/openvswitch/db.sock: connecting...
2016-12-12T16:44:16.530Z|00044|reconnect|INFO|unix:/var/run/openvswitch/db.sock: connection attempt failed (No such file or directory)
2016-12-12T16:44:16.530Z|00045|reconnect|INFO|unix:/var/run/openvswitch/db.sock: waiting 8 seconds before reconnect
2016-12-12T16:44:24.530Z|00046|reconnect|INFO|unix:/var/run/openvswitch/db.sock: connecting...
2016-12-12T16:44:24.530Z|00047|reconnect|INFO|unix:/var/run/openvswitch/db.sock: connection attempt failed (No such file or directory)
2016-12-12T16:44:24.530Z|00048|reconnect|INFO|unix:/var/run/openvswitch/db.sock: waiting 8 seconds before reconnect
2016-12-12T16:44:32.530Z|00049|reconnect|INFO|unix:/var/run/openvswitch/db.sock: connecting...
2016-12-12T16:44:32.530Z|00050|reconnect|INFO|unix:/var/run/openvswitch/db.sock: connection attempt failed (No such file or directory)
2016-12-12T16:44:32.530Z|00051|reconnect|INFO|unix:/var/run/openvswitch/db.sock: waiting 8 seconds before reconnect
...

In systemctl status openvswitch:

● openvswitch.service - Open vSwitch
   Loaded: loaded (/usr/lib/systemd/system/openvswitch.service; disabled; vendor preset: disabled)
   Active: inactive (dead)
...
Dec 12 11:44:09 localhost.localdomain systemd[1]: Stopping Open vSwitch...
Dec 12 11:44:09 localhost.localdomain systemd[1]: Starting Open vSwitch...
Dec 12 11:44:09 localhost.localdomain systemd[1]: Started Open vSwitch.
Dec 12 11:44:09 localhost.localdomain systemd[1]: Stopping Open vSwitch...
Dec 12 11:44:09 localhost.localdomain systemd[1]: Stopped Open vSwitch.

^^ NOTE THE ORDER OF MESSAGES

Does it suggest that we stop the service after it's started? Or in the middle of it? It's not completely clear.

The state can be fixed by 'systemctl restart openvswitch' that gets ovsdb-server back.

Environment: Red Hat Enterprise Linux Server release 7.3 Beta (Maipo)

Comment 2 Aaron Conole 2016-12-12 17:01:18 UTC

Is it possible to try with -15?  If you don't have access, the RPMs are available at:

http://download-node-02.eng.bos.redhat.com/brewroot/packages/openvswitch/2.5.0/15.git20160727.el7fdb/

-15 introduced a change whereby we don't restart openvswitch on upgrade due to dhcp interface acquisition errors.

Comment 3 Ihar Hrachyshka 2016-12-12 17:02:09 UTC

Also in /var/log/messages, I see:

Dec 12 11:44:09 localhost yum[3581]: Updated: openvswitch-2.5.0-22.git20160727.el7fdp.x86_64
Dec 12 11:44:09 localhost systemd: Reloading.
Dec 12 11:44:09 localhost systemd: Stopping Open vSwitch...
Dec 12 11:44:09 localhost systemd: Starting Open vSwitch Database Unit...
Dec 12 11:44:09 localhost systemd: Starting Open vSwitch...
Dec 12 11:44:09 localhost systemd: Started Open vSwitch.
Dec 12 11:44:09 localhost ovs-ctl: ovsdb-server is already running.
Dec 12 11:44:09 localhost ovs-ctl: Enabling remote OVSDB managers [  OK  ]
Dec 12 11:44:09 localhost systemd: Stopping Open vSwitch...
Dec 12 11:44:09 localhost systemd: Stopped Open vSwitch.
Dec 12 11:44:09 localhost ovs-ctl: Exiting ovsdb-server (3554) [  OK  ]
Dec 12 11:44:09 localhost systemd: Stopped Open vSwitch Database Unit.

Comment 4 Jiri Benc 2016-12-12 17:06:57 UTC

When reading bug description, it should be clear what the problem is and what are the relevant information. After hitting "clone", the text offered by bugzilla should be taken only as a base and should be hand edited to be useful for the future bug assignee.

If you look at the bug description of this bug, it is anything but helpful. Lots of poorly formatted blobs with no information whatsoever, lots of unrelated noise, etc. Remember, several people are going to read this (devel, QE. PM, team leads, etc.) and would have to painfully go through all of this churn.

Please do not do this. I'm closing this bug, please file a comprehensive clean bug report.

Comment 5 Ihar Hrachyshka 2016-12-12 17:17:41 UTC

OK, checked with -15 (needed to use 'downgrade' instead of just 'install' since it's with an Epoch).

Not sure which case you wanted to check, so I checked several things.

1. If I upgrade -14 to -15 then to -22, then everything works fine.
2. If I upgrade -14 straight to -22, then ovsdb-server dies.

So I suspect there is something in between -15 and -22 that is unsafe with post-update restart still being present (that I believe was present in -14).

Comment 6 Ihar Hrachyshka 2016-12-12 17:24:46 UTC

Ack, reported a hopefully better one: https://bugzilla.redhat.com/show_bug.cgi?id=1403958

Note You need to log in before you can comment on or make changes to this bug.