Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1403958

Summary:

[fdProd] Package update from openvswitch-2.5.0-14.git20160727 to -22.git20160727 kills ovsdb-server process

Product:

Red Hat Enterprise Linux 7

Reporter:

Ihar Hrachyshka <ihrachys>

Component:

openvswitch

Assignee:

Aaron Conole <aconole>

Status:

CLOSED CURRENTRELEASE

QA Contact:

Rick Alongi <ralongi>

Severity:

medium

Docs Contact:

Priority:

high

Version:

7.3

CC:

agurenko, amuller, apevec, atragler, fbaudin, fleitner, ihrachys, ipetrova, ktraynor, lpeer, markmc, mburns, mcornea, ohochman, pablo.iranzo, pmyers, qding, rkhan, sathlang, twilson

Target Milestone:

Target Release:

---

Hardware:

Unspecified

OS:

Unspecified

Whiteboard:

Fixed In Version:

openvswitch-2.5.0-23.git20160727.el7fdp

Doc Type:

If docs needed, set a value

Doc Text:

Story Points:

---

Clone Of:

Clones:

1419632 (view as bug list)

Environment:

Last Closed:

2017-02-06 17:51:50 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
Log of the controller upgrade.	none

Description Ihar Hrachyshka 2016-12-12 17:24:22 UTC

The reduced steps to reproduce are (starting with RHEL 7.3):

- yum install openvswitch-2.5.0-14.git20160727.el7fdp.x86_64
- systemctl start openvswitch
- check that both ovsdb-server and vswitchd processes are up
- yum install openvswitch-2.5.0-22.git20160727.el7fdp.x86_64

Observe that only vswitchd process is up.

Before package update:

[root@localhost ~]# ps ax|grep ovs
 3363 ?        S<s    0:00 ovsdb-server: monitoring pid 3364 (healthy)
 3364 ?        S<     0:00 ovsdb-server /etc/openvswitch/conf.db -vconsole:emer -vsyslog:err -vfile:info --remote=punix:/var/run/openvswitch/db.sock --private-key=db:Open_vSwitch,SSL,private_key --certificate=db:Open_vSwitch,SSL,certificate --bootstrap-ca-cert=db:Open_vSwitch,SSL,ca_cert --no-chdir --log-file=/var/log/openvswitch/ovsdb-server.log --pidfile=/var/run/openvswitch/ovsdb-server.pid --detach --monitor
 3373 ?        S<s    0:00 ovs-vswitchd: monitoring pid 3374 (healthy)
 3374 ?        S<Ll   0:00 ovs-vswitchd unix:/var/run/openvswitch/db.sock -vconsole:emer -vsyslog:err -vfile:info --mlockall --no-chdir --log-file=/var/log/openvswitch/ovs-vswitchd.log --pidfile=/var/run/openvswitch/ovs-vswitchd.pid --detach --monitor
 3424 pts/1    S+     0:00 grep --color=auto ovs

After package update:

[root@localhost ~]# ps ax|grep ovs
 3563 ?        S<s    0:00 ovs-vswitchd: monitoring pid 3564 (healthy)
 3564 ?        S<Ll   0:00 ovs-vswitchd unix:/var/run/openvswitch/db.sock -vconsole:emer -vsyslog:err -vfile:info --mlockall --no-chdir --log-file=/var/log/openvswitch/ovs-vswitchd.log --pidfile=/var/run/openvswitch/ovs-vswitchd.pid --detach --monitor
 3666 pts/1    S+     0:00 grep --color=auto ovs

In ovsdb-server.log, the only message after update is:
2016-12-12T16:44:09.528Z|00002|daemon_unix(monitor)|INFO|pid 3554 died, exit status 0, exiting

In vswitchd log:

2016-12-12T16:44:09.528Z|00036|reconnect|INFO|unix:/var/run/openvswitch/db.sock: connection closed by peer
2016-12-12T16:44:10.528Z|00037|reconnect|INFO|unix:/var/run/openvswitch/db.sock: connecting...
2016-12-12T16:44:10.528Z|00038|reconnect|INFO|unix:/var/run/openvswitch/db.sock: connection attempt failed (No such file or directory)
2016-12-12T16:44:10.528Z|00039|reconnect|INFO|unix:/var/run/openvswitch/db.sock: waiting 2 seconds before reconnect
2016-12-12T16:44:12.528Z|00040|reconnect|INFO|unix:/var/run/openvswitch/db.sock: connecting...
2016-12-12T16:44:12.528Z|00041|reconnect|INFO|unix:/var/run/openvswitch/db.sock: connection attempt failed (No such file or directory)
2016-12-12T16:44:12.528Z|00042|reconnect|INFO|unix:/var/run/openvswitch/db.sock: waiting 4 seconds before reconnect
2016-12-12T16:44:16.529Z|00043|reconnect|INFO|unix:/var/run/openvswitch/db.sock: connecting...
2016-12-12T16:44:16.530Z|00044|reconnect|INFO|unix:/var/run/openvswitch/db.sock: connection attempt failed (No such file or directory)
2016-12-12T16:44:16.530Z|00045|reconnect|INFO|unix:/var/run/openvswitch/db.sock: waiting 8 seconds before reconnect
2016-12-12T16:44:24.530Z|00046|reconnect|INFO|unix:/var/run/openvswitch/db.sock: connecting...
2016-12-12T16:44:24.530Z|00047|reconnect|INFO|unix:/var/run/openvswitch/db.sock: connection attempt failed (No such file or directory)
2016-12-12T16:44:24.530Z|00048|reconnect|INFO|unix:/var/run/openvswitch/db.sock: waiting 8 seconds before reconnect
2016-12-12T16:44:32.530Z|00049|reconnect|INFO|unix:/var/run/openvswitch/db.sock: connecting...
2016-12-12T16:44:32.530Z|00050|reconnect|INFO|unix:/var/run/openvswitch/db.sock: connection attempt failed (No such file or directory)
2016-12-12T16:44:32.530Z|00051|reconnect|INFO|unix:/var/run/openvswitch/db.sock: waiting 8 seconds before reconnect
...

In systemctl status openvswitch:

● openvswitch.service - Open vSwitch
   Loaded: loaded (/usr/lib/systemd/system/openvswitch.service; disabled; vendor preset: disabled)
   Active: inactive (dead)
...
Dec 12 11:44:09 localhost.localdomain systemd[1]: Stopping Open vSwitch...
Dec 12 11:44:09 localhost.localdomain systemd[1]: Starting Open vSwitch...
Dec 12 11:44:09 localhost.localdomain systemd[1]: Started Open vSwitch.
Dec 12 11:44:09 localhost.localdomain systemd[1]: Stopping Open vSwitch...
Dec 12 11:44:09 localhost.localdomain systemd[1]: Stopped Open vSwitch.

^^ NOTE THE ORDER OF MESSAGES

Does it suggest that we stop the service after it's started? Or in the middle of it? It's not completely clear.

The state can be fixed by 'systemctl restart openvswitch' that gets ovsdb-server back.

Also in /var/log/messages, I see:

Dec 12 11:44:09 localhost yum[3581]: Updated: openvswitch-2.5.0-22.git20160727.el7fdp.x86_64
Dec 12 11:44:09 localhost systemd: Reloading.
Dec 12 11:44:09 localhost systemd: Stopping Open vSwitch...
Dec 12 11:44:09 localhost systemd: Starting Open vSwitch Database Unit...
Dec 12 11:44:09 localhost systemd: Starting Open vSwitch...
Dec 12 11:44:09 localhost systemd: Started Open vSwitch.
Dec 12 11:44:09 localhost ovs-ctl: ovsdb-server is already running.
Dec 12 11:44:09 localhost ovs-ctl: Enabling remote OVSDB managers [  OK  ]
Dec 12 11:44:09 localhost systemd: Stopping Open vSwitch...
Dec 12 11:44:09 localhost systemd: Stopped Open vSwitch.
Dec 12 11:44:09 localhost ovs-ctl: Exiting ovsdb-server (3554) [  OK  ]
Dec 12 11:44:09 localhost systemd: Stopped Open vSwitch Database Unit.

Finally, checked with -15 version as found in: http://download-node-02.eng.bos.redhat.com/brewroot/packages/openvswitch/2.5.0/15.git20160727.el7fdb/

The result is:

1. If I upgrade -14 to -15 then to -22, then everything works fine.
2. If I upgrade -14 straight to -22, then ovsdb-server dies.

So I suspect there is something in between -15 and -22 that is unsafe with post-update restart still being present (that I believe was present in -14).

Environment: Red Hat Enterprise Linux Server release 7.3 Beta (Maipo)

Note: The bug is the result of broken OSPd upgrades as seen in https://bugzilla.redhat.com/show_bug.cgi?id=1403080

Comment 1 Ihar Hrachyshka 2016-12-12 17:27:34 UTC

Sorry,

"Note: The bug is the result of broken OSPd upgrades"

should be read as

"Note: The bug is the cause of broken OSPd upgrades"

Comment 10 Ihar Hrachyshka 2016-12-13 12:49:07 UTC

@Aaron, using -23 indeed fixes the update, the openvswitch is up and running. The processes are new, so it was restarted after update; I believe that's expected? I remember we were having some other problem with process restart happening in some previous package versions, that's why I am asking.

Comment 11 Ihar Hrachyshka 2016-12-13 13:04:20 UTC

Note: someone from TripleO also checks the -23 package in our upgrades scope, to see if it also solves the OSPd issue. I also suggested to test 14 to 15 to 22 package update, they may also do it after. I will ask them to report with results here.

Comment 12 Ihar Hrachyshka 2016-12-13 13:38:37 UTC

OK, the previous issue that we had with restart on package update was https://bugzilla.redhat.com/show_bug.cgi?id=1385096 I think we later worked it around for tripleo with using rpm --nopostun: https://github.com/openstack/tripleo-heat-templates/blob/stable/newton/extraconfig/tasks/major_upgrade_controller_pacemaker_2.sh#L102-L114 So maybe it indeed now makes sense to revert the patch, that's on you folks to decide.

Comment 19 Sofer Athlan-Guyot 2016-12-14 14:49:51 UTC

Hi,

so the rpm was installed like this by the upgrade script (confirmed by log on the platform and put there as attachment)

    rpm -U --replacepkgs --nopostun ./openvswitch-2.5.0-23.git20160727.el7fdb.x86_64.rpm

And then on the working upgraded platform we had the correct package:

    $ rpm -qa | grep 'openvswitch-2.5.0-23.git20160727.el7fdb.x86_64'
    openvswitch-2.5.0-23.git20160727.el7fdb.x86_64

Is that incorrect ?

I don't really get the comment about getting back to bug 1385096.

Comment 20 Sofer Athlan-Guyot 2016-12-14 14:51:17 UTC

Created attachment 1231765 [details]
Log of the controller upgrade.

This is the log of the controller installation where we can see the rpm installation of the openvswitch attached to the bz.

Comment 29 Gurenko Alex 2017-01-18 15:09:24 UTC

 I've did successful upgrade of openvswitch-2.5.0-14.git20160727 to openvswitch-2.5.0-22.git20160727.el7fdp.bz1403958.fbl.2.x86_64.

 I've setup a local repo on underlcoud node and installed this repo on all nodes prior to upgrade and it was successfully picked up during upgrade procedure.

 Whole upgrade went very smooth with this package. Here is an output after final step of the upgrade:

[stack@undercloud-0 ~]$ rpm -q openvswitch 
openvswitch-2.5.0-22.git20160727.el7fdp.bz1403958.fbl.2.x86_64 
[stack@undercloud-0 ~]$ for i in {7..13}; do ssh heat-admin.2.$i "hostname; rpm -q openvswitch"; 
done 
ceph-0.localdomain 
openvswitch-2.5.0-22.git20160727.el7fdp.bz1403958.fbl.2.x86_64 
compute-1.localdomain 
openvswitch-2.5.0-22.git20160727.el7fdp.bz1403958.fbl.2.x86_64 
compute-0.localdomain 
openvswitch-2.5.0-22.git20160727.el7fdp.bz1403958.fbl.2.x86_64 
compute-2.localdomain 
openvswitch-2.5.0-22.git20160727.el7fdp.bz1403958.fbl.2.x86_64 
controller-1.localdomain 
openvswitch-2.5.0-22.git20160727.el7fdp.bz1403958.fbl.2.x86_64 
controller-0.localdomain 
openvswitch-2.5.0-22.git20160727.el7fdp.bz1403958.fbl.2.x86_64 
controller-2.localdomain 
openvswitch-2.5.0-22.git20160727.el7fdp.bz1403958.fbl.2.x86_64

Comment 30 Assaf Muller 2017-01-19 13:25:46 UTC

Flavio, see comment 29, as well as an email thread "openvswitch 14 -> 22 upgrade issue" where Amit Ugol updated that your proposed OVS .rpm fixes the issue. The next step would be for the OVS team to supply this as an official build on Brew, at which point notify me and I'll contact OpenStack release delivery to bump our dependency from 14 to whatever version that ends up as.

Comment 31 Terry Wilson 2017-01-20 19:35:59 UTC

When upgrading from 2.5.0-2 to 2.6.1, because of the %postun in 2.5.0-2, openvswitch is restarted. Although updating 2.5.0-2 to fbl's 2.5.0-22 from comment 28 then updating to 2.6.1 resolves the issue, I'm not sure how we can actually be sure that people have upgraded to the latest 2.5 with the fix before updating to 2.6.1 (since postun is run from the currently installed package).

Also, re: using rpm -U --nopostun from 2.5.0-2 to 2.6.1, although this doesn't restart openvswitch, it does require one to manually run systemctl daemon-reload and ovsdb-server fails to start upon the first systemctl restart openvswitch. Successive systemctl restart openvswitch calls succeed, though.

Output:

[terry@aio ~]$ pgrep ovsdb-server
10710
[terry@aio ~]$ sudo yum install --downloadonly --downloaddir . openvswitch
...
--> Running transaction check
---> Package openvswitch.x86_64 0:2.5.0-2.el7 will be updated
---> Package openvswitch.x86_64 0:2.6.1-0.el7 will be an update
--> Finished Dependency Resolution
...
[terry@aio ~]$ sudo rpm -Uvh --nopostun openvswitch-2.6.1-0.el7.x86_64.rpm 
Preparing...                          ################################# [100%]
Updating / installing...
   1:openvswitch-2.6.1-0.el7          ################################# [ 50%]
Cleaning up / removing...
   2:openvswitch-2.5.0-2.el7          ################################# [100%]
[terry@aio ~]$ pgrep ovsdb-server
10710
[terry@aio ~]$ sudo systemctl restart openvswitch
Warning: openvswitch.service changed on disk. Run 'systemctl daemon-reload' to reload units.
[terry@aio ~]$ pgrep ovsdb-server
25871
[terry@aio ~]$ sudo systemctl daemon-reload
[terry@aio ~]$ pgrep ovsdb-server
25871
[terry@aio ~]$ sudo systemctl restart openvswitch
[terry@aio ~]$ pgrep ovsdb-server
[terry@aio ~]$ 

doing a stop followed by a start has identical results.

Output from /var/log/messages for the restart:

Jan 20 13:34:26 aio systemd: Reloading.
Jan 20 13:34:26 aio systemd: [/usr/lib/systemd/system/epmd@.service:18] Failed to parse resource value, ignoring: 0
Jan 20 13:34:34 aio systemd: Stopping Open vSwitch...
Jan 20 13:34:34 aio systemd: Starting Open vSwitch Database Unit...
Jan 20 13:34:34 aio systemd: Starting Open vSwitch...
Jan 20 13:34:34 aio systemd: Started Open vSwitch.
Jan 20 13:34:34 aio ovs-ctl: ovsdb-server is already running.
Jan 20 13:34:34 aio systemd: Stopping Open vSwitch...
Jan 20 13:34:34 aio ovs-ctl: Enabling remote OVSDB managers [  OK  ]
Jan 20 13:34:34 aio systemd: Stopped Open vSwitch.
Jan 20 13:34:34 aio ovs-ctl: Killing ovsdb-server (10710) [  OK  ]
Jan 20 13:34:34 aio systemd: Stopped Open vSwitch Database Unit.