Hide Forgot
The reduced steps to reproduce are (starting with RHEL 7.3): - yum install openvswitch-2.5.0-14.git20160727.el7fdp.x86_64 - systemctl start openvswitch - check that both ovsdb-server and vswitchd processes are up - yum install openvswitch-2.5.0-22.git20160727.el7fdp.x86_64 Observe that only vswitchd process is up. Before package update: [root@localhost ~]# ps ax|grep ovs 3363 ? S<s 0:00 ovsdb-server: monitoring pid 3364 (healthy) 3364 ? S< 0:00 ovsdb-server /etc/openvswitch/conf.db -vconsole:emer -vsyslog:err -vfile:info --remote=punix:/var/run/openvswitch/db.sock --private-key=db:Open_vSwitch,SSL,private_key --certificate=db:Open_vSwitch,SSL,certificate --bootstrap-ca-cert=db:Open_vSwitch,SSL,ca_cert --no-chdir --log-file=/var/log/openvswitch/ovsdb-server.log --pidfile=/var/run/openvswitch/ovsdb-server.pid --detach --monitor 3373 ? S<s 0:00 ovs-vswitchd: monitoring pid 3374 (healthy) 3374 ? S<Ll 0:00 ovs-vswitchd unix:/var/run/openvswitch/db.sock -vconsole:emer -vsyslog:err -vfile:info --mlockall --no-chdir --log-file=/var/log/openvswitch/ovs-vswitchd.log --pidfile=/var/run/openvswitch/ovs-vswitchd.pid --detach --monitor 3424 pts/1 S+ 0:00 grep --color=auto ovs After package update: [root@localhost ~]# ps ax|grep ovs 3563 ? S<s 0:00 ovs-vswitchd: monitoring pid 3564 (healthy) 3564 ? S<Ll 0:00 ovs-vswitchd unix:/var/run/openvswitch/db.sock -vconsole:emer -vsyslog:err -vfile:info --mlockall --no-chdir --log-file=/var/log/openvswitch/ovs-vswitchd.log --pidfile=/var/run/openvswitch/ovs-vswitchd.pid --detach --monitor 3666 pts/1 S+ 0:00 grep --color=auto ovs In ovsdb-server.log, the only message after update is: 2016-12-12T16:44:09.528Z|00002|daemon_unix(monitor)|INFO|pid 3554 died, exit status 0, exiting In vswitchd log: 2016-12-12T16:44:09.528Z|00036|reconnect|INFO|unix:/var/run/openvswitch/db.sock: connection closed by peer 2016-12-12T16:44:10.528Z|00037|reconnect|INFO|unix:/var/run/openvswitch/db.sock: connecting... 2016-12-12T16:44:10.528Z|00038|reconnect|INFO|unix:/var/run/openvswitch/db.sock: connection attempt failed (No such file or directory) 2016-12-12T16:44:10.528Z|00039|reconnect|INFO|unix:/var/run/openvswitch/db.sock: waiting 2 seconds before reconnect 2016-12-12T16:44:12.528Z|00040|reconnect|INFO|unix:/var/run/openvswitch/db.sock: connecting... 2016-12-12T16:44:12.528Z|00041|reconnect|INFO|unix:/var/run/openvswitch/db.sock: connection attempt failed (No such file or directory) 2016-12-12T16:44:12.528Z|00042|reconnect|INFO|unix:/var/run/openvswitch/db.sock: waiting 4 seconds before reconnect 2016-12-12T16:44:16.529Z|00043|reconnect|INFO|unix:/var/run/openvswitch/db.sock: connecting... 2016-12-12T16:44:16.530Z|00044|reconnect|INFO|unix:/var/run/openvswitch/db.sock: connection attempt failed (No such file or directory) 2016-12-12T16:44:16.530Z|00045|reconnect|INFO|unix:/var/run/openvswitch/db.sock: waiting 8 seconds before reconnect 2016-12-12T16:44:24.530Z|00046|reconnect|INFO|unix:/var/run/openvswitch/db.sock: connecting... 2016-12-12T16:44:24.530Z|00047|reconnect|INFO|unix:/var/run/openvswitch/db.sock: connection attempt failed (No such file or directory) 2016-12-12T16:44:24.530Z|00048|reconnect|INFO|unix:/var/run/openvswitch/db.sock: waiting 8 seconds before reconnect 2016-12-12T16:44:32.530Z|00049|reconnect|INFO|unix:/var/run/openvswitch/db.sock: connecting... 2016-12-12T16:44:32.530Z|00050|reconnect|INFO|unix:/var/run/openvswitch/db.sock: connection attempt failed (No such file or directory) 2016-12-12T16:44:32.530Z|00051|reconnect|INFO|unix:/var/run/openvswitch/db.sock: waiting 8 seconds before reconnect ... In systemctl status openvswitch: ● openvswitch.service - Open vSwitch Loaded: loaded (/usr/lib/systemd/system/openvswitch.service; disabled; vendor preset: disabled) Active: inactive (dead) ... Dec 12 11:44:09 localhost.localdomain systemd[1]: Stopping Open vSwitch... Dec 12 11:44:09 localhost.localdomain systemd[1]: Starting Open vSwitch... Dec 12 11:44:09 localhost.localdomain systemd[1]: Started Open vSwitch. Dec 12 11:44:09 localhost.localdomain systemd[1]: Stopping Open vSwitch... Dec 12 11:44:09 localhost.localdomain systemd[1]: Stopped Open vSwitch. ^^ NOTE THE ORDER OF MESSAGES Does it suggest that we stop the service after it's started? Or in the middle of it? It's not completely clear. The state can be fixed by 'systemctl restart openvswitch' that gets ovsdb-server back. Also in /var/log/messages, I see: Dec 12 11:44:09 localhost yum[3581]: Updated: openvswitch-2.5.0-22.git20160727.el7fdp.x86_64 Dec 12 11:44:09 localhost systemd: Reloading. Dec 12 11:44:09 localhost systemd: Stopping Open vSwitch... Dec 12 11:44:09 localhost systemd: Starting Open vSwitch Database Unit... Dec 12 11:44:09 localhost systemd: Starting Open vSwitch... Dec 12 11:44:09 localhost systemd: Started Open vSwitch. Dec 12 11:44:09 localhost ovs-ctl: ovsdb-server is already running. Dec 12 11:44:09 localhost ovs-ctl: Enabling remote OVSDB managers [ OK ] Dec 12 11:44:09 localhost systemd: Stopping Open vSwitch... Dec 12 11:44:09 localhost systemd: Stopped Open vSwitch. Dec 12 11:44:09 localhost ovs-ctl: Exiting ovsdb-server (3554) [ OK ] Dec 12 11:44:09 localhost systemd: Stopped Open vSwitch Database Unit. Finally, checked with -15 version as found in: http://download-node-02.eng.bos.redhat.com/brewroot/packages/openvswitch/2.5.0/15.git20160727.el7fdb/ The result is: 1. If I upgrade -14 to -15 then to -22, then everything works fine. 2. If I upgrade -14 straight to -22, then ovsdb-server dies. So I suspect there is something in between -15 and -22 that is unsafe with post-update restart still being present (that I believe was present in -14). Environment: Red Hat Enterprise Linux Server release 7.3 Beta (Maipo) Note: The bug is the result of broken OSPd upgrades as seen in https://bugzilla.redhat.com/show_bug.cgi?id=1403080
Sorry, "Note: The bug is the result of broken OSPd upgrades" should be read as "Note: The bug is the cause of broken OSPd upgrades"
@Aaron, using -23 indeed fixes the update, the openvswitch is up and running. The processes are new, so it was restarted after update; I believe that's expected? I remember we were having some other problem with process restart happening in some previous package versions, that's why I am asking.
Note: someone from TripleO also checks the -23 package in our upgrades scope, to see if it also solves the OSPd issue. I also suggested to test 14 to 15 to 22 package update, they may also do it after. I will ask them to report with results here.
OK, the previous issue that we had with restart on package update was https://bugzilla.redhat.com/show_bug.cgi?id=1385096 I think we later worked it around for tripleo with using rpm --nopostun: https://github.com/openstack/tripleo-heat-templates/blob/stable/newton/extraconfig/tasks/major_upgrade_controller_pacemaker_2.sh#L102-L114 So maybe it indeed now makes sense to revert the patch, that's on you folks to decide.
Hi, so the rpm was installed like this by the upgrade script (confirmed by log on the platform and put there as attachment) rpm -U --replacepkgs --nopostun ./openvswitch-2.5.0-23.git20160727.el7fdb.x86_64.rpm And then on the working upgraded platform we had the correct package: $ rpm -qa | grep 'openvswitch-2.5.0-23.git20160727.el7fdb.x86_64' openvswitch-2.5.0-23.git20160727.el7fdb.x86_64 Is that incorrect ? I don't really get the comment about getting back to bug 1385096.
Created attachment 1231765 [details] Log of the controller upgrade. This is the log of the controller installation where we can see the rpm installation of the openvswitch attached to the bz.
I've did successful upgrade of openvswitch-2.5.0-14.git20160727 to openvswitch-2.5.0-22.git20160727.el7fdp.bz1403958.fbl.2.x86_64. I've setup a local repo on underlcoud node and installed this repo on all nodes prior to upgrade and it was successfully picked up during upgrade procedure. Whole upgrade went very smooth with this package. Here is an output after final step of the upgrade: [stack@undercloud-0 ~]$ rpm -q openvswitch openvswitch-2.5.0-22.git20160727.el7fdp.bz1403958.fbl.2.x86_64 [stack@undercloud-0 ~]$ for i in {7..13}; do ssh heat-admin.2.$i "hostname; rpm -q openvswitch"; done ceph-0.localdomain openvswitch-2.5.0-22.git20160727.el7fdp.bz1403958.fbl.2.x86_64 compute-1.localdomain openvswitch-2.5.0-22.git20160727.el7fdp.bz1403958.fbl.2.x86_64 compute-0.localdomain openvswitch-2.5.0-22.git20160727.el7fdp.bz1403958.fbl.2.x86_64 compute-2.localdomain openvswitch-2.5.0-22.git20160727.el7fdp.bz1403958.fbl.2.x86_64 controller-1.localdomain openvswitch-2.5.0-22.git20160727.el7fdp.bz1403958.fbl.2.x86_64 controller-0.localdomain openvswitch-2.5.0-22.git20160727.el7fdp.bz1403958.fbl.2.x86_64 controller-2.localdomain openvswitch-2.5.0-22.git20160727.el7fdp.bz1403958.fbl.2.x86_64
Flavio, see comment 29, as well as an email thread "openvswitch 14 -> 22 upgrade issue" where Amit Ugol updated that your proposed OVS .rpm fixes the issue. The next step would be for the OVS team to supply this as an official build on Brew, at which point notify me and I'll contact OpenStack release delivery to bump our dependency from 14 to whatever version that ends up as.
When upgrading from 2.5.0-2 to 2.6.1, because of the %postun in 2.5.0-2, openvswitch is restarted. Although updating 2.5.0-2 to fbl's 2.5.0-22 from comment 28 then updating to 2.6.1 resolves the issue, I'm not sure how we can actually be sure that people have upgraded to the latest 2.5 with the fix before updating to 2.6.1 (since postun is run from the currently installed package). Also, re: using rpm -U --nopostun from 2.5.0-2 to 2.6.1, although this doesn't restart openvswitch, it does require one to manually run systemctl daemon-reload and ovsdb-server fails to start upon the first systemctl restart openvswitch. Successive systemctl restart openvswitch calls succeed, though. Output: [terry@aio ~]$ pgrep ovsdb-server 10710 [terry@aio ~]$ sudo yum install --downloadonly --downloaddir . openvswitch ... --> Running transaction check ---> Package openvswitch.x86_64 0:2.5.0-2.el7 will be updated ---> Package openvswitch.x86_64 0:2.6.1-0.el7 will be an update --> Finished Dependency Resolution ... [terry@aio ~]$ sudo rpm -Uvh --nopostun openvswitch-2.6.1-0.el7.x86_64.rpm Preparing... ################################# [100%] Updating / installing... 1:openvswitch-2.6.1-0.el7 ################################# [ 50%] Cleaning up / removing... 2:openvswitch-2.5.0-2.el7 ################################# [100%] [terry@aio ~]$ pgrep ovsdb-server 10710 [terry@aio ~]$ sudo systemctl restart openvswitch Warning: openvswitch.service changed on disk. Run 'systemctl daemon-reload' to reload units. [terry@aio ~]$ pgrep ovsdb-server 25871 [terry@aio ~]$ sudo systemctl daemon-reload [terry@aio ~]$ pgrep ovsdb-server 25871 [terry@aio ~]$ sudo systemctl restart openvswitch [terry@aio ~]$ pgrep ovsdb-server [terry@aio ~]$ doing a stop followed by a start has identical results. Output from /var/log/messages for the restart: Jan 20 13:34:26 aio systemd: Reloading. Jan 20 13:34:26 aio systemd: [/usr/lib/systemd/system/epmd@.service:18] Failed to parse resource value, ignoring: 0 Jan 20 13:34:34 aio systemd: Stopping Open vSwitch... Jan 20 13:34:34 aio systemd: Starting Open vSwitch Database Unit... Jan 20 13:34:34 aio systemd: Starting Open vSwitch... Jan 20 13:34:34 aio systemd: Started Open vSwitch. Jan 20 13:34:34 aio ovs-ctl: ovsdb-server is already running. Jan 20 13:34:34 aio systemd: Stopping Open vSwitch... Jan 20 13:34:34 aio ovs-ctl: Enabling remote OVSDB managers [ OK ] Jan 20 13:34:34 aio systemd: Stopped Open vSwitch. Jan 20 13:34:34 aio ovs-ctl: Killing ovsdb-server (10710) [ OK ] Jan 20 13:34:34 aio systemd: Stopped Open vSwitch Database Unit.