Bug 1589836 - OVS is dead OVSNeutronAgent will keep running and checking OVS status periodically
Summary: OVS is dead OVSNeutronAgent will keep running and checking OVS status periodi...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: rhosp-director-images
Version: 13.0 (Queens)
Hardware: Unspecified
OS: Unspecified
urgent
high
Target Milestone: ga
: 13.0 (Queens)
Assignee: Lon Hohberger
QA Contact: Omri Hochman
URL:
Whiteboard:
: 1593770 (view as bug list)
Depends On: 1584946
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-06-11 13:37 UTC by Maxim Babushkin
Modified: 2022-03-13 15:39 UTC (History)
27 users (show)

Fixed In Version: rhosp-director-images-13.0-20180619.1.el7ost
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-06-27 13:24:35 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
sosreport_undercloud-0_part_1 (18.00 MB, application/x-xz)
2018-06-11 13:39 UTC, Maxim Babushkin
no flags Details
sosreport_undercloud-0_part_2 (10.72 MB, application/octet-stream)
2018-06-11 13:40 UTC, Maxim Babushkin
no flags Details
sosreport_controller-0_part_1 (18.00 MB, application/x-xz)
2018-06-11 13:41 UTC, Maxim Babushkin
no flags Details
sosreport_controller-0_part_2 (17.35 MB, application/octet-stream)
2018-06-11 13:42 UTC, Maxim Babushkin
no flags Details
sosreport_controller-1_part_1 (18.00 MB, application/x-xz)
2018-06-11 13:44 UTC, Maxim Babushkin
no flags Details
sosreport_controller-1_part_2 (14.76 MB, application/octet-stream)
2018-06-11 13:45 UTC, Maxim Babushkin
no flags Details
sosreport_controller-2_part_1 (18.00 MB, application/x-xz)
2018-06-11 13:46 UTC, Maxim Babushkin
no flags Details
sosreport_controller-2_part_2 (14.52 MB, application/octet-stream)
2018-06-11 13:47 UTC, Maxim Babushkin
no flags Details
sosreport_computeovsdpdk-0 (13.39 MB, application/x-xz)
2018-06-11 13:48 UTC, Maxim Babushkin
no flags Details
sosreport_computeovsdpdk-1 (12.40 MB, application/x-xz)
2018-06-11 13:49 UTC, Maxim Babushkin
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker OSP-13647 0 None None None 2022-03-13 15:39:50 UTC
Red Hat Product Errata RHEA-2018:2083 0 None None None 2018-06-27 13:26:04 UTC

Description Maxim Babushkin 2018-06-11 13:37:47 UTC
Description of problem:
OVS is dead OVSNeutronAgent will keep running and checking OVS status periodically

After idle period of time with no instances, /var/run/openvswitch directory disappears on controllers.

[root@controller-0 ~]# ll /var/run/openvswitch
ls: cannot access /var/run/openvswitch: No such file or directory
[root@controller-0 ~]# 
[root@controller-0 ~]# ovs-vsctl show
ovs-vsctl: unix:/var/run/openvswitch/db.sock: database connection failed (No such file or directory)

2018-06-11 10:56:42.594 102859 WARNING ovsdbapp.backend.ovs_idl.vlog [-] tcp:127.0.0.1:6640: send error: Connection refused
2018-06-11 10:56:42.595 102859 WARNING ovsdbapp.backend.ovs_idl.vlog [-] tcp:127.0.0.1:6640: connection dropped (Connection refused)
2018-06-11 10:56:50.597 102859 WARNING ovsdbapp.backend.ovs_idl.vlog [-] tcp:127.0.0.1:6640: send error: Connection refused
2018-06-11 10:56:50.597 102859 WARNING ovsdbapp.backend.ovs_idl.vlog [-] tcp:127.0.0.1:6640: connection dropped (Connection refused)
2018-06-11 10:56:58.600 102859 WARNING ovsdbapp.backend.ovs_idl.vlog [-] tcp:127.0.0.1:6640: send error: Connection refused
2018-06-11 10:56:58.601 102859 WARNING ovsdbapp.backend.ovs_idl.vlog [-] tcp:127.0.0.1:6640: connection dropped (Connection refused)
2018-06-11 10:57:06.249 102859 ERROR neutron.agent.linux.async_process [-] Error received from [ovsdb-client monitor tcp:127.0.0.1:6640 Interface name,ofport,external_ids --format=json]: ovsdb-client: failed to connect to "tcp:127.0.0.1:6640" (Connection refused)
2018-06-11 10:57:06.249 102859 ERROR neutron.agent.linux.async_process [-] Process [ovsdb-client monitor tcp:127.0.0.1:6640 Interface name,ofport,external_ids --format=json] dies due to the error: ovsdb-client: failed to connect to "tcp:127.0.0.1:6640" (Connection refused)
2018-06-11 10:57:06.603 102859 WARNING ovsdbapp.backend.ovs_idl.vlog [-] tcp:127.0.0.1:6640: send error: Connection refused
2018-06-11 10:57:06.604 102859 WARNING ovsdbapp.backend.ovs_idl.vlog [-] tcp:127.0.0.1:6640: connection dropped (Connection refused)
2018-06-11 10:57:07.778 102859 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.openflow.native.ofswitch [req-c76421a3-0d63-4284-85f9-a7cfffa743ff - - - - -] Switch connection timeout
2018-06-11 10:57:07.780 102859 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.openflow.native.br_int [req-c76421a3-0d63-4284-85f9-a7cfffa743ff - - - - -] Failed to communicate with the switch: RuntimeError: Switch connection timeout
2018-06-11 10:57:07.780 102859 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.openflow.native.br_int Traceback (most recent call last):
2018-06-11 10:57:07.780 102859 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.openflow.native.br_int   File "/usr/lib/python2.7/site-packages/neutron/plugins/ml2/drivers/openvswitch/agent/openflow/native/br_int.py", line 52, in check_canary_table
2018-06-11 10:57:07.780 102859 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.openflow.native.br_int     flows = self.dump_flows(constants.CANARY_TABLE)
2018-06-11 10:57:07.780 102859 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.openflow.native.br_int   File "/usr/lib/python2.7/site-packages/neutron/plugins/ml2/drivers/openvswitch/agent/openflow/native/ofswitch.py", line 141, in dump_flows
2018-06-11 10:57:07.780 102859 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.openflow.native.br_int     (dp, ofp, ofpp) = self._get_dp()
2018-06-11 10:57:07.780 102859 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.openflow.native.br_int   File "/usr/lib/python2.7/site-packages/neutron/plugins/ml2/drivers/openvswitch/agent/openflow/native/ovs_bridge.py", line 67, in _get_dp
2018-06-11 10:57:07.780 102859 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.openflow.native.br_int     self._cached_dpid = new_dpid
2018-06-11 10:57:07.780 102859 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.openflow.native.br_int   File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__
2018-06-11 10:57:07.780 102859 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.openflow.native.br_int     self.force_reraise()
2018-06-11 10:57:07.780 102859 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.openflow.native.br_int   File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in force_reraise
2018-06-11 10:57:07.780 102859 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.openflow.native.br_int     six.reraise(self.type_, self.value, self.tb)
2018-06-11 10:57:07.780 102859 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.openflow.native.br_int   File "/usr/lib/python2.7/site-packages/neutron/plugins/ml2/drivers/openvswitch/agent/openflow/native/ovs_bridge.py", line 50, in _get_dp
2018-06-11 10:57:07.780 102859 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.openflow.native.br_int     dp = self._get_dp_by_dpid(self._cached_dpid)
2018-06-11 10:57:07.780 102859 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.openflow.native.br_int   File "/usr/lib/python2.7/site-packages/neutron/plugins/ml2/drivers/openvswitch/agent/openflow/native/ofswitch.py", line 69, in _get_dp_by_dpid
2018-06-11 10:57:07.780 102859 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.openflow.native.br_int     raise RuntimeError(m)
2018-06-11 10:57:07.780 102859 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.openflow.native.br_int RuntimeError: Switch connection timeout
2018-06-11 10:57:07.780 102859 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.openflow.native.br_int 
2018-06-11 10:57:07.781 102859 WARNING neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [req-c76421a3-0d63-4284-85f9-a7cfffa743ff - - - - -] OVS is dead. OVSNeutronAgent will keep running and checking OVS status periodically.


Version-Release number of selected component (if applicable):
OSP 13
Puddle 2018-06-06.3

How reproducible:
Deploy an environment.
Leave the environment for a day without any instance existing.
Try to boot an instance and check the status of ovs and the /var/run/openvswitch directory.

SOS reports attached.

Comment 1 Maxim Babushkin 2018-06-11 13:39:35 UTC
Created attachment 1450036 [details]
sosreport_undercloud-0_part_1

Comment 2 Maxim Babushkin 2018-06-11 13:40:36 UTC
Created attachment 1450037 [details]
sosreport_undercloud-0_part_2

Comment 3 Maxim Babushkin 2018-06-11 13:41:49 UTC
Created attachment 1450038 [details]
sosreport_controller-0_part_1

Comment 4 Maxim Babushkin 2018-06-11 13:42:55 UTC
Created attachment 1450039 [details]
sosreport_controller-0_part_2

Comment 5 Maxim Babushkin 2018-06-11 13:44:16 UTC
Created attachment 1450040 [details]
sosreport_controller-1_part_1

Comment 6 Maxim Babushkin 2018-06-11 13:45:27 UTC
Created attachment 1450041 [details]
sosreport_controller-1_part_2

Comment 7 Maxim Babushkin 2018-06-11 13:46:31 UTC
Created attachment 1450042 [details]
sosreport_controller-2_part_1

Comment 8 Maxim Babushkin 2018-06-11 13:47:23 UTC
Created attachment 1450043 [details]
sosreport_controller-2_part_2

Comment 9 Maxim Babushkin 2018-06-11 13:48:38 UTC
Created attachment 1450044 [details]
sosreport_computeovsdpdk-0

Comment 10 Maxim Babushkin 2018-06-11 13:49:28 UTC
Created attachment 1450045 [details]
sosreport_computeovsdpdk-1

Comment 11 Maxim Babushkin 2018-06-11 13:52:20 UTC
Sosreport logs divided into two part should be combined by the following:

cat sosreport-undercloud-0.redhat.local-20180611080635.tar.xz* > sosreport-undercloud-0.redhat.local-20180611080635.tar.xz

Comment 12 Saravanan KR 2018-06-12 08:29:25 UTC
ovs-vswitchd.log
----------------
2018-06-10T14:44:02.200Z|00379|connmgr|INFO|br-int<->tcp:127.0.0.1:6633: 11 flow_mods in the 2 s starting 57 s ago (11 deletes)
2018-06-11T03:25:01.887Z|00380|vlog|WARN|failed to open /var/log/openvswitch/ovs-vswitchd.log for logging: Permission denied


[root@controller-0 heat-admin]# ll /var/log/openvswitch/
total 48
-rw-r-----. 1 openvswitch hugetlbfs     0 Jun 12 03:26 ovsdb-server.log
-rw-r-----. 1 openvswitch hugetlbfs   484 Jun 10 11:11 ovsdb-server.log-20180611
-rw-r-----. 1 openvswitch hugetlbfs     0 Jun 11 03:25 ovsdb-server.log-20180612
-rw-r-----. 1 openvswitch hugetlbfs     0 Jun 12 03:26 ovs-vswitchd.log
-rw-r-----. 1 openvswitch hugetlbfs 42895 Jun 10 14:44 ovs-vswitchd.log-20180611
-rw-r-----. 1 openvswitch hugetlbfs     0 Jun 11 03:25 ovs-vswitchd.log-20180612
[root@controller-0 heat-admin]# ll /var/log/openvswitch/ -d
drwxr-x---. 2 root root 186 Jun 12 08:08 /var/log/openvswitch/


As per the logs, once the deployment is completed, all the tests has been passed, which implies that the the permissions were fine during the deployment. But if left over for overnight, the log folder /var/log/openvswitch ownership changes to root:root. This results in permission denied error in the ovsdb-server and ovs-vswitchd process. And those processes fails, resulting in this error.

Looking for possibility of log management tools affecting ownership.

Comment 13 Saravanan KR 2018-06-12 11:40:42 UTC
The ownership of the directory /var/log/openvswitch is never changed to openvswitch:hugetlbfs, it is always with root:root as it is created by systemd. After the logs files are rotated, the new logs files could not be opened by ovs daemons. It results in permission denied error log in the older log file (after logrotate).
----
2018-06-11T03:25:01.887Z|00380|vlog|WARN|failed to open /var/log/openvswitch/ovs-vswitchd.log for logging: Permission denied
----

Though this permission denied is an issue to be looked upon, it does not clarify why ovsdb-server and ovs-vswitchd services are going to error state.

Comment 14 Saravanan KR 2018-06-12 12:59:36 UTC
Here are the steps that i can reproduce it.

-------------
[root@computeovsdpdk-0 heat-admin]# logrotate -f /etc/logrotate.conf 

[root@computeovsdpdk-0 heat-admin]# ll /var/log/openvswitch/ 
total 296
-rw-r-----. 1 openvswitch hugetlbfs      0 Jun 12 12:55 ovsdb-server.log
-rw-r-----. 1 openvswitch hugetlbfs    352 Jun 10 11:25 ovsdb-server.log-20180611.gz
-rw-r-----. 1 openvswitch hugetlbfs    168 Jun 12 12:49 ovsdb-server.log-20180612.gz
-rw-r-----. 1 openvswitch hugetlbfs      0 Jun 12 12:55 ovs-vswitchd.log
-rw-r-----. 1 openvswitch hugetlbfs 289322 Jun 10 14:43 ovs-vswitchd.log-20180611
-rw-r-----. 1 openvswitch hugetlbfs    167 Jun 12 12:49 ovs-vswitchd.log-20180612.gz

[root@computeovsdpdk-0 heat-admin]# ps aux  | grep ovs
openvsw+    1465  0.0  0.0  55088  2944 ?        S<s  12:38   0:00 ovsdb-server /etc/openvswitch/conf.db -vconsole:emer -vsyslog:err -vfile:info --remote=punix:/var/run/openvswitch/db.sock --private-key=db:Open_vSwitch,SSL,private_key --certificate=db:Open_vSwitch,SSL,certificate --bootstrap-ca-cert=db:Open_vSwitch,SSL,ca_cert --user openvswitch:hugetlbfs --no-chdir --log-file=/var/log/openvswitch/ovsdb-server.log --pidfile=/var/run/openvswitch/ovsdb-server.pid --detach
openvsw+    1554  198  0.1 2904664 160420 ?      S<Lsl 12:38  35:14 ovs-vswitchd unix:/var/run/openvswitch/db.sock -vconsole:emer -vsyslog:err -vfile:info --mlockall --user openvswitch:hugetlbfs --no-chdir --log-file=/var/log/openvswitch/ovs-vswitchd.log --pidfile=/var/run/openvswitch/ovs-vswitchd.pid --detach
42435       3358  0.0  0.0  11684  1392 ?        Ss   12:39   0:00 /bin/bash /neutron_ovs_agent_launcher.sh
42435       4002  0.0  0.0  52656  2836 ?        S    12:39   0:00 ovsdb-client monitor tcp:127.0.0.1:6640 Interface name,ofport,external_ids --format=json

[root@computeovsdpdk-0 heat-admin]# docker exec -it logrotate_crond bash
()[root@01b72ee362c4 /]#  /sbin/lsof -nPs +L1 +D /var/log 2>&1|    awk '/\S+\s+[0-9]+\s.*\/var\/log\/.*\(deleted\)/ {print $2}' |    /bin/xargs -n1 -r -t kill -HUP
kill -HUP 1465 
kill -HUP 1554 
()[root@01b72ee362c4 /]# 

------------------------------

Comment 15 Saravanan KR 2018-06-12 13:13:03 UTC
To explain the behavior, since the /var/log/openvswitch directory has root:root ownership, after logrotate, ovs processes is not able to open the new files. Because of which lsof command lists those rotated files as "(deleted)".

---------
[root@computeovsdpdk-0 heat-admin]# /sbin/lsof -nPs +L1 +D /var/log 
ovsdb-ser 1465 openvswitch   22w   REG    8,2      220     0  14127046 /var/log/openvswitch/ovsdb-server.log-20180612 (deleted)
ovs-vswit 1554 openvswitch   83w   REG    8,2      220     0  14127063 /var/log/openvswitch/ovs-vswitchd.log-20180612 (deleted)
--------

Now the "logrotate_crond" container, which runs periodically to kill the processes which have the "(deleted)" log files, kill the ovs processes, which results in the ovsdb-server and ovs-vswitchd service error.

There are 2 issue here:
1) after logrotate, ovs logs are not capture because of the permission denied error - Aaron, could you comment on it
2) logrotate_crond container is killing ovs process, which is unexpected even though error in logging - I will continue to look in to it

Comment 16 Aaron Conole 2018-06-12 15:22:52 UTC
For 1)

In the Open vSwitch spec file, we do set the log directory ownership:

%attr(750,openvswitch,openvswitch) /var/log/openvswitch

Which RPM version is this?

Comment 17 Alan Pevec 2018-06-12 21:16:24 UTC
> Which RPM version is this?

From bz description: OSP 13 Puddle 2018-06-06.3

http://download.devel.redhat.com/rcm-guest/puddles/OpenStack/13.0-RHEL-7/2018-06-06.3

which includes cross-tagged openvswitch-2.9.0-19.el7fdp.1 build.

Comment 18 Timothy Redaelli 2018-06-13 12:33:59 UTC
This is strange since bz1583744 is marked as verify with the correct openvswitch/openvswitch owner

Comment 19 Saravanan KR 2018-06-13 12:58:03 UTC
I have not looked at the directory ownership after the deployment, as the issue is happened after couple of days. If it is correct after deployment, then is there a possibility that it gets changed after overnight. re-deploying now.

Comment 20 Saravanan KR 2018-06-13 13:01:15 UTC
Checked on another OSP13 deployment (with latest puddle), found that the directory permission is still root:root.

--------------
[root@overcloud-controller-0 heat-admin]# ll /var/log/openvswitch/ -d
drwxr-x---. 2 root root 54 Jun 13 08:56 /var/log/openvswitch/
[root@overcloud-controller-0 heat-admin]# yum list openvswitch
Loaded plugins: product-id, search-disabled-repos, subscription-manager
This system is not registered with an entitlement server. You can use subscription-manager to register.
Installed Packages
openvswitch.x86_64                                                                     2.9.0-19.el7fdp.1                                                                     @rhos-13.0-signed
[root@overcloud-controller-0 heat-admin]# ll /var/log/openvswitch/
total 12
-rw-r-----. 1 openvswitch hugetlbfs  359 Jun 13 08:56 ovsdb-server.log
-rw-r-----. 1 openvswitch hugetlbfs 4747 Jun 13 09:23 ovs-vswitchd.log
---------------

Comment 21 Saravanan KR 2018-06-13 13:09:33 UTC
Checking on the overcloud-full image (where openvswitch is pre-installed), the directory owenership is root:root. Does it happen if the openvswitch is updated rather than installed?

-------
 [root@undercloud01 images]# ll mount_image/var/log/openvswitch/ -d
drwxr-x---. 2 root root 6 May 30 09:59 mount_image/var/log/openvswitch/
-------

Comment 22 Jon Schlueter 2018-06-13 13:28:07 UTC
sounds related to bug we had to fix for /var/log/lastlog in overcloud-full images

This seems to be a possible virt-sysprep interaction issue or issue in logrotate config to make sure the process is bumped (sig HUP?) when log files are rotated so it will start to log to new file instead of old deleted one

The permissions will help for correct ownership but the log rotation configuration has to be clean or this will continue to happen

Comment 23 Lon Hohberger 2018-06-13 13:58:11 UTC
The problem isn't /var/log/lastlog directly - the virt-sysprep line is killing everything in /var/log; we then have hacks to put back lastlog.

If this works outside of deployment images, then it's likely a virt-sysprep issue and I know how to fix it.

Comment 24 Timothy Redaelli 2018-06-13 14:01:52 UTC
the logrotate configuration we ship with openvswitch is ok, since ovsdb-server and ovs-vswitchd run as openvswitch user and so they can create the new logs inside /var/log/openvswitch (with the correct owner:group) after the logs are rotated (/etc/logrotate.d/openvswitch).

Comment 25 Alexander Chuzhoy 2018-06-14 18:10:52 UTC
Reproduced.
[root@newplan-controller-0 ~]# ovs-vsctl show
ovs-vsctl: unix:/var/run/openvswitch/db.sock: database connection failed (No such file or directory)



After rebooting controllers it works again.

[root@newplan-controller-1 ~]# ls -l /var/run/ |grep openvswitch
drwxr-xr-x.  2 openvswitch hugetlbfs  240 Jun 14 17:57 openvswitch

Comment 26 Michele Baldessari 2018-06-15 07:52:47 UTC
(In reply to Saravanan KR from comment #15)
> To explain the behavior, since the /var/log/openvswitch directory has
> root:root ownership, after logrotate, ovs processes is not able to open the
> new files. Because of which lsof command lists those rotated files as
> "(deleted)".
> 
> ---------
> [root@computeovsdpdk-0 heat-admin]# /sbin/lsof -nPs +L1 +D /var/log 
> ovsdb-ser 1465 openvswitch   22w   REG    8,2      220     0  14127046
> /var/log/openvswitch/ovsdb-server.log-20180612 (deleted)
> ovs-vswit 1554 openvswitch   83w   REG    8,2      220     0  14127063
> /var/log/openvswitch/ovs-vswitchd.log-20180612 (deleted)
> --------
> 
> Now the "logrotate_crond" container, which runs periodically to kill the
> processes which have the "(deleted)" log files, kill the ovs processes,
> which results in the ovsdb-server and ovs-vswitchd service error.
> 
> There are 2 issue here:
> 1) after logrotate, ovs logs are not capture because of the permission
> denied error - Aaron, could you comment on it
> 2) logrotate_crond container is killing ovs process, which is unexpected
> even though error in logging - I will continue to look in to it

If indeed 2) is one of the root causes for this problem, then the fix for that is being tracked at https://bugzilla.redhat.com/show_bug.cgi?id=1584946

Comment 27 Saravanan KR 2018-06-15 08:54:12 UTC
(In reply to Michele Baldessari from comment #26)
> (In reply to Saravanan KR from comment #15)
> > 2) logrotate_crond container is killing ovs process, which is unexpected
> > even though error in logging - I will continue to look in to it
> 
> If indeed 2) is one of the root causes for this problem, then the fix for
> that is being tracked at https://bugzilla.redhat.com/show_bug.cgi?id=1584946

Indeed, (2) is the cause of killing the ovs processes resulting in ovs service error and disappearing of /var/run/openvswitch. Thanks for pointing out. 

And (1) also should be fixed, because logrotate of ovs logs will no more collect logs after a day, which is one of the main debugging log for OVS-DPDK deployment.

Comment 29 Jon Schlueter 2018-06-15 12:54:51 UTC
Might be helped with https://review.openstack.org/#/c/575525 in upstream stable/queens which should send SIGHUP when logrotate is run

Comment 34 Damien Ciabrini 2018-06-18 12:02:59 UTC
(In reply to Saravanan KR from comment #27)
> (In reply to Michele Baldessari from comment #26)
> > (In reply to Saravanan KR from comment #15)
> > > 2) logrotate_crond container is killing ovs process, which is unexpected
> > > even though error in logging - I will continue to look in to it
> > 
> > If indeed 2) is one of the root causes for this problem, then the fix for
> > that is being tracked at https://bugzilla.redhat.com/show_bug.cgi?id=1584946
> 
> Indeed, (2) is the cause of killing the ovs processes resulting in ovs
> service error and disappearing of /var/run/openvswitch. Thanks for pointing
> out. 
> 
> And (1) also should be fixed, because logrotate of ovs logs will no more
> collect logs after a day, which is one of the main debugging log for
> OVS-DPDK deployment.


As per https://bugzilla.redhat.com/show_bug.cgi?id=1584946#c19, it seems that the ovs service is no longer stopping due to log rotation when https://review.openstack.org/#/c/575525 is applied.

Comment 36 Lon Hohberger 2018-06-19 12:34:24 UTC
There's an ordering problem with the spec file. When creating images, it is installed once. However, the group/user for ovs doesn't exist:

Preparing...                          ################################# [100%]
Updating / installing...
   1:openvswitch-2.9.0-19.el7fdp.1    warning: user openvswitch does not exist - using root
warning: group openvswitch does not exist - using root
warning: user openvswitch does not exist - using root
warning: group openvswitch does not exist - using root
################################# [100%]
warning: user openvswitch does not exist - using root
warning: group openvswitch does not exist - using root
[root@localhost ~]# ls -ld /var/log/openvswitch/
drwxr-x---. 2 root root 6 May 30 09:59 /var/log/openvswitch/

Deleting the RPM and reinstalling it without deleting the user, and things work:

Preparing...                          ################################# [100%]
Updating / installing...
   1:openvswitch-2.9.0-19.el7fdp.1    ################################# [100%]
[root@localhost ~]# ls -ld /var/log/openvswitch/
drwxr-x---. 2 openvswitch openvswitch 6 May 30 09:59 /var/log/openvswitch/
[root@localhost ~]#

Comment 37 Lon Hohberger 2018-06-19 12:37:23 UTC
The 19.1 spec file has the useradd/groupadd in %post, as Jon noticed, which is the reason for the ordering problem.

Comment 39 Lon Hohberger 2018-06-19 13:03:09 UTC
Workaround (chown openvswitch.openvswitch /var/log/openvswitch) added to image build step.

Comment 46 Yariv 2018-06-21 12:09:33 UTC
[root@computeovsdpdk-0 ~]# ls -ltrd /var/log/openvswitch
drwxr-x---. 2 openvswitch openvswitch 54 Jun 21 10:05 /var/log/openvswitch

Comment 47 Andrew Austin 2018-06-21 14:43:17 UTC
*** Bug 1593770 has been marked as a duplicate of this bug. ***

Comment 49 errata-xmlrpc 2018-06-27 13:24:35 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2018:2083


Note You need to log in before you can comment on or make changes to this bug.