RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1307025 - Open vSwitch service resilience test
Summary: Open vSwitch service resilience test
Keywords:
Status: CLOSED UPSTREAM
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: openvswitch
Version: 7.3
Hardware: x86_64
OS: Linux
high
high
Target Milestone: rc
: ---
Assignee: Eelco Chaudron
QA Contact: ovs-qe
URL:
Whiteboard:
Depends On: 1335865
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-02-12 13:29 UTC by Flavio Leitner
Modified: 2017-03-20 12:31 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-02-28 08:33:59 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Flavio Leitner 2016-02-12 13:29:49 UTC
Description of problem:

When ovs-vswitchd segfaults for some reason, the monitor thread is responsible for starting it back to get the service online.  However, when the bridge includes a DPDK port, the restart doesn't work.

2016-02-12T13:19:20.142Z|00003|daemon_unix(monitor)|ERR|1 crashes: pid 78009 died, killed (Segmentation fault), core dumped, restarting
2016-02-12T13:19:20.172Z|00004|ovs_numa|INFO|Discovered 24 CPU cores on NUMA node 0
2016-02-12T13:19:20.172Z|00005|ovs_numa|INFO|Discovered 1 NUMA nodes and 24 CPU cores
2016-02-12T13:19:20.172Z|00006|memory|INFO|108952 kB peak resident set size after 47.5 seconds
2016-02-12T13:19:20.172Z|00007|reconnect|INFO|unix:/usr/local/var/run/openvswitch/db.sock: connecting...
2016-02-12T13:19:20.172Z|00008|reconnect|INFO|unix:/usr/local/var/run/openvswitch/db.sock: connected
2016-02-12T13:19:20.183Z|00009|ofproto_dpif|INFO|netdev@ovs-netdev: Datapath supports recirculation
2016-02-12T13:19:20.183Z|00010|ofproto_dpif|INFO|netdev@ovs-netdev: MPLS label stack length probed as 3
2016-02-12T13:19:20.183Z|00011|ofproto_dpif|INFO|netdev@ovs-netdev: Datapath supports unique flow ids
2016-02-12T13:19:20.183Z|00012|ofproto_dpif|INFO|netdev@ovs-netdev: Datapath does not support ct_state
2016-02-12T13:19:20.183Z|00013|ofproto_dpif|INFO|netdev@ovs-netdev: Datapath does not support ct_zone
2016-02-12T13:19:20.183Z|00014|ofproto_dpif|INFO|netdev@ovs-netdev: Datapath does not support ct_mark
2016-02-12T13:19:20.183Z|00015|ofproto_dpif|INFO|netdev@ovs-netdev: Datapath does not support ct_label
2016-02-12T13:19:20.189Z|00016|bridge|WARN|could not open network device dpdk1 (Cannot allocate memory)
2016-02-12T13:19:20.189Z|00017|bridge|WARN|could not open network device dpdk0 (Cannot allocate memory)
2016-02-12T13:19:20.200Z|00018|bridge|INFO|bridge ovsbr0: added interface ovsbr0 on port 65534
2016-02-12T13:19:20.201Z|00019|bridge|INFO|bridge ovsbr0: using datapath ID 000006b9d7c27d4f
2016-02-12T13:19:20.201Z|00020|connmgr|INFO|ovsbr0: added service controller "punix:/usr/local/var/run/openvswitch/ovsbr0.mgmt"
2016-02-12T13:19:20.227Z|00021|bridge|WARN|could not open network device dpdk1 (Cannot allocate memory)
2016-02-12T13:19:20.228Z|00022|bridge|WARN|could not open network device dpdk0 (Cannot allocate memory)
2016-02-12T13:19:20.229Z|00023|bridge|INFO|ovs-vswitchd (Open vSwitch) 2.5.0



Version-Release number of selected component (if applicable):
2.5.0


How reproducible:
Always


Steps to Reproduce:
1. do something that causes an OVS thread to segfault
2. watch the monitor thread failing to restart OVS

Expected results:
The monitor thread should be able to restart the service.

Comment 1 Panu Matilainen 2016-06-23 11:19:15 UTC
For physical ports this has been fixed in upstream development branches, probably part in dpdk and part in ovs. Haven't dug out the exact commits (yet), and stable branch situation needs testing too.

What does not work in upstream OVS is restarting vhostuser ports, they fail with 
VHOST_CONFIG: socket created, fd:50
VHOST_CONFIG: fail to bind fd:50: remove file:/var/run/openvswitch/<path> and try again.

The vhostuser sockets are registered for cleanup on fatal signals, but the problem is lib/fatal-signal.c only considers the { SIGTERM, SIGINT, SIGHUP, SIGALRM } as fatals. So the file cleanup never occurs on actual crashes, and that's why the vhostuser ports fail on restart.

Comment 11 Eelco Chaudron 2017-02-07 12:47:41 UTC
Send patch upstream to restart ovsdb or vswitchd on failure.

https://mail.openvswitch.org/pipermail/ovs-dev/2017-February/328546.html

Comment 12 Eelco Chaudron 2017-02-28 08:32:30 UTC
The changes have been accepted; 

https://github.com/openvswitch/ovs/commi/c19bf36d848cbdf755c6760fad1726c95e4377f1
https://github.com/openvswitch/ovs/commi/090cc60c08a513047cf0fcc8c7c63ffb42e8fef9

They will be available in next 2.7 release, probably 2.7.1.

Comment 14 Nilesh 2017-03-20 10:53:43 UTC
Modify the file as per the comment #10 ,  hit with below error. 


[root@compute-1 log]# systemctl daemon-reload
[root@compute-1 log]# systemctl restart openvswitch
Failed to restart openvswitch.service: Unit is not loaded properly: Invalid argument.
See system logs and 'systemctl status openvswitch.service' for details.
[root@compute-1 log]# systemctl restart openvswitch
Failed to restart openvswitch.service: Unit is not loaded properly: Invalid argument.
See system logs and 'systemctl status openvswitch.service' for details.
[root@compute-1 log]# systemctl status openvswitch.service -l
● openvswitch.service - Open vSwitch
   Loaded: error (Reason: Invalid argument)
   Active: active (exited) since Mon 2017-03-20 21:35:12 +03; 3min 22s ago
 Main PID: 986151 (code=exited, status=0/SUCCESS)
   CGroup: /system.slice/openvswitch.service

Mar 20 21:35:12 compute-1.localdomain systemd[1]: Starting Open vSwitch...
Mar 20 21:35:12 compute-1.localdomain systemd[1]: Started Open vSwitch.
Mar 20 21:35:16 compute-1.localdomain systemd[1]: openvswitch.service has Restart= setting other than no, which isn't allowed for Type=oneshot services. Refusing.
Mar 20 21:37:57 compute-1.localdomain systemd[1]: openvswitch.service has Restart= setting other than no, which isn't allowed for Type=oneshot services. Refusing.
[root@compute-1 log]#

Comment 16 Nilesh 2017-03-20 12:15:50 UTC
I modified below file :- 

vi /etc/systemd/system/multi-user.target.wants/openvswitch.service

~~~
[Unit]
Description=Open vSwitch
After=syslog.target network.target openvswitch-nonetwork.service
Requires=openvswitch-nonetwork.service

Requires=ovsdb-server.service  <<<<< Added 
Requires=ovs-vswitchd.service  <<<<< Added

[Service]
Type=oneshot
ExecStart=/bin/true
ExecStop=/bin/true
RemainAfterExit=yes
Restart=on-failure    <<<<< Added


[Install]
WantedBy=multi-user.target

~~~     


ovs_version: "2.5.0"

rpm -qa |grep systemd
systemd-219-30.el7_3.6.x86_64
systemd-libs-219-30.el7_3.6.x86_64
systemd-sysv-219-30.el7_3.6.x86_64


RHOSP-10



[root@compute-1 log]# systemctl daemon-reload
[root@compute-1 log]# systemctl restart openvswitch
Failed to restart openvswitch.service: Unit is not loaded properly: Invalid argument.
See system logs and 'systemctl status openvswitch.service' for details.
[root@compute-1 log]# systemctl restart openvswitch
Failed to restart openvswitch.service: Unit is not loaded properly: Invalid argument.
See system logs and 'systemctl status openvswitch.service' for details.
[root@compute-1 log]# systemctl status openvswitch.service -l
● openvswitch.service - Open vSwitch
   Loaded: error (Reason: Invalid argument)
   Active: active (exited) since Mon 2017-03-20 21:35:12 +03; 3min 22s ago
 Main PID: 986151 (code=exited, status=0/SUCCESS)
   CGroup: /system.slice/openvswitch.service

Mar 20 21:35:12 compute-1.localdomain systemd[1]: Starting Open vSwitch...
Mar 20 21:35:12 compute-1.localdomain systemd[1]: Started Open vSwitch.
Mar 20 21:35:16 compute-1.localdomain systemd[1]: openvswitch.service has Restart= setting other than no, which isn't allowed for Type=oneshot services. Refusing.
Mar 20 21:37:57 compute-1.localdomain systemd[1]: openvswitch.service has Restart= setting other than no, which isn't allowed for Type=oneshot services. Refusing.
[root@compute-1 log]#

Comment 17 Eelco Chaudron 2017-03-20 12:31:29 UTC
Hi Nilesh,

You do not need to add Restart=on-failure to the openvswitch.service file, but only to the ovs-vswitchd.service, ovsdb-server.service files.

See upstream patch: https://mail.openvswitch.org/pipermail/ovs-dev/2017-February/328546.html


Note You need to log in before you can comment on or make changes to this bug.