RDO tickets are now tracked in Jira https://issues.redhat.com/projects/RDO/issues/
Bug 1209003 - ovs-vswitchd segfault on boot leaving server with no network connectivity
Summary: ovs-vswitchd segfault on boot leaving server with no network connectivity
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: RDO
Classification: Community
Component: openvswitch
Version: Juno
Hardware: x86_64
OS: Linux
unspecified
urgent
Target Milestone: ---
: Kilo
Assignee: Lon Hohberger
QA Contact: Ofer Blaut
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2015-04-05 02:00 UTC by Vasiliy Fet
Modified: 2017-07-24 17:07 UTC (History)
12 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-01-05 18:31:06 UTC
Embargoed:


Attachments (Terms of Use)
core file (300.90 KB, application/x-gzip)
2015-04-10 19:49 UTC, Vasiliy Fet
no flags Details

Description Vasiliy Fet 2015-04-05 02:00:16 UTC
Description of problem: Since the latest upgrade (and then full reinstall) my RDO (allinone) server boots with no network connectivity and I find this in the logs:
[   41.431919] revalidator_5[4399]: segfault at 0 ip 00007f2f10966ab0 sp 00007f2f07ffcb38 error 4 in ovs-vswitchd[7f2f108f8000+153000]
[   48.527338] revalidator_5[4541]: segfault at 0 ip 00007f2f10966ab0 sp 00007f2f07ffc8f8 error 4 in ovs-vswitchd[7f2f108f8000+153000]

Interestingly, stopping and then restarting openvswitch fixes the issue until next reboot.


Version-Release number of selected component (if applicable):

CentOS Linux release 7.1.1503 (Core)

[root@live-server-1 ~]# ovs-vsctl --version
ovs-vsctl (Open vSwitch) 2.1.3
Compiled Oct 10 2014 21:29:30
Linux live-server-1.localnet 3.10.0-229.1.2.el7.x86_64 #1 SMP Fri Mar 27 03:04:26 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

openstack-cinder.noarch                2014.2.2-1.el7            @openstack-juno
openstack-dashboard.noarch             2014.2.2-1.el7            @openstack-juno
openstack-glance.noarch                2014.2.2-1.el7            @openstack-juno
openstack-keystone.noarch              2014.2.2-1.el7            @openstack-juno
openstack-neutron.noarch               2014.2.2-1.el7            @openstack-juno
openstack-neutron-ml2.noarch           2014.2.2-1.el7            @openstack-juno
openstack-neutron-openvswitch.noarch   2014.2.2-1.el7            @openstack-juno
openstack-nova-api.noarch              2014.2.2-1.el7            @openstack-juno
openstack-nova-cert.noarch             2014.2.2-1.el7            @openstack-juno
openstack-nova-common.noarch           2014.2.2-1.el7            @openstack-juno
openstack-nova-compute.noarch          2014.2.2-1.el7            @openstack-juno
openstack-nova-conductor.noarch        2014.2.2-1.el7            @openstack-juno
openstack-nova-console.noarch          2014.2.2-1.el7            @openstack-juno
openstack-nova-novncproxy.noarch       2014.2.2-1.el7            @openstack-juno
openstack-nova-scheduler.noarch        2014.2.2-1.el7            @openstack-juno
openstack-packstack.noarch             2014.2-0.18.dev1462.gbb05296.el7
                                                                 @openstack-juno
openstack-packstack-puppet.noarch      2014.2-0.18.dev1462.gbb05296.el7
                                                                 @openstack-juno
openstack-puppet-modules.noarch        2014.2.11-1.el7           @openstack-juno
openstack-selinux.noarch               0.5.19-2.el7ost           @openstack-juno
openstack-utils.noarch                 2014.2-1.el7.centos       @openstack-juno
openvswitch.x86_64                     2.1.2-2.el7.centos.1      @openstack-juno

How reproducible:

I think its related to the configuration in my neutron/plugin.ini
[ml2]
type_drivers = vxlan,vlan,flat

tenant_network_types = vxlan,vlan,flat

mechanism_drivers =openvswitch


[ml2_type_flat]
flat_networks =*

[ml2_type_vlan]
network_vlan_ranges = physnet1
bridge_mappings = physnet1:br-ex

[ml2_type_gre]

[ml2_type_vxlan]
vni_ranges =10:100

vxlan_group =224.0.0.1

[securitygroup]
enable_security_group = True

Steps to Reproduce:
1. install RDO on Centos 7.x
2. follow guide on "configure RDO with existing network"
3. update plugin.ini to above config
4. reboot - no network connectivity
5. stop openvswitch service, start - works until next reboot

Actual results:
on reboot openvswitchd cores

Expected results:
shouldnt core...

Additional info:

Comment 2 Vasiliy Fet 2015-04-10 19:49:41 UTC
Created attachment 1013266 [details]
core file

Comment 3 Vasiliy Fet 2015-04-11 02:16:02 UTC
booting into a prior kernel fixes the issue!
this works: [root@live-server-1 ~]# uname -a
Linux live-server-1.localnet 3.10.0-123.el7.x86_64 #1 SMP Mon Jun 30 12:09:22 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux


this doesnt: 3.10.0-229.

Comment 6 Donny Davis 2016-02-09 06:07:42 UTC
I am having the same issue, thought I did something wrong so I fired up a new piece of hardware with the same results

network kernel: neutron-server[21419]: segfault at c ip 00007f89c59e7945 sp 00007ffdab873cf8 error 6 in libffi.so.6.0.1[7f89c59e2000+7000]



3.10.0-327.4.5.el7.x86_64 #1 SMP Mon Jan 25 22:07:14 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux


ovs-vsctl (Open vSwitch) 2.4.0
Compiled Oct  7 2015 18:01:06
DB Schema 7.12.1

Comment 7 frjaraur 2016-03-02 18:03:13 UTC
I get same errors but with service neutron-linuxbridge-agent.service.


# systemctl start neutron-linuxbridge-agent.service

In system log /var/log/messages I can see what is happening :

Mar  2 18:57:34 compute1 systemd: Starting OpenStack Neutron Linux Bridge Agent...
Mar  2 18:57:34 compute1 kernel: neutron-linuxbr[11260]: segfault at c ip 00007f684caa8945 sp 00007fff84178508 error 6 in libffi.so.6.0.1[7f684caa3000+7000]
Mar  2 18:57:34 compute1 systemd: neutron-linuxbridge-agent.service: main process exited, code=killed, status=11/SEGV
Mar  2 18:57:34 compute1 systemd: Unit neutron-linuxbridge-agent.service entered failed state.
Mar  2 18:57:34 compute1 systemd: neutron-linuxbridge-agent.service failed.


# systemctl status neutron-linuxbridge-agent.service
● neutron-linuxbridge-agent.service - OpenStack Neutron Linux Bridge Agent
   Loaded: loaded (/usr/lib/systemd/system/neutron-linuxbridge-agent.service; enabled; vendor preset: disabled)
   Active: failed (Result: signal) since mié 2016-03-02 18:57:34 CET; 1min 48s ago
  Process: 11260 ExecStart=/usr/bin/neutron-linuxbridge-agent --config-file /usr/share/neutron/neutron-dist.conf --config-file /etc/neutron/neutron.conf --config-file /etc/neutron/plugins/ml2/linuxbridge_agent.ini --config-dir /etc/neutron/conf.d/common --config-dir /etc/neutron/conf.d/neutron-linuxbridge-agent --log-file /var/log/neutron/linuxbridge-agent.log (code=killed, signal=SEGV)
 Main PID: 11260 (code=killed, signal=SEGV)

mar 02 18:57:34 compute1 systemd[1]: Started OpenStack Neutron Linux Bridge Agent.
mar 02 18:57:34 compute1 systemd[1]: Starting OpenStack Neutron Linux Bridge Agent...
mar 02 18:57:34 compute1 systemd[1]: neutron-linuxbridge-agent.service: main process exited, code=killed, status=11/SEGV
mar 02 18:57:34 compute1 systemd[1]: Unit neutron-linuxbridge-agent.service entered failed state.
mar 02 18:57:34 compute1 systemd[1]: neutron-linuxbridge-agent.service failed.


But I can start neutron-linuxbridge-agent using service configuration without any errors...

# /usr/bin/neutron-linuxbridge-agent --config-file /usr/share/neutron/neutron-dist.conf --config-file /etc/neutron/neutron.conf --config-file /etc/neutron/plugins/ml2/linuxbridge_agent.ini --config-dir /etc/neutron/conf.d/common --config-dir /etc/neutron/conf.d/neutron-linuxbridge-agent --log-file /var/log/neutron/linuxbridge-agent.log
No handlers could be found for logger "oslo_config.cfg"# /usr/bin/neutron-linuxbridge-agent --config-file /usr/share/neutron/neutron-dist.conf --config-file /etc/neutron/neutron.conf --config-file /etc/neutron/plugins/ml2/linuxbridge_agent.ini --config-dir /etc/neutron/conf.d/common --config-dir /etc/neutron/conf.d/neutron-linuxbridge-agent --log-file /var/log/neutron/linuxbridge-agent.log
No handlers could be found for logger "oslo_config.cfg"


kernel 3.10.0-327.10.1.el7.x86_64

libffi-3.0.13-16.el7.x86_64

openstack-neutron-linuxbridge-7.0.1-1.el7.noarch

Many Thanks for You Help,
Javier R.

Comment 8 stijnvdb 2016-03-23 12:43:16 UTC
@Javier

I had the same issue: service not starting, but ok from commandline. I was missing the installation of
yum install python-openstackclient
yum install openstack-selinux

I also needed to remove the logfile /var/log/neutron/linuxbridge-agent.log because of permission problems.

After that, the service starts normally.


Note You need to log in before you can comment on or make changes to this bug.