Bug 1569557

Summary:

OVS daemon got lots of segfaults caused by "fork child died before signaling startup" ERR

Product:

Red Hat Enterprise Linux 7

Reporter:

Jean-Tsung Hsiao <jhsiao>

Component:

openvswitch

Assignee:

Eelco Chaudron <echaudro>

Status:

CLOSED UPSTREAM

QA Contact:

Jean-Tsung Hsiao <jhsiao>

Severity:

high

Docs Contact:

Priority:

high

Version:

7.5

CC:

atragler, ctrautma, jhsiao, kzhang, louis.peens, pablo.cascon, pieter.jansenvanvuuren, qding, simon.horman, tredaelli

Target Milestone:

Target Release:

---

Hardware:

x86_64

OS:

Linux

Whiteboard:

Fixed In Version:

Doc Type:

If docs needed, set a value

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2018-06-01 13:45:47 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
ovs-vswitchd.log from restart to stop	none

Description Jean-Tsung Hsiao 2018-04-19 13:33:14 UTC

Description of problem: OVS daemon got lots of segfaults caused by "fork child died before signaling startup" ERR

These segfaults were caused by ovs-vswitchd ERR like this:
2018-04-18T17:03:22.163Z|00002|daemon_unix|ERR|fork child died before signaling startup (killed (Segmentation fault), core dumped)

Got 225 /core.X files in 53 minutes, each is of size 233 MB, Thus, consumed all disk space for /.

Below is a gdb debug of one of the core files.

[root@netqe8 jhsiao]# gdb ovs-vswitchd core.36572
GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-110.el7
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /usr/sbin/ovs-vswitchd...Reading symbols from /usr/sbin/ovs-vswitchd...(no debugging symbols found)...done.
(no debugging symbols found)...done.
[New LWP 36641]
[New LWP 36642]
[New LWP 36576]
[New LWP 36575]
[New LWP 36577]
[New LWP 36573]
[New LWP 36578]
[New LWP 36579]
[New LWP 36600]
[New LWP 36626]
[New LWP 36630]
[New LWP 36629]
[New LWP 36628]
[New LWP 36631]
[New LWP 36627]
[New LWP 36632]
[New LWP 36625]
[New LWP 36633]
[New LWP 36634]
[New LWP 36635]
[New LWP 36636]
[New LWP 36638]
[New LWP 36640]
[New LWP 36637]
[New LWP 36639]
[New LWP 36572]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `ovs-vswitchd unix:/var/run/openvswitch/db.sock -vconsole:emer -vsyslog:err -vfi'.
Program terminated with signal 11, Segmentation fault.
#0  0x000055dfd9e178a4 in netdev_dpdk_rxq_recv ()
Missing separate debuginfos, use: debuginfo-install openvswitch-2.9.0-17.el7fdp.x86_64
(gdb) bt
#0  0x000055dfd9e178a4 in netdev_dpdk_rxq_recv ()
#1  0x000055dfd9d63d81 in netdev_rxq_recv ()
#2  0x000055dfd9d3d32f in dp_netdev_process_rxq_port ()
#3  0x000055dfd9d3d73a in pmd_thread_main ()
#4  0x000055dfd9dba3c6 in ovsthread_wrapper ()
#5  0x00007ff258c53dd5 in start_thread () from /lib64/libpthread.so.0
#6  0x00007ff258051b3d in clone () from /lib64/libc.so.6
(gdb) 

Version-Release number of selected component (if applicable):
[root@netqe8 ~]# rpm -q openvswitch
openvswitch-2.9.0-17.el7fdp.x86_64
[root@netqe8 ~]# uname -a
Linux netqe8.knqe.lab.eng.bos.redhat.com 3.10.0-862.el7.x86_64 #1 SMP Wed Mar 21 18:14:51 EDT 2018 x86_64 x86_64 x86_64 GNU/Linux
[root@netqe8 ~]#

How reproducible: Don't know how to reproduce yet.
NOTE: Encountered this issue while running "systemctl start openvswitch" and "systemctl stop openvswitch".

Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 2 Eelco Chaudron 2018-04-20 09:02:37 UTC

From the core dump:

(gdb) p dev->data
$3 = (struct rte_eth_dev_data *) 0x7f8f7ffb9f90
(gdb) p *dev->data
Cannot access memory at address 0x7f8f7ffb9f90

(gdb) info proc mappings
Mapped address spaces:

          Start Addr           End Addr       Size     Offset objfile
      0x5646f10e3000     0x5646f1544000   0x461000        0x0 /usr/sbin/ovs-vswitchd
      0x5646f1743000     0x5646f1760000    0x1d000   0x460000 /usr/sbin/ovs-vswitchd
      0x5646f1760000     0x5646f17a3000    0x43000   0x47d000 /usr/sbin/ovs-vswitchd
      0x7f8e80000000     0x7f8ec0000000 0x40000000        0x0 /dev/hugepages/rtemap_7
      0x7f8ec0000000     0x7f8f00000000 0x40000000        0x0 /dev/hugepages/rtemap_6
      0x7f8f00000000     0x7f8f40000000 0x40000000        0x0 /dev/hugepages/rtemap_5
      0x7f8f40000000     0x7f8f80000000 0x40000000        0x0 /dev/hugepages/rtemap_4

The memory accessed is huge page memory, which is not included in the core
dumps, as DPDK allocates it as shared.

# cat /proc/$(pgrep ovs-vswitchd)/maps | grep rtemap
7f70c0000000-7f7100000000 rw-s 00000000 00:24 7276557                    /dev/hugepages/rtemap_7
7f7100000000-7f7140000000 rw-s 00000000 00:24 7276556                    /dev/hugepages/rtemap_6
7f7140000000-7f7180000000 rw-s 00000000 00:24 7276555                    /dev/hugepages/rtemap_5
7f7180000000-7f71c0000000 rw-s 00000000 00:24 7276554                    /dev/hugepages/rtemap_4
7f7500000000-7f7540000000 rw-s 00000000 00:24 7276553                    /dev/hugepages/rtemap_3
7f7540000000-7f7580000000 rw-s 00000000 00:24 7276552                    /dev/hugepages/rtemap_2
7f7580000000-7f75c0000000 rw-s 00000000 00:24 7276551                    /dev/hugepages/rtemap_1
7f75c0000000-7f7600000000 rw-s 00000000 00:24 7225873                    /dev/hugepages/rtemap_0

To include it in the next core dump you need to adjust the core dump flags in
/proc/<pid>/coredump_filter.

[root@netqe8 ~]# cat /proc/$(pgrep ovs-vswitchd)/coredump_filter
00000033
[root@netqe8 ~]# echo 0x73 > /proc/$(pgrep ovs-vswitchd)/coredump_filter
[root@netqe8 ~]# cat /proc/$(pgrep ovs-vswitchd)/coredump_filter
00000073


Without the full memory dump it's even harder to see what data was corrupted.
But it looks like some data passed to the driver:

int16_t nb_rx = (*dev->rx_pkt_burst)(dev->data->rx_queues[queue_id],                                                                             
                rx_pkts, nb_pkts);

So dev->data->rx_queues[queue_id]  -> queue_id = 0

(gdb) p *dev
$2 = {rx_pkt_burst = 0x5646f1275db0 <nfp_net_recv_pkts>, tx_pkt_burst = 0x5646f1276fa0 <nfp_net_xmit_pkts>, tx_pkt_prepare = 0x0, data = 0x7f8f7ffb9f90, dev_ops = 0x5646f174ff40 <nfp_net_eth_dev_ops>, device = 0x5646f304f0d0, 
  intr_handle =	0x5646f304f1a8,	link_intr_cbs = {tqh_first = 0x0, tqh_last = 0x5646f1a6f1b8 <rte_eth_devices+33080>}, post_rx_burst_cbs = {0x0 <repeats 1024 times>}, pre_tx_burst_cbs = {0x0 <repeats 1024 times>}, 
  state = RTE_ETH_DEV_ATTACHED,	security_ctx = 0x0}



Please find a reproducer with saving shared hugepages enabled.
You might need to add coredump_filter=0x73 to the kernel command line.

Comment 3 Pablo Cascon (Netronome) 2018-04-20 10:15:31 UTC

Thanks for the bug report. BZ 1558567 could potentially explain this. How is OVS configuring the PMD? Are jumbo frames involved.

Comment 4 Eelco Chaudron 2018-04-20 12:20:22 UTC

Pablo it looks like this issue is related to enabling the jumbo framing. However, it still should not crash. 

The odd thing it will only crash the first start after the configuration, not at any subsequent restart:

$ ovs-vsctl set Interface dpdk0 mtu_request=1500
$ systemctl restart openvswitch

>> Here we are running fine...

$ ovs-vsctl set Interface dpdk0 mtu_request=4600
$ systemctl stop openvswitch
$ systemctl start openvswitch

Now it will crash and create a core dump...

$ systemctl restart openvswitch
$ systemctl restart openvswitch
$ systemctl restart openvswitch
...
...

No more crashes or core dump :(

NOTE: This is all with traffic being sent towards the card!



In the sigterm case we see driver has data not been initialized:

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7f2ba57fa700 (LWP 8765)]
0x00005591bde708a4 in rte_eth_rx_burst (nb_pkts=32, rx_pkts=0x7f2ba57f9770, 
    queue_id=0, port_id=0)
    at /usr/src/debug/openvswitch-2.9.0/dpdk-17.11/x86_64-native-linuxapp-gcc/include/rte_ethdev.h:2897
2897		int16_t nb_rx = (*dev->rx_pkt_burst)(dev->data->rx_queues[queue_id],
(gdb) p dev->data->rx_queues
$1 = (void **) 0x0


In the restart after the crash its the same:

(gdb) p *rte_eth_devices[0]->data->rx_queues
Cannot access memory at address 0x0


Guess it's the poll list in OVS which is setup incorrect but only for the first restart after the configuration change...

Working:

(gdb) p poll_cnt
$1 = <optimized out>
(gdb) p poll_list[0]
$2 = {rxq = 0x5579894ece50, port_no = 1}
(gdb) p poll_list[1]
$3 = {rxq = 0x20, port_no = 37}


Crashing case:


(gdb) p poll_cnt
$8 = <optimized out>
(gdb) p poll_list[0]
$9 = {rxq = 0x5646ada61ac0, port_no = 1}
(gdb) p poll_list[1]
$10 = {rxq = 0x5646ad99d010, port_no = 3}


So in the crashing case, both drivers are added to the list of drivers to pull.

Why?

Comment 5 Jean-Tsung Hsiao 2018-04-20 13:01:28 UTC

Hi Eelco,

Yes, I did change mtu size for a dpdk interface from 1500 to 9000. Then, after restart, ovs-vswitchd crashed. 

Thanks!
Jean

Comment 6 Jean-Tsung Hsiao 2018-04-20 13:58:41 UTC

Here is my exact reproduction based on the command line history.

1. OVS-dpdk geneve tunnelling working with dpdk0 mtu=1500
2. [root@netqe7 jhsiao]# ovs-vsctl set interface dpdk0 mtu_request=9000
3. ping failed to work from both sides
4. On netqe7, "systemctl restart openvswitch"; But, failed, started to get cores

[root@netqe7 jhsiao]# systemctl restart openvswitch
A dependency job for openvswitch.service failed. See 'journalctl -xe' for details.
 
Apr 20 09:44:24 netqe7.knqe.lab.eng.bos.redhat.com ovs-vswitchd[12379]: ovs|00065|netdev_dpdk|ERR|Interface dpdk0(rxq:1 txq:3) configure
Apr 20 09:44:24 netqe7.knqe.lab.eng.bos.redhat.com ovs-vswitchd[12379]: ovs|00066|dpif_netdev|ERR|Failed to set interface dpdk0 new conf
Apr 20 09:44:24 netqe7.knqe.lab.eng.bos.redhat.com ovs-vswitchd[12379]: ovs|00073|dpdk|ERR|VHOST_CONFIG: failed to bind to /var/run/open
Apr 20 09:44:24 netqe7.knqe.lab.eng.bos.redhat.com ovs-vswitchd[12379]: ovs|00074|netdev_dpdk|ERR|rte_vhost_driver_start failed for vhos
Apr 20 09:44:25 netqe7.knqe.lab.eng.bos.redhat.com kernel: pmd54[12469]: segfault at 0 ip 0000561797fa08a4 sp 00007fa8497f16e0 error 4 i
Apr 20 09:44:25 netqe7.knqe.lab.eng.bos.redhat.com ovs-vswitchd[12378]: ovs|00002|daemon_unix|ERR|fork child died before signaling start
Apr 20 09:44:25 netqe7.knqe.lab.eng.bos.redhat.com ovs-vswitchd[12378]: ovs|00003|daemon_unix|EMER|could not detach from foreground sess
Apr 20 09:44:25 netqe7.knqe.lab.eng.bos.redhat.com ovs-ctl[12351]: Starting ovs-vswitchd ovs-vswitchd: could not detach from foreground 
Apr 20 09:44:25 netqe7.knqe.lab.eng.bos.redhat.com ovs-ctl[12351]: [FAILED]
Apr 20 09:44:25 netqe7.knqe.lab.eng.bos.redhat.com systemd[1]: ovs-vswitchd.service: control process exited, code=exited status=1
Apr 20 09:44:25 netqe7.knqe.lab.eng.bos.redhat.com systemd[1]: Failed to start Open vSwitch Forwarding Unit.
-- Subject: Unit ovs-vswitchd.service has failed

5. Core dumping continued until "systemctl stop openvswitch".

[root@netqe7 jhsiao]# ll /core.*
-rw-------. 1 root root 233848832 Apr 20 09:43 /core.12022
-rw-------. 1 root root 233906176 Apr 20 09:43 /core.12141
-rw-------. 1 root root 233902080 Apr 20 09:44 /core.12259
-rw-------. 1 root root 233902080 Apr 20 09:44 /core.12379
-rw-------. 1 root root 233902080 Apr 20 09:44 /core.12506
-rw-------. 1 root root 233873408 Apr 20 09:44 /core.12626
-rw-------. 1 root root 233902080 Apr 20 09:45 /core.12744
-rw-------. 1 root root 233902080 Apr 20 09:45 /core.12864
-rw-------. 1 root root 233902080 Apr 20 09:45 /core.12991
-rw-------. 1 root root 233902080 Apr 20 09:46 /core.13109

Comment 7 Jean-Tsung Hsiao 2018-04-20 14:16:16 UTC

Created attachment 1424523 [details]
ovs-vswitchd.log from restart to stop

A complete log from "systemctl restart openvswitch" to "systemctl stop openvswitch"

Comment 8 Eelco Chaudron 2018-04-26 11:17:07 UTC

This issue is not NFP driver related, but a general OVS infrastructure problem. I've sent the following patch upstream:

https://mail.openvswitch.org/pipermail/ovs-dev/2018-April/346480.html

Comment 9 Pablo Cascon (Netronome) 2018-04-26 16:07:25 UTC

Thanks Eelco for fixing this. Glad to see this is not NFP related. 

FWIW was able to replicate the issue quite straightforward with no patches. With the fix for BZ 1558567 (V3 in the OVS mailing list) can't seem to replicate even after a few  iterations of ovs restart. Perhaps the bug you are fixing let the other bug happen? Will try with your patch applied as well and report back here

Comment 10 Pablo Cascon (Netronome) 2018-04-27 16:46:06 UTC

Your fix helps here, thanks Eelco

Comment 11 Eelco Chaudron 2018-05-18 17:28:36 UTC

Send out v2 patch upstream:

https://mail.openvswitch.org/pipermail/ovs-dev/2018-May/347005.html

Comment 12 Eelco Chaudron 2018-06-01 13:45:47 UTC

The cause of the crash has been fixed by 2.9.0-41, BZ1578324.

The root cause, due to invalid failure handling has been committed to master, and backported to 2.8 and 2.9. 

https://github.com/openvswitch/ovs/commi/606f665072507497e663e41478a6b902fec416ac

This will be picked up when we re-align our FDP to 2.9.2