Bug 1569557
Summary: | OVS daemon got lots of segfaults caused by "fork child died before signaling startup" ERR | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Jean-Tsung Hsiao <jhsiao> | ||||
Component: | openvswitch | Assignee: | Eelco Chaudron <echaudro> | ||||
Status: | CLOSED UPSTREAM | QA Contact: | Jean-Tsung Hsiao <jhsiao> | ||||
Severity: | high | Docs Contact: | |||||
Priority: | high | ||||||
Version: | 7.5 | CC: | atragler, ctrautma, jhsiao, kzhang, louis.peens, pablo.cascon, pieter.jansenvanvuuren, qding, simon.horman, tredaelli | ||||
Target Milestone: | rc | ||||||
Target Release: | --- | ||||||
Hardware: | x86_64 | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | If docs needed, set a value | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2018-06-01 13:45:47 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
Jean-Tsung Hsiao
2018-04-19 13:33:14 UTC
From the core dump: (gdb) p dev->data $3 = (struct rte_eth_dev_data *) 0x7f8f7ffb9f90 (gdb) p *dev->data Cannot access memory at address 0x7f8f7ffb9f90 (gdb) info proc mappings Mapped address spaces: Start Addr End Addr Size Offset objfile 0x5646f10e3000 0x5646f1544000 0x461000 0x0 /usr/sbin/ovs-vswitchd 0x5646f1743000 0x5646f1760000 0x1d000 0x460000 /usr/sbin/ovs-vswitchd 0x5646f1760000 0x5646f17a3000 0x43000 0x47d000 /usr/sbin/ovs-vswitchd 0x7f8e80000000 0x7f8ec0000000 0x40000000 0x0 /dev/hugepages/rtemap_7 0x7f8ec0000000 0x7f8f00000000 0x40000000 0x0 /dev/hugepages/rtemap_6 0x7f8f00000000 0x7f8f40000000 0x40000000 0x0 /dev/hugepages/rtemap_5 0x7f8f40000000 0x7f8f80000000 0x40000000 0x0 /dev/hugepages/rtemap_4 The memory accessed is huge page memory, which is not included in the core dumps, as DPDK allocates it as shared. # cat /proc/$(pgrep ovs-vswitchd)/maps | grep rtemap 7f70c0000000-7f7100000000 rw-s 00000000 00:24 7276557 /dev/hugepages/rtemap_7 7f7100000000-7f7140000000 rw-s 00000000 00:24 7276556 /dev/hugepages/rtemap_6 7f7140000000-7f7180000000 rw-s 00000000 00:24 7276555 /dev/hugepages/rtemap_5 7f7180000000-7f71c0000000 rw-s 00000000 00:24 7276554 /dev/hugepages/rtemap_4 7f7500000000-7f7540000000 rw-s 00000000 00:24 7276553 /dev/hugepages/rtemap_3 7f7540000000-7f7580000000 rw-s 00000000 00:24 7276552 /dev/hugepages/rtemap_2 7f7580000000-7f75c0000000 rw-s 00000000 00:24 7276551 /dev/hugepages/rtemap_1 7f75c0000000-7f7600000000 rw-s 00000000 00:24 7225873 /dev/hugepages/rtemap_0 To include it in the next core dump you need to adjust the core dump flags in /proc/<pid>/coredump_filter. [root@netqe8 ~]# cat /proc/$(pgrep ovs-vswitchd)/coredump_filter 00000033 [root@netqe8 ~]# echo 0x73 > /proc/$(pgrep ovs-vswitchd)/coredump_filter [root@netqe8 ~]# cat /proc/$(pgrep ovs-vswitchd)/coredump_filter 00000073 Without the full memory dump it's even harder to see what data was corrupted. But it looks like some data passed to the driver: int16_t nb_rx = (*dev->rx_pkt_burst)(dev->data->rx_queues[queue_id], rx_pkts, nb_pkts); So dev->data->rx_queues[queue_id] -> queue_id = 0 (gdb) p *dev $2 = {rx_pkt_burst = 0x5646f1275db0 <nfp_net_recv_pkts>, tx_pkt_burst = 0x5646f1276fa0 <nfp_net_xmit_pkts>, tx_pkt_prepare = 0x0, data = 0x7f8f7ffb9f90, dev_ops = 0x5646f174ff40 <nfp_net_eth_dev_ops>, device = 0x5646f304f0d0, intr_handle = 0x5646f304f1a8, link_intr_cbs = {tqh_first = 0x0, tqh_last = 0x5646f1a6f1b8 <rte_eth_devices+33080>}, post_rx_burst_cbs = {0x0 <repeats 1024 times>}, pre_tx_burst_cbs = {0x0 <repeats 1024 times>}, state = RTE_ETH_DEV_ATTACHED, security_ctx = 0x0} Please find a reproducer with saving shared hugepages enabled. You might need to add coredump_filter=0x73 to the kernel command line. Thanks for the bug report. BZ 1558567 could potentially explain this. How is OVS configuring the PMD? Are jumbo frames involved. Pablo it looks like this issue is related to enabling the jumbo framing. However, it still should not crash.
The odd thing it will only crash the first start after the configuration, not at any subsequent restart:
$ ovs-vsctl set Interface dpdk0 mtu_request=1500
$ systemctl restart openvswitch
>> Here we are running fine...
$ ovs-vsctl set Interface dpdk0 mtu_request=4600
$ systemctl stop openvswitch
$ systemctl start openvswitch
Now it will crash and create a core dump...
$ systemctl restart openvswitch
$ systemctl restart openvswitch
$ systemctl restart openvswitch
...
...
No more crashes or core dump :(
NOTE: This is all with traffic being sent towards the card!
In the sigterm case we see driver has data not been initialized:
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7f2ba57fa700 (LWP 8765)]
0x00005591bde708a4 in rte_eth_rx_burst (nb_pkts=32, rx_pkts=0x7f2ba57f9770,
queue_id=0, port_id=0)
at /usr/src/debug/openvswitch-2.9.0/dpdk-17.11/x86_64-native-linuxapp-gcc/include/rte_ethdev.h:2897
2897 int16_t nb_rx = (*dev->rx_pkt_burst)(dev->data->rx_queues[queue_id],
(gdb) p dev->data->rx_queues
$1 = (void **) 0x0
In the restart after the crash its the same:
(gdb) p *rte_eth_devices[0]->data->rx_queues
Cannot access memory at address 0x0
Guess it's the poll list in OVS which is setup incorrect but only for the first restart after the configuration change...
Working:
(gdb) p poll_cnt
$1 = <optimized out>
(gdb) p poll_list[0]
$2 = {rxq = 0x5579894ece50, port_no = 1}
(gdb) p poll_list[1]
$3 = {rxq = 0x20, port_no = 37}
Crashing case:
(gdb) p poll_cnt
$8 = <optimized out>
(gdb) p poll_list[0]
$9 = {rxq = 0x5646ada61ac0, port_no = 1}
(gdb) p poll_list[1]
$10 = {rxq = 0x5646ad99d010, port_no = 3}
So in the crashing case, both drivers are added to the list of drivers to pull.
Why?
Hi Eelco, Yes, I did change mtu size for a dpdk interface from 1500 to 9000. Then, after restart, ovs-vswitchd crashed. Thanks! Jean Here is my exact reproduction based on the command line history. 1. OVS-dpdk geneve tunnelling working with dpdk0 mtu=1500 2. [root@netqe7 jhsiao]# ovs-vsctl set interface dpdk0 mtu_request=9000 3. ping failed to work from both sides 4. On netqe7, "systemctl restart openvswitch"; But, failed, started to get cores [root@netqe7 jhsiao]# systemctl restart openvswitch A dependency job for openvswitch.service failed. See 'journalctl -xe' for details. Apr 20 09:44:24 netqe7.knqe.lab.eng.bos.redhat.com ovs-vswitchd[12379]: ovs|00065|netdev_dpdk|ERR|Interface dpdk0(rxq:1 txq:3) configure Apr 20 09:44:24 netqe7.knqe.lab.eng.bos.redhat.com ovs-vswitchd[12379]: ovs|00066|dpif_netdev|ERR|Failed to set interface dpdk0 new conf Apr 20 09:44:24 netqe7.knqe.lab.eng.bos.redhat.com ovs-vswitchd[12379]: ovs|00073|dpdk|ERR|VHOST_CONFIG: failed to bind to /var/run/open Apr 20 09:44:24 netqe7.knqe.lab.eng.bos.redhat.com ovs-vswitchd[12379]: ovs|00074|netdev_dpdk|ERR|rte_vhost_driver_start failed for vhos Apr 20 09:44:25 netqe7.knqe.lab.eng.bos.redhat.com kernel: pmd54[12469]: segfault at 0 ip 0000561797fa08a4 sp 00007fa8497f16e0 error 4 i Apr 20 09:44:25 netqe7.knqe.lab.eng.bos.redhat.com ovs-vswitchd[12378]: ovs|00002|daemon_unix|ERR|fork child died before signaling start Apr 20 09:44:25 netqe7.knqe.lab.eng.bos.redhat.com ovs-vswitchd[12378]: ovs|00003|daemon_unix|EMER|could not detach from foreground sess Apr 20 09:44:25 netqe7.knqe.lab.eng.bos.redhat.com ovs-ctl[12351]: Starting ovs-vswitchd ovs-vswitchd: could not detach from foreground Apr 20 09:44:25 netqe7.knqe.lab.eng.bos.redhat.com ovs-ctl[12351]: [FAILED] Apr 20 09:44:25 netqe7.knqe.lab.eng.bos.redhat.com systemd[1]: ovs-vswitchd.service: control process exited, code=exited status=1 Apr 20 09:44:25 netqe7.knqe.lab.eng.bos.redhat.com systemd[1]: Failed to start Open vSwitch Forwarding Unit. -- Subject: Unit ovs-vswitchd.service has failed 5. Core dumping continued until "systemctl stop openvswitch". [root@netqe7 jhsiao]# ll /core.* -rw-------. 1 root root 233848832 Apr 20 09:43 /core.12022 -rw-------. 1 root root 233906176 Apr 20 09:43 /core.12141 -rw-------. 1 root root 233902080 Apr 20 09:44 /core.12259 -rw-------. 1 root root 233902080 Apr 20 09:44 /core.12379 -rw-------. 1 root root 233902080 Apr 20 09:44 /core.12506 -rw-------. 1 root root 233873408 Apr 20 09:44 /core.12626 -rw-------. 1 root root 233902080 Apr 20 09:45 /core.12744 -rw-------. 1 root root 233902080 Apr 20 09:45 /core.12864 -rw-------. 1 root root 233902080 Apr 20 09:45 /core.12991 -rw-------. 1 root root 233902080 Apr 20 09:46 /core.13109 Created attachment 1424523 [details]
ovs-vswitchd.log from restart to stop
A complete log from "systemctl restart openvswitch" to "systemctl stop openvswitch"
This issue is not NFP driver related, but a general OVS infrastructure problem. I've sent the following patch upstream: https://mail.openvswitch.org/pipermail/ovs-dev/2018-April/346480.html Thanks Eelco for fixing this. Glad to see this is not NFP related. FWIW was able to replicate the issue quite straightforward with no patches. With the fix for BZ 1558567 (V3 in the OVS mailing list) can't seem to replicate even after a few iterations of ovs restart. Perhaps the bug you are fixing let the other bug happen? Will try with your patch applied as well and report back here Your fix helps here, thanks Eelco Send out v2 patch upstream: https://mail.openvswitch.org/pipermail/ovs-dev/2018-May/347005.html The cause of the crash has been fixed by 2.9.0-41, BZ1578324. The root cause, due to invalid failure handling has been committed to master, and backported to 2.8 and 2.9. https://github.com/openvswitch/ovs/commi/606f665072507497e663e41478a6b902fec416ac This will be picked up when we re-align our FDP to 2.9.2 |