1821185 – ovs-vswitchd crashes with segmentation fault

The FDP team is no longer accepting new bugs in Bugzilla. Please report your issues under FDP project in Jira. Thanks.

Bug 1821185 - ovs-vswitchd crashes with segmentation fault

Summary: ovs-vswitchd crashes with segmentation fault

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	Red Hat Enterprise Linux Fast Datapath
Classification:	Red Hat
Component:	openvswitch2.13
Sub Component:
Version:	FDP 20.C
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	urgent
Target Milestone:	---
Target Release:	---
Assignee:	Eelco Chaudron
QA Contact:	ovs-qe
Docs Contact:
URL:
Whiteboard:
Duplicates (3):	1823174 1823178 1824847 (view as bug list)
Depends On:
Blocks:	1825334
TreeView+	depends on / blocked

Reported:	2020-04-06 09:20 UTC by Jakub Libosvar
Modified:	2020-05-04 07:47 UTC (History)
CC List:	12 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Clones:	1825334 (view as bug list)
Environment:
Last Closed:	2020-05-04 07:47:50 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
core dump (1.85 MB, application/x-lz4) 2020-04-06 09:20 UTC, Jakub Libosvar	no flags	Details
logs in debug mode (3.87 MB, text/plain) 2020-04-09 08:32 UTC, Jakub Libosvar	no flags	Details
OVS conf.db from an OVN reproducer (47.09 KB, text/plain) 2020-04-21 15:01 UTC, Dumitru Ceara	no flags	Details
OVS flows generated by OVN reproducer (574.91 KB, text/plain) 2020-04-21 15:03 UTC, Dumitru Ceara	no flags	Details
View All

Description Jakub Libosvar 2020-04-06 09:20:23 UTC

Created attachment 1676523 [details]
core dump

Description of problem:
I tried to verify ovs2.13 running with OSP16 OVN driver and it crashes during tempest testing. Compressed core file is attached.

Version-Release number of selected component (if applicable):
openvswitch2.13-2.13.0-9.el8fdp.x86_64

How reproducible:
Always

Steps to Reproduce:
Unknown yet

Actual results:


Expected results:


Additional info:
I'll try to provide exact steps on how to reproduce the crash. The core dump is from controller, there are 3 controllers in the setup, 2 crash at the same time and 1 a little bit later - could be related to failover.

Comment 1 Jakub Libosvar 2020-04-06 09:36:56 UTC

Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `ovs-vswitchd unix:/var/run/openvswitch/db.sock -vconsole:emer -vsyslog:err -vfi'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x0000561c797d7d01 in classifier_lookup__ (cls=0x561c7bcda5c8, version=version@entry=1618, flow=flow@entry=0x7fa982a32580, wc=wc@entry=0x7fa982a57670,
    allow_conjunctive_matches=allow_conjunctive_matches@entry=true) at ../lib/classifier.c:941
941     {
[Current thread is 1 (Thread 0x7fa982a78700 (LWP 636127))]


Thread 3 (Thread 0x7fa982475700 (LWP 636131)):
#0  0x00007fa98579b2f5 in __pthread_mutex_unlock_usercnt (mutex=mutex@entry=0x561c7be4bc08, decr=decr@entry=1) at pthread_mutex_unlock.c:56
#1  0x00007fa98579b3ce in __GI___pthread_mutex_unlock (mutex=mutex@entry=0x561c7be4bc08) at pthread_mutex_unlock.c:356
#2  0x0000561c7987a68f in ovs_mutex_unlock (l_=l_@entry=0x561c7be4bc08) at ../lib/ovs-thread.c:130
#3  0x0000561c7979f63d in rule_dpif_credit_stats (rule=rule@entry=0x561c7be4bae0, stats=<optimized out>, offloaded=offloaded@entry=false) at ../ofproto/ofproto-dpif.c:4247
#4  0x0000561c797b7e1e in xlate_recursively (actions_xlator=0x561c797bd8c0 <do_xlate_actions>, is_last_action=false, deepens=<optimized out>, rule=0x561c7be4bae0, ctx=0x7fa98242e6e0)
    at ../ofproto/ofproto-dpif-xlate.c:4296
#5  xlate_table_action (xlator=0x561c797bd8c0 <do_xlate_actions>, is_last_action=false, with_ct_orig=<optimized out>, honor_table_miss=<optimized out>, may_packet_in=<optimized out>, table_id=<optimized out>,
    in_port=<optimized out>, ctx=0x7fa98242e6e0) at ../ofproto/ofproto-dpif-xlate.c:4434
#6  xlate_table_action (ctx=0x7fa98242e6e0, in_port=<optimized out>, table_id=<optimized out>, may_packet_in=<optimized out>, honor_table_miss=<optimized out>, with_ct_orig=<optimized out>,
    is_last_action=false, xlator=0x561c797bd8c0 <do_xlate_actions>) at ../ofproto/ofproto-dpif-xlate.c:4379
#7  0x0000561c797bed1d in xlate_ofpact_resubmit (is_last_action=<optimized out>, resubmit=0x561c7bd1b158, ctx=0x7fa98242e6e0) at ../ofproto/ofproto-dpif-xlate.c:4745
#8  do_xlate_actions (ofpacts=<optimized out>, ofpacts_len=<optimized out>, ctx=<optimized out>, is_last_action=<optimized out>, group_bucket_action=<optimized out>) at ../ofproto/ofproto-dpif-xlate.c:6881
#9  0x0000561c797b7e5e in xlate_recursively (actions_xlator=0x561c797bd8c0 <do_xlate_actions>, is_last_action=false, deepens=<optimized out>, rule=0x561c7bd81300, ctx=0x7fa98242e6e0)
    at ../ofproto/ofproto-dpif-xlate.c:4305
#10 xlate_table_action (xlator=0x561c797bd8c0 <do_xlate_actions>, is_last_action=false, with_ct_orig=<optimized out>, honor_table_miss=<optimized out>, may_packet_in=<optimized out>, table_id=<optimized out>,
    in_port=<optimized out>, ctx=0x7fa98242e6e0) at ../ofproto/ofproto-dpif-xlate.c:4434
#11 xlate_table_action (ctx=0x7fa98242e6e0, in_port=<optimized out>, table_id=<optimized out>, may_packet_in=<optimized out>, honor_table_miss=<optimized out>, with_ct_orig=<optimized out>,
    is_last_action=false, xlator=0x561c797bd8c0 <do_xlate_actions>) at ../ofproto/ofproto-dpif-xlate.c:4379
#12 0x0000561c797bed1d in xlate_ofpact_resubmit (is_last_action=<optimized out>, resubmit=0x561c7bc59928, ctx=0x7fa98242e6e0) at ../ofproto/ofproto-dpif-xlate.c:4745
#13 do_xlate_actions (ofpacts=<optimized out>, ofpacts_len=<optimized out>, ctx=<optimized out>, is_last_action=<optimized out>, group_bucket_action=<optimized out>) at ../ofproto/ofproto-dpif-xlate.c:6881
#14 0x0000561c797b7e5e in xlate_recursively (actions_xlator=0x561c797bd8c0 <do_xlate_actions>, is_last_action=false, deepens=<optimized out>, rule=0x561c7be04a10, ctx=0x7fa98242e6e0)
    at ../ofproto/ofproto-dpif-xlate.c:4305

... snip ...

#9043 xlate_table_action (xlator=0x561c797bd8c0 <do_xlate_actions>, is_last_action=false, with_ct_orig=<optimized out>, honor_table_miss=<optimized out>, may_packet_in=<optimized out>, table_id=<optimized out>, in_port=<optimized out>, ctx=0x7fa982a316e0) at ../ofproto/ofproto-dpif-xlate.c:4434
#9044 xlate_table_action (ctx=0x7fa982a316e0, in_port=<optimized out>, table_id=<optimized out>, may_packet_in=<optimized out>, honor_table_miss=<optimized out>, with_ct_orig=<optimized out>, is_last_action=false, xlator=0x561c797bd8c0 <do_xlate_actions>) at ../ofproto/ofproto-dpif-xlate.c:4379
#9045 0x0000561c797bed1d in xlate_ofpact_resubmit (is_last_action=<optimized out>, resubmit=0x561c7be48848, ctx=0x7fa982a316e0) at ../ofproto/ofproto-dpif-xlate.c:4745
#9046 do_xlate_actions (ofpacts=<optimized out>, ofpacts_len=<optimized out>, ctx=<optimized out>, is_last_action=<optimized out>, group_bucket_action=<optimized out>) at ../ofproto/ofproto-dpif-xlate.c:6881
#9047 0x0000561c797c1b41 in clone_xlate_actions (actions=0x561c7be487c8, actions_len=144, ctx=0x7fa982a316e0, is_last_action=<optimized out>, group_bucket_action=<optimized out>) at ../ofproto/ofproto-dpif-xlate.c:5714
#9048 0x0000561c797b7e5e in xlate_recursively (actions_xlator=0x561c797c1950 <clone_xlate_actions>, is_last_action=true, deepens=<optimized out>, rule=0x561c7be48620, ctx=0x7fa982a316e0) at ../ofproto/ofproto-dpif-xlate.c:4305
#9049 xlate_table_action (xlator=0x561c797c1950 <clone_xlate_actions>, is_last_action=true, with_ct_orig=<optimized out>, honor_table_miss=<optimized out>, may_packet_in=<optimized out>, table_id=<optimized out>, in_port=<optimized out>, ctx=0x7fa982a316e0) at ../ofproto/ofproto-dpif-xlate.c:4434
#9050 xlate_table_action (ctx=0x7fa982a316e0, in_port=<optimized out>, table_id=<optimized out>, may_packet_in=<optimized out>, honor_table_miss=<optimized out>, with_ct_orig=<optimized out>, is_last_action=true, xlator=0x561c797c1950 <clone_xlate_actions>) at ../ofproto/ofproto-dpif-xlate.c:4379
#9051 0x0000561c797c1371 in patch_port_output (ctx=ctx@entry=0x7fa982a316e0, out_dev=0x561c7be81bf0, in_dev=<optimized out>, in_dev=<optimized out>) at ../ofproto/ofproto-dpif-xlate.c:3858
#9052 0x0000561c797ba843 in compose_output_action__ (ctx=ctx@entry=0x7fa982a316e0, ofp_port=2, xr=0x0, check_stp=check_stp@entry=true, truncate=truncate@entry=false, is_last_action=false) at ../ofproto/ofproto-dpif-xlate.c:4138
#9053 0x0000561c797bc24d in compose_output_action (truncate=false, is_last_action=false, xr=<optimized out>, ofp_port=<optimized out>, ctx=0x7fa982a316e0) at ../ofproto/ofproto-dpif-xlate.c:4282
#9054 output_normal (ctx=0x7fa982a316e0, out_xbundle=<optimized out>, xvlan=<optimized out>) at ../ofproto/ofproto-dpif-xlate.c:2484
#9055 0x0000561c797bc81e in xlate_normal_flood (ctx=ctx@entry=0x7fa982a316e0, in_xbundle=in_xbundle@entry=0x561c7be80ee0, xvlan=xvlan@entry=0x7fa982a3068c) at ../ofproto/ofproto-dpif-xlate.c:2925
#9056 0x0000561c797bd2b8 in xlate_normal (ctx=0x7fa982a316e0) at ../ofproto/ofproto-dpif-xlate.c:3166
#9057 xlate_output_action (ctx=ctx@entry=0x7fa982a316e0, port=<optimized out>, controller_len=<optimized out>, may_packet_in=may_packet_in@entry=true, is_last_action=<optimized out>, truncate=truncate@entry=false, group_bucket_action=false) at ../ofproto/ofproto-dpif-xlate.c:5190
#9058 0x0000561c797bdbf0 in do_xlate_actions (ofpacts=<optimized out>, ofpacts_len=<optimized out>, ctx=<optimized out>, is_last_action=<optimized out>, group_bucket_action=<optimized out>) at ../include/openvswitch/ofp-actions.h:1302
#9059 0x0000561c797c3b23 in xlate_actions (xin=xin@entry=0x7fa982a32570, xout=xout@entry=0x7fa982a57618) at ../ofproto/ofproto-dpif-xlate.c:7699
#9060 0x0000561c797b2956 in upcall_xlate (wc=0x7fa982a57670, odp_actions=0x7fa982a57630, upcall=0x7fa982a575b0, udpif=0x561c7bcaf420) at ../ofproto/ofproto-dpif-upcall.c:1204
#9061 process_upcall (udpif=udpif@entry=0x561c7bcaf420, upcall=upcall@entry=0x7fa982a575b0, odp_actions=odp_actions@entry=0x7fa982a57630, wc=wc@entry=0x7fa982a57670) at ../ofproto/ofproto-dpif-upcall.c:1420
#9062 0x0000561c797b3553 in recv_upcalls (handler=<optimized out>, handler=<optimized out>) at ../ofproto/ofproto-dpif-upcall.c:842
#9063 0x0000561c797b3a1c in udpif_upcall_handler (arg=0x561c7bd12040) at ../ofproto/ofproto-dpif-upcall.c:759
#9064 0x0000561c7987b1d3 in ovsthread_wrapper (aux_=<optimized out>) at ../lib/ovs-thread.c:383
#9065 0x00007fa9857972de in start_thread (arg=<optimized out>) at pthread_create.c:486
#9066 0x00007fa984c09133 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Comment 2 Jakub Libosvar 2020-04-09 08:32:32 UTC

Created attachment 1677473 [details]
logs in debug mode

Attaching ovs-vswitchd logs in debug mode, the log ends at the time of the crash

Comment 3 Eelco Chaudron 2020-04-15 13:53:37 UTC

Here we go with a full analysis of the core dump:

First, this SIGSEGV is caused by the system stack being exhausted:

  (gdb) frame 0
  #0  0x0000561c797d7d01 in classifier_lookup__ (cls=0x561c7bcda5c8, version=version@entry=1618, flow=flow@entry=0x7fa982a32580, wc=wc@entry=0x7fa982a57670, 
      allow_conjunctive_matches=allow_conjunctive_matches@entry=true) at ../lib/classifier.c:941
  941	{

  (gdb) display/i $pc
  1: x/i $pc
  => 0x561c797d7d01 <classifier_lookup__+17>:	mov    %rcx,0x30(%rsp)

  (gdb) p $rsp
  $21 = (void *) 0x7fa982878ea0

  (gdb) p $rsp + 0x30
  $22 = (void *) 0x7fa982878ed0

  (gdb) maintenance info sections
  ...
   [88]     0x7fa982677000->0x7fa982677000 at 0x020ec000: load43 ALLOC READONLY
   [89]     0x7fa982678000->0x7fa982878000 at 0x020ec000: load44 ALLOC LOAD HAS_CONTENTS
   [90]     0x7fa982878000->0x7fa982878000 at 0x022ec000: load45 ALLOC READONLY
   [91]     0x7fa982879000->0x7fa982a79000 at 0x022ec000: load46 ALLOC LOAD HAS_CONTENTS
   [92]     0x7fa982a79000->0x7fa982a79000 at 0x024ec000: load47 ALLOC READONLY

So the other odd thing was that we see a lot of stack frame depth, 9k.
This looks like we have a lot of recirculation happening during the flow lookup.
This is confirmed looking at the context data:

  (gdb) frame 15
  #15 xlate_table_action (ctx=0x7fa982a316e0, in_port=<optimized out>, table_id=<optimized out>, may_packet_in=<optimized out>, honor_table_miss=<optimized out>, with_ct_orig=<optimized out>, is_last_action=false, 
      xlator=0x561c797bd8c0 <do_xlate_actions>) at ../ofproto/ofproto-dpif-xlate.c:4379
  4379	xlate_table_action(struct xlate_ctx *ctx, ofp_port_t in_port, uint8_t table_id,
  (gdb) p ctx->resubmits
  $25 = 2657

The OVS code has multiple protections against recirculation, see the xlate_resubmit_resource_check() function.
But we do not meat any of them yet.

  ctx->depth >= MAX_DEPTH           [64]
  ctx->resubmits >= MAX_RESUBMITS   [MAX_DEPTH * MAX_DEPTH = 4096]
  ctx->odp_actions->size > UINT16_MAX
  ctx->stack.size >= 65536

  (gdb) p ctx->resubmits
  $25 = 2657
  (gdb) p ctx->depth
  $26 = 59
  (gdb) p ctx->resubmits
  $27 = 2657
  (gdb) p ctx->odp_actions->size
  $28 = 344
  (gdb) p ctx->stack.size
  $29 = 87

So up until we ran out of stack size, we are still doing ok...

Looking at one of the most common stack frame sequence:

  #9040 0x0000561c797bed1d in xlate_ofpact_resubmit (is_last_action=<optimized out>, resubmit=0x561c7bd5e678, ctx=0x7fa982a316e0) at ../ofproto/ofproto-dpif-xlate.c:4745
  #9041 do_xlate_actions (ofpacts=<optimized out>, ofpacts_len=<optimized out>, ctx=<optimized out>, is_last_action=<optimized out>, group_bucket_action=<optimized out>) at ../ofproto/ofproto-dpif-xlate.c:6881
  #9042 0x0000561c797b7e5e in xlate_recursively (actions_xlator=0x561c797bd8c0 <do_xlate_actions>, is_last_action=false, deepens=<optimized out>, rule=0x561c7bdaebb0, ctx=0x7fa982a316e0) at ../ofproto/ofproto-dpif-xlate.c:4305
  #9043 xlate_table_action (xlator=0x561c797bd8c0 <do_xlate_actions>, is_last_action=false, with_ct_orig=<optimized out>, honor_table_miss=<optimized out>, may_packet_in=<optimized out>, table_id=<optimized out>, in_port=<optimized out>, ctx=0x7fa982a316e0) at ../ofproto/ofproto-dpif-xlate.c:4434
  #9044 xlate_table_action (ctx=0x7fa982a316e0, in_port=<optimized out>, table_id=<optimized out>, may_packet_in=<optimized out>, honor_table_miss=<optimized out>, with_ct_orig=<optimized out>, is_last_action=false, xlator=0x561c797bd8c0 <do_xlate_actions>) at ../ofproto/ofproto-dpif-xlate.c:4379

This eats around 846 bytes and is repeated around 1833 times, which gives a total of 1.5M of stack space.
RHEL configures 2M of stack space per thread trough systemd, based on this patch from Flavio:

    [ovs-dev] [PATCH] rhel: limit stack size to 2M.
    The default stack size in Fedora/RHEL is 8M, which means when ovs-vswitchd
    daemon starts and uses --mlockall (default), it will dirty all memory
    regions for all threads which is proportionally to the number of CPUs.

    On a big host this increases memory usage to many hundreds of megabytes
    while OVS actually requires much less.

    This patch relies on systemd to limit to 2M/thread. That is much more
    than the minimum documented at function ovs_thread_create():

        /* Some small systems use a default stack size as small as 80 kB, but OVS
         * requires approximately 384 kB according to the following analysis:
         * https://mail.openvswitch.org/pipermail/ovs-dev/2016-January/308592.html
         *
         * We use 512 kB to give us some margin of error. */

    Signed-off-by: Flavio Leitner <fbl at sysclose.org>
    ---
     rhel/usr_lib_systemd_system_ovs-vswitchd.service.in | 1 +
     1 file changed, 1 insertion(+)

    diff --git a/rhel/usr_lib_systemd_system_ovs-vswitchd.service.in b/rhel/usr_lib_systemd_system_ovs-vswitchd.service.in
    index 525deae0b..317aa993c 100644
    --- a/rhel/usr_lib_systemd_system_ovs-vswitchd.service.in
    +++ b/rhel/usr_lib_systemd_system_ovs-vswitchd.service.in
    @@ -14,6 +14,7 @@ Environment=XDG_RUNTIME_DIR=/var/run/openvswitch
     EnvironmentFile=/etc/openvswitch/default.conf
     EnvironmentFile=-/etc/sysconfig/openvswitch
     EnvironmentFile=-/run/openvswitch/useropts
    +LimitSTACK=2M
     @begin_dpdk@
     ExecStartPre=-/bin/sh -c '/usr/bin/chown :$${OVS_USER_ID##*:} /dev/hugepages'
     ExecStartPre=-/usr/bin/chmod 0775 /dev/hugepages
    -- 
    2.20.1

The packet is received by the OVS thread is an MISS_UPCALL:

  (gdb) p upcall->type
  $1 = MISS_UPCALL

The packet itself is a IPv6 Multicast packet:

  (gdb) p upcall->packet->source
  $8 = DPBUF_STUB

  (gdb) p upcall->packet->mbuf.buf_addr + upcall->packet->mbuf.data_off
  $5 = (void *) 0x7fa982a37e60

  (gdb) p upcall->packet->mbuf.pkt_len
  $6 = 78

  (gdb)  x /78b 0x7fa982a37e60
  0x7fa982a37e60:	0x33	0x33	0xff	0x91	0x13	0x6e	0xfa	0x16
  0x7fa982a37e68:	0x3e	0xca	0x8c	0x8d	0x86	0xdd	0x60	0x00
  0x7fa982a37e70:	0x00	0x00	0x00	0x18	0x3a	0x8e	0x00	0x00
  0x7fa982a37e78:	0x00	0x00	0x00	0x00	0x00	0x00	0x00	0x00
  0x7fa982a37e80:	0x00	0x00	0x00	0x00	0x00	0x00	0xff	0x02
  0x7fa982a37e88:	0x00	0x00	0x00	0x00	0x00	0x00	0x00	0x00
  0x7fa982a37e90:	0x00	0x01	0xff	0x91	0x13	0x6e	0x87	0x00
  0x7fa982a37e98:	0xef	0xd8	0x00	0x00	0x00	0x00	0x20	0x01
  0x7fa982a37ea0:	0x0d	0xb8	0x00	0x00	0x00	0x01	0xf8	0x16
  0x7fa982a37ea8:	0x3e	0xff	0xfe	0x91	0x13	0x6e

  hex2pcapRAW "
      0x33	0x33	0xff	0x91	0x13	0x6e	0xfa	0x16
      0x3e	0xca	0x8c	0x8d	0x86	0xdd	0x60	0x00
      0x00	0x00	0x00	0x18	0x3a	0x8e	0x00	0x00
      0x00	0x00	0x00	0x00	0x00	0x00	0x00	0x00
      0x00	0x00	0x00	0x00	0x00	0x00	0xff	0x02
      0x00	0x00	0x00	0x00	0x00	0x00	0x00	0x00
      0x00	0x01	0xff	0x91	0x13	0x6e	0x87	0x00
      0xef	0xd8	0x00	0x00	0x00	0x00	0x20	0x01
      0x0d	0xb8	0x00	0x00	0x00	0x01	0xf8	0x16
      0x3e	0xff	0xfe	0x91	0x13	0x6e
  "

  Frame 1: 78 bytes on wire (624 bits), 78 bytes captured (624 bits)
      [Protocols in frame: eth:ethertype:ipv6:icmpv6]
  Ethernet II, Src: fa:16:3e:ca:8c:8d (fa:16:3e:ca:8c:8d), Dst: IPv6mcast_ff:91:13:6e (33:33:ff:91:13:6e)
      Destination: IPv6mcast_ff:91:13:6e (33:33:ff:91:13:6e)
          Address: IPv6mcast_ff:91:13:6e (33:33:ff:91:13:6e)
          .... ..1. .... .... .... .... = LG bit: Locally administered address (this is NOT the factory default)
          .... ...1 .... .... .... .... = IG bit: Group address (multicast/broadcast)
      Source: fa:16:3e:ca:8c:8d (fa:16:3e:ca:8c:8d)
          Address: fa:16:3e:ca:8c:8d (fa:16:3e:ca:8c:8d)
          .... ..1. .... .... .... .... = LG bit: Locally administered address (this is NOT the factory default)
          .... ...0 .... .... .... .... = IG bit: Individual address (unicast)
      Type: IPv6 (0x86dd)
  Internet Protocol Version 6, Src: ::, Dst: ff02::1:ff91:136e
      0110 .... = Version: 6
      .... 0000 0000 .... .... .... .... .... = Traffic Class: 0x00 (DSCP: CS0, ECN: Not-ECT)
          .... 0000 00.. .... .... .... .... .... = Differentiated Services Codepoint: Default (0)
          .... .... ..00 .... .... .... .... .... = Explicit Congestion Notification: Not ECN-Capable Transport (0)
      .... .... .... 0000 0000 0000 0000 0000 = Flow Label: 0x00000
      Payload Length: 24
      Next Header: ICMPv6 (58)
      Hop Limit: 142
      Source: ::
      Destination: ff02::1:ff91:136e
  Internet Control Message Protocol v6
      Type: Neighbor Solicitation (135)
      Code: 0
      Checksum: 0xefd8 [correct]
      [Checksum Status: Good]
      Reserved: 00000000
      Target Address: 2001:db8::1:f816:3eff:fe91:136e

Looking at some of the other information, we notice the packet ingresses on br-ext, port ens5, and we loop into sending the packet to the patch port patch-provnet-d9475c4f-b78d-43c8-8bae-31efb873b196-to-br-int.

Now the questions are:

1) It looks like a 2M stack is not enough to support 4K recirculations. Do we need to double the maximum stack size, or decrease the max number of iterations?
2) Is the OVN pipeline valid in the case of this crash, i.e. are we facing a recirculation loop, or is this a valid 2K+ recirculation?

Next actions:

- Engineering to take a look at 1)
- QA to replicate the issue again (or at least get the setup in a state where the crash previously happened) and capture the output of the OVS and OVN flow tables. With this, we can determine if the recirculation is supposed to happen or not.
If the setup ready, you can try to force the core by sending the above packet into interface ens5.

If you have the setup ready in a state where you can replicate it, please keep it so we can do some hands-on debugging.

Comment 4 Eelco Chaudron 2020-04-15 13:54:56 UTC

Jakub please take a look at the second next action, and let me know when you can do this.

Comment 6 Dumitru Ceara 2020-04-15 15:00:13 UTC

Jakub, could you please also attach the NB/SB/OVS databases and/or sos reports?

This way we can confirm whether the OVN pipeline is correct or if somehow we end up looping for specific packets like the multicast one in https://bugzilla.redhat.com/show_bug.cgi?id=1821185#c3 above.

Thanks,
Dumitru

Comment 7 Eelco Chaudron 2020-04-16 07:53:10 UTC

One more question, you mention this is a regression, can you let us know what previous version of OVN and OVS where working on OSP16 with this test? Also was this the same version of the OSP16 OVN plugin, or was this a different version also?

Comment 8 Jakub Libosvar 2020-04-16 08:06:57 UTC

(In reply to Eelco Chaudron from comment #3)
> ...
> 
> Now the questions are:
> 
> 2) Is the OVN pipeline valid in the case of this crash, i.e. are we facing a
> recirculation loop, or is this a valid 2K+ recirculation?

I didn't have a chance to observe the flows. I will capture the flows every second and then I can compare if the recirc is valid.

> 
> Next actions:
> 
> - Engineering to take a look at 1)
> - QA to replicate the issue again (or at least get the setup in a state

QA are not involved in testing, I will try to reproduce.

> 
> If you have the setup ready in a state where you can replicate it, please
> keep it so we can do some hands-on debugging.

(In reply to Dumitru Ceara from comment #6)
> Jakub, could you please also attach the NB/SB/OVS databases and/or sos
> reports?

I will once I have the env back up and running. The problem is that I can't reproduce easily and it happens when running multiple tests in parallel, so I don't know yet what exactly causes the crash.
> 
> This way we can confirm whether the OVN pipeline is correct or if somehow we
> end up looping for specific packets like the multicast one in
> https://bugzilla.redhat.com/show_bug.cgi?id=1821185#c3 above.
> 
> Thanks,
> Dumitru


(In reply to Eelco Chaudron from comment #7)
> One more question, you mention this is a regression, can you let us know
> what previous version of OVN and OVS where working on OSP16 with this test?

The crash doesn't happen with openvswitch2.11-2.11.0-48.el8fdp.x86_64


> Also was this the same version of the OSP16 OVN plugin, or was this a
> different version also?

OSP16 networking-ovn hasn't changed. However OVN itself has also changed to ovn2.13 so I can try to test ovs2.11 with ovn2.13 and ovs2.13 with ovn2.11 to see if it crashes in any case.

Comment 9 Eelco Chaudron 2020-04-16 10:03:11 UTC

FYI I found the that lead to scaling down the stacksize: 

https://bugzilla.redhat.com/show_bug.cgi?id=1572797

Comment 10 Roman Safronov 2020-04-16 12:16:22 UTC

*** Bug 1823178 has been marked as a duplicate of this bug. ***

Comment 11 Roman Safronov 2020-04-16 12:19:40 UTC

*** Bug 1823174 has been marked as a duplicate of this bug. ***

Comment 12 Daniel Alvarez Sanchez 2020-04-17 12:38:05 UTC

*** Bug 1824847 has been marked as a duplicate of this bug. ***

Comment 13 Jakub Libosvar 2020-04-17 15:01:56 UTC

It seems the crash is caused by one of IPv6 router advertisements getting into a loop on the provider network. At one point, there are about 116 000 same RA packets within a second or two:

12:34:06.821480 fa:16:3e:7e:af:34 > 33:33:00:00:00:01, ethertype IPv6 (0x86dd), length 142: (class 0xc0, hlim 158, next-header ICMPv6 (58) payload length: 88) fe80::5054:ff:fe58:ccc4 > ff02::1: [icmp6 sum ok] ICMP6, router advertisement, length 88
        hop limit 64, Flags [managed, other stateful], pref medium, router lifetime 1800s, reachable time 0ms, retrans time 0ms
          prefix info option (3), length 32 (4): 2620:52:0:13b8::/64, Flags [onlink], valid time 3600s, pref. time 3600s
            0x0000:  4080 0000 0e10 0000 0e10 0000 0000 2620
            0x0010:  0052 0000 13b8 0000 0000 0000 0000
          mtu option (5), length 8 (1):  1500
            0x0000:  0000 0000 05dc
          source link-address option (1), length 8 (1): 52:54:00:58:cc:c4
            0x0000:  5254 0058 ccc4
          rdnss option (25), length 24 (3):  lifetime 3600s, addr: fe80::5054:ff:fe58:ccc4
            0x0000:  0000 0000 0e10 fe80 0000 0000 0000 5054
            0x0010:  00ff fe58 ccc4

Comment 14 Jakub Libosvar 2020-04-17 15:02:46 UTC

And just to add, this is likely an OVN bug and not OVS, OVS just gets SEGV signal because of the depth in stack.

Comment 15 Dumitru Ceara 2020-04-17 18:02:36 UTC

(In reply to Jakub Libosvar from comment #14)
> And just to add, this is likely an OVN bug and not OVS, OVS just gets SEGV
> signal because of the depth in stack.

Agreed, this was introduced by https://github.com/ovn-org/ovn/commit/677a3ba4d66b10300ce6050c82b07bb152027bd5

I'm working on an OVN fix for the routing loop but I think the stack size in OVS should be adjusted too in order to avoid crashing and just drop packets when they reach the 4K resubmit limit.
I'll clone this bug to track the fix in OVN2.13.

Thanks,
Dumitru

Comment 16 Dumitru Ceara 2020-04-21 15:01:03 UTC

Created attachment 1680583 [details]
OVS conf.db from an OVN reproducer

Comment 17 Dumitru Ceara 2020-04-21 15:03:10 UTC

Created attachment 1680584 [details]
OVS flows generated by OVN reproducer

Steps to replicate the issue with flows generated by an OVN deployment:

1. Restart ovs with attached conf.db
2. Load flows on br-int:

ovs-ofctl -OOpenFlow15 add-flows br-int br-int.flows.dump

3. Run ofproto-trace on patch port from br-ex:

in_port=$(ovs-vsctl --bare --columns ofport list interface patch-br-int-to-ln-public)
flow="icmp6,dl_src=40:54:00:00:00:03,dl_dst=33:33:ff:00:00:0b,ipv6_src=fe80::5254:ff:fe00:3,ipv6_dst=ff02::2,nw_tos=0,nw_ecn=0,nw_ttl=254,icmpv6_type=135,nd_target=1000::b,nd_sll=50:54:00:00:00:03"
ovs-appctl ofproto/trace br-int in_port=${in_port},$flow

Comment 18 Dumitru Ceara 2020-04-22 08:50:20 UTC

(In reply to Dumitru Ceara from comment #17)
> Created attachment 1680584 [details]
> OVS flows generated by OVN reproducer
> 
> Steps to replicate the issue with flows generated by an OVN deployment:
> 
> 1. Restart ovs with attached conf.db
> 2. Load flows on br-int:
> 
> ovs-ofctl -OOpenFlow15 add-flows br-int br-int.flows.dump
> 
> 3. Run ofproto-trace on patch port from br-ex:
> 
> in_port=$(ovs-vsctl --bare --columns ofport list interface
> patch-br-int-to-ln-public)
> flow="icmp6,dl_src=40:54:00:00:00:03,dl_dst=33:33:ff:00:00:0b,ipv6_src=fe80::
> 5254:ff:fe00:3,ipv6_dst=ff02::2,nw_tos=0,nw_ecn=0,nw_ttl=254,icmpv6_type=135,
> nd_target=1000::b,nd_sll=50:54:00:00:00:03"
> ovs-appctl ofproto/trace br-int in_port=${in_port},$flow

The above packet trace will hit the max depth resubmit of 64. If instead we want to hit the max number of resubmits we need to limit the depth of the packet processing tree. One way to do that is to lower the TTL to 10:

in_port=$(ovs-vsctl --bare --columns ofport list interface patch-br-int-to-ln-public)
flow="icmp6,dl_src=40:54:00:00:00:03,dl_dst=33:33:ff:00:00:0b,ipv6_src=fe80::5254:ff:fe00:3,ipv6_dst=ff02::2,nw_tos=0,nw_ecn=0,nw_ttl=10,icmpv6_type=135,nd_target=1000::b,nd_sll=50:54:00:00:00:03"
ovs-appctl ofproto/trace br-int in_port=${in_port},$flow

Comment 19 Eelco Chaudron 2020-04-22 14:07:27 UTC

Looking at this in general for now it makes no sense changing the default stack size. I will research a bit more if it makes sense to increase it slightly to support OVN better. However, it would make more sense to increase the stack size dynamically based on the deployment of OVS. This, for example, can be done using the systemd drop-in feature. For example:

$ mkdir -p /etc/systemd/system/ovs-vswitchd.service.d/
$ echo -e "[Service]\nLimitSTACK=8M" > /etc/systemd/system/ovs-vswitchd.service.d/limitstack.conf
$ systemctl daemon-reload
$ systemctl restart openvswitch
$ cat /proc/$(pidof ovs-vswitchd)/limits | grep stack -
Max stack size            8388608              8388608              bytes

Comment 20 Eelco Chaudron 2020-05-04 07:47:50 UTC

Closing BZ, see #19. Sent email upstream to see if we can optimize the stack usage for the xlate code path, but as this is at the core of OVS it might need more taught and just increasing the stack on a need to basis seems the solution for now. 

https://mail.openvswitch.org/pipermail/ovs-dev/2020-April/369776.html

Note You need to log in before you can comment on or make changes to this bug.