Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

The FDP team is no longer accepting new bugs in Bugzilla. Please report your issues under FDP project in Jira. Thanks.

Bug 1964582

Summary:	[21.E RHEL-8] Fast Datapath Release
Product:	Red Hat Enterprise Linux Fast Datapath	Reporter:	Timothy Redaelli <tredaelli>
Component:	openvswitch2.13	Assignee:	Timothy Redaelli <tredaelli>
Status:	CLOSED ERRATA	QA Contact:	ovs-qe
Severity:	unspecified	Docs Contact:
Priority:	unspecified
Version:	FDP 21.E	CC:	ctrautma, jhsiao, kfida, ralongi
Target Milestone:	---
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:	openvswitch2.13-2.13.0-114.el8fdp	Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2021-06-21 14:24:43 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Timothy Redaelli 2021-05-25 18:24:29 UTC

commit 160bb80a321538141b48ba2b4bcff262f2635e5d
Author: Adrian Moreno <amorenoz>
Date:   Tue May 18 09:50:27 2021 +0200

    ofp_actions: Fix set_mpls_tc formatting.
    
    Apart from a cut-and-paste typo, the man page claims that mpls_labels
    can be provided in hexadecimal format but that's currently not the case.
    
    Fix mpls ofp-action formatting, add size checks on ofp-action parsing
    and add some unit tests.
    
    Signed-off-by: Adrian Moreno <amorenoz>
    Signed-off-by: Ilya Maximets <i.maximets>

commit 32a197f845106da48833dd6194abcce01114cbe4
Author: Ilya Maximets <i.maximets>
Date:   Wed Apr 21 15:48:16 2021 +0200

    dpif-netdev: Remove meter rate from the bucket size calculation.
    
    Implementation of meters supposed to be a classic token bucket with 2
    typical parameters: rate and burst size.
    
    Burst size in this schema is the maximum number of bytes/packets that
    could pass without being rate limited.
    
    Recent changes to userspace datapath made meter implementation to be
    in line with the kernel one, and this uncovered several issues.
    
    The main problem is that maximum bucket size for unknown reason
    accounts not only burst size, but also the numerical value of rate.
    This creates a lot of confusion around behavior of meters.
    
    For example, if rate is configured as 1000 pps and burst size set to 1,
    this should mean that meter will tolerate bursts of 1 packet at most,
    i.e. not a single packet above the rate should pass the meter.
    However, current implementation calculates maximum bucket size as
    (rate + burst size), so the effective bucket size will be 1001.  This
    means that first 1000 packets will not be rate limited and average
    rate might be twice as high as the configured rate.  This also makes
    it practically impossible to configure meter that will have burst size
    lower than the rate, which might be a desirable configuration if the
    rate is high.
    
    Inability to configure low values of a burst size and overall inability
    for a user to predict what will be a maximum and average rate from the
    configured parameters of a meter without looking at the OVS and kernel
    code might be also classified as a security issue, because drop meters
    are frequently used as a way of protection from DoS attacks.
    
    This change removes rate from the calculation of a bucket size, making
    it in line with the classic token bucket algorithm and essentially
    making the rate and burst tolerance being predictable from a users'
    perspective.
    
    Same change will be proposed for the kernel implementation.
    Unit tests changed back to their correct version and enhanced.
    
    Signed-off-by: Ilya Maximets <i.maximets>
    Acked-by: Eelco Chaudron <echaudro>
    Reviewed-by: Tonghao Zhang <xiangxia.m.yue>

commit 1b4d8fe1d9cd1b558f2f093f909090c49fcda6b4
Author: Dumitru Ceara <dceara>
Date:   Mon May 17 11:22:27 2021 +0200

    ovsdb-idl: Consider all tables when computing expected cond seqno.
    
    In ovsdb_idl_db_set_condition(), take into account all pending
    condition changes for all tables when computing the db->cond_seqno at
    which the monitor is expected to be updated.
    
    In the following scenario, with two tables, A and B, the old code
    performed the following steps:
    1. Initial db->cond_seqno = X.
    2. Client changes condition for table A:
       - A->new_cond gets set
       - expected cond seqno returned to the client: X + 1
    3. ovsdb-idl sends the monitor_cond_change for table A
       - A->req_cond <- A->new_cond
    4. Client changes condition for table B:
       - B->new_cond gets set
       - expected cond seqno returned to the client: X + 1
       - however, because the condition change at step 3 is still not replied
         to, table B's monitor_cond_change request is not sent yet.
    5. ovsdb-idl receives the reply for the condition change at step 3:
       - db->cond_seqno <- X + 1
    6. ovsdb-idl sends the monitor_cond_change for table B
    7. ovsdb-idl receives the reply for the condition change at step 6:
      - db->cond_seqno <- X + 2
    
    The client was incorrectly informed that it will have all relevant
    updates for table B at seqno X + 1 while actually that happens later, at
    seqno X + 2.
    
    Fixes: 46437c5232bd ("ovsdb-idl: Enhance conditional monitoring API")
    Acked-by: Ben Pfaff <blp>
    Signed-off-by: Dumitru Ceara <dceara>
    (cherry picked from commit b5bb044fbe4c1395dcde5cc7d5081ef0099bb8b3)
    Signed-off-by: Ilya Maximets <i.maximets>

commit 9acbf2cff7df1389e069eeb74a78c7f625a2459f
Author: Wang Yibo <bobxxwang>
Date:   Thu May 6 11:14:07 2021 +0800

    ovs-ofctl: Fix coredump when using "add-groups" command.
    
    When using ovs-ofctl add-groups with only "switch" argument, a coredump
    is generated.  The main reason is that the command "ovs-ofctl add-groups"
    need two arguments but the limitation of min-args of this command is
    set to 1.
    
    Fixes: 7395c05254df ("Implement OpenFlow 1.1+ "groups" protocol.")
    Submitted-at: https://github.com/openvswitch/ovs/pull/360
    Signed-off-by: Wang Yibo <bobxxwang>
    Signed-off-by: Ilya Maximets <i.maximets>

commit 058702e3dcc61700bd587621d750d80985660a54
Author: Ilya Maximets <i.maximets>
Date:   Thu May 6 14:47:31 2021 +0200

    raft: Transfer leadership before creating snapshots.
    
    With a big database writing snapshot could take a lot of time, for
    example, on one of the systems compaction of 300MB database takes
    about 10 seconds to complete.  For the clustered database, 40% of this
    time takes conversion of the database to the file transaction json
    format, the rest of time is formatting a string and writing to disk.
    Of course, this highly depends on the disc and CPU speeds.  300MB is
    the very possible database size for the OVN Southbound DB, and it might
    be even bigger than that.
    
    During compaction the database is not available and the ovsdb-server
    doesn't do any other tasks.  If leader spends 10-15 seconds writing a
    snapshot, the cluster is not functional for that time period.  Leader
    also, likely, has some monitors to serve, so the one poll interval may
    be 15-20 seconds long in the end.  Systems with so big databases
    typically has very high election timers configured (16 seconds), so
    followers will start election only after this significant amount of
    time.  Once leader is back to the operational state, it will
    re-connect and try to join the cluster back.  In some cases, this might
    also trigger the 'connected' state flapping on the old leader
    triggering a re-connection of clients.  This issue has been observed
    with large-scale OVN deployments.
    
    One of the methods to improve the situation is to transfer leadership
    before compacting.  This allows to keep the cluster functional,
    while one of the servers writes a snapshot.
    
    Additionally logging the time spent for compaction if it was longer
    than 1 second.  This adds a bit of visibility to 'unreasonably long
    poll interval's.
    
    Reported-at: https://bugzilla.redhat.com/1960391
    Signed-off-by: Ilya Maximets <i.maximets>
    Acked-by: Dumitru Ceara <dceara>

commit 3bc41e2b6a170dcd6b5c728a4ed3410a6ce0865d
Author: Hariprasad Govindharajan <hariprasad.govindharajan>
Date:   Fri Apr 23 13:16:37 2021 +0100

    dpdk: Use DPDK 19.11.8 release.
    
    Modify ci linux build script to use the latest DPDK stable release.
    Modify Documentation to use the latest DPDK stable release 19.11.8
    Update NEWS file to reflect the latest DPDK stable release.
    
    Building DPDK releases 19.11.6 and 19.11.7,with meson was
    found to break the OvS build.
    DPDK 19.11.8 was released to fix the above said issue.
    Link to the discussion about the issue can be found below:
    http://mails.dpdk.org/archives/stable/2021-April/029786.html
    
    Signed-off-by: Hariprasad Govindharajan <hariprasad.govindharajan>
    Acked-by: Sunil Pai G <sunil.pai.g>
    Signed-off-by: Ian Stokes <ian.stokes>

commit fbebe7377dc00bd0627588f48d37b9b2658b6532
Author: Ilya Maximets <i.maximets>
Date:   Tue May 11 15:20:49 2021 +0200

    github: Fix up malformed /etc/hosts.
    
    For some reason /etc/hosts in GHA now contains a plain text line
    like this:
    
      Note: Don't Delete this file. Also, don't remove this line. ...
    
    This breaks libunbound and makes a series of unit tests to emit
    following warning:
      |00001|dns_resolve|WARN|Failed to read etc/hosts: syntax error
    
    Working around this issue by removing a bad line from /etc/hosts
    until this fixed properly by GitHub team.  This in combination
    with other fixes should unblock CI.
    
    Bug for virtual-environments:
      https://github.com/actions/virtual-environments/issues/3353
    
    Signed-off-by: Ilya Maximets <i.maximets>
    Reviewed-by: David Marchand <david.marchand>
    Acked-by: Aaron Conole <aconole>

commit 2a3d1f497152505e2d27febf1ff5329e3a277164
Author: Ilya Maximets <i.maximets>
Date:   Mon May 10 17:57:45 2021 +0200

    doc: automake: Add support for sphinx 4.0.
    
    File layout for man pages in sphinx 4 by default changed [1] from:
    
      Documentation/_ref/man/page.section
    
     to:
    
      Documentation/_ref/man/section/page.section
    
    Ajusting our build scripts so they will be able to locate files
    in new places.  This fixes our CI build.
    
    [1] https://github.com/sphinx-doc/sphinx/issues/7996
    
    Signed-off-by: Ilya Maximets <i.maximets>
    Acked-by: Ben Pfaff <blp>
    Reviewed-by: David Marchand <david.marchand>
    Reviewed-by: Aaron Conole <aconole>

commit 4ab5524dbb6f71232230a4a7cd7e13313c5db28b
Author: Ilya Maximets <i.maximets>
Date:   Thu May 6 14:27:23 2021 +0200

    cirrus: Look up existing versions of python dependencies.
    
    FreeBSD sometimes changes the base version of python3 that is
    used for packages.  This affects package names.  For example,
    currently CI is broken, because there is no more py37- versions
    of sphinx and openssl available, only py38- ones:
    
      pkg: No packages available to install matching 'py37-openssl'
           have been found in the repositories
      pkg: No packages available to install matching 'py37-sphinx'
           have been found in the repositories
    
    We had the same issue last year with 3.6 -> 3.7 transition:
      dfa2e3d04948 ("cirrus: Use python 3.7 packages on FreeBSD.")
    
    Fixing that by searching for a package instead of using a specific
    version.  This should help to avoid same issues in the future.
    
    Signed-off-by: Ilya Maximets <i.maximets>
    Reviewed-by: Aaron Conole <aconole>
    Acked-by: Kevin Traynor <ktraynor>
    Reviewed-by: David Marchand <david.marchand>

commit ea98fe0743d6902ab4b521e54cfa8edc467e703d
Author: Ben Pfaff <blp>
Date:   Thu May 6 10:41:40 2021 -0700

    ofp-group: Use big-enough buffer in ofputil_format_group().
    
    GCC 11 pointed out that ofputil_group_to_string()'s prototype asks for
    a buffer with one byte more than supplied.  This fixes the problem.
    
    This wasn't a buffer overflow because ofputil_group_to_string() honors
    the buffer size passed in, which was correct.  The worst that could
    happen was truncating the last byte of a group name.
    
    Signed-off-by: Ben Pfaff <blp>
    Acked-by: Paolo Valerio <pvalerio>

commit 798a40ffb6b8702630c8c1298a43bcd9ec399259
Author: Gavin Li <gavinl>
Date:   Mon Apr 19 10:26:04 2021 +0300

    ofproto/ofproto-dpif-sflow: Check sflow agent in case of race
    
    sFlow agent may not exist while calling dpif_sflow_received. The sFlow
    may be deleted in another thread, eg, by user, which will cause crash.
    
    Signed-off-by: Gavin Li <gavinl>
    Reviewed-by: Gavi Teitz <gavi>
    Signed-off-by: Simon Horman <simon.horman>

commit 5641bf9d72c0bb4a56a4a84dfb9d673ef60b42ab
Author: Timothy Redaelli <tredaelli>
Date:   Tue Apr 20 17:23:25 2021 +0200

    Fix typo in rh-mock-srpm
    
    Thanks fbl for reporting

commit 9984d6112495934bdb3ec668830ddcd0df7399c8
Author: Ilya Maximets <i.maximets>
Date:   Sun Apr 4 14:43:52 2021 +0200

    dpif: Fix use of uninitialized execute hash.
    
    'dpif_execute_helper_cb' doesn't initilalize the 'hash' field that
    may be passed down to datapath and might cause execution of a different
    set of actions, e.g. on recirculation.
    
     Thread 6 handler27:
     Conditional jump or move depends on uninitialised value(s)
        at 0x53A2C2: dpif_netlink_encode_execute (dpif-netlink.c:1841)
        by 0x53A2C2: dpif_netlink_operate__ (dpif-netlink.c:1919)
        by 0x53A82D: dpif_netlink_operate_chunks (dpif-netlink.c:2238)
        by 0x53A82D: dpif_netlink_operate (dpif-netlink.c:2297)
        by 0x48135F: dpif_operate (dpif.c:1366)
        by 0x481923: dpif_execute.part.24 (dpif.c:1320)
        by 0x481C46: dpif_execute (dpif.c:1312)
        by 0x481C46: dpif_execute_helper_cb (dpif.c:1243)
        by 0x4AE943: odp_execute_actions (odp-execute.c:865)
        by 0x47F272: dpif_execute_with_help (dpif.c:1296)
        by 0x4812FF: dpif_operate (dpif.c:1422)
        by 0x442226: handle_upcalls (ofproto-dpif-upcall.c:1617)
        by 0x442226: recv_upcalls.isra.36 (ofproto-dpif-upcall.c:855)
        by 0x442351: udpif_upcall_handler (ofproto-dpif-upcall.c:755)
        by 0x4FDE2C: ovsthread_wrapper (ovs-thread.c:383)
        by 0x5E19159: start_thread (in /usr/lib64/libpthread-2.28.so)
        by 0x69ECF72: clone (in /usr/lib64/libc-2.28.so)
      Uninitialised value was created by a stack allocation
        at 0x481966: dpif_execute_helper_cb (dpif.c:1159)
    
    Additionally added a missing comment to the 'struct dpif_execute'.
    
    Fixes: 0442bfb11d6c ("ofproto-dpif-upcall: Echo HASH attribute back to datapath.")
    Acked-by: Tonghao Zhang <xiangxia.m.yue>
    Signed-off-by: Ilya Maximets <i.maximets>

commit 43c2a10922b9a0efee4b1c12fbefb257fc242707
Author: Ilya Maximets <i.maximets>
Date:   Sun Apr 4 14:32:29 2021 +0200

    odp-util: Fix use of uninitialized erspan metadata.
    
    'struct erspan_metadata' contains union with fields of different
    sizes, hence not all the memory initiliazed.  This memory goes
    to syscalls and also used to compare ukeys with memcmp which may
    cause unexpected behavior.
    
     Thread 15 revalidator13:
     Conditional jump or move depends on uninitialised value(s)
        at 0x4C377B6: bcmp (vg_replace_strmem.c:1111)
        by 0x43F844: ofpbuf_equal (ofpbuf.h:273)
        by 0x43F844: revalidate_ukey__ (ofproto-dpif-upcall.c:2227)
        by 0x43F9C9: revalidate_ukey (ofproto-dpif-upcall.c:2294)
        by 0x4425C2: revalidate.isra.33 (ofproto-dpif-upcall.c:2734)
        by 0x4434B8: udpif_revalidator (ofproto-dpif-upcall.c:943)
        by 0x4FDE2C: ovsthread_wrapper (ovs-thread.c:383)
        by 0x5E19159: start_thread (in /usr/lib64/libpthread-2.28.so)
        by 0x69ECF72: clone (in /usr/lib64/libc-2.28.so)
      Uninitialised value was created by a stack allocation
        at 0x4B1CE0: tun_key_to_attr (odp-util.c:3129)
    
    Fixes: 98514eea21f4 ("erspan: add kernel datapath support")
    Acked-by: William Tu <u9012063>
    Signed-off-by: Ilya Maximets <i.maximets>

commit 4f68040a5c5242197bd518e3d54564addf786829
Author: Yunjian Wang <wangyunjian>
Date:   Thu Apr 8 21:33:12 2021 +0800

    dpif-netlink: Fix using uninitialized info.tc_modify_flow_deleted in out label.
    
    Before info.tc_modify_flow_deleted is assigned a value, error
    processing of other statements goes to the out label. In the
    out label, the uninitialized variant is used for condition
    determination, which may cause uncertain behavior.
    
    Fixes: 65b84d4a32bd ("dpif-netlink: avoid netlink modify flow put op failed after tc modify flow put op failed.")
    Signed-off-by: Mengfan Lv <lvmengfan>
    Signed-off-by: Yunjian Wang <wangyunjian>
    Reviewed-by: Simon Horman <simon.horman>
    Signed-off-by: Ilya Maximets <i.maximets>

commit 1c2a5f170fd98b4884b39c249c5d215a80d511de
Author: Paul Blakey <paulb>
Date:   Fri Apr 9 09:35:35 2021 -0400

    netdev-offload-tc: Probe for support for any of the ct_state flags.
    
    Upstream kernel now rejects unsupported ct_state flags.
    Earlier kernels, ignored it but still echoed back the requested ct_state,
    if ct_state was supported. ct_state initial support had trk, new, est,
    and rel flags.
    
    If kernel echos back ct_state, assume support for trk, new, est, and
    rel. If kernel rejects a specific unsupported flag, continue and
    use reject mechanisim to probe for flags rep and inv.
    
    Disallow inserting rules with unnsupported ct_state flags.
    
    Signed-off-by: Paul Blakey <paulb>
    Acked-by: Roi Dayan <roid>
    Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner>
    Signed-off-by: Simon Horman <simon.horman>
    (cherry picked from commit 1e4aa061ac8b71196c0b0c8ab23d1627cd2e4a27)
    Signed-off-by: Aaron Conole <aconole>
    Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner>
    Signed-off-by: Ilya Maximets <i.maximets>

commit a4730043a43e44bd66eb658849aca1425e3a260f
Author: Paul Blakey <paulb>
Date:   Fri Apr 9 09:35:34 2021 -0400

    compat: Add ct_state flags definitions.
    
    Add TCA_FLOWER_KEY_CT_FLAGS_REPLY, and TCA_FLOWER_KEY_CT_FLAGS_INVALID.
    
    Signed-off-by: Paul Blakey <paulb>
    Acked-by: Roi Dayan <roid>
    Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner>
    Signed-off-by: Simon Horman <simon.horman>
    (cherry picked from commit 0a8bd432ae0cf750b5560312343ddaa05af8c876)
    Signed-off-by: Aaron Conole <aconole>
    Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner>
    Signed-off-by: Ilya Maximets <i.maximets>

commit bbd62d7a444e889ea7e05f7f42bedf76e8178f52
Author: Roi Dayan <roid>
Date:   Fri Apr 9 09:35:33 2021 -0400

    tc: Use skip_hw flag when probing tc features.
    
    There is no need to pass tc rules to hw when just probing
    for tc features. this will avoid redundant errors from hw drivers
    that may happen.
    
    Signed-off-by: Roi Dayan <roid>
    Acked-By: Vlad Buslov <vladbu>
    Reviewed-by: Tonghao Zhang <xiangxia.m.yue>
    Signed-off-by: Simon Horman <simon.horman>
    (cherry picked from commit d5659751f65ebc17d9aec40b60c1cff3a2d87162)
    Signed-off-by: Aaron Conole <aconole>
    Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner>
    Signed-off-by: Ilya Maximets <i.maximets>

Comment 3 Christian Trautman 2021-06-20 02:46:10 UTC

Regression executed and no major issues found.

Comment 5 errata-xmlrpc 2021-06-21 14:24:43 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (openvswitch2.13 bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:2508