Bug 1751458 - OpenShift sdn crashes when there is a gretap link
Summary: OpenShift sdn crashes when there is a gretap link
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.3.0
Hardware: Unspecified
OS: Linux
low
low
Target Milestone: ---
: 4.3.0
Assignee: Ricardo Carrillo Cruz
QA Contact: zhaozhanqi
URL:
Whiteboard:
: 1633672 (view as bug list)
Depends On:
Blocks: 1759497 1759831
TreeView+ depends on / blocked
 
Reported: 2019-09-12 04:46 UTC by Juan Luis de Sousa-Valadas
Modified: 2020-01-23 11:06 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1759497 1759831 1759833 (view as bug list)
Environment:
Last Closed: 2020-01-23 11:05:47 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Github openshift sdn pull 49 'None' closed Bug 1751458: Fix parsing of IFLA_GRE_COLLECT_METADATA 2020-06-02 08:06:46 UTC
Red Hat Product Errata RHBA-2020:0062 None None None 2020-01-23 11:06:11 UTC

Description Juan Luis de Sousa-Valadas 2019-09-12 04:46:57 UTC
Description of problem:
Customer created a gre tunnel in br0 to monitor some stuff and this causes the sdn pod to crash during the bootstrap.

Version-Release number of selected component (if applicable):
3.11.140-1

How reproducible:
Always

Steps to Reproduce:
1. Create a gretap link either in the br0 or outside of it:
ovs-vsctl add-port br0 gre0 \
    -- set interface gre0 type=gre options:remote_ip=10.47.235.49 \
    -- --id=@p get port gre0 \
    -- --id=@m create mirror name=pvxmirror select-all=true output-port=@p \
    -- set bridge br0 mirror=@m
  echo "gre0 created"

2. Restart the sdn pod
3. Verify there is a crash in the logs

Actual results:
SDN panics

Expected results:
SDN doesn't panic

Additional info:
github.com/openshift/sdn should also be affected as it ships the same library version
There is a fix upstream to the issue however I believe this fix is wrong
Fix upstream: https://github.com/vishvananda/netlink/commit/12728257a952cdfbc09cd4ec9fc55e97c8b2cf02

Logs and Stack trace:
# /bin/openshift start network --config=/etc/origin/node/node-config.yaml --kubeconfig=/etc/origin/node/node.kubeconfig --loglevel=2                                                                                                                                                                           I0910 10:05:52.781813  123512 start_network.go:193] Reading node configuration from /etc/origin/node/node-config.yaml
I0910 10:05:52.786286  123512 start_network.go:200] Starting node networking node-0.dirtyharry311.lab.pnq2.cee.redhat.com (v3.11.141-1+be05365)
W0910 10:05:52.786499  123512 server.go:195] WARNING: all flags other than --config, --write-config-to, and --cleanup are deprecated. Please begin using a config file ASAP.
I0910 10:05:52.786558  123512 feature_gate.go:230] feature gates: &{map[]}
I0910 10:05:52.799201  123512 transport.go:160] Refreshing client certificate from store
I0910 10:05:52.799285  123512 certificate_store.go:131] Loading cert/key pair from "/etc/origin/node/certificates/kubelet-client-current.pem".
I0910 10:05:52.800625  123512 node.go:147] Initializing SDN node of type "redhat/openshift-ovs-networkpolicy" with configured hostname "node-0.dirtyharry311.lab.pnq2.cee.redhat.com" (IP ""), iptables sync period "30s"
panic: runtime error: index out of range

goroutine 1 [running]:
github.com/openshift/origin/vendor/github.com/vishvananda/netlink.parseGretapData(0x9c242e0, 0xc000b905a0, 0xc0008f4c00, 0x10, 0x10)
        /home/jdesousa/go/src/github.com/openshift/ose/_output/local/go/src/github.com/openshift/origin/vendor/github.com/vishvananda/netlink/link_linux.go:1741 +0x34b
github.com/openshift/origin/vendor/github.com/vishvananda/netlink.LinkDeserialize(0x0, 0xc00084d888, 0x580, 0x778, 0x9, 0xc000932400, 0x8, 0x10)
        /home/jdesousa/go/src/github.com/openshift/ose/_output/local/go/src/github.com/openshift/origin/vendor/github.com/vishvananda/netlink/link_linux.go:1211 +0xfa1
github.com/openshift/origin/vendor/github.com/vishvananda/netlink.(*Handle).LinkList(0xdb13b00, 0xa, 0x10, 0xc000219b01, 0xc00161c520, 0x53ca73)
        /home/jdesousa/go/src/github.com/openshift/ose/_output/local/go/src/github.com/openshift/origin/vendor/github.com/vishvananda/netlink/link_linux.go:1311 +0x1d9
github.com/openshift/origin/vendor/github.com/vishvananda/netlink.LinkList(...)
        /home/jdesousa/go/src/github.com/openshift/ose/_output/local/go/src/github.com/openshift/origin/vendor/github.com/vishvananda/netlink/link_linux.go:1291
github.com/openshift/origin/pkg/network/node.GetLinkDetails(0xc000219bb0, 0xc, 0xc000219bb0, 0xc, 0x0, 0x0, 0x30)
        /home/jdesousa/go/src/github.com/openshift/ose/_output/local/go/src/github.com/openshift/origin/pkg/network/node/node.go:239 +0x43
github.com/openshift/origin/pkg/network/node.(*OsdnNodeConfig).setNodeIP(0xc0006f4e10, 0x28, 0x9bd6ee0)
        /home/jdesousa/go/src/github.com/openshift/ose/_output/local/go/src/github.com/openshift/origin/pkg/network/node/node.go:224 +0x66
github.com/openshift/origin/pkg/network/node.New(0xc0006f4e10, 0xc0006f4e10, 0x5608cb2, 0xd)
        /home/jdesousa/go/src/github.com/openshift/ose/_output/local/go/src/github.com/openshift/origin/pkg/network/node/node.go:157 +0x3f3
github.com/openshift/origin/pkg/cmd/server/kubernetes/network.NewSDNInterfaces(0x0, 0x0, 0x0, 0x0, 0xc0003b8d20, 0x2c, 0x0, 0x0, 0xc00036ffc0, 0xd, ...)
        /home/jdesousa/go/src/github.com/openshift/ose/_output/local/go/src/github.com/openshift/origin/pkg/cmd/server/kubernetes/network/sdn_linux.go:55 +0x5c6
github.com/openshift/origin/pkg/cmd/server/kubernetes/network.New(0x0, 0x0, 0x0, 0x0, 0xc0003b8d20, 0x2c, 0x0, 0x0, 0xc00036ffc0, 0xd, ...)
        /home/jdesousa/go/src/github.com/openshift/ose/_output/local/go/src/github.com/openshift/origin/pkg/cmd/server/kubernetes/network/network_config.go:123 +0x10ac
github.com/openshift/origin/pkg/cmd/server/start.StartNetwork(0x0, 0x0, 0x0, 0x0, 0xc0003b8d20, 0x2c, 0x0, 0x0, 0xc00036ffc0, 0xd, ...)
        /home/jdesousa/go/src/github.com/openshift/ose/_output/local/go/src/github.com/openshift/origin/pkg/cmd/server/start/start_network.go:211 +0x4be
github.com/openshift/origin/pkg/cmd/server/start.NetworkOptions.RunNetwork(0xc001200900, 0x0, 0x7ffc62b71826, 0x21, 0x9bd6ea0, 0xc000010018, 0xc0000f2180, 0x0, 0x0)
        /home/jdesousa/go/src/github.com/openshift/ose/_output/local/go/src/github.com/openshift/origin/pkg/cmd/server/start/start_network.go:182 +0x6b9
github.com/openshift/origin/pkg/cmd/server/start.NetworkOptions.StartNetwork(0xc001200900, 0x0, 0x7ffc62b71826, 0x21, 0x9bd6ea0, 0xc000010018, 0xc0000f2180, 0x0, 0x0)
        /home/jdesousa/go/src/github.com/openshift/ose/_output/local/go/src/github.com/openshift/origin/pkg/cmd/server/start/start_network.go:128 +0x50
github.com/openshift/origin/pkg/cmd/server/start.(*NetworkOptions).Run(0xc0016fbd70, 0xc0017de500, 0x9bd6ea0, 0xc000010020, 0xc001472ba0, 0x0, 0x3, 0xc0000f2180)
        /home/jdesousa/go/src/github.com/openshift/ose/_output/local/go/src/github.com/openshift/origin/pkg/cmd/server/start/start_network.go:96 +0x153
github.com/openshift/origin/pkg/cmd/server/start.NewCommandStartNetwork.func1.2(0xc000000008, 0x58fded0)
        /home/jdesousa/go/src/github.com/openshift/ose/_output/local/go/src/github.com/openshift/origin/pkg/cmd/server/start/start_network.go:68 +0x69
github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/util/interrupt.(*Handler).Run(0xc001472c00, 0xc00161dc18, 0x0, 0x0)
        /home/jdesousa/go/src/github.com/openshift/ose/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/util/interrupt/interrupt.go:103 +0xff
github.com/openshift/origin/pkg/cmd/server/start.NewCommandStartNetwork.func1(0xc0017de500, 0xc001472ba0, 0x0, 0x3)
        /home/jdesousa/go/src/github.com/openshift/ose/_output/local/go/src/github.com/openshift/origin/pkg/cmd/server/start/start_network.go:67 +0x1ae
github.com/openshift/origin/vendor/github.com/spf13/cobra.(*Command).execute(0xc0017de500, 0xc001472b10, 0x3, 0x3, 0xc0017de500, 0xc001472b10)
        /home/jdesousa/go/src/github.com/openshift/ose/_output/local/go/src/github.com/openshift/origin/vendor/github.com/spf13/cobra/command.go:760 +0x2ae
github.com/openshift/origin/vendor/github.com/spf13/cobra.(*Command).ExecuteC(0xc0017c3680, 0x9, 0xc0017c3680, 0x9)
        /home/jdesousa/go/src/github.com/openshift/ose/_output/local/go/src/github.com/openshift/origin/vendor/github.com/spf13/cobra/command.go:846 +0x2ec
github.com/openshift/origin/vendor/github.com/spf13/cobra.(*Command).Execute(...)
        /home/jdesousa/go/src/github.com/openshift/ose/_output/local/go/src/github.com/openshift/origin/vendor/github.com/spf13/cobra/command.go:794
main.main()
        /home/jdesousa/go/src/github.com/openshift/ose/_output/local/go/src/github.com/openshift/origin/cmd/openshift/openshift.go:41 +0x2c2

(note: There are references to /home/jdesousa because I had to recompile it with some gcflags so that delve would work)

What is currently happening is when we get to this point:

github.com/openshift/origin/vendor/github.com/vishvananda/netlink.parseGretapData(0x9c242e0, 0xc000b905a0, 0xc0008f4c00, 0x10, 0x10)
        /home/jdesousa/go/src/github.com/openshift/ose/_output/local/go/src/github.com/openshift/origin/vendor/github.com/vishvananda/netlink/link_linux.go:1741 +0x34b


The LinkDeserialize() function passes an array of attributes to parseGretapData().

   1709    func parseGretapData(link Link, data []syscall.NetlinkRouteAttr) {
   1710            gre := link.(*Gretap)
   1711            for _, datum := range data {
   1712                    switch datum.Attr.Type {
     :
   1740                    case nl.IFLA_GRE_COLLECT_METADATA:
   1741                            gre.FlowBased = int8(datum.Value[0]) != 0

datum.Value[0] is out of bounds because data[15] is:
$15 = {
  Attr = {
    Len = 4,
    Type = 18 = IFLA_GRE_COLLECT_METADATA
  },
  Value = {
    array = 0xc00157db18 "\020",
    len = 0, <------------------ !!!
    cap = 1256
  }
}


So it's out of bounds

Comment 5 Ricardo Carrillo Cruz 2019-10-07 15:08:29 UTC
I was hoping to get https://github.com/openshift/sdn/pull/33 merged then start moving back, but
there's an issue with libnetwork when bumping netlink, basicly there's a change on the signature
of Receive methods.

Therefore, I will just backport the netlink fix onto the vendor netlink repo on 4.1, then 3.11.

Will update when done, hoping to have it for tomorrow.

Comment 7 Weibin Liang 2019-10-09 20:31:48 UTC
@Ricardo, does QE need wait for https://github.com/openshift/sdn/pull/33 to be merged to verify this bug? Thanks!

Comment 8 zhaozhanqi 2019-10-10 02:45:57 UTC
Assigned this bug according to comment 7

Comment 9 Ricardo Carrillo Cruz 2019-10-10 07:52:32 UTC
@Weibin nope, I closed that PR cos bumping the  netlink from upstream causes an incompatibility 
on libnetwork, thus I created a carry patch on https://github.com/openshift/sdn/pull/49 , which it just merged.

To test this, try out what is described in the bug, attach a gre to br0, restart sdn and see if it doesn't crash.

Thanks

Comment 12 Ricardo Carrillo Cruz 2019-11-08 09:59:14 UTC
*** Bug 1633672 has been marked as a duplicate of this bug. ***

Comment 14 errata-xmlrpc 2020-01-23 11:05:47 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0062


Note You need to log in before you can comment on or make changes to this bug.