Description of problem: Customer created a gre tunnel in br0 to monitor some stuff and this causes the sdn pod to crash during the bootstrap. Version-Release number of selected component (if applicable): 3.11.140-1 How reproducible: Always Steps to Reproduce: 1. Create a gretap link either in the br0 or outside of it: ovs-vsctl add-port br0 gre0 \ -- set interface gre0 type=gre options:remote_ip=10.47.235.49 \ -- --id=@p get port gre0 \ -- --id=@m create mirror name=pvxmirror select-all=true output-port=@p \ -- set bridge br0 mirror=@m echo "gre0 created" 2. Restart the sdn pod 3. Verify there is a crash in the logs Actual results: SDN panics Expected results: SDN doesn't panic Additional info: github.com/openshift/sdn should also be affected as it ships the same library version There is a fix upstream to the issue however I believe this fix is wrong Fix upstream: https://github.com/vishvananda/netlink/commit/12728257a952cdfbc09cd4ec9fc55e97c8b2cf02 Logs and Stack trace: # /bin/openshift start network --config=/etc/origin/node/node-config.yaml --kubeconfig=/etc/origin/node/node.kubeconfig --loglevel=2 I0910 10:05:52.781813 123512 start_network.go:193] Reading node configuration from /etc/origin/node/node-config.yaml I0910 10:05:52.786286 123512 start_network.go:200] Starting node networking node-0.dirtyharry311.lab.pnq2.cee.redhat.com (v3.11.141-1+be05365) W0910 10:05:52.786499 123512 server.go:195] WARNING: all flags other than --config, --write-config-to, and --cleanup are deprecated. Please begin using a config file ASAP. I0910 10:05:52.786558 123512 feature_gate.go:230] feature gates: &{map[]} I0910 10:05:52.799201 123512 transport.go:160] Refreshing client certificate from store I0910 10:05:52.799285 123512 certificate_store.go:131] Loading cert/key pair from "/etc/origin/node/certificates/kubelet-client-current.pem". I0910 10:05:52.800625 123512 node.go:147] Initializing SDN node of type "redhat/openshift-ovs-networkpolicy" with configured hostname "node-0.dirtyharry311.lab.pnq2.cee.redhat.com" (IP ""), iptables sync period "30s" panic: runtime error: index out of range goroutine 1 [running]: github.com/openshift/origin/vendor/github.com/vishvananda/netlink.parseGretapData(0x9c242e0, 0xc000b905a0, 0xc0008f4c00, 0x10, 0x10) /home/jdesousa/go/src/github.com/openshift/ose/_output/local/go/src/github.com/openshift/origin/vendor/github.com/vishvananda/netlink/link_linux.go:1741 +0x34b github.com/openshift/origin/vendor/github.com/vishvananda/netlink.LinkDeserialize(0x0, 0xc00084d888, 0x580, 0x778, 0x9, 0xc000932400, 0x8, 0x10) /home/jdesousa/go/src/github.com/openshift/ose/_output/local/go/src/github.com/openshift/origin/vendor/github.com/vishvananda/netlink/link_linux.go:1211 +0xfa1 github.com/openshift/origin/vendor/github.com/vishvananda/netlink.(*Handle).LinkList(0xdb13b00, 0xa, 0x10, 0xc000219b01, 0xc00161c520, 0x53ca73) /home/jdesousa/go/src/github.com/openshift/ose/_output/local/go/src/github.com/openshift/origin/vendor/github.com/vishvananda/netlink/link_linux.go:1311 +0x1d9 github.com/openshift/origin/vendor/github.com/vishvananda/netlink.LinkList(...) /home/jdesousa/go/src/github.com/openshift/ose/_output/local/go/src/github.com/openshift/origin/vendor/github.com/vishvananda/netlink/link_linux.go:1291 github.com/openshift/origin/pkg/network/node.GetLinkDetails(0xc000219bb0, 0xc, 0xc000219bb0, 0xc, 0x0, 0x0, 0x30) /home/jdesousa/go/src/github.com/openshift/ose/_output/local/go/src/github.com/openshift/origin/pkg/network/node/node.go:239 +0x43 github.com/openshift/origin/pkg/network/node.(*OsdnNodeConfig).setNodeIP(0xc0006f4e10, 0x28, 0x9bd6ee0) /home/jdesousa/go/src/github.com/openshift/ose/_output/local/go/src/github.com/openshift/origin/pkg/network/node/node.go:224 +0x66 github.com/openshift/origin/pkg/network/node.New(0xc0006f4e10, 0xc0006f4e10, 0x5608cb2, 0xd) /home/jdesousa/go/src/github.com/openshift/ose/_output/local/go/src/github.com/openshift/origin/pkg/network/node/node.go:157 +0x3f3 github.com/openshift/origin/pkg/cmd/server/kubernetes/network.NewSDNInterfaces(0x0, 0x0, 0x0, 0x0, 0xc0003b8d20, 0x2c, 0x0, 0x0, 0xc00036ffc0, 0xd, ...) /home/jdesousa/go/src/github.com/openshift/ose/_output/local/go/src/github.com/openshift/origin/pkg/cmd/server/kubernetes/network/sdn_linux.go:55 +0x5c6 github.com/openshift/origin/pkg/cmd/server/kubernetes/network.New(0x0, 0x0, 0x0, 0x0, 0xc0003b8d20, 0x2c, 0x0, 0x0, 0xc00036ffc0, 0xd, ...) /home/jdesousa/go/src/github.com/openshift/ose/_output/local/go/src/github.com/openshift/origin/pkg/cmd/server/kubernetes/network/network_config.go:123 +0x10ac github.com/openshift/origin/pkg/cmd/server/start.StartNetwork(0x0, 0x0, 0x0, 0x0, 0xc0003b8d20, 0x2c, 0x0, 0x0, 0xc00036ffc0, 0xd, ...) /home/jdesousa/go/src/github.com/openshift/ose/_output/local/go/src/github.com/openshift/origin/pkg/cmd/server/start/start_network.go:211 +0x4be github.com/openshift/origin/pkg/cmd/server/start.NetworkOptions.RunNetwork(0xc001200900, 0x0, 0x7ffc62b71826, 0x21, 0x9bd6ea0, 0xc000010018, 0xc0000f2180, 0x0, 0x0) /home/jdesousa/go/src/github.com/openshift/ose/_output/local/go/src/github.com/openshift/origin/pkg/cmd/server/start/start_network.go:182 +0x6b9 github.com/openshift/origin/pkg/cmd/server/start.NetworkOptions.StartNetwork(0xc001200900, 0x0, 0x7ffc62b71826, 0x21, 0x9bd6ea0, 0xc000010018, 0xc0000f2180, 0x0, 0x0) /home/jdesousa/go/src/github.com/openshift/ose/_output/local/go/src/github.com/openshift/origin/pkg/cmd/server/start/start_network.go:128 +0x50 github.com/openshift/origin/pkg/cmd/server/start.(*NetworkOptions).Run(0xc0016fbd70, 0xc0017de500, 0x9bd6ea0, 0xc000010020, 0xc001472ba0, 0x0, 0x3, 0xc0000f2180) /home/jdesousa/go/src/github.com/openshift/ose/_output/local/go/src/github.com/openshift/origin/pkg/cmd/server/start/start_network.go:96 +0x153 github.com/openshift/origin/pkg/cmd/server/start.NewCommandStartNetwork.func1.2(0xc000000008, 0x58fded0) /home/jdesousa/go/src/github.com/openshift/ose/_output/local/go/src/github.com/openshift/origin/pkg/cmd/server/start/start_network.go:68 +0x69 github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/util/interrupt.(*Handler).Run(0xc001472c00, 0xc00161dc18, 0x0, 0x0) /home/jdesousa/go/src/github.com/openshift/ose/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/util/interrupt/interrupt.go:103 +0xff github.com/openshift/origin/pkg/cmd/server/start.NewCommandStartNetwork.func1(0xc0017de500, 0xc001472ba0, 0x0, 0x3) /home/jdesousa/go/src/github.com/openshift/ose/_output/local/go/src/github.com/openshift/origin/pkg/cmd/server/start/start_network.go:67 +0x1ae github.com/openshift/origin/vendor/github.com/spf13/cobra.(*Command).execute(0xc0017de500, 0xc001472b10, 0x3, 0x3, 0xc0017de500, 0xc001472b10) /home/jdesousa/go/src/github.com/openshift/ose/_output/local/go/src/github.com/openshift/origin/vendor/github.com/spf13/cobra/command.go:760 +0x2ae github.com/openshift/origin/vendor/github.com/spf13/cobra.(*Command).ExecuteC(0xc0017c3680, 0x9, 0xc0017c3680, 0x9) /home/jdesousa/go/src/github.com/openshift/ose/_output/local/go/src/github.com/openshift/origin/vendor/github.com/spf13/cobra/command.go:846 +0x2ec github.com/openshift/origin/vendor/github.com/spf13/cobra.(*Command).Execute(...) /home/jdesousa/go/src/github.com/openshift/ose/_output/local/go/src/github.com/openshift/origin/vendor/github.com/spf13/cobra/command.go:794 main.main() /home/jdesousa/go/src/github.com/openshift/ose/_output/local/go/src/github.com/openshift/origin/cmd/openshift/openshift.go:41 +0x2c2 (note: There are references to /home/jdesousa because I had to recompile it with some gcflags so that delve would work) What is currently happening is when we get to this point: github.com/openshift/origin/vendor/github.com/vishvananda/netlink.parseGretapData(0x9c242e0, 0xc000b905a0, 0xc0008f4c00, 0x10, 0x10) /home/jdesousa/go/src/github.com/openshift/ose/_output/local/go/src/github.com/openshift/origin/vendor/github.com/vishvananda/netlink/link_linux.go:1741 +0x34b The LinkDeserialize() function passes an array of attributes to parseGretapData(). 1709 func parseGretapData(link Link, data []syscall.NetlinkRouteAttr) { 1710 gre := link.(*Gretap) 1711 for _, datum := range data { 1712 switch datum.Attr.Type { : 1740 case nl.IFLA_GRE_COLLECT_METADATA: 1741 gre.FlowBased = int8(datum.Value[0]) != 0 datum.Value[0] is out of bounds because data[15] is: $15 = { Attr = { Len = 4, Type = 18 = IFLA_GRE_COLLECT_METADATA }, Value = { array = 0xc00157db18 "\020", len = 0, <------------------ !!! cap = 1256 } } So it's out of bounds
I was hoping to get https://github.com/openshift/sdn/pull/33 merged then start moving back, but there's an issue with libnetwork when bumping netlink, basicly there's a change on the signature of Receive methods. Therefore, I will just backport the netlink fix onto the vendor netlink repo on 4.1, then 3.11. Will update when done, hoping to have it for tomorrow.
@Ricardo, does QE need wait for https://github.com/openshift/sdn/pull/33 to be merged to verify this bug? Thanks!
Assigned this bug according to comment 7
@Weibin nope, I closed that PR cos bumping the netlink from upstream causes an incompatibility on libnetwork, thus I created a carry patch on https://github.com/openshift/sdn/pull/49 , which it just merged. To test this, try out what is described in the bug, attach a gre to br0, restart sdn and see if it doesn't crash. Thanks
*** Bug 1633672 has been marked as a duplicate of this bug. ***
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:0062