Bug 1272571

Summary: Kernel 4.2.3-200 oopses when I bring up a GRE tunnel inside IPSec [also 4.2.5-201]
Product: [Fedora] Fedora Reporter: Chris Siebenmann <cks-rhbugzilla>
Component: kernelAssignee: Kernel Maintainer List <kernel-maint>
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: 22CC: gansalmon, itamar, jforbes, jhenner, jonathan, kafai, kernel-maint, madhu.chinakonda, mchehab, steved
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: kernel-4.2.6-301.fc23 kernel-4.2.6-201.fc22 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-11-26 20:55:36 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Full serial console OOPS from a Fedora 22 virtual machine
none
Fedora 23 kernel OOPS from a virtual machine
none
Serial console OOPS from Fedora 22 with kernel 4.2.5-201 none

Description Chris Siebenmann 2015-10-16 19:06:40 UTC
Description of problem:

I have a point to point GRE tunnel running inside IPSec on my machine.
When I updated to kernel-core.x86_64 4.2.3-200.fc22, my machine
immediately started oopsing when everything started. Further investigation
shows that it specifically oopses the moment the GRE tunnel is brought up.

My /etc/ipsec.d/hawk-gre.conf file is set up as follows:

	conn hawkgre
		left=128.100.3.58
		leftsourceip=128.100.3.58
		right=66.96.18.208
		leftprotoport=gre
		rightprotoport=gre
		auto=start
		ikev2=insist
		leftupdown="/var/local/ipsec/pluto/updown"
		rightupdown="/var/local/ipsec/pluto/updown"
		authby=rsasig
		[...]

128.100.3.58 is an IP alias on the machine. When the connection comes
up, 'updown' winds up running a script that does:

	modprobe ip_gre
	ip tun add extun mode gre local 128.100.3.58 remote 66.96.18.208 dev em0
	ip link set extun up
	# [... additional actions ...]

In 4.2.3-200.fc22, this oopses the moment that 'ip link set extun up'
runs. I have verified this by disabling the scripts and running these
commands by hand, one by one.

The oops happens in interrupt context and does not get written to disk.
Unfortunately I have been unable to capture very much of it with
netconsole and I do not have an onboard serial port on this PC. The
portion of the panic I've been able to capture is:

[   73.391042] alg: No test for echainiv(authenc(hmac(sha1),cbc(aes))) (echainiv(authenc(hmac(sha1-generic),cbc-aes-aesni)))
[   85.800701] gre: GRE over IPv4 demultiplexor driver
[   85.815811] ip_gre: GRE over IPv4 tunneling driver
[   94.977317] BUG: unable to handle kernel NULL pointer dereference at 0000000000000014
[   94.979623] IP: [<ffffffff817788e7>] _raw_write_lock_bh+0x17/0x30
[   94.981888] PGD 0 
[   94.984094] Oops: 0002 [#1] 
[   94.986313] Modules linked in: ip_gre gre authenc echainiv cmac rmd160 ip_vti ip_tunnel af_key ah6 ah4 esp6 esp4 xfrm4_mode_beet xfrm4_tunnel tunnel4 xfrm4_mode_tunnel xfrm4_mode_transport xfrm6_mode_transport xfrm6_mode_ro xfrm6_mode_beet xfrm6_mode_tunnel ipcomp ipcomp6 xfrm6_tunnel tunnel6 xfrm_ipcomp camellia_generic camellia_aesni_avx_x86_64 camellia_x86_64 cast6_avx_x86_64 cast6_generic cast5_avx_x86_64 cast5_generic cast_common deflate cts gcm ccm serpent_avx_x86_64 serpent_sse2_x86_64 serpent_generic blowfish_generic blowfish_x86_64 blowfish_common twofish_generic twofish_avx_x86_64 twofish_x86_64_3way twofish_x86_64 twofish_common xcbc sha256_ssse3 sha512_ssse3 sha512_generic des_generic tpm_rng timeriomem_rng virtio_rng virtio_ring virtio netconsole ppdev parport_pc parport fuse vmw_vsock_vmci_transport

Transcribed by hand, the stack backtrace appears to run:
	__ip6_ins_rt+0x32/0x60
	ip6_ins_rt+0x58/0x70
	__ip6_rt_update_pmtu.part.42+0x100/0x1e0
	ip6_rt_update_pmtu+0x2e/0x40
	ip_tunnel_xmit+0x18d/0xa50 [ip_tunnel]

... and I'm afraid that's about all I can transcribe right now.

This machine has no IPv6 connectivity and no special IPv6 addresses
configured, but the machine at the other end of the GRE tunnel has IPv6
connectivity and an IPv6 address.

Version-Release number of selected component (if applicable):

kernel-core.x86_64 4.2.3-200.fc22
libreswan-3.13-1.fc22.x86_64

How reproducible:

This is 100% reproducible on this machine.

Comment 1 Chris Siebenmann 2015-11-02 17:19:29 UTC
I have now captured a full OOPS from a Fedora 22 virtual machine's virtual
serial console. I'll attach it here.

Comment 2 Chris Siebenmann 2015-11-02 17:22:03 UTC
Created attachment 1088662 [details]
Full serial console OOPS from a Fedora 22 virtual machine

This OOPS is slightly different to what I see on my workstation, but it
happens in exactly the same way as before.

Comment 3 Chris Siebenmann 2015-11-04 22:48:01 UTC
Created attachment 1089920 [details]
Fedora 23 kernel OOPS from a virtual machine

Unsurprisingly, this issue reproduces on Fedora 23 as well. I've attached
a captured kernel oops from a virtual machine. This oops looks more like
my initial one, so perhaps there are multiple places where this bug can
happen.

Comment 4 Chris Siebenmann 2015-11-05 17:15:32 UTC
Created attachment 1090296 [details]
Serial console OOPS from Fedora 22 with kernel 4.2.5-201

I tried this again in the just released 4.2.5-201 kernel and it paniced
as before. I've attached the serial console log.

Comment 5 Steve Durbin 2015-11-06 14:26:38 UTC
Seeing same issue here with similar dump. 

Because of lack of serial console can't check it, but here it is not GRE tunnelling as per Chris; my solution (by elimination) was disabling radvd which allowed boot with full stack including two VPN tunnels and an IPV6 tunnel loaded.

Comment 6 Chris Siebenmann 2015-11-06 18:04:55 UTC
I built a stock 4.2.5 kernel from kernel.org (but reusing the Fedora 22
config) and then a stock 4.3 kernel, and this issue is present in both
of them. I'll see if I can bisect this in the upstream kernel source
but this may take some time.

Comment 7 Chris Siebenmann 2015-11-06 18:24:20 UTC
It appears that this bug may already be recognized upstream in the kernel
bugzilla:
https://bugzilla.kernel.org/show_bug.cgi?id=106611

This report fingers git commit 8d0b94afdca84598912347e61defa846a0988d04.
I have not tested this myself.

Comment 8 Martin KaFai Lau 2015-11-08 09:38:01 UTC
Thanks for the report.  I manage to report now.  I will work on a patch.

This one (gre over ipsec) should be different from the upstream  bug #106611 which should have been fixed in 4.3 by:
ebfa45f ipv6: Move common init code for rt6_info to a new function rt6_info_init()
0a1f596 ipv6: Initialize rt6_info properly in ip6_blackhole_route()

Comment 9 Martin KaFai Lau 2015-11-08 09:40:15 UTC
typo: I meant I managed to reproduce it now.

Comment 10 Chris Siebenmann 2015-11-20 15:50:00 UTC
For the record/people following this bug: this issue is not fixed in the
just-released Fedora 22 update kernel 4.2.6-200.fc22. It does seem to
be fixed by the upstream commit 0d3f6d297bfb7af24d0508460fdb3d1ec4903fa3 
that Martin KaFai Lau made.

Comment 11 Justin M. Forbes 2015-11-20 17:10:43 UTC
This patch is added in git and queued for the next F22/23 builds.

Comment 12 Fedora Update System 2015-11-25 15:18:14 UTC
kernel-4.2.6-301.fc23 has been submitted as an update to Fedora 23. https://bodhi.fedoraproject.org/updates/FEDORA-2015-f26dec73e9

Comment 13 Fedora Update System 2015-11-25 15:19:47 UTC
kernel-4.2.6-201.fc22 has been submitted as an update to Fedora 22. https://bodhi.fedoraproject.org/updates/FEDORA-2015-912d8e4998

Comment 14 Fedora Update System 2015-11-26 02:24:57 UTC
kernel-4.2.6-201.fc22 has been pushed to the Fedora 22 testing repository. If problems still persist, please make note of it in this bug report.
If you want to test the update, you can install it with
$ su -c 'dnf --enablerepo=updates-testing update kernel'
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2015-912d8e4998

Comment 15 Fedora Update System 2015-11-26 02:53:42 UTC
kernel-4.2.6-301.fc23 has been pushed to the Fedora 23 testing repository. If problems still persist, please make note of it in this bug report.
If you want to test the update, you can install it with
$ su -c 'dnf --enablerepo=updates-testing update kernel'
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2015-f26dec73e9

Comment 16 Fedora Update System 2015-11-26 20:55:09 UTC
kernel-4.2.6-301.fc23 has been pushed to the Fedora 23 stable repository. If problems still persist, please make note of it in this bug report.

Comment 17 Justin M. Forbes 2015-11-30 12:45:28 UTC
*** Bug 1286471 has been marked as a duplicate of this bug. ***

Comment 18 Fedora Update System 2015-11-30 23:21:37 UTC
kernel-4.2.6-201.fc22 has been pushed to the Fedora 22 stable repository. If problems still persist, please make note of it in this bug report.

Comment 19 Jaroslav Henner 2015-12-01 16:37:22 UTC
It seems like this is not 100% reproducible, but I was able to reproduce with 

#sudo ip tun add server mode gre remote `resolveip -s xxx.eng.bos.redhat.com` local A.B.C.D  ttl 255
#sudo firewall-cmd --direct --add-rule ipv4 filter INPUT 0 -p gre -j ACCEPT
success
#sudo ip a a 172.25.1.2/30 dev server 
#sudo ip l s server up

several times. I have just installed the newer kernel. I will watch it.