Bug 1568167

Summary: crypto aesni-intel aes(gcm) is broken for IPsec
Product: Red Hat Enterprise Linux 7 Reporter: Tuomo Soini <tis>
Component: kernelAssignee: Bruno Meneguele <brdeoliv>
kernel sub component: IPSec QA Contact: xmu
Status: CLOSED ERRATA Docs Contact:
Severity: urgent    
Priority: urgent CC: dhoward, johnny, network-qe, pwouters, rhel, salmy, sbroz, sdubroca, tis
Version: 7.5Keywords: ZStream
Target Milestone: rc   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: kernel-3.10.0-875.el7 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1570537 (view as bug list) Environment:
Last Closed: 2018-10-30 09:05:48 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1570537    
Attachments:
Description Flags
Fix for the issue
none
Config to test the issue. none

Description Tuomo Soini 2018-04-16 22:01:32 UTC
Created attachment 1422750 [details]
Fix for the issue

kernel-3.10.0-862.el7 doesn't work with Libreswan IPsec when connection has options ikev2=insist and no esp= set and when kernel supports aesni.

Symptoms: Normal ping over IPsec works. If you increase ping size to bigger, like ping -s 1400 packet flow ends and won't restore.


/proc/net/xfrm_stat shows following error:

XfrmInStateProtoError           397

After checking diff from kernel-3.10.0-693.21.1.el7 and kernel-3.10.0-862.el7 and verifying changes with upstream Linux kernel I found out a wrong change causing this issue. The change was to change ivsize from 12 to 16 for aes(gcm).

This breaks compatibility with previous kernels and upstream kernel and every other implementation of aes_gcm.

Note: even two machines with kernel-3.10.0-862.el7 are affected by this issue so even two machines with 862 kernel can't talk together with aes_gcm if aesni-intel kernel module is loaded.

This is really bad issue and requires immediate fix. This issue breaks all IPsec tunnels using ikev2 with default esp options.

I'm really worried how this affects us as Libreswan upstream when Centos 7 updates to 7.5 kernel.

Please make sure there is fixing kernel update before that.

Comment 3 Tuomo Soini 2018-04-16 22:23:45 UTC
My fix patch doesn't fix whole issue. Even with patch applies I can trigger same issue so there is something less obvious broken with aesni aes-gcm and IPsec.

Comment 4 Bruno Meneguele 2018-04-16 22:46:47 UTC
(In reply to Tuomo Soini from comment #3)
> My fix patch doesn't fix whole issue. Even with patch applies I can trigger
> same issue so there is something less obvious broken with aesni aes-gcm and
> IPsec.

Do you mean the patch changing ivsize back to 12? If that's the case, after applying, did see the same issue or was another? And have you reproduced in the same way: bigger ICMP packet size?

Comment 5 Bruno Meneguele 2018-04-16 22:55:14 UTC
Btw, could you also share the config options you've used to create the IPsec tunnel? Thus we can test it using almost the same environment.

Comment 6 Tuomo Soini 2018-04-17 05:51:01 UTC
Created attachment 1422909 [details]
Config to test the issue.

Comment 7 Tuomo Soini 2018-04-17 06:08:41 UTC
We tested with the iv size patch and it looked like it fixed the issue because I errorously used too small ping when testing patched kernel. So ivsize is not the only required fix for this.

VM on one end of the tunnel need to support aesni. And when you send big packets over tunnel traffic to both directions stop working. In our test case machine called "fi" has aes flag in /proc/cpuinfo

There is no esp=setting because aes_gcm is the default algo when ikev2=insist is used.

Any traffic which uses big packets (ping -s 1400), wget over tunnel, scp over tunnel etc causes immediate lockup of tunnel and visible error in

/proc/net/xfrm_stat

Also ping running on background when problem is triggered stops.

Originally we had hard time reproducing this because most of our testing vms don't have aes instruction set available.

Problem is also specific to aes_gcm, for example setting esp=aes128-sha2_512 works around the issue.

Triggering problem doesn't require ipv6 - but I created config as near to original one as possible.

Comment 8 Bruno Meneguele 2018-04-18 17:59:03 UTC
Hi Tuomo,

just as a matter of logging the results, Paul Wouters informed us through email that you've tested the patch proposed by Sabrina Dubroca and the issue was solved and it wasn't related to the actual ivsize as you first thought (comment#7), is it right?

Comment 9 Tuomo Soini 2018-04-18 18:23:54 UTC
Correct. While I did not test with original ivsize 16.

We also tested that aes_gcm128 was not affected by the issue.

Comment 10 Bruno Meneguele 2018-04-18 20:37:47 UTC
(In reply to Tuomo Soini from comment #9)
> Correct. While I did not test with original ivsize 16.
> 
> We also tested that aes_gcm128 was not affected by the issue.

Right, well, the ivsize should not affect the results in this case.
Thanks!

Comment 13 Bruno Meneguele 2018-04-22 20:36:04 UTC
Patch(es) committed on kernel repository and an interim kernel build is undergoing testing

Comment 16 Bruno Meneguele 2018-04-23 13:51:44 UTC
Patch(es) available on kernel-3.10.0-875.el7

Comment 21 errata-xmlrpc 2018-10-30 09:05:48 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:3083