Bug 1461450

Summary: Corosync hangs on secauth with FIPS enabled
Product: Red Hat Enterprise Linux 7 Reporter: Radek Steiger <rsteiger>
Component: corosyncAssignee: Jan Friesse <jfriesse>
Status: CLOSED ERRATA QA Contact: cluster-qe <cluster-qe>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 7.4CC: ccaulfie, cfeist, cluster-maint, jruemker, mnovacek, nbarcet
Target Milestone: rcKeywords: ZStream
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: corosync-2.4.0-10.el7 Doc Type: Bug Fix
Doc Text:
Previously, when the corosync service had encryption enabled and was running in an environment with FIPS kernel mode activated, corosync terminated unexpectedly after starting. A patch has been applied to load a symmetric key that works when FIPS kernel mode is activated, and the described problem no longer occurs.
Story Points: ---
Clone Of:
: 1484264 (view as bug list) Environment:
Last Closed: 2018-04-10 16:52:19 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1484264    
Attachments:
Description Flags
Propagate error from totemcrypto layer to upper layers
none
totemcrypto: Refactor symmetric key importing
none
totemcrypto: Use different method to import key
none
Fix compiler warnings none

Description Radek Steiger 2017-06-14 13:14:42 UTC
> Description of problem:

With FIPS kernel mode on the corosync process hangs with 100% CPU usage when secauth encryption is enabled. 

Strace will show thousands of these per second:

[pid 10471] write(7, "\17\0\0\0", 4)    = -1 EAGAIN (Resource temporarily unavailable)
[pid 10471] write(7, "\17\0\0\0", 4)    = -1 EAGAIN (Resource temporarily unavailable)
[pid 10471] write(7, "\17\0\0\0", 4)    = -1 EAGAIN (Resource temporarily unavailable)

From corosync.log:

[32738] host-033.virt.lab.msp.redhat.com corosyncnotice  [MAIN  ] Corosync Cluster Engine ('2.4.0'): started and ready to provide service.
[32738] host-033.virt.lab.msp.redhat.com corosyncinfo    [MAIN  ] Corosync built-in features: dbus systemd xmlconf qdevices qnetd snmp pie relro bindnow
[32738] host-033.virt.lab.msp.redhat.com corosyncnotice  [TOTEM ] Initializing transport (UDP/IP Unicast).
[32738] host-033.virt.lab.msp.redhat.com corosyncnotice  [TOTEM ] Initializing transmit/receive security (NSS) crypto: aes256 hash: sha1
[32738] host-033.virt.lab.msp.redhat.com corosyncalert   [TOTEM ] Failure to import key into NSS (err -8190)


> Version-Release number of selected component (if applicable):

corosync-2.4.0-9.el7.x86_64


> How reproducible:

Always


> Steps to Reproduce:

1. Enable FIPS (see bug 1334806 comment 2)
2. Set up cluster with pcs encryption=1
    (or alternatively create an authkey && enable secauth in corosync.conf)
3. Start the cluster


> Actual results:

Cluster not running, 100% CPU load on corosync process.


> Expected results:

Cluster running normally.

Comment 2 Jan Friesse 2017-06-14 15:05:22 UTC
Created attachment 1287687 [details]
Propagate error from totemcrypto layer to upper layers

Comment 3 Jan Friesse 2017-06-14 15:08:45 UTC
Nice catch. Fixing 100% CPU load/coredump is easy (see proposed patch). Fixing corosync to work in FIPS environment with encryption enabled seems to be much harder. It looks like FIPS really doesn't support importing symmetric keys.

Comment 4 John Ruemker 2017-06-19 15:17:19 UTC
I'll work on documenting broken FIPS mode with current/prior releases.  

I guess it makes sense to treat this like a regular bug (for knowledgebase purposes at least) rather than a support policy/limitation, as treating it like a bug would allow us some room to request backporting to EUS streams if any customers hit this.  That's opposed to stating something like "Red Hat supports FIPS mode for corosync starting with corosync-v-r.el7", as that might limit customers or support engineers from trying to find solutions for past releases.

Comment 5 Jan Friesse 2017-06-23 09:48:51 UTC
Created attachment 1290952 [details]
totemcrypto: Refactor symmetric key importing

totemcrypto: Refactor symmetric key importing

Signed-off-by: Jan Friesse <jfriesse>
Reviewed-by: Fabio M. Di Nitto <fdinitto>
Reviewed-by: Christine Caulfield <ccaulfie>

Comment 6 Jan Friesse 2017-06-23 09:48:57 UTC
Created attachment 1290953 [details]
totemcrypto: Use different method to import key

totemcrypto: Use different method to import key

PK11_ImportSymKey doesn't work when FIPS is enabled because NSS is
targeting to FIPS Level 2 where loading of unencrypted symmetric
key is prohibited.

FIPS Level 2 is hard to achieve without breaking compatibility so patch
implements "workaround" to make NSS behave like FIPS Level 1
(where is allowed to load unencrypted symmetric key).

Workaround is about using temporal key to encrypt corosync authkey in
memory and then to unwrap it into valid NSS key.

Signed-off-by: Jan Friesse <jfriesse>
Reviewed-by: Fabio M. Di Nitto <fdinitto>
Reviewed-by: Christine Caulfield <ccaulfie>

Comment 7 Jan Friesse 2017-06-23 09:53:31 UTC
"Unit test" is https://github.com/corosync/corosync/pull/224

I've also tested qnetd + qdevice-net behavior when FIPS enabled and everything was working as expected.

Comment 8 Jan Friesse 2017-08-01 12:48:07 UTC
Created attachment 1307553 [details]
Fix compiler warnings

Comment 13 Jan Friesse 2017-08-22 14:06:29 UTC
@John:
copy/pasting Tomáš Mráz response when we were trying to find out RHEL support of FIPS:

> Tomas, if we want to be FIPS 140-2 Level 2 certified is there any way
> for
> us to have a shared key between all nodes in a cluster that survives
> reboots (without the user entering a passphrase)?
>
> Do you know if we (RHEL HA) need to be FIPS 140-2 Level 2 certified
> (or who
> would be able answer that question within Red Hat)?

No, FIPS 140-2 Level 2 validation is unnecessary and basically you
cannot achieve it currently. We do not do FIPS validation at Level 2 at
all.


So we should be straight there and not announce level-2.

Comment 17 michal novacek 2017-12-13 15:06:23 UTC
I have verified that our minimal regression tests pass with 'fips=1' enabled and corosync-2.4.3-1.

Comment 20 errata-xmlrpc 2018-04-10 16:52:19 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:0920