1165821 – pcs CLI/GUI should be capable of setting up a hardened cluster (for not entirely trusted environment)

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1165821 - pcs CLI/GUI should be capable of setting up a hardened cluster (for not entirely trusted environment)

Summary: pcs CLI/GUI should be capable of setting up a hardened cluster (for not entir...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 7
Classification:	Red Hat
Component:	pcs
Sub Component:
Version:	7.1
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	unspecified
Target Milestone:	rc
Target Release:	---
Assignee:	Ivan Devat
QA Contact:	cluster-qe@redhat.com
Docs Contact:	Steven J. Levine
URL:
Whiteboard:
Duplicates (1):	1455919 (view as bug list)
Depends On:
Blocks:	1455919
TreeView+	depends on / blocked

Reported:	2014-11-19 18:36 UTC by Jan Pokorný [poki]
Modified:	2017-08-02 07:00 UTC (History)
CC List:	9 users (show)
Fixed In Version:	pcs-0.9.158-4.el7
Doc Type:	Release Note
Doc Text:	pcs now provides the ability to set up a cluster with encrypted corosync communication The `pcs cluster setup` command now supports a new `--encryption` flag that controls the setting of corosync encryption in a cluster. This allows users to set up a cluster with encrypted corosync communication in a not entirely trusted environment.
Clone Of:
Clones:	1455919 (view as bug list)
Environment:
Last Closed:	2017-08-01 18:22:57 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
proposed fix (7.12 KB, patch) 2017-05-23 11:13 UTC, Ivan Devat	no flags	Details \| Diff
proposed fix - backup and restore keys (6.00 KB, patch) 2017-05-23 11:34 UTC, Tomas Jelinek	no flags	Details \| Diff
proposed fix (part3) (14.33 KB, patch) 2017-05-31 12:39 UTC, Ivan Devat	no flags	Details \| Diff
proposed fix 4 (11.62 KB, patch) 2017-06-01 12:56 UTC, Tomas Jelinek	no flags	Details \| Diff
proposed fix (part5) (15.56 KB, patch) 2017-06-06 10:01 UTC, Ivan Devat	no flags	Details \| Diff
View All

Links
System	ID	Priority	Status	Summary	Last Updated
Red Hat Bugzilla	1173346	medium	CLOSED	Allow to modify arbitrary corosync.conf value	2021-06-10 10:49:13 UTC
Red Hat Bugzilla	1457314	high	CLOSED	[RFE] Add commands for enabling and disabling cluster hardening in existing clusters	2023-04-28 11:48:03 UTC
Red Hat Product Errata	RHBA-2017:1958	normal	SHIPPED_LIVE	pcs bug fix and enhancement update	2017-08-01 18:09:47 UTC

Internal Links: 1173346 1457314

Description Jan Pokorný [poki] 2014-11-19 18:36:26 UTC

Most notable sign of a hardened cluster is likely the encrypted messaging
(i.e., corosync) layer, which would probably mean /etc/corosync/authkey
setup and distribution and aligning corosync.conf with intention to use
the encryption.

Also, if there would be some persisted cluster-wide flag denoting the cluster
should be purposefully hardened, various properties could be checked during
the run-time (is the channel towards the fence devices supporting encryption
really configured to that effect? etc.).

Comment 3 Jan Pokorný [poki] 2014-11-20 18:07:11 UTC

BTW. the "persisted cluster-wide flag" I mentioned could
actually be implied by non-presence of "secauth: off" in corosync
configuration (and/or encryption not disabled by other means
like "crypto_{hash,type}: none").

Comment 4 Jan Pokorný [poki] 2014-11-20 19:19:42 UTC

Note wrt. backup/restore logic (at least global, nonlocal):
it should be perfectly fine not to backup /etc/corosync/authkey,
but rather, on restore, check if authentication is needed per
corosync configuration (and that authkey is not part of the tarball)
and generate and distribute that file anew.

Comment 8 Tomas Jelinek 2016-06-23 07:19:48 UTC

See also the github issue https://github.com/ClusterLabs/pcs/issues/98

Comment 13 Tomas Jelinek 2017-05-15 11:52:47 UTC

We are going to implement enabling corosync encryption and distributing corosync authkey. This will be enabled by default and pcs will not offer an option to disable the encryption.

For other hardening related issues please file separate bzs.


(In reply to Jan Pokorný from comment #4)
> Note wrt. backup/restore logic (at least global, nonlocal):
> it should be perfectly fine not to backup /etc/corosync/authkey,
> but rather, on restore, check if authentication is needed per
> corosync configuration (and that authkey is not part of the tarball)
> and generate and distribute that file anew.

When one of nodes is unreachable during the restore procedure, the user can restore cluster configuration on that node later using the same tarball. For the node to be able to connect to the cluster it must have the same key as the rest of the cluster. Therefore we are going to include the authkeys (both corosync and pacemaker) in the tarball.

Comment 14 Ivan Devat 2017-05-23 11:13:48 UTC

Created attachment 1281458 [details]
proposed fix

Comment 15 Tomas Jelinek 2017-05-23 11:34:35 UTC

Created attachment 1281494 [details]
proposed fix - backup and restore keys

Comment 16 Jan Pokorný [poki] 2017-05-26 12:34:50 UTC

re [comment 13]

> We are going to implement enabling corosync encryption and
> distributing corosync authkey. This will be enabled by default
> and pcs will not offer an option to disable the encryption.

This is quite a substantial change (well, a complete twist) in behavior
that needs to be spelled out prominently!

Note I really wasn't asking for that without ability to stick with
the old one.

> For other hardening related issues please file separate bzs.

Note that some of the configuration bits I had in mind will be solved
with a resolution of [bug 1173346] for which I'd like to kindly ask
for a priority elevation.

Comment 17 Tomas Jelinek 2017-05-26 12:58:07 UTC

After fix:

[root@rh73-node1:~]# rpm -q pcs
pcs-0.9.158-2.el7.x86_64

> the authkey is created and distributed automatically

[root@rh73-node1:~]# ls -l /etc/corosync/authkey
ls: cannot access /etc/corosync/authkey: No such file or directory
[root@rh73-node1:~]# pcs cluster setup --name rhel73 rh73-node1 rh73-node2
Destroying cluster on nodes: rh73-node1, rh73-node2...
rh73-node2: Stopping Cluster (pacemaker)...
rh73-node1: Stopping Cluster (pacemaker)...
rh73-node2: Successfully destroyed cluster
rh73-node1: Successfully destroyed cluster

Sending 'corosync authkey', 'pacemaker_remote authkey' to 'rh73-node1', 'rh73-node2'
rh73-node1: successful distribution of the file 'corosync authkey'
rh73-node1: successful distribution of the file 'pacemaker_remote authkey'
rh73-node2: successful distribution of the file 'corosync authkey'
rh73-node2: successful distribution of the file 'pacemaker_remote authkey'
Sending cluster config files to the nodes...
rh73-node1: Succeeded
rh73-node2: Succeeded

Synchronizing pcsd certificates on nodes rh73-node1, rh73-node2...
rh73-node1: Success
rh73-node2: Success
Restarting pcsd on the nodes in order to reload the certificates...
rh73-node1: Success
rh73-node2: Success
[root@rh73-node1:~]# ls -l /etc/corosync/authkey
-r--------. 1 root root 256 May 26 14:46 /etc/corosync/authkey

> after cluster start, corosync reports in its log that traffic encryption has been enabled

[26801] rh73-node1 corosyncnotice  [TOTEM ] Initializing transmit/receive security (NSS) crypto: aes256 hash: sha1

> the authkey is part of backup / restore procedure

[root@rh73-node1:~]# cat /etc/corosync/authkey
c92c34c422808663f15bfc811faf12f84984bc270fc846f31601431d1c47ae3f9658b4a9ac55194503a039f145f6293e90f414d7ff78e54956f3a140c12fe00c8e5441d9ab73d0aeaf88f5d822098f93c8f591f94e27c16aa8626efa5017461d39e4f9cfddf16648636a465110813760044e4de3ee33f33dfdbfa464ce02cc22[root@rh73-node1:~]# 
[root@rh73-node1:~]# cat /etc/pacemaker/authkey
eca4d1834c7619b535a19432e90c3107fa3e9a82d06a513ff33ae32d701d00d4[root@rh73-node1:~]# 

[root@rh73-node1:~]# pcs config backup cluster.tar.bz2
[root@rh73-node1:~]# tar -tf cluster.tar.bz2 | grep authkey
corosync_authkey
pacemaker_authkey

[root@rh73-node1:~]# pcs cluster destroy --all
rh73-node1: Stopping Cluster (pacemaker)...
rh73-node2: Stopping Cluster (pacemaker)...
rh73-node2: Successfully destroyed cluster
rh73-node1: Successfully destroyed cluster
[root@rh73-node1:~]# cat /etc/corosync/authkey 
cat: /etc/corosync/authkey: No such file or directory
[root@rh73-node1:~]# cat /etc/pacemaker/authkey 
cat: /etc/pacemaker/authkey: No such file or directory

[root@rh73-node1:~]# pcs config restore cluster.tar.bz2
rh73-node1: Succeeded
rh73-node2: Succeeded
[root@rh73-node1:~]# cat /etc/corosync/authkey
c92c34c422808663f15bfc811faf12f84984bc270fc846f31601431d1c47ae3f9658b4a9ac55194503a039f145f6293e90f414d7ff78e54956f3a140c12fe00c8e5441d9ab73d0aeaf88f5d822098f93c8f591f94e27c16aa8626efa5017461d39e4f9cfddf16648636a465110813760044e4de3ee33f33dfdbfa464ce02cc22[root@rh73-node1:~]# 
[root@rh73-node1:~]# cat /etc/pacemaker/authkey
eca4d1834c7619b535a19432e90c3107fa3e9a82d06a513ff33ae32d701d00d4[root@rh73-node1:~]#

Comment 19 Jan Friesse 2017-05-26 14:48:00 UTC

Is there any reason to reduce key length/complexity by using just hex instead full byte? Corosync reads just first 128 bytes so pcs generated authkey is then not 1024-bits strong (as expected).

Comment 20 Ivan Devat 2017-05-29 07:34:50 UTC

I did not know that corosync reads just first 128 bytes from the key. I'm going to fix it in the next build.

Comment 21 Jan Friesse 2017-05-29 08:22:19 UTC

@Ivan: Ok, sounds good, thanks.

Also what exactly is the reason to force encryption without ability to disable it? I mean, to have encryption enabled by default is for sure the good thing, but encryption uses some cpu. And because corosync is single threaded, encryption can negatively influence resulting throughput. This is not a problem for "pacemaker only" clusters, but may become problem for gfs/clvm.

Also what if somebody wants to turn on the encryption on "old" (pre 7.4) cluster? They will be forced to "rebuild" the cluster?

What exactly is the problem? It's really about adding/removing one variable in corosync.conf (secauth). Authkey can be distributed even when encryption is disabled.

Comment 22 Tomas Jelinek 2017-05-29 08:55:05 UTC

(In reply to Jan Friesse from comment #21)
> @Ivan: Ok, sounds good, thanks.
> 
> Also what exactly is the reason to force encryption without ability to
> disable it? I mean, to have encryption enabled by default is for sure the
> good thing, but encryption uses some cpu. And because corosync is single
> threaded, encryption can negatively influence resulting throughput. This is
> not a problem for "pacemaker only" clusters, but may become problem for
> gfs/clvm.

That is because of me. I had an impression, apparently wrong, that we want to have corosync encryption enabled always. We will add an option to disable the encryption to the cluster setup command.

> 
> Also what if somebody wants to turn on the encryption on "old" (pre 7.4)
> cluster? They will be forced to "rebuild" the cluster?
> 
> What exactly is the problem? It's really about adding/removing one variable
> in corosync.conf (secauth). Authkey can be distributed even when encryption
> is disabled.

We are running out of time for 7.4. So we have been planning to add commands for enabling and disabling the encryption in an existing cluster in 7.5 timeframe.

Comment 23 Tomas Jelinek 2017-05-29 08:57:26 UTC

*** Bug 1455919 has been marked as a duplicate of this bug. ***

Comment 25 Tomas Jelinek 2017-05-31 07:28:43 UTC

Summary of comment 24:

When enabling encryption in an existing cluster, we want to remove all secauth, crypto_cipher and crypto_hash directives from corosnyc.conf. When these are not set, the encryption is enabled by default in corosync. Of course the authkey needs to be distributed as well.

Comment 26 Ivan Devat 2017-05-31 12:39:10 UTC

Created attachment 1283766 [details]
proposed fix (part3)

Comment 27 Ivan Devat 2017-05-31 12:40:20 UTC

After Fix:

[vm-rhel72-1 ~] $ rpm -q pcs
pcs-0.9.158-3.el7.x86_64

[vm-rhel72-1 ~] $ pcs cluster setup --name=devcluster vm-rhel72-1 vm-rhel72-3 --start --no-hardened
Destroying cluster on nodes: vm-rhel72-1, vm-rhel72-3...
vm-rhel72-1: Stopping Cluster (pacemaker)...
vm-rhel72-3: Stopping Cluster (pacemaker)...
vm-rhel72-3: Successfully destroyed cluster
vm-rhel72-1: Successfully destroyed cluster

Sending 'pacemaker_remote authkey' to 'vm-rhel72-1', 'vm-rhel72-3'
vm-rhel72-3: successful distribution of the file 'pacemaker_remote authkey'
vm-rhel72-1: successful distribution of the file 'pacemaker_remote authkey'
Sending cluster config files to the nodes...
vm-rhel72-1: Succeeded
vm-rhel72-3: Succeeded

Starting cluster on nodes: vm-rhel72-1, vm-rhel72-3...
vm-rhel72-1: Starting Cluster...
vm-rhel72-3: Starting Cluster...

Synchronizing pcsd certificates on nodes vm-rhel72-1, vm-rhel72-3...
vm-rhel72-3: Success
vm-rhel72-1: Success
Restarting pcsd on the nodes in order to reload the certificates...
vm-rhel72-3: Success
vm-rhel72-1: Success

[vm-rhel72-1 ~] $ ls -l /etc/corosync/a*
zsh: no matches found: /etc/corosync/a*

[vm-rhel72-1 ~] $ cat /var/log/cluster/corosync.log|grep security
[14010] vm-rhel72-1 corosyncnotice  [TOTEM ] Initializing transmit/receive security (NSS) crypto: none hash: none

Comment 28 Tomas Jelinek 2017-05-31 13:59:06 UTC

We want the encryption to be disabled by default in RHEL7.4.

Comment 29 Chris Feist 2017-05-31 14:25:06 UTC

The option --no-hardened sounds a little awkward.

I'd propose --with-encryption and --no-encryption (or something similar).

Thanks,
Chris

Comment 30 Jan Friesse 2017-06-01 05:38:28 UTC

One question/proposal. From testcase (line ls -l /etc/corosync/a*) it looks like when no-hardened is enabled, authkey is not distributed. I would recommend to always distribute authkey, because it may make future enable/disable operation easier (just changing corosync.conf).

Comment 31 Tomas Jelinek 2017-06-01 12:56:49 UTC

Created attachment 1284135 [details]
proposed fix 4

Comment 33 Ivan Devat 2017-06-06 09:54:22 UTC

(In reply to Jan Friesse from comment #30)
> One question/proposal. From testcase (line ls -l /etc/corosync/a*) it looks
> like when no-hardened is enabled, authkey is not distributed. I would
> recommend to always distribute authkey, because it may make future
> enable/disable operation easier (just changing corosync.conf).

Unfortunately in the future enable/disable operation we cannot make an assumption that authkey exists. So we will need to distribute it anyways.

Comment 34 Ivan Devat 2017-06-06 10:01:14 UTC

Created attachment 1285324 [details]
proposed fix (part5)

Comment 35 Ivan Devat 2017-06-06 10:58:47 UTC

After Fix:
[vm-rhel72-1 ~] $ rpm -q pcs
pcs-0.9.158-4.el7.x86_64

> 1) implicitly without encryption

[vm-rhel72-1 ~] $ pcs cluster setup --name=devcluster vm-rhel72-1 vm-rhel72-3 --start
Destroying cluster on nodes: vm-rhel72-1, vm-rhel72-3...
vm-rhel72-1: Stopping Cluster (pacemaker)...
vm-rhel72-3: Stopping Cluster (pacemaker)...
vm-rhel72-1: Successfully destroyed cluster
vm-rhel72-3: Successfully destroyed cluster

Sending 'pacemaker_remote authkey' to 'vm-rhel72-1', 'vm-rhel72-3'
vm-rhel72-3: successful distribution of the file 'pacemaker_remote authkey'
vm-rhel72-1: successful distribution of the file 'pacemaker_remote authkey'
Sending cluster config files to the nodes...
vm-rhel72-1: Succeeded
vm-rhel72-3: Succeeded

Starting cluster on nodes: vm-rhel72-1, vm-rhel72-3...
vm-rhel72-1: Starting Cluster...
vm-rhel72-3: Starting Cluster...

Synchronizing pcsd certificates on nodes vm-rhel72-1, vm-rhel72-3...
vm-rhel72-3: Success
vm-rhel72-1: Success
Restarting pcsd on the nodes in order to reload the certificates...
vm-rhel72-1: Success
vm-rhel72-3: Success

[vm-rhel72-1 ~] $ cat /var/log/cluster/corosync.log|grep security
[8322] vm-rhel72-1 corosyncnotice  [TOTEM ] Initializing transmit/receive security (NSS) crypto: none hash: none

[vm-rhel72-1 ~] $ ls -l /etc/corosync/a*
zsh: no matches found: /etc/corosync/a*

> 2) explicitly with encryption

[vm-rhel72-1 ~] $ pcs cluster setup --name=devcluster vm-rhel72-1 vm-rhel72-3 --start --encryption=1
Destroying cluster on nodes: vm-rhel72-1, vm-rhel72-3...
vm-rhel72-1: Stopping Cluster (pacemaker)...
vm-rhel72-3: Stopping Cluster (pacemaker)...
vm-rhel72-1: Successfully destroyed cluster
vm-rhel72-3: Successfully destroyed cluster

Sending 'corosync authkey', 'pacemaker_remote authkey' to 'vm-rhel72-1', 'vm-rhel72-3'
vm-rhel72-3: successful distribution of the file 'corosync authkey'
vm-rhel72-3: successful distribution of the file 'pacemaker_remote authkey'
vm-rhel72-1: successful distribution of the file 'corosync authkey'
vm-rhel72-1: successful distribution of the file 'pacemaker_remote authkey'
Sending cluster config files to the nodes...
vm-rhel72-1: Succeeded
vm-rhel72-3: Succeeded

Starting cluster on nodes: vm-rhel72-1, vm-rhel72-3...
vm-rhel72-1: Starting Cluster...
vm-rhel72-3: Starting Cluster...

Synchronizing pcsd certificates on nodes vm-rhel72-1, vm-rhel72-3...
vm-rhel72-3: Success
vm-rhel72-1: Success
Restarting pcsd on the nodes in order to reload the certificates...
vm-rhel72-1: Success
vm-rhel72-3: Success

[vm-rhel72-1 ~] $ cat /var/log/cluster/corosync.log|grep security
[15309] vm-rhel72-1 corosyncnotice  [TOTEM ] Initializing transmit/receive security (NSS) crypto: aes256 hash: sha1

[vm-rhel72-1 ~] $ ls -l /etc/corosync/a*
-r--------. 1 root root 128  6. čen 12.47 /etc/corosync/authkey

> authkey is not an ASCII text

[vm-rhel72-1 ~] $ file /etc/corosync/authkey
/etc/corosync/authkey: data

Comment 44 errata-xmlrpc 2017-08-01 18:22:57 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:1958

Note You need to log in before you can comment on or make changes to this bug.