Bug 1855303
Summary: | [RFE] Support for reload of crypto configuration | |||
---|---|---|---|---|
Product: | Red Hat Enterprise Linux 8 | Reporter: | Jan Friesse <jfriesse> | |
Component: | corosync | Assignee: | Jan Friesse <jfriesse> | |
Status: | CLOSED ERRATA | QA Contact: | cluster-qe <cluster-qe> | |
Severity: | unspecified | Docs Contact: | ||
Priority: | unspecified | |||
Version: | 8.3 | CC: | ccaulfie, cluster-maint, phagara, slevine | |
Target Milestone: | rc | Keywords: | FutureFeature | |
Target Release: | 8.0 | Flags: | pm-rhel:
mirror+
|
|
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | ||||
Fixed In Version: | corosync-3.1.0-1.el8 | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 1856397 (view as bug list) | Environment: | ||
Last Closed: | 2021-05-18 15:26:09 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | 1855293, 1855301 | |||
Bug Blocks: | 1856397 |
Description
Jan Friesse
2020-07-09 13:54:50 UTC
Just to clarify - change of crypto parameters is done by changing corosync.conf (or authkey) - as any other change in corosync.conf (so no extra call is needed). After all nodes has updated corosync.conf (or authkey) on place, then just issue `corosync-cfgtool -R` on one (and only one) of the node - so same as before. Basically this feature just allows changing more options on fly. to be tested together with bz#1855301, which adds config reload support to kronosnet @slevine: One super important thing for you to document. It's going to be in the release notes for upstream but I think we should mention it also in release notes for 8.4 HA. When user updates corosync (package) and some of the nodes have old version of corosync and some of the nodes have a new version of corosoync and user changes crypto key and/or options and triggers configuration reload then nodes cluster will split into 2 partitions - one with old corosync and one with new corosync. This is really happening only when some nodes runs old (3.0.3) and some new (3.1.0) version of corosync. So if all nodes runs either new or old corosync it works just fine. Same happens when reload of configuration is triggered but no change in crypto key or options was made. Chrissie wrote generic and very true warning to corosync man pages: Running corosync-cfgtool -R where nodes are running different versions of corosync (including minor versions) is unsupported and may result in undefined behaviour. So you may consider adding something similar to our docs too. @jfriesse: Would this fall into the release note category of "Known Issue"? If so we can set that as the doc value and this will get put on the release note list. In the meantime I have added this to the Release Notes section of my 8.4 doc plan: https://docs.engineering.redhat.com/display/RHELPLAN/Documentation+Plan+for+RHEL+HA+Add-On+for+8.4 @slevine: Not really sure. I mean, it is not really "Issue" (= nothing we would like to or could fix). It is how it works. But it it is definitively "heads-up" area. @jfriesse : Should we mention this feature in general in the 8.4 release notes as a new HA feature? If so, we can just be sure we include the warning when we note the feature. My question is whether this is a user-level feature to note. @slevine : You mean ability to reload crypto configuration? Yes, it is definitively big feature so mentioning it in release notes would be very handy. The question how much it is user-level feature is a bit tricky. pcs will be responsible for changing corosync.conf/regenerate authkey/sync, but this wouldn't be possible without corosync support. So at the end of the day, this feature will allow user to change few more options (crypto cipher/hash and key) at runtime. Hopefully I've answered your question (if not, then please don't hesitate to ask more ;) ) > [root@virt-248 ~]# rpm -q pacemaker libknet1 > pacemaker-2.0.5-2.el8.x86_64 > libknet1-1.18-1.el8.x86_64 > [root@virt-248 ~]# corosync-cfgtool -s > Printing link status. > Local node ID 1 > LINK ID 0 > addr = 2620:52:0:25a4:1800:ff:fe00:f8 > status: > nodeid 1: localhost > nodeid 2: connected > nodeid 3: connected > nodeid 4: connected > nodeid 5: connected > [root@virt-248 ~]# corosync-cmapctl totem.crypto_cipher totem.crypto_hash > totem.crypto_cipher (str) = aes256 > totem.crypto_hash (str) = sha256 > [root@virt-248 ~]# OTHER_NODES="virt-249 virt-250 virt-251 virt-252" Downgrading from aes256/sha256 to none/none: > [root@virt-248 ~]# sed -Ei 's/(crypto_hash: ).*/\1none/' /etc/corosync/corosync.conf > [root@virt-248 ~]# sed -Ei 's/(crypto_cipher: ).*/\1none/' /etc/corosync/corosync.conf > [root@virt-248 ~]# for node in $OTHER_NODES; do scp /etc/corosync/corosync.conf $node:/etc/corosync/; done > corosync.conf 100% 733 1.2MB/s 00:00 > corosync.conf 100% 733 1.1MB/s 00:00 > corosync.conf 100% 733 909.3KB/s 00:00 > corosync.conf 100% 733 1.1MB/s 00:00 > [root@virt-248 ~]# corosync-cfgtool -R > Reloading corosync.conf... > Done > [root@virt-248 ~]# tail -f /var/log/cluster/corosync.log > Nov 09 11:37:23 [50162] virt-248 corosync notice [CFG ] Config reload requested by node 1 > Nov 09 11:37:23 [50162] virt-248 corosync info [TOTEM ] Configuring link 0 > Nov 09 11:37:23 [50162] virt-248 corosync info [TOTEM ] Configured link number 0: local addr: 2620:52:0:25a4:1800:ff:fe00:f8, port=5405 > Nov 09 11:37:23 [50162] virt-248 corosync info [TOTEM ] kronosnet crypto reconfigured on index 2: nss/none/none > Nov 09 11:37:23 [50162] virt-248 corosync error [KNET ] nsscrypto: Digest does not match > Nov 09 11:37:23 [50162] virt-248 corosync error [KNET ] nsscrypto: Digest does not match > Nov 09 11:37:23 [50162] virt-248 corosync error [KNET ] nsscrypto: Digest does not match > Nov 09 11:37:23 [50162] virt-248 corosync error [KNET ] nsscrypto: Digest does not match > Nov 09 11:37:23 [50162] virt-248 corosync error [KNET ] nsscrypto: Digest does not match > Nov 09 11:37:23 [50162] virt-248 corosync error [KNET ] nsscrypto: Digest does not match > Nov 09 11:37:23 [50162] virt-248 corosync error [KNET ] nsscrypto: Digest does not match > Nov 09 11:37:23 [50162] virt-248 corosync info [KNET ] pmtud: PMTUD link change for host: 5 link: 0 from 1381 to 1426 > Nov 09 11:37:23 [50162] virt-248 corosync info [KNET ] pmtud: PMTUD link change for host: 4 link: 0 from 1381 to 1426 > Nov 09 11:37:23 [50162] virt-248 corosync info [KNET ] pmtud: PMTUD link change for host: 3 link: 0 from 1381 to 1426 > Nov 09 11:37:23 [50162] virt-248 corosync info [KNET ] pmtud: PMTUD link change for host: 2 link: 0 from 1381 to 1426 > Nov 09 11:37:23 [50162] virt-248 corosync info [KNET ] pmtud: Global data MTU changed to: 1426 Knet crypto successfully disabled, cluster communication resumes, no fencing, all nodes report active connection to all others (`corosync-cfgtool -s`, left out for the sake of brevity) with the desired crypto config (`corosync-cmapctl totem.crypto_cipher totem.crypto_hash`, left out for the sake of brevity). Upgrading from none/none to aes128/sha1: > [root@virt-248 ~]# sed -Ei 's/(crypto_cipher: ).*/\1aes128/' /etc/corosync/corosync.conf > [root@virt-248 ~]# sed -Ei 's/(crypto_hash: ).*/\1sha1/' /etc/corosync/corosync.conf > [root@virt-248 ~]# for node in $OTHER_NODES; do scp /etc/corosync/corosync.conf $node:/etc/corosync/; done > corosync.conf 100% 735 758.2KB/s 00:00 > corosync.conf 100% 735 742.1KB/s 00:00 > corosync.conf 100% 735 703.7KB/s 00:00 > corosync.conf 100% 735 799.9KB/s 00:00 > [root@virt-248 ~]# corosync-cfgtool -R > Reloading corosync.conf... > Done > [root@virt-248 ~]# tail -f /var/log/cluster/corosync.log > Nov 09 11:41:35 [50162] virt-248 corosync notice [CFG ] Config reload requested by node 1 > Nov 09 11:41:35 [50162] virt-248 corosync info [TOTEM ] Configuring link 0 > Nov 09 11:41:35 [50162] virt-248 corosync info [TOTEM ] Configured link number 0: local addr: 2620:52:0:25a4:1800:ff:fe00:f8, port=5405 > Nov 09 11:41:35 [50162] virt-248 corosync info [TOTEM ] kronosnet crypto reconfigured on index 1: nss/aes128/sha1 > Nov 09 11:41:35 [50162] virt-248 corosync error [KNET ] nsscrypto: Digest does not match > Nov 09 11:41:35 [50162] virt-248 corosync error [KNET ] nsscrypto: Digest does not match > Nov 09 11:41:36 [50162] virt-248 corosync info [KNET ] pmtud: PMTUD link change for host: 5 link: 0 from 1426 to 1381 > Nov 09 11:41:36 [50162] virt-248 corosync info [KNET ] pmtud: PMTUD link change for host: 4 link: 0 from 1426 to 1381 > Nov 09 11:41:36 [50162] virt-248 corosync info [KNET ] pmtud: PMTUD link change for host: 3 link: 0 from 1426 to 1381 > Nov 09 11:41:36 [50162] virt-248 corosync info [KNET ] pmtud: PMTUD link change for host: 2 link: 0 from 1426 to 1381 > Nov 09 11:41:36 [50162] virt-248 corosync info [KNET ] pmtud: Global data MTU changed to: 1381 Knet crypto successfully enabled, cluster communication resumes, no fencing, all nodes report active connection to all others (`corosync-cfgtool -s`, left out for the sake of brevity) with the desired crypto config (`corosync-cmapctl totem.crypto_cipher totem.crypto_hash`, left out for the sake of brevity). Upgrading from aes128/sha1 to aes256/sha512: > [root@virt-248 ~]# sed -Ei 's/(crypto_hash: ).*/\1sha512/' /etc/corosync/corosync.conf > [root@virt-248 ~]# sed -Ei 's/(crypto_cipher: ).*/\1aes256/' /etc/corosync/corosync.conf > [root@virt-248 ~]# for node in $OTHER_NODES; do scp /etc/corosync/corosync.conf $node:/etc/corosync/; done > corosync.conf 100% 737 1.2MB/s 00:00 > corosync.conf 100% 737 1.3MB/s 00:00 > corosync.conf 100% 737 1.3MB/s 00:00 > corosync.conf 100% 737 1.0MB/s 00:00 > [root@virt-248 ~]# corosync-cfgtool -R > Reloading corosync.conf... > Done > [root@virt-248 ~]# tail -f /var/log/cluster/corosync.log > Nov 09 11:53:19 [50162] virt-248 corosync notice [CFG ] Config reload requested by node 1 > Nov 09 11:53:19 [50162] virt-248 corosync info [TOTEM ] Configuring link 0 > Nov 09 11:53:19 [50162] virt-248 corosync info [TOTEM ] Configured link number 0: local addr: 2620:52:0:25a4:1800:ff:fe00:f8, port=5405 > Nov 09 11:53:19 [50162] virt-248 corosync info [TOTEM ] kronosnet crypto reconfigured on index 2: nss/aes256/sha512 > Nov 09 11:53:19 [50162] virt-248 corosync info [KNET ] pmtud: PMTUD link change for host: 5 link: 0 from 1381 to 1333 > Nov 09 11:53:19 [50162] virt-248 corosync info [KNET ] pmtud: PMTUD link change for host: 4 link: 0 from 1381 to 1333 > Nov 09 11:53:19 [50162] virt-248 corosync info [KNET ] pmtud: PMTUD link change for host: 3 link: 0 from 1381 to 1333 > Nov 09 11:53:19 [50162] virt-248 corosync info [KNET ] pmtud: PMTUD link change for host: 2 link: 0 from 1381 to 1333 > Nov 09 11:53:19 [50162] virt-248 corosync info [KNET ] pmtud: Global data MTU changed to: 1333 Knet crypto successfully upgraded, cluster communication resumes, no fencing, all nodes report active connection to all others (`corosync-cfgtool -s`, left out for the sake of brevity) with the desired crypto config (`corosync-cmapctl totem.crypto_cipher totem.crypto_hash`, left out for the sake of brevity). Changing authkey: > [root@virt-248 ~]# corosync-keygen > Corosync Cluster Engine Authentication key generator. > Gathering 2048 bits for key from /dev/urandom. > Writing corosync key to /etc/corosync/authkey. > [root@virt-248 ~]# for node in $OTHER_NODES; do scp /etc/corosync/authkey $node:/etc/corosync/; done > authkey 100% 256 443.3KB/s 00:00 > authkey 100% 256 464.2KB/s 00:00 > authkey 100% 256 427.8KB/s 00:00 > authkey 100% 256 387.0KB/s 00:00 > [root@virt-248 ~]# corosync-cfgtool -R > Reloading corosync.conf... > Done > [root@virt-248 ~]# tail -f /var/log/cluster/corosync.log > Nov 09 11:57:11 [50162] virt-248 corosync notice [CFG ] Config reload requested by node 1 > Nov 09 11:57:11 [50162] virt-248 corosync info [TOTEM ] Configuring link 0 > Nov 09 11:57:11 [50162] virt-248 corosync info [TOTEM ] Configured link number 0: local addr: 2620:52:0:25a4:1800:ff:fe00:f8, port=5405 > Nov 09 11:57:11 [50162] virt-248 corosync info [TOTEM ] kronosnet crypto reconfigured on index 1: nss/aes256/sha512 > Nov 09 11:57:11 [50162] virt-248 corosync info [KNET ] pmtud: Global data MTU changed to: 1333 Cluster authkey seems to have been successfully changed, cluster communication resumes, no fencing, all nodes report active connection to all others (`corosync-cfgtool -s`, left out for the sake of brevity). Verify that the authkey actually does get changed by using a different one on one of the nodes: > [root@virt-249 ~]# corosync-keygen > [root@virt-248 ~]# corosync-cfgtool -R > Reloading corosync.conf... > Done > [root@virt-248 ~]# tail -f /var/log/cluster/corosync.log > Nov 09 12:00:33 [50162] virt-248 corosync notice [CFG ] Config reload requested by node 1 > Nov 09 12:00:33 [50162] virt-248 corosync info [TOTEM ] Configuring link 0 > Nov 09 12:00:33 [50162] virt-248 corosync info [TOTEM ] Configured link number 0: local addr: 2620:52:0:25a4:1800:ff:fe00:f8, port=5405 > Nov 09 12:00:34 [50162] virt-248 corosync error [KNET ] nsscrypto: Digest does not match > Nov 09 12:00:34 [50162] virt-248 corosync error [KNET ] nsscrypto: Digest does not match > Nov 09 12:00:35 [50162] virt-248 corosync error [KNET ] nsscrypto: Digest does not match > Nov 09 12:00:36 [50162] virt-248 corosync info [KNET ] link: host: 2 link: 0 is down > Nov 09 12:00:36 [50162] virt-248 corosync info [KNET ] host: host: 2 (passive) best link: 0 (pri: 1) > Nov 09 12:00:36 [50162] virt-248 corosync warning [KNET ] host: host: 2 has no active links > Nov 09 12:00:36 [50162] virt-248 corosync error [KNET ] nsscrypto: Digest does not match > Nov 09 12:00:37 [50162] virt-248 corosync error [KNET ] nsscrypto: Digest does not match > Nov 09 12:00:37 [50162] virt-248 corosync notice [TOTEM ] Token has not been received in 3730 ms > Nov 09 12:00:37 [50162] virt-248 corosync error [KNET ] nsscrypto: Digest does not match > Nov 09 12:00:38 [50162] virt-248 corosync error [KNET ] nsscrypto: Digest does not match > Nov 09 12:00:39 [50162] virt-248 corosync error [KNET ] nsscrypto: Digest does not match > Nov 09 12:00:39 [50162] virt-248 corosync error [KNET ] nsscrypto: Digest does not match > Nov 09 12:00:40 [50162] virt-248 corosync error [KNET ] nsscrypto: Digest does not match > Nov 09 12:00:41 [50162] virt-248 corosync error [KNET ] nsscrypto: Digest does not match > Nov 09 12:00:41 [50162] virt-248 corosync error [KNET ] nsscrypto: Digest does not match > Nov 09 12:00:42 [50162] virt-248 corosync error [KNET ] nsscrypto: Digest does not match > Nov 09 12:00:43 [50162] virt-248 corosync error [KNET ] nsscrypto: Digest does not match > Nov 09 12:00:44 [50162] virt-248 corosync error [KNET ] nsscrypto: Digest does not match > Nov 09 12:00:44 [50162] virt-248 corosync notice [QUORUM] Sync members[4]: 1 3 4 5 > Nov 09 12:00:44 [50162] virt-248 corosync notice [QUORUM] Sync left[1]: 2 > Nov 09 12:00:44 [50162] virt-248 corosync notice [TOTEM ] A new membership (1.11) was formed. Members left: 2 > Nov 09 12:00:44 [50162] virt-248 corosync notice [TOTEM ] Failed to receive the leave message. failed: 2 > Nov 09 12:00:44 [50162] virt-248 corosync notice [QUORUM] Members[4]: 1 3 4 5 > Nov 09 12:00:44 [50162] virt-248 corosync notice [MAIN ] Completed service synchronization, ready to provide service. The node fails to properly communicate after config reload and gets fenced as a result, which indicates that the authkey does indeed get changed/reloaded properly. Backwards compatibility was also tested, old and new corosync/knet versions with the same totem.crypto_{cipher,hash} config are able to start, communicate and cleanly shut down without issues: > [root@virt-248 ~]# for node in $NODES; do echo $node; ssh $node rpm -q corosync libknet1; ssh $node corosync-cfgtool -s; ssh $node corosync-cmapctl totem.crypto_cipher totem.crypto_hash; done > virt-248 > corosync-3.1.0-1.el8.x86_64 > libknet1-1.18-1.el8.x86_64 > Printing link status. > Local node ID 1 > LINK ID 0 > addr = 2620:52:0:25a4:1800:ff:fe00:f8 > status: > nodeid 1: localhost > nodeid 2: connected > nodeid 3: connected > nodeid 4: connected > nodeid 5: connected > totem.crypto_cipher (str) = aes256 > totem.crypto_hash (str) = sha512 > virt-249 > corosync-3.0.3-4.el8.x86_64 > libknet1-1.16-1.el8.x86_64 > Printing link status. > Local node ID 2 > LINK ID 0 > addr = 2620:52:0:25a4:1800:ff:fe00:f9 > status: > nodeid 1: link enabled:1 link connected:1 > nodeid 2: link enabled:1 link connected:1 > nodeid 3: link enabled:1 link connected:1 > nodeid 4: link enabled:1 link connected:1 > nodeid 5: link enabled:1 link connected:1 > totem.crypto_cipher (str) = aes256 > totem.crypto_hash (str) = sha512 > virt-250 > corosync-3.1.0-1.el8.x86_64 > libknet1-1.18-1.el8.x86_64 > Printing link status. > Local node ID 3 > LINK ID 0 > addr = 2620:52:0:25a4:1800:ff:fe00:fa > status: > nodeid 1: connected > nodeid 2: connected > nodeid 3: localhost > nodeid 4: connected > nodeid 5: connected > totem.crypto_cipher (str) = aes256 > totem.crypto_hash (str) = sha512 > virt-251 > corosync-3.1.0-1.el8.x86_64 > libknet1-1.18-1.el8.x86_64 > Printing link status. > Local node ID 4 > LINK ID 0 > addr = 2620:52:0:25a4:1800:ff:fe00:fb > status: > nodeid 1: connected > nodeid 2: connected > nodeid 3: connected > nodeid 4: localhost > nodeid 5: connected > totem.crypto_cipher (str) = aes256 > totem.crypto_hash (str) = sha512 > virt-252 > corosync-3.1.0-1.el8.x86_64 > libknet1-1.18-1.el8.x86_64 > Printing link status. > Local node ID 5 > LINK ID 0 > addr = 2620:52:0:25a4:1800:ff:fe00:fc > status: > nodeid 1: connected > nodeid 2: connected > nodeid 3: connected > nodeid 4: connected > nodeid 5: localhost > totem.crypto_cipher (str) = aes256 > totem.crypto_hash (str) = sha512 Changing cluster configuration during an upgrade procedure is not supported and hence crypto reconfiguration between old and new corosync/knet versions was not tested. Moving to verified. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (corosync bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:1780 |