Bug 1705591
| Summary: | False errors in corosync.log when adding a knet link | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 8 | Reporter: | Tomas Jelinek <tojeline> | ||||
| Component: | corosync | Assignee: | Jan Friesse <jfriesse> | ||||
| Status: | CLOSED ERRATA | QA Contact: | michal novacek <mnovacek> | ||||
| Severity: | unspecified | Docs Contact: | |||||
| Priority: | unspecified | ||||||
| Version: | 8.0 | CC: | ccaulfie, cluster-maint, mlisik, mnovacek | ||||
| Target Milestone: | rc | Flags: | pm-rhel:
mirror+
|
||||
| Target Release: | 8.1 | ||||||
| Hardware: | Unspecified | ||||||
| OS: | Unspecified | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | corosync-3.0.2-1.el8 | Doc Type: | If docs needed, set a value | ||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2019-11-05 21:12:28 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Attachments: |
|
||||||
Upstream fix: https://github.com/corosync/corosync/pull/462 Created attachment 1561739 [details]
knet: Fix a couple of errors when adding a new link
knet: Fix a couple of errors when adding a new link
When adding a new link for the first time you will often see:
1) knet_link_set_ping_timers for nodeid 1, link 1 failed: Invalid
argument (22)
2) New config has different knet transport for link 1. Internal value
was NOT changed. To reconfigure an interface it must be deleted and
recreated. A working interface needs to be available to corosync at all
times
1) is caused by setting the ping timers twice, once in
totemknet_member_add() and once in totemknet_refresh_config().
The first time we don't know the value
so it's zero and thus display an error. For this we simply check
for the zero and skip the knet API call. It's not ideal, but
totemconfig needs a lot of reconfiguring itself before we can
make this more sane.
2) was caused by simply comparing an unconfigured link with
a configured one, so OF COURSE, they are going to be different!
Signed-off-by: Christine Caulfield <ccaulfie>
Reviewed-by: Jan Friesse <jfriesse>
I have verified that there are no more suspicious log entries it corosync log with corosync-3.0.2-3.el8 and libknet1-1.10-1.el8.
---
# pcs cluster link add virt-163=192.168.3.43 virt-164=192.168.4.44 options transport=udp
Sending updated corosync.conf to nodes...
virt-163: Succeeded
virt-164: Succeeded
virt-163: Corosync configuration reloaded
# pcs cluster link delete 2
Sending updated corosync.conf to nodes...
virt-163: Succeeded
virt-164: Succeeded
virt-163: Corosync configuration reloaded
]# pcs cluster link add virt-163=192.168.3.43 virt-164=192.168.4.44 options transport=udp linknumber=2
Sending updated corosync.conf to nodes...
virt-163: Succeeded
virt-164: Succeeded
virt-163: Corosync configuration reloaded
Sep 03 17:39:54 [30272] virt-163.cluster-qe.lab.eng.brq.redhat.com corosync notice [CFG ] Config reload requested by node 1
Sep 03 17:39:54 [30272] virt-163.cluster-qe.lab.eng.brq.redhat.com corosync info [TOTEM ] Configuring link 0
Sep 03 17:39:54 [30272] virt-163.cluster-qe.lab.eng.brq.redhat.com corosync info [TOTEM ] Configured link number 0: local addr: 2620:52:0:25a4:1800:ff:fe00:a3, port=5405
Sep 03 17:39:54 [30272] virt-163.cluster-qe.lab.eng.brq.redhat.com corosync info [TOTEM ] Configuring link 1
Sep 03 17:39:54 [30272] virt-163.cluster-qe.lab.eng.brq.redhat.com corosync info [TOTEM ] Configured link number 1: local addr: 192.168.2.43, port=5406
Sep 03 17:39:54 [30272] virt-163.cluster-qe.lab.eng.brq.redhat.com corosync info [KNET ] host: host: 2 (passive) best link: 0 (pri: 1)
Sep 03 17:40:08 [30272] virt-163.cluster-qe.lab.eng.brq.redhat.com corosync notice [CFG ] Config reload requested by node 1
Sep 03 17:40:08 [30272] virt-163.cluster-qe.lab.eng.brq.redhat.com corosync info [TOTEM ] Configuring link 0
Sep 03 17:40:08 [30272] virt-163.cluster-qe.lab.eng.brq.redhat.com corosync info [TOTEM ] Configured link number 0: local addr: 2620:52:0:25a4:1800:ff:fe00:a3, port=5405
Sep 03 17:40:08 [30272] virt-163.cluster-qe.lab.eng.brq.redhat.com corosync info [TOTEM ] Configuring link 1
Sep 03 17:40:08 [30272] virt-163.cluster-qe.lab.eng.brq.redhat.com corosync info [TOTEM ] Configured link number 1: local addr: 192.168.2.43, port=5406
Sep 03 17:40:08 [30272] virt-163.cluster-qe.lab.eng.brq.redhat.com corosync info [TOTEM ] Configuring link 2
Sep 03 17:40:08 [30272] virt-163.cluster-qe.lab.eng.brq.redhat.com corosync info [TOTEM ] Configured link number 2: local addr: 192.168.3.43, port=5407
Sep 03 17:40:08 [30272] virt-163.cluster-qe.lab.eng.brq.redhat.com corosync info [KNET ] host: host: 2 (passive) best link: 0 (pri: 1)
# cat /etc/corosync/corosync.conf
totem {
version: 2
cluster_name: STSRHTS5930
transport: knet
crypto_cipher: aes256
crypto_hash: sha256
interface {
knet_transport: udp
linknumber: 2
}
}
nodelist {
node {
ring0_addr: virt-163
name: virt-163
nodeid: 1
ring1_addr: 192.168.2.43
ring2_addr: 192.168.3.43
}
node {
ring0_addr: virt-164
name: virt-164
nodeid: 2
ring1_addr: 192.168.2.44
ring2_addr: 192.168.4.44
}
}
quorum {
provider: corosync_votequorum
two_node: 1
}
logging {
to_logfile: yes
logfile: /var/log/cluster/corosync.log
to_syslog: yes
timestamp: on
}
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:3435 |
Description of problem: When adding a new knet link, these two error messages appear in corosync.log: 1) knet_link_set_ping_timers for nodeid 1, link 1 failed: Invalid argument (22) 2) New config has different knet transport for link 1. Internal value was NOT changed. To reconfigure an interface it must be deleted and recreated. A working interface needs to be available to corosync at all times The cluster is running just fine, though. Version-Release number of selected component (if applicable): corosync-3.0.0-2.el8.x86_64 corosynclib-3.0.0-2.el8.x86_64 corosync-qdevice-3.0.0-2.el8.x86_64 libknet1-1.4-3.el8.x86_64 libknet1-compress-bzip2-plugin-1.4-3.el8.x86_64 libknet1-compress-lz4-plugin-1.4-3.el8.x86_64 libknet1-compress-lzma-plugin-1.4-3.el8.x86_64 libknet1-compress-lzo2-plugin-1.4-3.el8.x86_64 libknet1-compress-plugins-all-1.4-3.el8.x86_64 libknet1-compress-zlib-plugin-1.4-3.el8.x86_64 libknet1-crypto-nss-plugin-1.4-3.el8.x86_64 libknet1-crypto-openssl-plugin-1.4-3.el8.x86_64 libknet1-crypto-plugins-all-1.4-3.el8.x86_64 libknet1-plugins-all-1.4-3.el8.x86_64 How reproducible: always, easily Steps to Reproduce: 1. Add a new knet link. 2. Check corosync.log. 3. Delete a knet link. 4. Add a link with the same linknumber as the link deleted in step 3 had. 2. Check corosync.log. Actual results: When adding a link for the first time, both messages 1) and 2) from above get logged. When adding a previously deleted link, only message 1) is logged. Expected results: The messages seem to be false since the cluster is running fine. It is therefore expected for them not to be logged. Additional info: corosync.conf before adding a link: totem { version: 2 cluster_name: rhel80-knet transport: knet crypto_cipher: aes256 crypto_hash: sha256 } nodelist { node { name: rh80-node1 nodeid: 1 ring0_addr: 192.168.122.201 } node { name: rh80-node2 nodeid: 2 ring0_addr: 192.168.122.202 } } quorum { provider: corosync_votequorum two_node: 1 } logging { to_logfile: yes logfile: /var/log/cluster/corosync.log to_syslog: yes } corosync.conf after adding a link: totem { version: 2 cluster_name: rhel80-knet transport: knet crypto_cipher: aes256 crypto_hash: sha256 } nodelist { node { name: rh80-node1 nodeid: 1 ring0_addr: 192.168.122.201 ring1_addr: 192.168.123.201 } node { name: rh80-node2 nodeid: 2 ring0_addr: 192.168.122.202 ring1_addr: 192.168.123.202 } } quorum { provider: corosync_votequorum two_node: 1 } logging { to_logfile: yes logfile: /var/log/cluster/corosync.log to_syslog: yes } logs from node1: Apr 30 17:42:33 [1296] rh80-node1 corosync notice [CFG ] Config reload requested by node 1 Apr 30 17:42:33 [1296] rh80-node1 corosync info [TOTEM ] Configuring link 0 Apr 30 17:42:33 [1296] rh80-node1 corosync info [TOTEM ] Configured link number 0: local addr: 192.168.122.201, port=5405 Apr 30 17:42:33 [1296] rh80-node1 corosync info [TOTEM ] Configuring link 1 Apr 30 17:42:33 [1296] rh80-node1 corosync info [TOTEM ] Configured link number 1: local addr: 192.168.123.201, port=5406 Apr 30 17:42:33 [1296] rh80-node1 corosync error [TOTEM ] New config has different knet transport for link 1. Internal value was NOT changed. Apr 30 17:42:33 [1296] rh80-node1 corosync error [TOTEM ] To reconfigure an interface it must be deleted and recreated. A working interface needs to be available to corosync at all times Apr 30 17:42:33 [1296] rh80-node1 corosync error [TOTEM ] knet_link_set_ping_timers for nodeid 2, link 1 failed: Invalid argument (22) Apr 30 17:42:34 [1296] rh80-node1 corosync info [KNET ] rx: host: 2 link: 1 is up Apr 30 17:42:34 [1296] rh80-node1 corosync info [KNET ] pmtud: PMTUD link change for host: 2 link: 1 from 470 to 1366 logs from node2: Apr 30 17:42:33 [10063] rh80-node2 corosync notice [CFG ] Config reload requested by node 1 Apr 30 17:42:33 [10063] rh80-node2 corosync info [TOTEM ] Configuring link 0 Apr 30 17:42:33 [10063] rh80-node2 corosync info [TOTEM ] Configured link number 0: local addr: 192.168.122.202, port=5405 Apr 30 17:42:33 [10063] rh80-node2 corosync info [TOTEM ] Configuring link 1 Apr 30 17:42:33 [10063] rh80-node2 corosync info [TOTEM ] Configured link number 1: local addr: 192.168.123.202, port=5406 Apr 30 17:42:33 [10063] rh80-node2 corosync error [TOTEM ] New config has different knet transport for link 1. Internal value was NOT changed. Apr 30 17:42:33 [10063] rh80-node2 corosync error [TOTEM ] To reconfigure an interface it must be deleted and recreated. A working interface needs to be available to corosync at all times Apr 30 17:42:33 [10063] rh80-node2 corosync error [TOTEM ] knet_link_set_ping_timers for nodeid 1, link 1 failed: Invalid argument (22) Apr 30 17:42:34 [10063] rh80-node2 corosync info [KNET ] rx: host: 1 link: 1 is up Apr 30 17:42:34 [10063] rh80-node2 corosync info [KNET ] pmtud: PMTUD link change for host: 1 link: 1 from 470 to 1366