1705591 – False errors in corosync.log when adding a knet link

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1705591 - False errors in corosync.log when adding a knet link

Summary: False errors in corosync.log when adding a knet link

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 8
Classification:	Red Hat
Component:	corosync
Sub Component:
Version:	8.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	rc
Target Release:	8.1
Assignee:	Jan Friesse
QA Contact:	michal novacek
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2019-05-02 14:19 UTC by Tomas Jelinek
Modified:	2020-11-14 10:41 UTC (History)
CC List:	4 users (show)
Fixed In Version:	corosync-3.0.2-1.el8
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2019-11-05 21:12:28 UTC
Type:	Bug
Target Upstream Version:
Embargoed:
Dependent Products:
Flags:	pm-rhel: mirror+

Attachments	(Terms of Use)
knet: Fix a couple of errors when adding a new link (3.96 KB, patch) 2019-05-02 14:44 UTC, Jan Friesse	no flags	Details \| Diff
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2019:3435	0	None	None	None	2019-11-05 21:12:37 UTC

Description Tomas Jelinek 2019-05-02 14:19:16 UTC

Description of problem:
When adding a new knet link, these two error messages appear in corosync.log:
1) knet_link_set_ping_timers for nodeid 1, link 1 failed: Invalid argument (22)
2) New config has different knet transport for link 1. Internal value was NOT changed. To reconfigure an interface it must be deleted and recreated. A working interface needs to be available to corosync at all times

The cluster is running just fine, though.


Version-Release number of selected component (if applicable):
corosync-3.0.0-2.el8.x86_64
corosynclib-3.0.0-2.el8.x86_64
corosync-qdevice-3.0.0-2.el8.x86_64
libknet1-1.4-3.el8.x86_64
libknet1-compress-bzip2-plugin-1.4-3.el8.x86_64
libknet1-compress-lz4-plugin-1.4-3.el8.x86_64
libknet1-compress-lzma-plugin-1.4-3.el8.x86_64
libknet1-compress-lzo2-plugin-1.4-3.el8.x86_64
libknet1-compress-plugins-all-1.4-3.el8.x86_64
libknet1-compress-zlib-plugin-1.4-3.el8.x86_64
libknet1-crypto-nss-plugin-1.4-3.el8.x86_64
libknet1-crypto-openssl-plugin-1.4-3.el8.x86_64
libknet1-crypto-plugins-all-1.4-3.el8.x86_64
libknet1-plugins-all-1.4-3.el8.x86_64


How reproducible:
always, easily


Steps to Reproduce:
1. Add a new knet link.
2. Check corosync.log.
3. Delete a knet link.
4. Add a link with the same linknumber as the link deleted in step 3 had.
2. Check corosync.log.


Actual results:
When adding a link for the first time, both messages 1) and 2) from above get logged. When adding a previously deleted link, only message 1) is logged.


Expected results:
The messages seem to be false since the cluster is running fine. It is therefore expected for them not to be logged.



Additional info:

corosync.conf before adding a link:
totem {
     version: 2
     cluster_name: rhel80-knet
     transport: knet
     crypto_cipher: aes256
     crypto_hash: sha256
}

nodelist {
     node {
         name: rh80-node1
         nodeid: 1
         ring0_addr: 192.168.122.201
     }

     node {
         name: rh80-node2
         nodeid: 2
         ring0_addr: 192.168.122.202
     }
}

quorum {
     provider: corosync_votequorum
     two_node: 1
}

logging {
     to_logfile: yes
     logfile: /var/log/cluster/corosync.log
     to_syslog: yes
}



corosync.conf after adding a link:
totem {
     version: 2
     cluster_name: rhel80-knet
     transport: knet
     crypto_cipher: aes256
     crypto_hash: sha256
}

nodelist {
     node {
         name: rh80-node1
         nodeid: 1
         ring0_addr: 192.168.122.201
         ring1_addr: 192.168.123.201
     }

     node {
         name: rh80-node2
         nodeid: 2
         ring0_addr: 192.168.122.202
         ring1_addr: 192.168.123.202
     }
}

quorum {
     provider: corosync_votequorum
     two_node: 1
}

logging {
     to_logfile: yes
     logfile: /var/log/cluster/corosync.log
     to_syslog: yes
}



logs from node1:
Apr 30 17:42:33 [1296] rh80-node1 corosync notice  [CFG   ] Config reload requested by node 1
Apr 30 17:42:33 [1296] rh80-node1 corosync info    [TOTEM ] Configuring link 0
Apr 30 17:42:33 [1296] rh80-node1 corosync info    [TOTEM ] Configured link number 0: local addr: 192.168.122.201, port=5405
Apr 30 17:42:33 [1296] rh80-node1 corosync info    [TOTEM ] Configuring link 1
Apr 30 17:42:33 [1296] rh80-node1 corosync info    [TOTEM ] Configured link number 1: local addr: 192.168.123.201, port=5406
Apr 30 17:42:33 [1296] rh80-node1 corosync error   [TOTEM ] New config has different knet transport for link 1. Internal value was NOT changed.
Apr 30 17:42:33 [1296] rh80-node1 corosync error   [TOTEM ] To reconfigure an interface it must be deleted and recreated. A working interface needs to be available to corosync at all times
Apr 30 17:42:33 [1296] rh80-node1 corosync error   [TOTEM ] knet_link_set_ping_timers for nodeid 2, link 1 failed: Invalid argument (22)
Apr 30 17:42:34 [1296] rh80-node1 corosync info    [KNET  ] rx: host: 2 link: 1 is up
Apr 30 17:42:34 [1296] rh80-node1 corosync info    [KNET  ] pmtud: PMTUD link change for host: 2 link: 1 from 470 to 1366



logs from node2:
Apr 30 17:42:33 [10063] rh80-node2 corosync notice  [CFG   ] Config reload requested by node 1
Apr 30 17:42:33 [10063] rh80-node2 corosync info    [TOTEM ] Configuring link 0
Apr 30 17:42:33 [10063] rh80-node2 corosync info    [TOTEM ] Configured link number 0: local addr: 192.168.122.202, port=5405
Apr 30 17:42:33 [10063] rh80-node2 corosync info    [TOTEM ] Configuring link 1
Apr 30 17:42:33 [10063] rh80-node2 corosync info    [TOTEM ] Configured link number 1: local addr: 192.168.123.202, port=5406
Apr 30 17:42:33 [10063] rh80-node2 corosync error   [TOTEM ] New config has different knet transport for link 1. Internal value was NOT changed.
Apr 30 17:42:33 [10063] rh80-node2 corosync error   [TOTEM ] To reconfigure an interface it must be deleted and recreated. A working interface needs to be available to corosync at all times
Apr 30 17:42:33 [10063] rh80-node2 corosync error   [TOTEM ] knet_link_set_ping_timers for nodeid 1, link 1 failed: Invalid argument (22)
Apr 30 17:42:34 [10063] rh80-node2 corosync info    [KNET  ] rx: host: 1 link: 1 is up
Apr 30 17:42:34 [10063] rh80-node2 corosync info    [KNET  ] pmtud: PMTUD link change for host: 1 link: 1 from 470 to 1366

Comment 1 Jan Friesse 2019-05-02 14:26:56 UTC

Upstream fix: https://github.com/corosync/corosync/pull/462

Comment 3 Jan Friesse 2019-05-02 14:44:34 UTC

Created attachment 1561739 [details]
knet: Fix a couple of errors when adding a new link

knet: Fix a couple of errors when adding a new link

When adding a new link for the first time you will often see:
1) knet_link_set_ping_timers for nodeid 1, link 1 failed: Invalid
argument (22)
2) New config has different knet transport for link 1. Internal value
was NOT changed. To reconfigure an interface it must be deleted and
recreated. A working interface needs to be available to corosync at all
times

1) is caused by setting the ping timers twice, once in
totemknet_member_add() and once in totemknet_refresh_config().
The first time we don't know the value
so it's zero and thus display an error. For this we simply check
for the zero and skip the knet API call. It's not ideal, but
totemconfig needs a lot of reconfiguring itself before we can
make this more sane.

2) was caused by simply comparing an unconfigured link with
a configured one, so OF COURSE, they are going to be different!

Signed-off-by: Christine Caulfield <ccaulfie>
Reviewed-by: Jan Friesse <jfriesse>

Comment 5 michal novacek 2019-09-03 15:46:01 UTC

I have verified that there are no more suspicious log entries it corosync log with corosync-3.0.2-3.el8 and libknet1-1.10-1.el8.

---

# pcs cluster link add virt-163=192.168.3.43 virt-164=192.168.4.44 options transport=udp
Sending updated corosync.conf to nodes...
virt-163: Succeeded
virt-164: Succeeded
virt-163: Corosync configuration reloaded

# pcs cluster link delete 2
Sending updated corosync.conf to nodes...
virt-163: Succeeded
virt-164: Succeeded
virt-163: Corosync configuration reloaded

]# pcs cluster link add virt-163=192.168.3.43 virt-164=192.168.4.44 options transport=udp linknumber=2
Sending updated corosync.conf to nodes...
virt-163: Succeeded
virt-164: Succeeded
virt-163: Corosync configuration reloaded

Sep 03 17:39:54 [30272] virt-163.cluster-qe.lab.eng.brq.redhat.com corosync notice  [CFG   ] Config reload requested by node 1
Sep 03 17:39:54 [30272] virt-163.cluster-qe.lab.eng.brq.redhat.com corosync info    [TOTEM ] Configuring link 0
Sep 03 17:39:54 [30272] virt-163.cluster-qe.lab.eng.brq.redhat.com corosync info    [TOTEM ] Configured link number 0: local addr: 2620:52:0:25a4:1800:ff:fe00:a3, port=5405
Sep 03 17:39:54 [30272] virt-163.cluster-qe.lab.eng.brq.redhat.com corosync info    [TOTEM ] Configuring link 1
Sep 03 17:39:54 [30272] virt-163.cluster-qe.lab.eng.brq.redhat.com corosync info    [TOTEM ] Configured link number 1: local addr: 192.168.2.43, port=5406
Sep 03 17:39:54 [30272] virt-163.cluster-qe.lab.eng.brq.redhat.com corosync info    [KNET  ] host: host: 2 (passive) best link: 0 (pri: 1)
Sep 03 17:40:08 [30272] virt-163.cluster-qe.lab.eng.brq.redhat.com corosync notice  [CFG   ] Config reload requested by node 1
Sep 03 17:40:08 [30272] virt-163.cluster-qe.lab.eng.brq.redhat.com corosync info    [TOTEM ] Configuring link 0
Sep 03 17:40:08 [30272] virt-163.cluster-qe.lab.eng.brq.redhat.com corosync info    [TOTEM ] Configured link number 0: local addr: 2620:52:0:25a4:1800:ff:fe00:a3, port=5405
Sep 03 17:40:08 [30272] virt-163.cluster-qe.lab.eng.brq.redhat.com corosync info    [TOTEM ] Configuring link 1
Sep 03 17:40:08 [30272] virt-163.cluster-qe.lab.eng.brq.redhat.com corosync info    [TOTEM ] Configured link number 1: local addr: 192.168.2.43, port=5406
Sep 03 17:40:08 [30272] virt-163.cluster-qe.lab.eng.brq.redhat.com corosync info    [TOTEM ] Configuring link 2
Sep 03 17:40:08 [30272] virt-163.cluster-qe.lab.eng.brq.redhat.com corosync info    [TOTEM ] Configured link number 2: local addr: 192.168.3.43, port=5407
Sep 03 17:40:08 [30272] virt-163.cluster-qe.lab.eng.brq.redhat.com corosync info    [KNET  ] host: host: 2 (passive) best link: 0 (pri: 1)


# cat /etc/corosync/corosync.conf
totem {
    version: 2
    cluster_name: STSRHTS5930
    transport: knet
    crypto_cipher: aes256
    crypto_hash: sha256

    interface {
        knet_transport: udp
        linknumber: 2
    }
}

nodelist {
    node {
        ring0_addr: virt-163
        name: virt-163
        nodeid: 1
        ring1_addr: 192.168.2.43
        ring2_addr: 192.168.3.43
    }

    node {
        ring0_addr: virt-164
        name: virt-164
        nodeid: 2
        ring1_addr: 192.168.2.44
        ring2_addr: 192.168.4.44
    }
}

quorum {
    provider: corosync_votequorum
    two_node: 1
}

logging {
    to_logfile: yes
    logfile: /var/log/cluster/corosync.log
    to_syslog: yes
    timestamp: on
}

Comment 7 errata-xmlrpc 2019-11-05 21:12:28 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:3435

Note You need to log in before you can comment on or make changes to this bug.