Bug 1705591

Summary: False errors in corosync.log when adding a knet link
Product: Red Hat Enterprise Linux 8 Reporter: Tomas Jelinek <tojeline>
Component: corosyncAssignee: Jan Friesse <jfriesse>
Status: CLOSED ERRATA QA Contact: michal novacek <mnovacek>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 8.0CC: ccaulfie, cluster-maint, mlisik, mnovacek
Target Milestone: rcFlags: pm-rhel: mirror+
Target Release: 8.1   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: corosync-3.0.2-1.el8 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-11-05 21:12:28 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
knet: Fix a couple of errors when adding a new link none

Description Tomas Jelinek 2019-05-02 14:19:16 UTC
Description of problem:
When adding a new knet link, these two error messages appear in corosync.log:
1) knet_link_set_ping_timers for nodeid 1, link 1 failed: Invalid argument (22)
2) New config has different knet transport for link 1. Internal value was NOT changed. To reconfigure an interface it must be deleted and recreated. A working interface needs to be available to corosync at all times

The cluster is running just fine, though.


Version-Release number of selected component (if applicable):
corosync-3.0.0-2.el8.x86_64
corosynclib-3.0.0-2.el8.x86_64
corosync-qdevice-3.0.0-2.el8.x86_64
libknet1-1.4-3.el8.x86_64
libknet1-compress-bzip2-plugin-1.4-3.el8.x86_64
libknet1-compress-lz4-plugin-1.4-3.el8.x86_64
libknet1-compress-lzma-plugin-1.4-3.el8.x86_64
libknet1-compress-lzo2-plugin-1.4-3.el8.x86_64
libknet1-compress-plugins-all-1.4-3.el8.x86_64
libknet1-compress-zlib-plugin-1.4-3.el8.x86_64
libknet1-crypto-nss-plugin-1.4-3.el8.x86_64
libknet1-crypto-openssl-plugin-1.4-3.el8.x86_64
libknet1-crypto-plugins-all-1.4-3.el8.x86_64
libknet1-plugins-all-1.4-3.el8.x86_64


How reproducible:
always, easily


Steps to Reproduce:
1. Add a new knet link.
2. Check corosync.log.
3. Delete a knet link.
4. Add a link with the same linknumber as the link deleted in step 3 had.
2. Check corosync.log.


Actual results:
When adding a link for the first time, both messages 1) and 2) from above get logged. When adding a previously deleted link, only message 1) is logged.


Expected results:
The messages seem to be false since the cluster is running fine. It is therefore expected for them not to be logged.



Additional info:

corosync.conf before adding a link:
totem {
     version: 2
     cluster_name: rhel80-knet
     transport: knet
     crypto_cipher: aes256
     crypto_hash: sha256
}

nodelist {
     node {
         name: rh80-node1
         nodeid: 1
         ring0_addr: 192.168.122.201
     }

     node {
         name: rh80-node2
         nodeid: 2
         ring0_addr: 192.168.122.202
     }
}

quorum {
     provider: corosync_votequorum
     two_node: 1
}

logging {
     to_logfile: yes
     logfile: /var/log/cluster/corosync.log
     to_syslog: yes
}



corosync.conf after adding a link:
totem {
     version: 2
     cluster_name: rhel80-knet
     transport: knet
     crypto_cipher: aes256
     crypto_hash: sha256
}

nodelist {
     node {
         name: rh80-node1
         nodeid: 1
         ring0_addr: 192.168.122.201
         ring1_addr: 192.168.123.201
     }

     node {
         name: rh80-node2
         nodeid: 2
         ring0_addr: 192.168.122.202
         ring1_addr: 192.168.123.202
     }
}

quorum {
     provider: corosync_votequorum
     two_node: 1
}

logging {
     to_logfile: yes
     logfile: /var/log/cluster/corosync.log
     to_syslog: yes
}



logs from node1:
Apr 30 17:42:33 [1296] rh80-node1 corosync notice  [CFG   ] Config reload requested by node 1
Apr 30 17:42:33 [1296] rh80-node1 corosync info    [TOTEM ] Configuring link 0
Apr 30 17:42:33 [1296] rh80-node1 corosync info    [TOTEM ] Configured link number 0: local addr: 192.168.122.201, port=5405
Apr 30 17:42:33 [1296] rh80-node1 corosync info    [TOTEM ] Configuring link 1
Apr 30 17:42:33 [1296] rh80-node1 corosync info    [TOTEM ] Configured link number 1: local addr: 192.168.123.201, port=5406
Apr 30 17:42:33 [1296] rh80-node1 corosync error   [TOTEM ] New config has different knet transport for link 1. Internal value was NOT changed.
Apr 30 17:42:33 [1296] rh80-node1 corosync error   [TOTEM ] To reconfigure an interface it must be deleted and recreated. A working interface needs to be available to corosync at all times
Apr 30 17:42:33 [1296] rh80-node1 corosync error   [TOTEM ] knet_link_set_ping_timers for nodeid 2, link 1 failed: Invalid argument (22)
Apr 30 17:42:34 [1296] rh80-node1 corosync info    [KNET  ] rx: host: 2 link: 1 is up
Apr 30 17:42:34 [1296] rh80-node1 corosync info    [KNET  ] pmtud: PMTUD link change for host: 2 link: 1 from 470 to 1366



logs from node2:
Apr 30 17:42:33 [10063] rh80-node2 corosync notice  [CFG   ] Config reload requested by node 1
Apr 30 17:42:33 [10063] rh80-node2 corosync info    [TOTEM ] Configuring link 0
Apr 30 17:42:33 [10063] rh80-node2 corosync info    [TOTEM ] Configured link number 0: local addr: 192.168.122.202, port=5405
Apr 30 17:42:33 [10063] rh80-node2 corosync info    [TOTEM ] Configuring link 1
Apr 30 17:42:33 [10063] rh80-node2 corosync info    [TOTEM ] Configured link number 1: local addr: 192.168.123.202, port=5406
Apr 30 17:42:33 [10063] rh80-node2 corosync error   [TOTEM ] New config has different knet transport for link 1. Internal value was NOT changed.
Apr 30 17:42:33 [10063] rh80-node2 corosync error   [TOTEM ] To reconfigure an interface it must be deleted and recreated. A working interface needs to be available to corosync at all times
Apr 30 17:42:33 [10063] rh80-node2 corosync error   [TOTEM ] knet_link_set_ping_timers for nodeid 1, link 1 failed: Invalid argument (22)
Apr 30 17:42:34 [10063] rh80-node2 corosync info    [KNET  ] rx: host: 1 link: 1 is up
Apr 30 17:42:34 [10063] rh80-node2 corosync info    [KNET  ] pmtud: PMTUD link change for host: 1 link: 1 from 470 to 1366

Comment 1 Jan Friesse 2019-05-02 14:26:56 UTC
Upstream fix: https://github.com/corosync/corosync/pull/462

Comment 3 Jan Friesse 2019-05-02 14:44:34 UTC
Created attachment 1561739 [details]
knet: Fix a couple of errors when adding a new link

knet: Fix a couple of errors when adding a new link

When adding a new link for the first time you will often see:
1) knet_link_set_ping_timers for nodeid 1, link 1 failed: Invalid
argument (22)
2) New config has different knet transport for link 1. Internal value
was NOT changed. To reconfigure an interface it must be deleted and
recreated. A working interface needs to be available to corosync at all
times

1) is caused by setting the ping timers twice, once in
totemknet_member_add() and once in totemknet_refresh_config().
The first time we don't know the value
so it's zero and thus display an error. For this we simply check
for the zero and skip the knet API call. It's not ideal, but
totemconfig needs a lot of reconfiguring itself before we can
make this more sane.

2) was caused by simply comparing an unconfigured link with
a configured one, so OF COURSE, they are going to be different!

Signed-off-by: Christine Caulfield <ccaulfie>
Reviewed-by: Jan Friesse <jfriesse>

Comment 5 michal novacek 2019-09-03 15:46:01 UTC
I have verified that there are no more suspicious log entries it corosync log with corosync-3.0.2-3.el8 and libknet1-1.10-1.el8.

---

# pcs cluster link add virt-163=192.168.3.43 virt-164=192.168.4.44 options transport=udp
Sending updated corosync.conf to nodes...
virt-163: Succeeded
virt-164: Succeeded
virt-163: Corosync configuration reloaded

# pcs cluster link delete 2
Sending updated corosync.conf to nodes...
virt-163: Succeeded
virt-164: Succeeded
virt-163: Corosync configuration reloaded

]# pcs cluster link add virt-163=192.168.3.43 virt-164=192.168.4.44 options transport=udp linknumber=2
Sending updated corosync.conf to nodes...
virt-163: Succeeded
virt-164: Succeeded
virt-163: Corosync configuration reloaded

Sep 03 17:39:54 [30272] virt-163.cluster-qe.lab.eng.brq.redhat.com corosync notice  [CFG   ] Config reload requested by node 1
Sep 03 17:39:54 [30272] virt-163.cluster-qe.lab.eng.brq.redhat.com corosync info    [TOTEM ] Configuring link 0
Sep 03 17:39:54 [30272] virt-163.cluster-qe.lab.eng.brq.redhat.com corosync info    [TOTEM ] Configured link number 0: local addr: 2620:52:0:25a4:1800:ff:fe00:a3, port=5405
Sep 03 17:39:54 [30272] virt-163.cluster-qe.lab.eng.brq.redhat.com corosync info    [TOTEM ] Configuring link 1
Sep 03 17:39:54 [30272] virt-163.cluster-qe.lab.eng.brq.redhat.com corosync info    [TOTEM ] Configured link number 1: local addr: 192.168.2.43, port=5406
Sep 03 17:39:54 [30272] virt-163.cluster-qe.lab.eng.brq.redhat.com corosync info    [KNET  ] host: host: 2 (passive) best link: 0 (pri: 1)
Sep 03 17:40:08 [30272] virt-163.cluster-qe.lab.eng.brq.redhat.com corosync notice  [CFG   ] Config reload requested by node 1
Sep 03 17:40:08 [30272] virt-163.cluster-qe.lab.eng.brq.redhat.com corosync info    [TOTEM ] Configuring link 0
Sep 03 17:40:08 [30272] virt-163.cluster-qe.lab.eng.brq.redhat.com corosync info    [TOTEM ] Configured link number 0: local addr: 2620:52:0:25a4:1800:ff:fe00:a3, port=5405
Sep 03 17:40:08 [30272] virt-163.cluster-qe.lab.eng.brq.redhat.com corosync info    [TOTEM ] Configuring link 1
Sep 03 17:40:08 [30272] virt-163.cluster-qe.lab.eng.brq.redhat.com corosync info    [TOTEM ] Configured link number 1: local addr: 192.168.2.43, port=5406
Sep 03 17:40:08 [30272] virt-163.cluster-qe.lab.eng.brq.redhat.com corosync info    [TOTEM ] Configuring link 2
Sep 03 17:40:08 [30272] virt-163.cluster-qe.lab.eng.brq.redhat.com corosync info    [TOTEM ] Configured link number 2: local addr: 192.168.3.43, port=5407
Sep 03 17:40:08 [30272] virt-163.cluster-qe.lab.eng.brq.redhat.com corosync info    [KNET  ] host: host: 2 (passive) best link: 0 (pri: 1)


# cat /etc/corosync/corosync.conf
totem {
    version: 2
    cluster_name: STSRHTS5930
    transport: knet
    crypto_cipher: aes256
    crypto_hash: sha256

    interface {
        knet_transport: udp
        linknumber: 2
    }
}

nodelist {
    node {
        ring0_addr: virt-163
        name: virt-163
        nodeid: 1
        ring1_addr: 192.168.2.43
        ring2_addr: 192.168.3.43
    }

    node {
        ring0_addr: virt-164
        name: virt-164
        nodeid: 2
        ring1_addr: 192.168.2.44
        ring2_addr: 192.168.4.44
    }
}

quorum {
    provider: corosync_votequorum
    two_node: 1
}

logging {
    to_logfile: yes
    logfile: /var/log/cluster/corosync.log
    to_syslog: yes
    timestamp: on
}

Comment 7 errata-xmlrpc 2019-11-05 21:12:28 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:3435