RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 2031736 - RHEL9[fence-kdump]fence not work on 2-nodes-pcs-cluster
Summary: RHEL9[fence-kdump]fence not work on 2-nodes-pcs-cluster
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 9
Classification: Red Hat
Component: kexec-tools
Version: 9.0
Hardware: All
OS: Unspecified
unspecified
unspecified
Target Milestone: rc
: ---
Assignee: Pingfan Liu
QA Contact: Ruowen Qin
URL:
Whiteboard:
Depends On:
Blocks: 2027125
TreeView+ depends on / blocked
 
Reported: 2021-12-13 11:07 UTC by Jie Li
Modified: 2022-05-17 16:26 UTC (History)
7 users (show)

Fixed In Version: kexec-tools-2.0.23-6.el9
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-05-17 15:56:53 UTC
Type: Bug
Target Upstream Version:
Embargoed:
pm-rhel: mirror+


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker RHELPLAN-105506 0 None None None 2021-12-13 11:11:31 UTC
Red Hat Product Errata RHBA-2022:3974 0 None None None 2022-05-17 15:57:10 UTC

Comment 3 Ken Gaillot 2022-01-06 15:41:40 UTC
Hi all,

The warning itself is harmless, and just means that fence_kdump requires "off" instead of "reboot". The message has been downgraded to notice level in more recent versions.

The question is why doesn't the "off" command work, which is a little beyond me. Reid, do you see anything here?

Comment 4 Reid Wahl 2022-01-06 18:00:30 UTC
(In reply to Ken Gaillot from comment #3)
> Hi all,
> 
> The warning itself is harmless, and just means that fence_kdump requires
> "off" instead of "reboot". The message has been downgraded to notice level
> in more recent versions.
> 
> The question is why doesn't the "off" command work, which is a little beyond
> me. Reid, do you see anything here?

I'd need to see more data, in the form of either a sosreport from each node or SSH access (with credentials) to both nodes. "fence_kdump not working" cases are tricky to diagnose regardless but I'll take a look as soon as the reporter can provide one of these.

It doesn't seem likely that the "off" command doesn't work. It appears to be doing what it's supposed to do -- namely, listening for a "fence_kdump_send" message from the fenced node.

If there's nothing wrong with the listener itself, then there are two possibilities:
  - The fenced node is not sending the message (e.g., misconfiguration of kdump.conf).
  - Something is preventing the surviving/fencing node from receiving the message (e.g., firewall, routing, bug in loading dracut network module, misconfiguration of kdump.conf).

Comment 8 Reid Wahl 2022-01-07 08:14:29 UTC
Manual send/receive works fine when both nodes are booted normally.

[root@node1 ~]# fence_kdump -n node2 -o off -v -v
[debug]: options {        
[debug]:     nodename = node2
[debug]:     ipport   = 7410
[debug]:     family   = 0
[debug]:     count    = 0
[debug]:     interval = 10
[debug]:     timeout  = 60
[debug]:     verbose  = 2
[debug]: }                
[debug]: node {       
[debug]:     name = node2
[debug]:     addr = <addr2>
[debug]:     port = 7410
[debug]:     info = 0x5570ecb98970
[debug]: }            
[debug]: waiting for message from '<addr2>'

[debug]: received valid message from '<addr2>'


[root@node2 ~]# /usr/libexec/fence_kdump_send -i 1 -v -v node1
[debug]: options {        
[debug]:     nodename = (null)
[debug]:     ipport   = 7410
[debug]:     family   = 0
[debug]:     count    = 0
[debug]:     interval = 1
[debug]:     timeout  = 60
[debug]:     verbose  = 2
[debug]: }                
[debug]: node {       
[debug]:     name = node1
[debug]:     addr = <addr1>
[debug]:     port = 7410
[debug]:     info = 0x555ba08df940
[debug]: }            
[debug]: message sent to node '<addr1>'


Looking at the console log, this seems problematic.

[    3.669269] kdump.sh[538]: /bin/kdump.sh: line 536: node1.fqdn.com: command not found

Comment 9 Reid Wahl 2022-01-07 08:41:15 UTC
In /usr/lib/dracut/modules.d/99kdumpbase/kdump.sh, fence_kdump_notify(), the variables FENCE_KDUMP_SEND and FENCE_KDUMP_ARGS are unset.

These variables are supposed to get set by kdump-lib.sh.


In RHEL 8, kdump-lib.sh gets included by kdump-lib-initramfs.sh. In RHEL 9, kdump-lib.sh does not get included by kdump-lib-initramfs.sh.

kdump-lib-initramfs.sh gets included by kdump.sh during a dump. kdump-lib.sh does not get included by kdump.sh during a dump.

Therefore, in RHEL 8, kdump.sh (transitively) pulls in kdump-lib.sh. In RHEL 9, it does not.


In other words, I suspect the issue is as follows.

RHEL 8:
[root@fastvm-rhel-8-0-23 ~]# head -n 3 /usr/lib/kdump/kdump-lib-initramfs.sh 
# These variables and functions are useful in 2nd kernel

. /lib/kdump-lib.sh

RHEL 9:
[root@rhel-9-node2 ~]# grep kdump-lib /usr/lib/kdump/kdump-lib-initramfs.sh 
[root@rhel-9-node2 ~]# 


kdump-lib.sh doesn't even get included in the initramfs anymore.

@Pingfan, can you comment on this change in the includes?

Comment 10 Reid Wahl 2022-01-07 08:43:56 UTC
(In reply to Reid Wahl from comment #9)
> In /usr/lib/dracut/modules.d/99kdumpbase/kdump.sh, fence_kdump_notify(), the
> variables FENCE_KDUMP_SEND and FENCE_KDUMP_ARGS are unset.
> 
> These variables are supposed to get set by kdump-lib.sh.

Slight correction: only FENCE_KDUMP_SEND gets set by kdump-lib.sh. FENCE_KDUMP_ARGS is correctly empty because I haven't configured any args in kdump.conf.

Comment 12 Reid Wahl 2022-01-07 10:04:57 UTC
By the way, since these VMs may dump their core and reboot before pacemaker initiates fencing, you might have to configure a kdump-pre script that sleeps for a period (e.g., 10 seconds).

Comment 13 ltao 2022-01-07 10:10:17 UTC
(In reply to Reid Wahl from comment #9)
> In /usr/lib/dracut/modules.d/99kdumpbase/kdump.sh, fence_kdump_notify(), the
> variables FENCE_KDUMP_SEND and FENCE_KDUMP_ARGS are unset.
> 
> These variables are supposed to get set by kdump-lib.sh.
> 
> 
> In RHEL 8, kdump-lib.sh gets included by kdump-lib-initramfs.sh. In RHEL 9,
> kdump-lib.sh does not get included by kdump-lib-initramfs.sh.
> 
> kdump-lib-initramfs.sh gets included by kdump.sh during a dump. kdump-lib.sh
> does not get included by kdump.sh during a dump.
> 
> Therefore, in RHEL 8, kdump.sh (transitively) pulls in kdump-lib.sh. In RHEL
> 9, it does not.
> 
> 
> In other words, I suspect the issue is as follows.
> 
> RHEL 8:
> [root@fastvm-rhel-8-0-23 ~]# head -n 3 /usr/lib/kdump/kdump-lib-initramfs.sh 
> # These variables and functions are useful in 2nd kernel
> 
> . /lib/kdump-lib.sh
> 
> RHEL 9:
> [root@rhel-9-node2 ~]# grep kdump-lib /usr/lib/kdump/kdump-lib-initramfs.sh 
> [root@rhel-9-node2 ~]# 
> 
> 
> kdump-lib.sh doesn't even get included in the initramfs anymore.
> 
> @Pingfan, can you comment on this change in the includes?

Hi Reid,

The kdump-lib.sh is excluded from kdump-lib-initramfs.sh in rhel9 commit:
   
35519c3 kdump-lib-initramfs.sh: prepare to be a POSIX compatible lib

which was backported from fedora. The same patch has not been backported to rhel8.

Comment 15 Reid Wahl 2022-01-07 10:29:31 UTC
(In reply to ltao from comment #13)
> Hi Reid,
> 
> The kdump-lib.sh is excluded from kdump-lib-initramfs.sh in rhel9 commit:
>    
> 35519c3 kdump-lib-initramfs.sh: prepare to be a POSIX compatible lib
> 
> which was backported from fedora. The same patch has not been backported to
> rhel8.

Ack, thanks :) 

So the issue is that the FENCE_KDUMP_SEND variable is not getting assigned during kdump execution, and thus the notification is never sent from the crashed node to the surviving node. The fix will need to occur in kexec-tools.

It's fine if we need to exclude kdump-lib.sh from kdump-lib-initramfs.sh. However, we'll need to assign that FENCE_KDUMP_SEND variable somewhere that's accessible from within the kdump initramfs.

Comment 16 ltao 2022-01-07 11:05:54 UTC
(In reply to Reid Wahl from comment #15)
> (In reply to ltao from comment #13)
> > Hi Reid,
> > 
> > The kdump-lib.sh is excluded from kdump-lib-initramfs.sh in rhel9 commit:
> >    
> > 35519c3 kdump-lib-initramfs.sh: prepare to be a POSIX compatible lib
> > 
> > which was backported from fedora. The same patch has not been backported to
> > rhel8.
> 
> Ack, thanks :) 
> 
> So the issue is that the FENCE_KDUMP_SEND variable is not getting assigned
> during kdump execution, and thus the notification is never sent from the
> crashed node to the surviving node. The fix will need to occur in
> kexec-tools.
> 
> It's fine if we need to exclude kdump-lib.sh from kdump-lib-initramfs.sh.
> However, we'll need to assign that FENCE_KDUMP_SEND variable somewhere
> that's accessible from within the kdump initramfs.

Thanks for the info! By a quick glimpse of the code, I guess FENCE_KDUMP_CONFIG_FILE FENCE_KDUMP_SEND FADUMP_ENABLED_SYS_NODE need to be moved from kdump-lib.sh to kdump-lib-initramfs.sh.

Previously kdump-lib.sh is sourced by kdump-lib-initramfs.sh, so the variables in kdump-lib.sh can get sourced. Currently they are sourced oppositely in commit 35519c3 kdump-lib-initramfs.sh: prepare to be a POSIX compatible lib.

Comment 17 Pingfan Liu 2022-01-10 01:50:36 UTC
Hi Reid,

(In reply to Reid Wahl from comment #8)
> Manual send/receive works fine when both nodes are booted normally.
> 
> [root@node1 ~]# fence_kdump -n node2 -o off -v -v
> [debug]: options {        
> [debug]:     nodename = node2
> [debug]:     ipport   = 7410
> [debug]:     family   = 0
> [debug]:     count    = 0
> [debug]:     interval = 10
> [debug]:     timeout  = 60
> [debug]:     verbose  = 2
> [debug]: }                
> [debug]: node {       
> [debug]:     name = node2
> [debug]:     addr = <addr2>
> [debug]:     port = 7410
> [debug]:     info = 0x5570ecb98970
> [debug]: }            
> [debug]: waiting for message from '<addr2>'
> 
> [debug]: received valid message from '<addr2>'
> 
> 
> [root@node2 ~]# /usr/libexec/fence_kdump_send -i 1 -v -v node1
> [debug]: options {        
> [debug]:     nodename = (null)
> [debug]:     ipport   = 7410
> [debug]:     family   = 0
> [debug]:     count    = 0
> [debug]:     interval = 1
> [debug]:     timeout  = 60
> [debug]:     verbose  = 2
> [debug]: }                
> [debug]: node {       
> [debug]:     name = node1
> [debug]:     addr = <addr1>
> [debug]:     port = 7410
> [debug]:     info = 0x555ba08df940
> [debug]: }            
> [debug]: message sent to node '<addr1>'

Thanks for the info, I know how to debug it next time.

And for the root cause originated from kexec-tools, I think Tao has explained it clearly. Thanks Tao.

Comment 18 Pingfan Liu 2022-01-10 01:58:09 UTC
(In reply to ltao from comment #16)
> (In reply to Reid Wahl from comment #15)
> > (In reply to ltao from comment #13)
> > > Hi Reid,
> > > 
> > > The kdump-lib.sh is excluded from kdump-lib-initramfs.sh in rhel9 commit:
> > >    
> > > 35519c3 kdump-lib-initramfs.sh: prepare to be a POSIX compatible lib
> > > 
> > > which was backported from fedora. The same patch has not been backported to
> > > rhel8.
> > 
> > Ack, thanks :) 
> > 
> > So the issue is that the FENCE_KDUMP_SEND variable is not getting assigned
> > during kdump execution, and thus the notification is never sent from the
> > crashed node to the surviving node. The fix will need to occur in
> > kexec-tools.
> > 
> > It's fine if we need to exclude kdump-lib.sh from kdump-lib-initramfs.sh.
> > However, we'll need to assign that FENCE_KDUMP_SEND variable somewhere
> > that's accessible from within the kdump initramfs.
> 
> Thanks for the info! By a quick glimpse of the code, I guess
> FENCE_KDUMP_CONFIG_FILE FENCE_KDUMP_SEND FADUMP_ENABLED_SYS_NODE need to be
> moved from kdump-lib.sh to kdump-lib-initramfs.sh.
> 
Unfortunately, these variable is shared by the 1st and 2nd kernel, which means both kdump-lib-initramfs.sh and kdump-lib.sh should use them.

We need to find a way out meanwhile stick to POSIX compatiblity.

Thanks

Pingfan
 
> Previously kdump-lib.sh is sourced by kdump-lib-initramfs.sh, so the
> variables in kdump-lib.sh can get sourced. Currently they are sourced
> oppositely in commit 35519c3 kdump-lib-initramfs.sh: prepare to be a POSIX
> compatible lib.

Comment 19 Reid Wahl 2022-01-10 02:15:34 UTC
(In reply to Pingfan Liu from comment #18)
> Unfortunately, these variable is shared by the 1st and 2nd kernel, which
> means both kdump-lib-initramfs.sh and kdump-lib.sh should use them.
> 
> We need to find a way out meanwhile stick to POSIX compatiblity.

kdump-lib.sh includes kdump-lib-initramfs.sh at the top. Could we just move the variable assignments to kdump-lib-initramfs.sh as Tao suggested in comment 16?

Comment 20 Pingfan Liu 2022-01-10 07:08:22 UTC
(In reply to Reid Wahl from comment #19)
> (In reply to Pingfan Liu from comment #18)
> > Unfortunately, these variable is shared by the 1st and 2nd kernel, which
> > means both kdump-lib-initramfs.sh and kdump-lib.sh should use them.
> > 
> > We need to find a way out meanwhile stick to POSIX compatiblity.
> 
> kdump-lib.sh includes kdump-lib-initramfs.sh at the top. Could we just move
> the variable assignments to kdump-lib-initramfs.sh as Tao suggested in
> comment 16?

According to the original commit
    commit a5faa052d4969cb66719d0b795d746449d3c71b7
    Author: Kairui Song <kasong>
    Date:   Tue Sep 14 03:25:46 2021 +0800
    
        kdump-lib-initramfs.sh: prepare to be a POSIX compatible lib
    
        Move all functions needed in the second kernel from kdump-lib.sh
        to kdump-lib-initramfs.sh, and update shebang headers.
    
        Now, kdump-lib-initramfs.sh is an independent lib script, no longer
        depend on kdump-lib.sh, and kdump-lib.sh is no longer needed for
        the second kernel.
    
        In later commits, functions in kdump-lib-initramfs.sh will be reworked
        to be POSIX compatible, kdump-lib.sh will contain bash only functions.
    
        POSIX shell have very limited features, eg. `local` keyword doesn't
        exist in POSIX but we rely on that heavily. So kdump-lib.sh will
        use bash syntax and contain the most complex helper and codes.
    
        kdump-lib-initramfs.sh will contain the minimum set of helpers,
        and be shared by both the first and second kernel.
               ^^^^^

I think it should be the right way as Tao suggested.

Thank both of you for the help.

Regards,

Pingfan

Comment 28 errata-xmlrpc 2022-05-17 15:56:53 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (new packages: kexec-tools), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:3974


Note You need to log in before you can comment on or make changes to this bug.