RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1860545 - fence_lpar: Long username, HMC hostname, or managed system name causes failures [RHEL 7] [rhel-7.9.z]
Summary: fence_lpar: Long username, HMC hostname, or managed system name causes failur...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: fence-agents
Version: 7.8
Hardware: All
OS: Linux
urgent
urgent
Target Milestone: rc
: ---
Assignee: Oyvind Albrigtsen
QA Contact: cluster-qe@redhat.com
URL:
Whiteboard:
Depends On: 1860544
Blocks: 1862461 1862462
TreeView+ depends on / blocked
 
Reported: 2020-07-25 00:29 UTC by Reid Wahl
Modified: 2023-12-15 18:35 UTC (History)
7 users (show)

Fixed In Version: fence-agents-4.2.1-41.el7_9.1
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1860544
: 1862461 1862462 (view as bug list)
Environment:
Last Closed: 2020-11-10 12:56:12 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github ClusterLabs fence-agents pull 351 0 None closed fence_lpar: Fix parse error from long command line 2020-11-10 14:48:07 UTC
Red Hat Knowledge Base (Solution) 5242121 0 None None None 2020-07-25 01:54:39 UTC
Red Hat Product Errata RHSA-2020:5003 0 None None None 2020-11-10 12:56:30 UTC

Description Reid Wahl 2020-07-25 00:29:01 UTC
+++ This bug was initially created as a clone of Bug #1860544 +++

Description of problem:

fence_lpar fails when the complete lssyscfg command line (including prompt) is greater than 80 characters long on the HMC. This happens with a long user name and/or a long managed system name.

**Note: This happens ONLY when fence_lpar is executed by pacemaker. It does NOT happen when fence_lpar is executed from the command line.**


A long command line gets a carriage return ('\r') added at the 80 character mark and wraps back to the beginning of the line with no line feed ('\n'), overwriting the displayed characters.

**Note: Adding `repr()` to fspawn.log_expect() makes debugging much easier, since literal carriage return characters overwrite displayed characters.**


fence_lpar's regex matches handle this fine when it's run from the command line. The regexes look for particular patterns (e.g., "\n" or ",state=(.*?),"), and the '\r' characters don't get in the way of this.

The problem is that when Pacemaker spawns fence_lpar, **for some unknown reason** there are backspace characters when we hit the '\r' character. This seems to overwrite some of the conn.before string.

~~~
Jul 24 17:20:13 fastvm-rhel-8-0-24 pacemaker-fenced[1607]: warning: fence_lpar[4419] stderr: [ 2020-07-24 17:20:10,775 INFO: Running command: /usr/bin/ssh  sv-hanahmc_prd-aunxhmc2a-z2h.lab.eng.bos.redhat.com -p 22 -o PubkeyAuthentication=no ]
Jul 24 17:20:13 fastvm-rhel-8-0-24 pacemaker-fenced[1607]: warning: fence_lpar[4419] stderr: [ 2020-07-24 17:20:12,045 DEBUG: Received: '\rPa''ssword:' ]
Jul 24 17:20:13 fastvm-rhel-8-0-24 pacemaker-fenced[1607]: warning: fence_lpar[4419] stderr: [ 2020-07-24 17:20:12,045 DEBUG: Sent: password ]
Jul 24 17:20:13 fastvm-rhel-8-0-24 pacemaker-fenced[1607]: warning: fence_lpar[4419] stderr: [  ]
Jul 24 17:20:13 fastvm-rhel-8-0-24 pacemaker-fenced[1607]: warning: fence_lpar[4419] stderr: [ 2020-07-24 17:20:13,074 DEBUG: Received: ' \r\nLast login: Sat Jul 25 00:20:00 2020 from 10.3.112.217\r\r\nsv-hanahmc_prd-aunxhmc2a-z2h@p9-vhmc1'':~>' ]
Jul 24 17:20:13 fastvm-rhel-8-0-24 pacemaker-fenced[1607]: warning: fence_lpar[4419] stderr: [ 2020-07-24 17:20:13,074 DEBUG: Sent: lssyscfg -r lpar -m ibm-p9z-20 -F name:state ]
Jul 24 17:20:13 fastvm-rhel-8-0-24 pacemaker-fenced[1607]: warning: fence_lpar[4419] stderr: [  ]
Jul 24 17:20:13 fastvm-rhel-8-0-24 pacemaker-fenced[1607]: warning: fence_lpar[4419] stderr: [ 2020-07-24 17:20:13,238 DEBUG: Received: ' lssyscfg -r lpar -m ibm-p9z-20 -F na\r<h@p9-vhmc1'':~>' ]
Jul 24 17:20:13 fastvm-rhel-8-0-24 pacemaker-fenced[1607]: warning: fence_lpar[4419] stderr: [ 2020-07-24 17:20:13,240 ERROR: Unable to parse output of list command ]
Jul 24 17:20:13 fastvm-rhel-8-0-24 pacemaker-fenced[1607]: warning: fence_lpar[4419] stderr: [  ]
Jul 24 17:20:13 fastvm-rhel-8-0-24 pacemaker-fenced[1607]: warning: fence_lpar[4419] stderr: [ 2020-07-24 17:20:13,240 ERROR: Please use '-h' for usage ]
Jul 24 17:20:13 fastvm-rhel-8-0-24 pacemaker-fenced[1607]: warning: fence_lpar[4419] stderr: [  ]
Jul 24 17:20:13 fastvm-rhel-8-0-24 pacemaker-fenced[1607]: notice: Operation 'monitor' [4419] for device 'lpar' returned: -201 (Generic Pacemaker error)
~~~


I added this with / write block to get additional debug information about the pexpect.spawn object:

                with open("/tmp/nwahl_conn.out", "w") as f:
                        f.write(str(conn))
                if res == None:
                        fail_usage("Unable to parse output of list command")

When fence_lpar is run from the command line:
~~~
# cat /tmp/nwahl_conn.out 
<fencing.fspawn object at 0x7fe21f082048>
command: /usr/bin/ssh
args: [b'/usr/bin/ssh', b'sv-hanahmc_prd-aunxhmc2a-z2h.lab.eng.bos.redhat.com', b'-p', b'22', b'-o', b'PubkeyAuthentication=no']
buffer (last 100 chars): ' '
before (last 100 chars): '\np9-node4:Running\r\np9-node2:Running\r\nibm-p9z-20-vios1:Running\r\nsv-hanahmc_prd-aunxhmc2a-z2h@p9-vhmc1'
after: ':~>'
match: <_sre.SRE_Match object; span=(829, 832), match=':~>'>
match_index: 0
exitstatus: None
flag_eof: False
pid: 4389
child_fd: 6
closed: False
timeout: 30
delimiter: <class 'pexpect.exceptions.EOF'>
logfile: None
logfile_read: None
logfile_send: None
maxread: 2000
ignorecase: False
searchwindowsize: None
delaybeforesend: 0.05
delayafterclose: 0.1
delayafterterminate: 0.1
~~~

When fence_lpar is run by Pacemaker:
~~~
# cat /tmp/nwahl_conn.out 
<fencing.fspawn object at 0x7f0c4b11e550>
command: /usr/bin/ssh
args: [b'/usr/bin/ssh', b'sv-hanahmc_prd-aunxhmc2a-z2h.lab.eng.bos.redhat.com', b'-p', b'22', b'-o', b'PubkeyAuthentication=no']
buffer (last 100 chars): ' lssyscfg -r lpar -m ibm-p9z-20 -F nam                         \x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08e:state\r\n'
before (last 100 chars): ' lssyscfg -r lpar -m ibm-p9z-20 -F na\r<h@p9-vhmc1'
after: ':~>'
match: <_sre.SRE_Match object; span=(49, 52), match=':~>'>
match_index: 0
exitstatus: None
flag_eof: False
pid: 4421
child_fd: 8
closed: False
timeout: 30
delimiter: <class 'pexpect.exceptions.EOF'>
logfile: None
logfile_read: None
logfile_send: None
maxread: 2000
ignorecase: False
searchwindowsize: None
delaybeforesend: 0.05
delayafterclose: 0.1
delayafterterminate: 0.1
~~~

Note the '\x08' (backspace) characters in the buffer, and note that the `conn.before` string gets collapsed. It's not truncated. It's actually missing characters **in the middle**. "h@p9-vhmc1" is the correct end of the conn.before string, and it's still present. A section in the middle gets replaced by "\r<".

Pacemaker plays no role in initiating the SSH session or parsing its output; it simply launches fence_lpar and lets it run. So the difference in behavior depending on how fence_lpar is executed is baffling. All of the same options are passed. Additionally, I took extra steps to ensure that fence_lpar explicitly set all the user environment variables before running the SSH command, since Pacemaker doesn't set them. This made no difference and the same errors occurred. I also made sure that the same SSH client options are used when Pacemaker launches fence_lpar.

-----

Version-Release number of selected component (if applicable):

fence-agents-lpar-4.2.1-30.el8_1.1.noarch

-----

How reproducible:

Always

-----

Steps to Reproduce:
1. Configure a fence_lpar stonith device so that the HMC prompt plus the lssyscfg command line is greater than 80 characters long.
2a. Allow Pacemaker to run a start/monitor operation.
2b. Try to fence a node.

-----

Actual results:

a. The monitor operation fails with "Unable to parse output of list command."
b. Fencing fails with EC_STATUS_HMC ("Either unable to obtain correct plug status, partition is not available or incorrect HMC version used").

-----

Expected results:

Monitoring and fencing succeed.

-----

Additional information:

This bug renders fence_lpar unusable unless a user is able to reduce the length of the HMC username and/or the managed system name. An accelerated fix may be in order.

--- Additional comment from Reid Wahl on 2020-07-25 00:28:02 UTC ---

Example credentials that reproduce the issue:

 Resource: lpar (class=stonith type=fence_lpar)
  Attributes: ipaddr=p9-vhmc1.fs.lab.eng.bos.redhat.com login=sv-hanahmc_prd-aunxhmc2a-z2h managed=ibm-p9z-20 passwd=password pcmk_host_map=node2:node2 verbose=1


Example credentials that do NOT reproduce the issue because the username is shorter:

 Resource: lpar (class=stonith type=fence_lpar)
  Attributes: ipaddr=p9-vhmc1.fs.lab.eng.bos.redhat.com login=bkr_operator managed=ibm-p9z-20 passwd=beaker01 pcmk_host_map=node2:node2 verbose=1

Comment 16 errata-xmlrpc 2020-11-10 12:56:12 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Low: fence-agents security and bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:5003


Note You need to log in before you can comment on or make changes to this bug.