Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1860544

Summary: fence_lpar: Long username, HMC hostname, or managed system name causes failures [RHEL 8]
Product: Red Hat Enterprise Linux 8 Reporter: Reid Wahl <nwahl>
Component: fence-agentsAssignee: Oyvind Albrigtsen <oalbrigt>
Status: CLOSED ERRATA QA Contact: cluster-qe <cluster-qe>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 8.2CC: bfrank, cfeist, cluster-maint, cmackows, hannsj_uhl, sbradley
Target Milestone: rcKeywords: ZStream
Target Release: 8.0Flags: pm-rhel: mirror+
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: fence-agents-4.2.1-52.el8 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1860545 1861138 1861139 (view as bug list) Environment:
Last Closed: 2020-11-04 02:29:00 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1860545, 1861138, 1861139, 1862461, 1862462    

Description Reid Wahl 2020-07-25 00:26:32 UTC
Description of problem:

fence_lpar fails when the complete lssyscfg command line (including prompt) is greater than 80 characters long on the HMC. This happens with a long user name and/or a long managed system name.

**Note: This happens ONLY when fence_lpar is executed by pacemaker. It does NOT happen when fence_lpar is executed from the command line.**


A long command line gets a carriage return ('\r') added at the 80 character mark and wraps back to the beginning of the line with no line feed ('\n'), overwriting the displayed characters.

**Note: Adding `repr()` to fspawn.log_expect() makes debugging much easier, since literal carriage return characters overwrite displayed characters.**


fence_lpar's regex matches handle this fine when it's run from the command line. The regexes look for particular patterns (e.g., "\n" or ",state=(.*?),"), and the '\r' characters don't get in the way of this.

The problem is that when Pacemaker spawns fence_lpar, **for some unknown reason** there are backspace characters when we hit the '\r' character. This seems to overwrite some of the conn.before string.

~~~
Jul 24 17:20:13 fastvm-rhel-8-0-24 pacemaker-fenced[1607]: warning: fence_lpar[4419] stderr: [ 2020-07-24 17:20:10,775 INFO: Running command: /usr/bin/ssh  sv-hanahmc_prd-aunxhmc2a-z2h.lab.eng.bos.redhat.com -p 22 -o PubkeyAuthentication=no ]
Jul 24 17:20:13 fastvm-rhel-8-0-24 pacemaker-fenced[1607]: warning: fence_lpar[4419] stderr: [ 2020-07-24 17:20:12,045 DEBUG: Received: '\rPa''ssword:' ]
Jul 24 17:20:13 fastvm-rhel-8-0-24 pacemaker-fenced[1607]: warning: fence_lpar[4419] stderr: [ 2020-07-24 17:20:12,045 DEBUG: Sent: password ]
Jul 24 17:20:13 fastvm-rhel-8-0-24 pacemaker-fenced[1607]: warning: fence_lpar[4419] stderr: [  ]
Jul 24 17:20:13 fastvm-rhel-8-0-24 pacemaker-fenced[1607]: warning: fence_lpar[4419] stderr: [ 2020-07-24 17:20:13,074 DEBUG: Received: ' \r\nLast login: Sat Jul 25 00:20:00 2020 from 10.3.112.217\r\r\nsv-hanahmc_prd-aunxhmc2a-z2h@p9-vhmc1'':~>' ]
Jul 24 17:20:13 fastvm-rhel-8-0-24 pacemaker-fenced[1607]: warning: fence_lpar[4419] stderr: [ 2020-07-24 17:20:13,074 DEBUG: Sent: lssyscfg -r lpar -m ibm-p9z-20 -F name:state ]
Jul 24 17:20:13 fastvm-rhel-8-0-24 pacemaker-fenced[1607]: warning: fence_lpar[4419] stderr: [  ]
Jul 24 17:20:13 fastvm-rhel-8-0-24 pacemaker-fenced[1607]: warning: fence_lpar[4419] stderr: [ 2020-07-24 17:20:13,238 DEBUG: Received: ' lssyscfg -r lpar -m ibm-p9z-20 -F na\r<h@p9-vhmc1'':~>' ]
Jul 24 17:20:13 fastvm-rhel-8-0-24 pacemaker-fenced[1607]: warning: fence_lpar[4419] stderr: [ 2020-07-24 17:20:13,240 ERROR: Unable to parse output of list command ]
Jul 24 17:20:13 fastvm-rhel-8-0-24 pacemaker-fenced[1607]: warning: fence_lpar[4419] stderr: [  ]
Jul 24 17:20:13 fastvm-rhel-8-0-24 pacemaker-fenced[1607]: warning: fence_lpar[4419] stderr: [ 2020-07-24 17:20:13,240 ERROR: Please use '-h' for usage ]
Jul 24 17:20:13 fastvm-rhel-8-0-24 pacemaker-fenced[1607]: warning: fence_lpar[4419] stderr: [  ]
Jul 24 17:20:13 fastvm-rhel-8-0-24 pacemaker-fenced[1607]: notice: Operation 'monitor' [4419] for device 'lpar' returned: -201 (Generic Pacemaker error)
~~~


I added this with / write block to get additional debug information about the pexpect.spawn object:

                with open("/tmp/nwahl_conn.out", "w") as f:
                        f.write(str(conn))
                if res == None:
                        fail_usage("Unable to parse output of list command")

When fence_lpar is run from the command line:
~~~
# cat /tmp/nwahl_conn.out 
<fencing.fspawn object at 0x7fe21f082048>
command: /usr/bin/ssh
args: [b'/usr/bin/ssh', b'sv-hanahmc_prd-aunxhmc2a-z2h.lab.eng.bos.redhat.com', b'-p', b'22', b'-o', b'PubkeyAuthentication=no']
buffer (last 100 chars): ' '
before (last 100 chars): '\np9-node4:Running\r\np9-node2:Running\r\nibm-p9z-20-vios1:Running\r\nsv-hanahmc_prd-aunxhmc2a-z2h@p9-vhmc1'
after: ':~>'
match: <_sre.SRE_Match object; span=(829, 832), match=':~>'>
match_index: 0
exitstatus: None
flag_eof: False
pid: 4389
child_fd: 6
closed: False
timeout: 30
delimiter: <class 'pexpect.exceptions.EOF'>
logfile: None
logfile_read: None
logfile_send: None
maxread: 2000
ignorecase: False
searchwindowsize: None
delaybeforesend: 0.05
delayafterclose: 0.1
delayafterterminate: 0.1
~~~

When fence_lpar is run by Pacemaker:
~~~
# cat /tmp/nwahl_conn.out 
<fencing.fspawn object at 0x7f0c4b11e550>
command: /usr/bin/ssh
args: [b'/usr/bin/ssh', b'sv-hanahmc_prd-aunxhmc2a-z2h.lab.eng.bos.redhat.com', b'-p', b'22', b'-o', b'PubkeyAuthentication=no']
buffer (last 100 chars): ' lssyscfg -r lpar -m ibm-p9z-20 -F nam                         \x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08e:state\r\n'
before (last 100 chars): ' lssyscfg -r lpar -m ibm-p9z-20 -F na\r<h@p9-vhmc1'
after: ':~>'
match: <_sre.SRE_Match object; span=(49, 52), match=':~>'>
match_index: 0
exitstatus: None
flag_eof: False
pid: 4421
child_fd: 8
closed: False
timeout: 30
delimiter: <class 'pexpect.exceptions.EOF'>
logfile: None
logfile_read: None
logfile_send: None
maxread: 2000
ignorecase: False
searchwindowsize: None
delaybeforesend: 0.05
delayafterclose: 0.1
delayafterterminate: 0.1
~~~

Note the '\x08' (backspace) characters in the buffer, and note that the `conn.before` string gets collapsed. It's not truncated. It's actually missing characters **in the middle**. "h@p9-vhmc1" is the correct end of the conn.before string, and it's still present. A section in the middle gets replaced by "\r<".

Pacemaker plays no role in initiating the SSH session or parsing its output; it simply launches fence_lpar and lets it run. So the difference in behavior depending on how fence_lpar is executed is baffling. All of the same options are passed. Additionally, I took extra steps to ensure that fence_lpar explicitly set all the user environment variables before running the SSH command, since Pacemaker doesn't set them. This made no difference and the same errors occurred. I also made sure that the same SSH client options are used when Pacemaker launches fence_lpar.

-----

Version-Release number of selected component (if applicable):

fence-agents-lpar-4.2.1-30.el8_1.1.noarch

-----

How reproducible:

Always

-----

Steps to Reproduce:
1. Configure a fence_lpar stonith device so that the HMC prompt plus the lssyscfg command line is greater than 80 characters long.
2a. Allow Pacemaker to run a start/monitor operation.
2b. Try to fence a node.

-----

Actual results:

a. The monitor operation fails with "Unable to parse output of list command."
b. Fencing fails with EC_STATUS_HMC ("Either unable to obtain correct plug status, partition is not available or incorrect HMC version used").

-----

Expected results:

Monitoring and fencing succeed.

-----

Additional information:

This bug renders fence_lpar unusable unless a user is able to reduce the length of the HMC username and/or the managed system name. An accelerated fix may be in order.

Comment 13 errata-xmlrpc 2020-11-04 02:29:00 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (fence-agents bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4622