1628947 – crm_resource's terminal handling is fishy

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1628947 - crm_resource's terminal handling is fishy

Summary: crm_resource's terminal handling is fishy

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 7
Classification:	Red Hat
Component:	pacemaker
Sub Component:
Version:	7.6
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	rc
Target Release:	7.6
Assignee:	Ken Gaillot
QA Contact:	cluster-qe@redhat.com
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1574898
TreeView+	depends on / blocked

Reported:	2018-09-14 13:23 UTC by Radek Steiger
Modified:	2018-10-30 07:59 UTC (History)
CC List:	9 users (show)
Fixed In Version:	pacemaker-1.1.19-8.el7
Doc Type:	No Doc Update
Doc Text:	Not of interest to most end users
Clone Of:
Environment:
Last Closed:	2018-10-30 07:58:22 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2018:3055	0	None	None	None	2018-10-30 07:59:03 UTC

Description Radek Steiger 2018-09-14 13:23:39 UTC

> Summary:

This issue arised in pcs after some changes have been to command execution. It turned out the issue seems to be in how crm_resource works with terminal and verbose output.


> Version:

Tested with: pacemaker-cli-1.1.19-7.el7


> Description:

When crm_resource is run locally from normal shell session, everything is okay:

[root@virt-126 ~]# /usr/sbin/crm_resource -r dummy0 --force-start --verbose --verbose | head -n 5
Operation start for dummy0 (ocf:heartbeat:Dummy) returned: 'ok' (0)
 >  stderr: +++ 20:59:31: ocf_start_trace:1000: echo
 >  stderr: +++ 20:59:31: ocf_start_trace:1000: printenv
 >  stderr: +++ 20:59:31: ocf_start_trace:1000: sort
 >  stderr: ++ 20:59:31: ocf_start_trace:1000: env='


Running through ssh command execution however most of the output is discarded and never returned back:

[root@virt-126 ~]# ssh root@localhost /usr/sbin/crm_resource -r dummy0 --force-start --verbose --verbose | head -n 5
Operation start for dummy0 (ocf:heartbeat:Dummy) returned: 'ok' (0)

Same happens with qarsh (an important QA tool for remote execution):

[root@virt-126 ~]# qarsh localhost /usr/sbin/crm_resource -r dummy0 --force-start --verbose --verbose | head -n 5
Operation start for dummy0 (ocf:heartbeat:Dummy) returned: 'ok' (0)


Forcing terminal allocation in ssh seems to workaround the behavior:

[root@virt-126 pcs]# ssh -t root@localhost /usr/sbin/crm_resource -r dummy0 --force-start --verbose --verbose | head -n 5
Connection to localhost closed.
Operation start for dummy0 (ocf:heartbeat:Dummy) returned: 'ok' (0)
 >  stderr: +++ 15:15:28: ocf_start_trace:1000: echo
 >  stderr: +++ 15:15:28: ocf_start_trace:1000: printenv
 >  stderr: +++ 15:15:28: ocf_start_trace:1000: sort
 >  stderr: ++ 15:15:28: ocf_start_trace:1000: env='

Just like `unbuffer` wrapper from expect package does:

[root@virt-126 ~]# ssh root@localhost unbuffer /usr/sbin/crm_resource -r dummy0 --force-start --verbose --verbose | head -n 5
Operation start for dummy0 (ocf:heartbeat:Dummy) returned: 'ok' (0)
 >  stderr: +++ 21:00:18: ocf_start_trace:1000: echo
 >  stderr: +++ 21:00:18: ocf_start_trace:1000: printenv
 >  stderr: +++ 21:00:18: ocf_start_trace:1000: sort
 >  stderr: ++ 21:00:18: ocf_start_trace:1000: env='

[root@virt-126 ~]# qarsh localhost unbuffer /usr/sbin/crm_resource -r dummy0 --force-start --verbose --verbose | head -n 5
Operation start for dummy0 (ocf:heartbeat:Dummy) returned: 'ok' (0)
 >  stderr: +++ 21:01:04: ocf_start_trace:1000: echo
 >  stderr: +++ 21:01:04: ocf_start_trace:1000: printenv
 >  stderr: +++ 21:01:04: ocf_start_trace:1000: sort
 >  stderr: ++ 21:01:04: ocf_start_trace:1000: env='


PCS has got a change in execution implementation due to an unrelated problem with SBD and now suffers from the same problem as discovered in reproduction of bug 1574898, which is now blocked by this behavior.

Comment 1 Ken Gaillot 2018-09-14 22:55:08 UTC

This is a bizarre problem, but thankfully, it's someone else's problem ;-)

It has something to do with the resource-agents package's ocf-shellfuncs library. The minimal reproducer is:
---
cd /root

cat >called.sh <<"EOF"
#!/bin/sh
. /usr/lib/ocf/lib/heartbeat/ocf-shellfuncs
echo hello world
exit 0
EOF

chmod a+x called.sh
---

(Being root lets the library create a trace file under /var/lib. You could probably set OCF_TRACE_FILE and run it as a regular user, but I wanted to show the minimal reproducer.)

If you run the script interactively, with or without an action name, you get the expected output:
---
# OCF_TRACE_RA=0 ./called.sh
hello world
# OCF_TRACE_RA=1 ./called.sh
+ 17:49:55: 3: echo hello world
hello world
+ 17:49:55: 4: exit 0
# OCF_TRACE_RA=0 ./called.sh start
hello world
# OCF_TRACE_RA=1 ./called.sh start
+ 17:45:24: 3: echo hello world
hello world
+ 17:45:24: 4: exit 0
---

If you call the script via ssh *without* an action name, you get the expected output, but if you call the script via ssh *with* an action name, you don't get trace output:
---
% ssh root@rhel7-1 OCF_TRACE_RA=0 ./called.sh 
hello world
% ssh root@rhel7-1 OCF_TRACE_RA=1 ./called.sh 
hello world
+ 17:48:21: 3: echo hello world
+ 17:48:21: 4: exit 0
% ssh root@rhel7-1 OCF_TRACE_RA=0 ./called.sh start
hello world
% ssh root@rhel7-1 OCF_TRACE_RA=1 ./called.sh start
hello world
---

That's as far as I investigated.

Comment 4 Ken Gaillot 2018-09-20 14:30:14 UTC

The best solution will probably be to have crm_resource call the resource agent with a pseudo-terminal, so that ocf-shellfuncs sends log messages to stderr. Even if crm_resource is called from a script, it makes sense to log to stderr.

Comment 6 Ken Gaillot 2018-09-20 18:34:59 UTC

Ping-ponging back to pacemaker. The latest plan is for crm_resource to set OCF_TRACE_FILE to /dev/stderr in the environment, which should make the ocf-shellfuncs library log to stderr without having to do pseudo-terminal gymnastics.

Comment 8 Ken Gaillot 2018-09-25 02:57:53 UTC

Fixed in upstream 1.1 branch by commit d351a32

Comment 12 errata-xmlrpc 2018-10-30 07:58:22 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:3055

Note You need to log in before you can comment on or make changes to this bug.