Bug 2093479

Summary: Some pcs commands fail when /proc/cmdline includes debug
Product: Red Hat Enterprise Linux 8 Reporter: Reid Wahl <nwahl>
Component: systemdAssignee: David Tardon <dtardon>
Status: CLOSED ERRATA QA Contact: Frantisek Sumsal <fsumsal>
Severity: medium Docs Contact:
Priority: medium    
Version: 8.5CC: cluster-maint, dtardon, idevat, jamacku, kmalyjur, mlisik, mpospisi, omular, sbradley, systemd-maint-list, tojeline
Target Milestone: rcKeywords: Bugfix, Triaged
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: systemd-239-65.el8 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 2118297 (view as bug list) Environment:
Last Closed: 2022-11-08 10:49:56 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 2118297    

Description Reid Wahl 2022-06-03 20:09:43 UTC
Description of problem:

If debugging is enabled from the kernel command line, some pcs commands fail. This applies at least to `pcs cluster destroy` and `pcs cluster setup`, and possibly to others.

The exact failure behavior can vary depending on pcs version and possibly other factors. In the support case where this issue was discovered, pcs-0.10.8-1.el8 is installed, and `pcs cluster setup` fails as follows:
```
root@node1:/root # pcs cluster setup newcluster --start node1  addr=<addr1> node2 addr=<addr2>
Destroying cluster on hosts: 'node1', 'node2'...
Error: node1: Error destroying cluster:
["Shutting down pacemaker/corosync services...\n"]
["Traceback (most recent call last):\n", "  File \"/usr/sbin/pcs\", line 11, in <module>\n", "    load_entry_point('pcs==0.10.8', 'console_scripts', 'pcs')()\n", "  File \"/usr/lib/python3.6/site-packages/pcs/app.py\", line 277, in main\n", "    utils.get_library_wrapper(), argv, utils.get_input_modifiers()\n", "  File \"/usr/lib/python3.6/site-packages/pcs/cli/common/routing.py\", line 33, in _router\n", "    return cmd_map[sub_cmd](lib, argv_next, modifiers)\n", "  File \"/usr/lib/python3.6/site-packages/pcs/cli/common/routing.py\", line 33, in _router\n", "    return cmd_map[sub_cmd](lib, argv_next, modifiers)\n", "  File \"/usr/lib/python3.6/site-packages/pcs/cluster.py\", line 1348, in cluster_destroy\n", "    utils.stop_service(service)\n", "  File \"/usr/lib/python3.6/site-packages/pcs/utils.py\", line 2221, in stop_service\n", "    [settings.systemctl_binary, \"stop\", service]\n", "  File \"/usr/lib/python3.6/site-packages/pcs/lib/external.py\", line 132, in run\n", "    out_std, out_err = process.communicate(stdin_string)\n", "  File \"/usr/lib64/python3.6/subprocess.py\", line 863, in communicate\n", "    stdout, stderr = self._communicate(input, endtime, timeout)\n", "  File \"/usr/lib64/python3.6/subprocess.py\", line 1578, in _communicate\n", "    self.stderr.errors)\n", "  File \"/usr/lib64/python3.6/subprocess.py\", line 760, in _translate_newlines\n", "    data = data.decode(encoding, errors)\n", "UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 30: ordinal not in range(128)\n"]
1
...
```


I tested on pcs-0.10.10-4.el8_5.1, and my results are below. They differ from the customer's results but are failures nonetheless.

Without debug:
```
[root@fastvm-rhel-8-0-23 ~]# pcs cluster setup testcluster node1 node2 --start --force 
...
Cluster has been successfully set up.
Starting cluster on hosts: 'node1', 'node2'...
```

With debug:
```
[root@fastvm-rhel-8-0-23 ~]# pcs cluster destroy --all
Warning: Unable to load CIB to get guest and remote nodes from it, those nodes will not be deconfigured.
node2: Stopping Cluster (pacemaker)...
node1: Error connecting to node1 - (HTTP error: 400)
node1: Error connecting to node1 - (HTTP error: 400)
node2: Successfully destroyed cluster
Error: unable to destroy cluster
node1: Error connecting to node1 - (HTTP error: 400)

[root@fastvm-rhel-8-0-23 ~]# pcs cluster setup testcluster node1 node2 --start --force 
No addresses specified for host 'node1', using 'node1'
No addresses specified for host 'node2', using 'node2'
Error: node1: Required cluster services not installed: 'corosync', 'pacemaker'
Warning: node1: The host seems to be in a cluster already as cluster configuration files have been found on the host. If the host is not part of a cluster, run 'pcs cluster destroy' on host 'node1' to remove those configuration files
Error: Errors have occurred, therefore pcs is unable to continue
```

-----

Version-Release number of selected component (if applicable):

pcs-0.10.8-1.el8
pcs-0.10.10-4.el8_5.1

-----

How reproducible:

Always

-----

Steps to Reproduce:
1. Boot with `debug` in the kernel command line.

[root@fastvm-rhel-8-0-23 ~]# cat /proc/cmdline 
BOOT_IMAGE=(hd0,msdos1)/vmlinuz-4.18.0-305.3.1.el8_4.x86_64 root=/dev/mapper/r8vg-root_lv ro crashkernel=128M resume=/dev/mapper/r8vg-swap_lv rd.lvm.lv=r8vg/root_lv rd.lvm.lv=r8vg/swap_lv console=ttyS0,115200n8 fips=0 boot=UUID=17b0762b-92c6-474a-9d97-e8ceaa6825a6 debug

2. Run `pcs cluster destroy --all` and/or `pcs cluster setup ...`

-----

Actual results:

One or both commands fail, possibly with an error like one of the ones demonstrated in the Description. As noted there, the exact details of the failure seem variable depending on package versions, etc.

-----

Expected results:

Commands succeed.

-----

Additional info:

I feel like I've filed this as a bug within the past couple of years, but I can't find any such BZ or other documentation...

Comment 1 Tomas Jelinek 2022-06-06 14:53:20 UTC
The root issue is that systemctl is printing non-ASCII characters even if pcs sets LC_ALL=C.

# LC_ALL=C SYSTEMD_LOG_LEVEL=debug systemctl status pacemaker
Bus n/a: changing state UNSET → OPENING
Bus n/a: changing state OPENING → AUTHENTICATING
Successfully forked off '(pager)' as PID 1349.
Showing one /org/freedesktop/systemd1/unit/pacemaker_2eservice
Bus n/a: changing state AUTHENTICATING <E2><86><92> RUNNING
{...omitted...}
Bus n/a: changing state RUNNING <E2><86><92> CLOSED
* pacemaker.service - Pacemaker High Availability Cluster Manager
   Loaded: loaded (/usr/lib/systemd/system/pacemaker.service; enabled; vendor preset: disabled)
   Active: active (running) since Mon 2022-06-06 16:33:48 CEST; 8min ago
     Docs: man:pacemakerd
           https://clusterlabs.org/pacemaker/doc/
 Main PID: 1262 (pacemakerd)
    Tasks: 7
   Memory: 62.8M
   CGroup: /system.slice/pacemaker.service
           |-1262 /usr/sbin/pacemakerd
           |-1263 /usr/libexec/pacemaker/pacemaker-based
           |-1264 /usr/libexec/pacemaker/pacemaker-fenced
           |-1265 /usr/libexec/pacemaker/pacemaker-execd
           |-1266 /usr/libexec/pacemaker/pacemaker-attrd
           |-1267 /usr/libexec/pacemaker/pacemaker-schedulerd
           `-1268 /usr/libexec/pacemaker/pacemaker-controld

The output consists of mixed stdout and stderr. If you want to see just the debug log, redirect stdout to /dev/null.
Those '→' characters are non-ASCII.
Note that LC_ALL=C makes systemctl to print the process tree using ASCII characters.

Comment 2 Tomas Jelinek 2022-06-17 08:21:47 UTC
Moving to systemd for further investigation.

Comment 3 David Tardon 2022-06-24 08:30:15 UTC
I've submitted https://github.com/systemd/systemd/pull/23821 (only the first commit is needed to fix this issue).

Comment 4 Plumber Bot 2022-08-18 11:55:36 UTC
fix merged to github master branch -> https://github.com/redhat-plumbers/systemd-rhel8/pull/308

Comment 5 Plumber Bot 2022-08-18 11:55:55 UTC
fix merged to github master branch -> https://github.com/redhat-plumbers/systemd-rhel8/pull/307

Comment 9 errata-xmlrpc 2022-11-08 10:49:56 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (systemd bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:7727