Bug 2093479
| Summary: | Some pcs commands fail when /proc/cmdline includes debug | |||
|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 8 | Reporter: | Reid Wahl <nwahl> | |
| Component: | systemd | Assignee: | David Tardon <dtardon> | |
| Status: | CLOSED ERRATA | QA Contact: | Frantisek Sumsal <fsumsal> | |
| Severity: | medium | Docs Contact: | ||
| Priority: | medium | |||
| Version: | 8.5 | CC: | cluster-maint, dtardon, idevat, jamacku, kmalyjur, mlisik, mpospisi, omular, sbradley, systemd-maint-list, tojeline | |
| Target Milestone: | rc | Keywords: | Bugfix, Triaged | |
| Target Release: | --- | Flags: | pm-rhel:
mirror+
|
|
| Hardware: | All | |||
| OS: | Linux | |||
| Whiteboard: | ||||
| Fixed In Version: | systemd-239-65.el8 | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | ||
| Clone Of: | ||||
| : | 2118297 (view as bug list) | Environment: | ||
| Last Closed: | 2022-11-08 10:49:56 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | ||||
| Bug Blocks: | 2118297 | |||
The root issue is that systemctl is printing non-ASCII characters even if pcs sets LC_ALL=C.
# LC_ALL=C SYSTEMD_LOG_LEVEL=debug systemctl status pacemaker
Bus n/a: changing state UNSET → OPENING
Bus n/a: changing state OPENING → AUTHENTICATING
Successfully forked off '(pager)' as PID 1349.
Showing one /org/freedesktop/systemd1/unit/pacemaker_2eservice
Bus n/a: changing state AUTHENTICATING <E2><86><92> RUNNING
{...omitted...}
Bus n/a: changing state RUNNING <E2><86><92> CLOSED
* pacemaker.service - Pacemaker High Availability Cluster Manager
Loaded: loaded (/usr/lib/systemd/system/pacemaker.service; enabled; vendor preset: disabled)
Active: active (running) since Mon 2022-06-06 16:33:48 CEST; 8min ago
Docs: man:pacemakerd
https://clusterlabs.org/pacemaker/doc/
Main PID: 1262 (pacemakerd)
Tasks: 7
Memory: 62.8M
CGroup: /system.slice/pacemaker.service
|-1262 /usr/sbin/pacemakerd
|-1263 /usr/libexec/pacemaker/pacemaker-based
|-1264 /usr/libexec/pacemaker/pacemaker-fenced
|-1265 /usr/libexec/pacemaker/pacemaker-execd
|-1266 /usr/libexec/pacemaker/pacemaker-attrd
|-1267 /usr/libexec/pacemaker/pacemaker-schedulerd
`-1268 /usr/libexec/pacemaker/pacemaker-controld
The output consists of mixed stdout and stderr. If you want to see just the debug log, redirect stdout to /dev/null.
Those '→' characters are non-ASCII.
Note that LC_ALL=C makes systemctl to print the process tree using ASCII characters.
Moving to systemd for further investigation. I've submitted https://github.com/systemd/systemd/pull/23821 (only the first commit is needed to fix this issue). fix merged to github master branch -> https://github.com/redhat-plumbers/systemd-rhel8/pull/308 fix merged to github master branch -> https://github.com/redhat-plumbers/systemd-rhel8/pull/307 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (systemd bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2022:7727 |
Description of problem: If debugging is enabled from the kernel command line, some pcs commands fail. This applies at least to `pcs cluster destroy` and `pcs cluster setup`, and possibly to others. The exact failure behavior can vary depending on pcs version and possibly other factors. In the support case where this issue was discovered, pcs-0.10.8-1.el8 is installed, and `pcs cluster setup` fails as follows: ``` root@node1:/root # pcs cluster setup newcluster --start node1 addr=<addr1> node2 addr=<addr2> Destroying cluster on hosts: 'node1', 'node2'... Error: node1: Error destroying cluster: ["Shutting down pacemaker/corosync services...\n"] ["Traceback (most recent call last):\n", " File \"/usr/sbin/pcs\", line 11, in <module>\n", " load_entry_point('pcs==0.10.8', 'console_scripts', 'pcs')()\n", " File \"/usr/lib/python3.6/site-packages/pcs/app.py\", line 277, in main\n", " utils.get_library_wrapper(), argv, utils.get_input_modifiers()\n", " File \"/usr/lib/python3.6/site-packages/pcs/cli/common/routing.py\", line 33, in _router\n", " return cmd_map[sub_cmd](lib, argv_next, modifiers)\n", " File \"/usr/lib/python3.6/site-packages/pcs/cli/common/routing.py\", line 33, in _router\n", " return cmd_map[sub_cmd](lib, argv_next, modifiers)\n", " File \"/usr/lib/python3.6/site-packages/pcs/cluster.py\", line 1348, in cluster_destroy\n", " utils.stop_service(service)\n", " File \"/usr/lib/python3.6/site-packages/pcs/utils.py\", line 2221, in stop_service\n", " [settings.systemctl_binary, \"stop\", service]\n", " File \"/usr/lib/python3.6/site-packages/pcs/lib/external.py\", line 132, in run\n", " out_std, out_err = process.communicate(stdin_string)\n", " File \"/usr/lib64/python3.6/subprocess.py\", line 863, in communicate\n", " stdout, stderr = self._communicate(input, endtime, timeout)\n", " File \"/usr/lib64/python3.6/subprocess.py\", line 1578, in _communicate\n", " self.stderr.errors)\n", " File \"/usr/lib64/python3.6/subprocess.py\", line 760, in _translate_newlines\n", " data = data.decode(encoding, errors)\n", "UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 30: ordinal not in range(128)\n"] 1 ... ``` I tested on pcs-0.10.10-4.el8_5.1, and my results are below. They differ from the customer's results but are failures nonetheless. Without debug: ``` [root@fastvm-rhel-8-0-23 ~]# pcs cluster setup testcluster node1 node2 --start --force ... Cluster has been successfully set up. Starting cluster on hosts: 'node1', 'node2'... ``` With debug: ``` [root@fastvm-rhel-8-0-23 ~]# pcs cluster destroy --all Warning: Unable to load CIB to get guest and remote nodes from it, those nodes will not be deconfigured. node2: Stopping Cluster (pacemaker)... node1: Error connecting to node1 - (HTTP error: 400) node1: Error connecting to node1 - (HTTP error: 400) node2: Successfully destroyed cluster Error: unable to destroy cluster node1: Error connecting to node1 - (HTTP error: 400) [root@fastvm-rhel-8-0-23 ~]# pcs cluster setup testcluster node1 node2 --start --force No addresses specified for host 'node1', using 'node1' No addresses specified for host 'node2', using 'node2' Error: node1: Required cluster services not installed: 'corosync', 'pacemaker' Warning: node1: The host seems to be in a cluster already as cluster configuration files have been found on the host. If the host is not part of a cluster, run 'pcs cluster destroy' on host 'node1' to remove those configuration files Error: Errors have occurred, therefore pcs is unable to continue ``` ----- Version-Release number of selected component (if applicable): pcs-0.10.8-1.el8 pcs-0.10.10-4.el8_5.1 ----- How reproducible: Always ----- Steps to Reproduce: 1. Boot with `debug` in the kernel command line. [root@fastvm-rhel-8-0-23 ~]# cat /proc/cmdline BOOT_IMAGE=(hd0,msdos1)/vmlinuz-4.18.0-305.3.1.el8_4.x86_64 root=/dev/mapper/r8vg-root_lv ro crashkernel=128M resume=/dev/mapper/r8vg-swap_lv rd.lvm.lv=r8vg/root_lv rd.lvm.lv=r8vg/swap_lv console=ttyS0,115200n8 fips=0 boot=UUID=17b0762b-92c6-474a-9d97-e8ceaa6825a6 debug 2. Run `pcs cluster destroy --all` and/or `pcs cluster setup ...` ----- Actual results: One or both commands fail, possibly with an error like one of the ones demonstrated in the Description. As noted there, the exact details of the failure seem variable depending on package versions, etc. ----- Expected results: Commands succeed. ----- Additional info: I feel like I've filed this as a bug within the past couple of years, but I can't find any such BZ or other documentation...