Bug 1783846 - rhel worker can not get ready and crio service is reporting "Unknown option --persist-dir"
Summary: rhel worker can not get ready and crio service is reporting "Unknown option -...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Node
Version: 4.3.0
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 4.3.0
Assignee: Jindrich Novy
QA Contact: Sunil Choudhary
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-12-16 04:56 UTC by Johnny Liu
Modified: 2020-01-23 11:19 UTC (History)
3 users (show)

Fixed In Version: conmon-2.0.8-2.el7
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-01-23 11:19:25 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2020:0062 0 None None None 2020-01-23 11:19:49 UTC

Description Johnny Liu 2019-12-16 04:56:52 UTC
Description of problem:


Version-Release number of selected component (if applicable):
openshift-ansible-4.3.0-201912130552.git.177.65373bf.el7.noarch
cri-o-1.16.2-3.dev.rhaos4.3.gitd575ff8.el7.x86_64

How reproducible:
Always

Steps to Reproduce:
1. scale up a rhel worker to join an existing cluster
2.
3.

Actual results:
openshift-ansible playbook failed.
TASK [openshift_node : Wait for nodes to report ready] *************************
Monday 16 December 2019  11:51:56 +0800 (0:00:12.695)       0:09:50.536 ******* 
FAILED - RETRYING: Wait for nodes to report ready (36 retries left).
FAILED - RETRYING: Wait for nodes to report ready (35 retries left).
FAILED - RETRYING: Wait for nodes to report ready (34 retries left).
FAILED - RETRYING: Wait for nodes to report ready (33 retries left).
FAILED - RETRYING: Wait for nodes to report ready (32 retries left).
FAILED - RETRYING: Wait for nodes to report ready (31 retries left).
FAILED - RETRYING: Wait for nodes to report ready (30 retries left).
FAILED - RETRYING: Wait for nodes to report ready (29 retries left).
FAILED - RETRYING: Wait for nodes to report ready (28 retries left).
FAILED - RETRYING: Wait for nodes to report ready (27 retries left).
FAILED - RETRYING: Wait for nodes to report ready (26 retries left).
FAILED - RETRYING: Wait for nodes to report ready (25 retries left).
FAILED - RETRYING: Wait for nodes to report ready (24 retries left).
FAILED - RETRYING: Wait for nodes to report ready (23 retries left).
FAILED - RETRYING: Wait for nodes to report ready (22 retries left).
FAILED - RETRYING: Wait for nodes to report ready (21 retries left).
FAILED - RETRYING: Wait for nodes to report ready (20 retries left).
FAILED - RETRYING: Wait for nodes to report ready (19 retries left).
FAILED - RETRYING: Wait for nodes to report ready (18 retries left).
FAILED - RETRYING: Wait for nodes to report ready (17 retries left).
FAILED - RETRYING: Wait for nodes to report ready (16 retries left).
FAILED - RETRYING: Wait for nodes to report ready (15 retries left).
FAILED - RETRYING: Wait for nodes to report ready (14 retries left).
FAILED - RETRYING: Wait for nodes to report ready (13 retries left).
FAILED - RETRYING: Wait for nodes to report ready (12 retries left).
FAILED - RETRYING: Wait for nodes to report ready (11 retries left).
FAILED - RETRYING: Wait for nodes to report ready (10 retries left).
FAILED - RETRYING: Wait for nodes to report ready (9 retries left).
FAILED - RETRYING: Wait for nodes to report ready (8 retries left).
FAILED - RETRYING: Wait for nodes to report ready (7 retries left).
FAILED - RETRYING: Wait for nodes to report ready (6 retries left).
FAILED - RETRYING: Wait for nodes to report ready (5 retries left).
FAILED - RETRYING: Wait for nodes to report ready (4 retries left).
FAILED - RETRYING: Wait for nodes to report ready (3 retries left).
FAILED - RETRYING: Wait for nodes to report ready (2 retries left).
FAILED - RETRYING: Wait for nodes to report ready (1 retries left).
failed: [ip-10-0-57-254.us-east-2.compute.internal -> localhost] (item=ip-10-0-57-254.us-east-2.compute.internal) => {"ansible_loop_var": "item", "attempts": 36, "changed": true, "cmd": ["oc", "get", "node", "ip-10-0-57-254.us-east-2.compute.internal", "--config=/tmp/installer-Dz1JfD/auth/kubeconfig", "--output=jsonpath={.status.conditions[?(@.type==\"Ready\")].status}"], "delta": "0:00:00.377350", "end": "2019-12-16 11:55:14.715043", "item": "ip-10-0-57-254.us-east-2.compute.internal", "rc": 0, "start": "2019-12-16 11:55:14.337693", "stderr": "", "stderr_lines": [], "stdout": "False", "stdout_lines": ["False"]}
FAILED - RETRYING: Wait for nodes to report ready (36 retries left).
FAILED - RETRYING: Wait for nodes to report ready (35 retries left).
FAILED - RETRYING: Wait for nodes to report ready (34 retries left).
FAILED - RETRYING: Wait for nodes to report ready (33 retries left).
FAILED - RETRYING: Wait for nodes to report ready (32 retries left).
FAILED - RETRYING: Wait for nodes to report ready (31 retries left).
FAILED - RETRYING: Wait for nodes to report ready (30 retries left).
FAILED - RETRYING: Wait for nodes to report ready (29 retries left).
FAILED - RETRYING: Wait for nodes to report ready (28 retries left).
FAILED - RETRYING: Wait for nodes to report ready (27 retries left).
FAILED - RETRYING: Wait for nodes to report ready (26 retries left).
FAILED - RETRYING: Wait for nodes to report ready (25 retries left).
FAILED - RETRYING: Wait for nodes to report ready (24 retries left).
FAILED - RETRYING: Wait for nodes to report ready (23 retries left).
FAILED - RETRYING: Wait for nodes to report ready (22 retries left).
FAILED - RETRYING: Wait for nodes to report ready (21 retries left).
FAILED - RETRYING: Wait for nodes to report ready (20 retries left).
FAILED - RETRYING: Wait for nodes to report ready (19 retries left).
FAILED - RETRYING: Wait for nodes to report ready (18 retries left).
FAILED - RETRYING: Wait for nodes to report ready (17 retries left).
FAILED - RETRYING: Wait for nodes to report ready (16 retries left).
FAILED - RETRYING: Wait for nodes to report ready (15 retries left).
FAILED - RETRYING: Wait for nodes to report ready (14 retries left).
FAILED - RETRYING: Wait for nodes to report ready (13 retries left).
FAILED - RETRYING: Wait for nodes to report ready (12 retries left).
FAILED - RETRYING: Wait for nodes to report ready (11 retries left).
FAILED - RETRYING: Wait for nodes to report ready (10 retries left).
FAILED - RETRYING: Wait for nodes to report ready (9 retries left).
FAILED - RETRYING: Wait for nodes to report ready (8 retries left).
FAILED - RETRYING: Wait for nodes to report ready (7 retries left).
FAILED - RETRYING: Wait for nodes to report ready (6 retries left).
FAILED - RETRYING: Wait for nodes to report ready (5 retries left).
FAILED - RETRYING: Wait for nodes to report ready (4 retries left).
FAILED - RETRYING: Wait for nodes to report ready (3 retries left).
FAILED - RETRYING: Wait for nodes to report ready (2 retries left).
FAILED - RETRYING: Wait for nodes to report ready (1 retries left).
failed: [ip-10-0-57-254.us-east-2.compute.internal -> localhost] (item=ip-10-0-48-70.us-east-2.compute.internal) => {"ansible_loop_var": "item", "attempts": 36, "changed": true, "cmd": ["oc", "get", "node", "ip-10-0-48-70.us-east-2.compute.internal", "--config=/tmp/installer-Dz1JfD/auth/kubeconfig", "--output=jsonpath={.status.conditions[?(@.type==\"Ready\")].status}"], "delta": "0:00:00.292136", "end": "2019-12-16 11:58:33.183704", "item": "ip-10-0-48-70.us-east-2.compute.internal", "rc": 0, "start": "2019-12-16 11:58:32.891568", "stderr": "", "stderr_lines": [], "stdout": "False", "stdout_lines": ["False"]}

Expected results:
rhel worker is joined cluster successfully.

Additional info:
All csr for rhel worker already approved.
# oc get csr
NAME        AGE   REQUESTOR                                                                   CONDITION
csr-9h6sd   63m   system:node:ip-10-0-57-254.us-east-2.compute.internal                       Approved,Issued
csr-fzb7x   63m   system:serviceaccount:openshift-machine-config-operator:node-bootstrapper   Approved,Issued
csr-kfj8n   63m   system:node:ip-10-0-48-70.us-east-2.compute.internal                        Approved,Issued
csr-trqxf   63m   system:serviceaccount:openshift-machine-config-operator:node-bootstrapper   Approved,Issued

On rhel worker, check crio service log:
[root@ip-10-0-57-254 ~]# systemctl status crio
● crio.service - Open Container Initiative Daemon
   Loaded: loaded (/usr/lib/systemd/system/crio.service; enabled; vendor preset: disabled)
  Drop-In: /etc/systemd/system/crio.service.d
           └─10-default-env.conf
   Active: active (running) since Mon 2019-12-16 04:41:21 UTC; 3min 19s ago
     Docs: https://github.com/cri-o/cri-o
 Main PID: 23513 (crio)
    Tasks: 17
   Memory: 21.6M
   CGroup: /system.slice/crio.service
           └─23513 /usr/bin/crio --enable-metrics=true --metrics-port=9537

Dec 16 04:44:17 ip-10-0-57-254.us-east-2.compute.internal crio[23513]: conmon: option parsing failed: Unknown option --persist-dir
Dec 16 04:44:19 ip-10-0-57-254.us-east-2.compute.internal crio[23513]: conmon: option parsing failed: Unknown option --persist-dir
Dec 16 04:44:20 ip-10-0-57-254.us-east-2.compute.internal crio[23513]: conmon: option parsing failed: Unknown option --persist-dir
Dec 16 04:44:21 ip-10-0-57-254.us-east-2.compute.internal crio[23513]: conmon: option parsing failed: Unknown option --persist-dir
Dec 16 04:44:26 ip-10-0-57-254.us-east-2.compute.internal crio[23513]: conmon: option parsing failed: Unknown option --persist-dir
Dec 16 04:44:31 ip-10-0-57-254.us-east-2.compute.internal crio[23513]: conmon: option parsing failed: Unknown option --persist-dir
Dec 16 04:44:32 ip-10-0-57-254.us-east-2.compute.internal crio[23513]: conmon: option parsing failed: Unknown option --persist-dir
Dec 16 04:44:33 ip-10-0-57-254.us-east-2.compute.internal crio[23513]: conmon: option parsing failed: Unknown option --persist-dir
Dec 16 04:44:34 ip-10-0-57-254.us-east-2.compute.internal crio[23513]: conmon: option parsing failed: Unknown option --persist-dir
Dec 16 04:44:38 ip-10-0-57-254.us-east-2.compute.internal crio[23513]: conmon: option parsing failed: Unknown option --persist-dir

When I was verifying https://bugzilla.redhat.com/show_bug.cgi?id=1781019, did not hit such problem, seem like new cri-o rpm package broke the installation again.

Comment 4 Johnny Liu 2019-12-17 06:12:09 UTC
conmon-2.0.8-2.el7 is included into puddle.

Verified this bug with conmon-2.0.8-2.el7 + cri-o-1.16.2-4.dev.rhaos4.3.git7ebd1fe.el7 + openshift-ansible-4.3.0-201912130552.git.177.65373bf.el7.noarch, and PASS.

Comment 6 errata-xmlrpc 2020-01-23 11:19:25 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0062


Note You need to log in before you can comment on or make changes to this bug.