Bug 1720466

Summary: Install metrics failed at TASK [Mark node unschedulable] when cri-o enabled
Product: OpenShift Container Platform Reporter: Junqi Zhao <juzhao>
Component: HawkularAssignee: Joseph Callen <jcallen>
Status: CLOSED ERRATA QA Contact: Junqi Zhao <juzhao>
Severity: high Docs Contact:
Priority: high    
Version: 3.11.0CC: aos-bugs, jmartisk, vlaad, wmeng, wsun
Target Milestone: ---Keywords: Regression
Target Release: 3.11.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Cause: When updating metrics to include a node drain and kubelet restart used the incorrect fact variable Consequence: Playbook fails with l_kubelet_node_name Fix: Replace l_init_fact_hosts with "oo_nodes_to_config" Result: l_kubelet_node_name exists for all nodes to be configured.
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-06-26 09:08:11 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
installation log - no error none

Description Junqi Zhao 2019-06-14 05:51:28 UTC
Description of problem:
This bug is split from https://bugzilla.redhat.com/show_bug.cgi?id=1646886#c10
it is brought by the fix of bz 1646886
metrics failed at TASK [Mark node unschedulable]
********************************************************************************
TASK [Mark node unschedulable] 
task path: /usr/share/ansible/openshift-ansible/playbooks/openshift-node/private/restart.yml:11
Monday 10 June 2019  22:56:55 -0400 (0:00:01.261)       0:02:48.041 *********** 
fatal: [vm-10-0-77-163.hosted.upshift.rdu2.redhat.com]: FAILED! => {
    "msg": "The task includes an option with an undefined variable. The error was: 'l_kubelet_node_name' is undefined\n\nThe error appears to have been in '/usr/share/ansible/openshift-ansible/playbooks/openshift-node/private/restart.yml': line 11, column 5, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n  tasks:\n  - name: Mark node unschedulable\n    ^ here\n"
}
********************************************************************************

Actually the installation is not blocked, but the network diagram is empty on console UI in CRI-O env, still need to do workaround

# systemctl restart atomic-openshift-node.service

# oc get pod -n openshift-infra
NAME                            READY     STATUS      RESTARTS   AGE
hawkular-cassandra-1-9b48n      1/1       Running     0          10m
hawkular-metrics-schema-vqmjr   0/1       Completed   0          12m
hawkular-metrics-whpfz          1/1       Running     1          13m
heapster-sqlts                  1/1       Running     0          12m


Version-Release number of selected component (if applicable):
# rpm -qa | grep openshift-ansible
openshift-ansible-docs-3.11.119-1.git.0.c9a8ebf.el7.noarch
openshift-ansible-playbooks-3.11.119-1.git.0.c9a8ebf.el7.noarch
openshift-ansible-3.11.119-1.git.0.c9a8ebf.el7.noarch
openshift-ansible-roles-3.11.119-1.git.0.c9a8ebf.el7.noarch


How reproducible:
Always

Steps to Reproduce:
1. Install metrics 311, parameters see [Additional info] part
2.
3.

Actual results:
Install metrics failed at TASK [Mark node unschedulable]

Expected results:
Installation should be fine

Additional info:
openshift_metrics_install_metrics=true
openshift_metrics_cassandra_storage_type=dynamic

Comment 5 Junqi Zhao 2019-06-19 05:47:24 UTC
Issue is fixed with 
# rpm -qa | grep openshift-ansible
openshift-ansible-3.11.121-1.git.0.a7a0c1e.el7.noarch
openshift-ansible-docs-3.11.121-1.git.0.a7a0c1e.el7.noarch
openshift-ansible-playbooks-3.11.121-1.git.0.a7a0c1e.el7.noarch
openshift-ansible-roles-3.11.121-1.git.0.a7a0c1e.el7.noarch

No installation error now

Comment 6 Junqi Zhao 2019-06-19 05:47:56 UTC
Created attachment 1582081 [details]
installation log - no error

Comment 8 errata-xmlrpc 2019-06-26 09:08:11 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:1605