Bug 1445568

Summary: hawkular-metrics pod is CrashLoopBackOff after metrics 3.6.0 was deployed
Product: OpenShift Container Platform Reporter: Junqi Zhao <juzhao>
Component: HawkularAssignee: John Sanda <jsanda>
Status: CLOSED CURRENTRELEASE QA Contact: Junqi Zhao <juzhao>
Severity: high Docs Contact:
Priority: high    
Version: 3.6.0CC: aos-bugs, jcantril, jcosta, jeder, jhenner, jupierce, mifiedle, mwringe, smunilla, vlaad, xtian, xxia
Target Milestone: ---   
Target Release: 3.6.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: aos-scalability-36
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-10-03 20:48:18 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
metrics ansible deploy log
none
metrics ansible inventory file none

Description Junqi Zhao 2017-04-26 01:30:32 UTC
Created attachment 1274060 [details]
metrics ansible deploy log

Description of problem:
hawkular-metrics pod is CrashLoopBackOff after metrics 3.6.0 was deployed. Error "openssl: command not found" in hawkular-metrics pod's log.

# oc get po
NAME                         READY     STATUS             RESTARTS   AGE
hawkular-cassandra-1-kf3jx   1/1       Running            0          16m
hawkular-metrics-ct9xw       0/1       CrashLoopBackOff   7          16m
heapster-xwwlw               0/1       Running            1          16m

# oc logs hawkular-metrics-ct9xw
2017-04-26 00:53:26 Starting Hawkular Metrics
/opt/hawkular/scripts/hawkular-metrics-wrapper.sh: line 49: openssl: command not found
/opt/hawkular/scripts/hawkular-metrics-wrapper.sh: line 53: openssl: command not found
The service account has read permissions for its project. Proceeding
/opt/hawkular/scripts/hawkular-metrics-wrapper.sh: line 104: openssl: command not found
Creating the Hawkular Metrics keystore from the Secret's cert data
Failed to create a PKCS12 certificate file with the service-specific certificate. Aborting.

# openssl version
OpenSSL 1.0.1e-fips 11 Feb 2013

Version-Release number of selected component (if applicable):
# oc version
oc v3.6.49
kubernetes v1.5.2+43a9be4
features: Basic-Auth GSSAPI Kerberos SPNEGO

# rpm -qa | grep openshift-ansible
openshift-ansible-docs-3.6.37-1.git.0.e19f6d8.el7.noarch
openshift-ansible-lookup-plugins-3.6.37-1.git.0.e19f6d8.el7.noarch
openshift-ansible-3.6.37-1.git.0.e19f6d8.el7.noarch
openshift-ansible-callback-plugins-3.6.37-1.git.0.e19f6d8.el7.noarch
openshift-ansible-roles-3.6.37-1.git.0.e19f6d8.el7.noarch
openshift-ansible-filter-plugins-3.6.37-1.git.0.e19f6d8.el7.noarch
openshift-ansible-playbooks-3.6.37-1.git.0.e19f6d8.el7.noarch

# docker images | grep metrics
registry.ops.openshift.com/openshift3/metrics-hawkular-metrics   3.6.0               12f3f49d713a        6 days ago          1.293 GB
registry.ops.openshift.com/openshift3/metrics-cassandra          3.6.0               fe1b71caa3bf        7 days ago          545.2 MB
registry.ops.openshift.com/openshift3/metrics-heapster           3.6.0               0fa183f8e8ff        2 weeks ago         273.8 MB

How reproducible:
Always

Steps to Reproduce:
1.Deploy metrics 3.6.0 stacks on OCP 3.6.0 by running ansible scripts
2.
3.

Actual results:
hawkular-metrics pod is CrashLoopBackOff

Expected results:
All metrics pod are running well and without errors

Additional info:
Attached ansible inventory file, running log

Comment 1 Junqi Zhao 2017-04-26 01:33:01 UTC
Created attachment 1274061 [details]
metrics ansible inventory file

Comment 2 Junqi Zhao 2017-04-26 01:34:52 UTC
Metrics is deployed by:

ansible-playbook -vvv -i ${INVENTORY_FILE} playbooks/byo/openshift-cluster/openshift-metrics.yml

Comment 3 Juraci Paixão Kröhling 2017-05-02 10:00:07 UTC
Looks like the Alpha 0 Dockerfile for Hawkular Metrics didn't include OpenSSL:

https://github.com/openshift/origin-metrics/blob/v3.6.0-alpha.0/hawkular-metrics/Dockerfile#L80

This is present for Alpha 1 though:

https://github.com/openshift/origin-metrics/blob/v3.6.0-alpha.1/hawkular-metrics/Dockerfile#L80

Comment 4 Matt Wringe 2017-05-02 14:52:26 UTC
Sorry, I had this fixed in our brew build a few days ago but never got around to verifying it. I have since confirmed that it works for me now, can you please retry?

Comment 8 Xia Zhao 2017-05-04 05:44:12 UTC
@mwringe,

Retested with images tag=v3.6 on brew registry, issue had been fixed well. The metrics pods are running fine also the metrics statistics are visible on web console. Please feel free to change back to ON_QA for closure.

Images tested with:
openshift3/metrics-cassandra    58aedf976616
openshift3/metrics-hawkular-metrics    a2d906e06f22
openshift3/metrics-heapster    99ceffab1a79

# oc get po
NAME                         READY     STATUS    RESTARTS   AGE
hawkular-cassandra-1-l1xwt   1/1       Running   0          7m
hawkular-metrics-h2q7j       1/1       Running   0          7m
heapster-d9nbv               1/1       Running   0          7m

Comment 9 Xia Zhao 2017-05-04 05:44:40 UTC
# openshift version
openshift v3.6.63
kubernetes v1.6.1+5115d708d7
etcd 3.1.0

Comment 10 Xia Zhao 2017-05-09 06:08:41 UTC
@mwringe,
The issue was resolved and we have test passed this scenario per comment #8. Please feel free to change back to ON_QA for closure.

Comment 11 Xia Zhao 2017-05-10 02:07:08 UTC
Set to verified according to comment #8.

Comment 12 Mike Fiedler 2017-05-17 20:19:17 UTC
*** Bug 1451909 has been marked as a duplicate of this bug. ***

Comment 25 Jaroslav Henner 2017-08-02 18:08:37 UTC
Hi. I still see this problem with the image pulled from brew.

Comment 26 Junqi Zhao 2017-08-03 01:44:55 UTC
(In reply to Jaroslav Henner from comment #25)
> Hi. I still see this problem with the image pulled from brew.


Please set openshift_metrics_image_version=v3.6 in inventory file, do not set 3.6.0, since images with 3.6.0 tag are not the latest, images with v3.6 tag are the latest.

We don't have this issue in our functional testing now, if you still find this issue in your performance testing, please open one defect.

Comment 27 Jaroslav Henner 2017-08-08 05:53:55 UTC
(In reply to Junqi Zhao from comment #26)
> (In reply to Jaroslav Henner from comment #25)
> > Hi. I still see this problem with the image pulled from brew.
> 
> 
> Please set openshift_metrics_image_version=v3.6 in inventory file, do not
> set 3.6.0, since images with 3.6.0 tag are not the latest, images with v3.6
> tag are the latest.
> 
> We don't have this issue in our functional testing now, if you still find
> this issue in your performance testing, please open one defect.

Thanks. It seems it worked. Should the same image version be used with OSES 3.5?

Comment 28 Matt Wringe 2017-08-08 14:21:16 UTC
(In reply to Jaroslav Henner from comment #27)
> (In reply to Junqi Zhao from comment #26)
> Thanks. It seems it worked. Should the same image version be used with OSES
> 3.5?

For OCP 3.5, the 'v3.5' tag will be for the latest 3.5 image.

I believe the '3.5.0' tag also aliases to the latest 3.5 image (at least for now, but if there is ever a 3.5.1 release then the '3.5.1' tag will instead point to the latest version).

There are also specific tags that someone can use for 3.5, so if they don't want to automatically pull in the latest images, they can specify an exact image tag (eg 3.5.0-28).

The container catalog will show the tag options which can be used: https://access.redhat.com/containers/#/search/openshift3%252Fmetrics