Bug 1608216 - Should change timeoutSeconds for hawkular-cassandra readiness check to a bigger value
Summary: Should change timeoutSeconds for hawkular-cassandra readiness check to a bigg...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Hawkular
Version: 3.10.0
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 3.11.0
Assignee: John Sanda
QA Contact: Junqi Zhao
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-07-25 06:40 UTC by Junqi Zhao
Modified: 2018-10-11 07:22 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-10-11 07:22:06 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2018:2652 0 None None None 2018-10-11 07:22:29 UTC

Description Junqi Zhao 2018-07-25 06:40:11 UTC
Description of problem:
This bug is from Bug 1607984,  the default timeoutSeconds for hawkular-cassandra readiness check is 1 second, but if the readiness check takes more than 1 second to get the response,
metrics pods could not started up

# oc get pod -n openshift-infra
NAME                            READY     STATUS    RESTARTS   AGE
hawkular-cassandra-1-njcbq      0/1       Running   0          1h
hawkular-metrics-642hp          0/1       Running   8          1h
hawkular-metrics-schema-4k4hj   1/1       Running   0          1h
heapster-lmc8m                  0/1       Running   9          1h


# oc rsh hawkular-cassandra-1-njcbq
sh-4.2$ time nodetool status
Picked up JAVA_TOOL_OPTIONS: -Duser.home=/home/jboss -Duser.name=jboss
Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address      Load       Tokens       Owns (effective)  Host ID                               Rack
UN  10.131.0.64  103.11 KB  256          100.0%            df669d60-a338-4057-a4c2-00cf92b6291b  rack1


real    0m1.499s
user    0m2.417s
sys    0m0.187s



sh-4.2$ time nodetool help 
Picked up JAVA_TOOL_OPTIONS: -Duser.home=/home/jboss -Duser.name=jboss
<--snip >

See 'nodetool help <command>' for more information on a specific command.


real    0m1.626s
user    0m1.807s
sys    0m0.133s



after changing it to bigger value in roles/openshift_metrics/templates/hawkular_cassandra_rc.j2, added timeoutSeconds: 10, metrics works well.

        readinessProbe:
          exec:
            command:
            - "/opt/apache-cassandra/bin/cassandra-docker-ready.sh"
          timeoutSeconds: 10


Version-Release number of selected component (if applicable):
# rpm -qa | grep openshift-ansible
openshift-ansible-roles-3.10.14-1.git.273.a64b86b.el7.noarch
openshift-ansible-playbooks-3.10.14-1.git.273.a64b86b.el7.noarch
openshift-ansible-3.10.14-1.git.273.a64b86b.el7.noarch
openshift-ansible-docs-3.10.14-1.git.273.a64b86b.el7.noarch

openshift3-metrics-cassandra-v3.10.14-7
metrics-hawkular-metrics-v3.10.14-7
metrics-schema-installer-v3.10.14-7
metrics-heapster-v3.10.14-8


How reproducible:
Always

Steps to Reproduce:
1. Deploy metrics
2.
3.

Actual results:
metrics pods could not started up

Expected results:
metrics pods can start up

Additional info:

Comment 1 Ruben Vargas Palma 2018-08-03 16:53:56 UTC
I've sent a PR

https://github.com/openshift/openshift-ansible/pull/9417

Which is already merged, I'll move this to MODIFIED.

Comment 3 Junqi Zhao 2018-08-23 07:26:22 UTC
Issue is fixed,  timeoutSeconds for hawkular-cassandra readiness check is 10s

# rpm -qa | grep ansible
ansible-2.6.3-1.el7ae.noarch
openshift-ansible-playbooks-3.11.0-0.20.0.git.0.ec6d8caNone.noarch
openshift-ansible-roles-3.11.0-0.20.0.git.0.ec6d8caNone.noarch
openshift-ansible-3.11.0-0.20.0.git.0.ec6d8caNone.noarch
openshift-ansible-docs-3.11.0-0.20.0.git.0.ec6d8caNone.noarch

Comment 5 errata-xmlrpc 2018-10-11 07:22:06 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:2652


Note You need to log in before you can comment on or make changes to this bug.