Bug 1459345
| Summary: | Openshift cannot create Hawkular Metrics' pods because missing key_spaces | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Guilherme Baufaker Rêgo <gbaufake> | ||||||||
| Component: | Hawkular | Assignee: | Matt Wringe <mwringe> | ||||||||
| Status: | CLOSED DEFERRED | QA Contact: | Liming Zhou <lizhou> | ||||||||
| Severity: | unspecified | Docs Contact: | |||||||||
| Priority: | unspecified | ||||||||||
| Version: | 3.5.1 | CC: | aos-bugs, gbaufake, jsanda, snegrea | ||||||||
| Target Milestone: | --- | ||||||||||
| Target Release: | 3.7.0 | ||||||||||
| Hardware: | Unspecified | ||||||||||
| OS: | Unspecified | ||||||||||
| Whiteboard: | |||||||||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |||||||||
| Doc Text: | Story Points: | --- | |||||||||
| Clone Of: | Environment: | ||||||||||
| Last Closed: | 2017-09-29 19:15:24 UTC | Type: | Bug | ||||||||
| Regression: | --- | Mount Type: | --- | ||||||||
| Documentation: | --- | CRM: | |||||||||
| Verified Versions: | Category: | --- | |||||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||||
| Embargoed: | |||||||||||
| Attachments: |
|
||||||||||
can you please provide the logs for Cassandra as well as the outputs of: - 'oc get pods -n openshift-infra' - 'oc get pods -o yaml -n openshift-infra' - 'oc describe pods -n openshift-infra' Is this a fresh install on OCP 3.5? or is this an update from an older 3.4 release? It is a fresh install of OCP 3.5
- 'oc get pods -n openshift-infra'
hawkular-cassandra-1-x94p0 1/1 Running 0 47d
hawkular-cassandra-1-z18wk 1/1 Running 0 47d
hawkular-metrics-zj777 1/1 Running 0 21h
heapster-tpc9q 1/1 Running 2 60d
- 'oc get pods -o yaml -n openshift-infra'
apiVersion: v1
items:
- apiVersion: v1
kind: Pod
metadata:
annotations:
kubernetes.io/created-by: |
{"kind":"SerializedReference","apiVersion":"v1","reference":{"kind":"ReplicationController","namespace":"openshift-infra","name":"hawkular-cassandra-1","uid":"654c53d3-1713-11e7-b64b-001a4a10173a","apiVersion":"v1","resourceVersion":"569955"}}
openshift.io/scc: restricted
creationTimestamp: 2017-04-20T22:53:17Z
generateName: hawkular-cassandra-1-
labels:
metrics-infra: hawkular-cassandra
name: hawkular-cassandra-1
type: hawkular-cassandra
name: hawkular-cassandra-1-x94p0
namespace: openshift-infra
resourceVersion: "570065"
selfLink: /api/v1/namespaces/openshift-infra/pods/hawkular-cassandra-1-x94p0
uid: 249ef5f4-261c-11e7-9dc7-001a4a10173a
spec:
containers:
- command:
- /opt/apache-cassandra/bin/cassandra-docker.sh
- --cluster_name=hawkular-metrics
- --data_volume=/cassandra_data
- --internode_encryption=all
- --require_node_auth=true
- --enable_client_encryption=true
- --require_client_auth=true
- --keystore_file=/secret/cassandra.keystore
- --keystore_password_file=/secret/cassandra.keystore.password
- --truststore_file=/secret/cassandra.truststore
- --truststore_password_file=/secret/cassandra.truststore.password
- --cassandra_pem_file=/secret/cassandra.pem
env:
- name: CASSANDRA_MASTER
value: "true"
- name: CASSANDRA_DATA_VOLUME
value: /cassandra_data
- name: JVM_OPTS
value: -Dcassandra.commitlog.ignorereplayerrors=true
- name: POD_NAMESPACE
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.namespace
- name: MEMORY_LIMIT
valueFrom:
resourceFieldRef:
divisor: "0"
resource: limits.memory
- name: CPU_LIMIT
valueFrom:
resourceFieldRef:
divisor: 1m
resource: limits.cpu
image: openshift3/metrics-cassandra:3.5.0
imagePullPolicy: IfNotPresent
lifecycle:
postStart:
exec:
command:
- /opt/apache-cassandra/bin/cassandra-poststart.sh
preStop:
exec:
command:
- /opt/apache-cassandra/bin/cassandra-prestop.sh
name: hawkular-cassandra-1
ports:
- containerPort: 9042
name: cql-port
protocol: TCP
- containerPort: 9160
name: thift-port
protocol: TCP
- containerPort: 7000
name: tcp-port
protocol: TCP
- containerPort: 7001
name: ssl-port
protocol: TCP
readinessProbe:
exec:
command:
- /opt/apache-cassandra/bin/cassandra-docker-ready.sh
failureThreshold: 3
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 1
resources:
limits:
memory: 2G
requests:
memory: 1G
securityContext:
capabilities:
drop:
- KILL
- MKNOD
- SETGID
- SETUID
- SYS_CHROOT
privileged: false
runAsUser: 1000000000
seLinuxOptions:
level: s0:c1,c0
terminationMessagePath: /dev/termination-log
volumeMounts:
- mountPath: /cassandra_data
name: cassandra-data
- mountPath: /secret
name: hawkular-cassandra-secrets
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: cassandra-token-5l9kw
readOnly: true
dnsPolicy: ClusterFirst
imagePullSecrets:
- name: cassandra-dockercfg-nnhxg
nodeName: ose1.bc.jonqe.lab.eng.bos.redhat.com
restartPolicy: Always
securityContext:
fsGroup: 1000000000
seLinuxOptions:
level: s0:c1,c0
supplementalGroups:
- 65534
serviceAccount: cassandra
serviceAccountName: cassandra
terminationGracePeriodSeconds: 30
volumes:
- emptyDir: {}
name: cassandra-data
- name: hawkular-cassandra-secrets
secret:
defaultMode: 420
secretName: hawkular-cassandra-secrets
- name: cassandra-token-5l9kw
secret:
defaultMode: 420
secretName: cassandra-token-5l9kw
status:
conditions:
- lastProbeTime: null
lastTransitionTime: 2017-04-20T22:53:17Z
status: "True"
type: Initialized
- lastProbeTime: null
lastTransitionTime: 2017-04-20T22:57:10Z
status: "True"
type: Ready
- lastProbeTime: null
lastTransitionTime: 2017-04-20T22:53:17Z
status: "True"
type: PodScheduled
containerStatuses:
- containerID: docker://aa0ee98fd46cd1727bbd423053f4725877175f8558c71e656768a1b8c2e0b82e
image: openshift3/metrics-cassandra:3.5.0
imageID: docker-pullable://brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/openshift3/metrics-cassandra@sha256:f195339f5bbcaf5de4a844fa1738f83ddb36c372c6cb03859199bb53bcf5e093
lastState: {}
name: hawkular-cassandra-1
ready: true
restartCount: 0
state:
running:
startedAt: 2017-04-20T22:53:56Z
hostIP: 10.16.23.148
phase: Running
podIP: 10.128.0.73
startTime: 2017-04-20T22:53:17Z
- apiVersion: v1
kind: Pod
metadata:
annotations:
kubernetes.io/created-by: |
{"kind":"SerializedReference","apiVersion":"v1","reference":{"kind":"ReplicationController","namespace":"openshift-infra","name":"hawkular-cassandra-1","uid":"654c53d3-1713-11e7-b64b-001a4a10173a","apiVersion":"v1","resourceVersion":"569963"}}
openshift.io/scc: restricted
creationTimestamp: 2017-04-20T22:53:18Z
generateName: hawkular-cassandra-1-
labels:
metrics-infra: hawkular-cassandra
name: hawkular-cassandra-1
type: hawkular-cassandra
name: hawkular-cassandra-1-z18wk
namespace: openshift-infra
resourceVersion: "570069"
selfLink: /api/v1/namespaces/openshift-infra/pods/hawkular-cassandra-1-z18wk
uid: 2508a753-261c-11e7-9dc7-001a4a10173a
spec:
containers:
- command:
- /opt/apache-cassandra/bin/cassandra-docker.sh
- --cluster_name=hawkular-metrics
- --data_volume=/cassandra_data
- --internode_encryption=all
- --require_node_auth=true
- --enable_client_encryption=true
- --require_client_auth=true
- --keystore_file=/secret/cassandra.keystore
- --keystore_password_file=/secret/cassandra.keystore.password
- --truststore_file=/secret/cassandra.truststore
- --truststore_password_file=/secret/cassandra.truststore.password
- --cassandra_pem_file=/secret/cassandra.pem
env:
- name: CASSANDRA_MASTER
value: "true"
- name: CASSANDRA_DATA_VOLUME
value: /cassandra_data
- name: JVM_OPTS
value: -Dcassandra.commitlog.ignorereplayerrors=true
- name: POD_NAMESPACE
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.namespace
- name: MEMORY_LIMIT
valueFrom:
resourceFieldRef:
divisor: "0"
resource: limits.memory
- name: CPU_LIMIT
valueFrom:
resourceFieldRef:
divisor: 1m
resource: limits.cpu
image: openshift3/metrics-cassandra:3.5.0
imagePullPolicy: IfNotPresent
lifecycle:
postStart:
exec:
command:
- /opt/apache-cassandra/bin/cassandra-poststart.sh
preStop:
exec:
command:
- /opt/apache-cassandra/bin/cassandra-prestop.sh
name: hawkular-cassandra-1
ports:
- containerPort: 9042
name: cql-port
protocol: TCP
- containerPort: 9160
name: thift-port
protocol: TCP
- containerPort: 7000
name: tcp-port
protocol: TCP
- containerPort: 7001
name: ssl-port
protocol: TCP
readinessProbe:
exec:
command:
- /opt/apache-cassandra/bin/cassandra-docker-ready.sh
failureThreshold: 3
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 1
resources:
limits:
memory: 2G
requests:
memory: 1G
securityContext:
capabilities:
drop:
- KILL
- MKNOD
- SETGID
- SETUID
- SYS_CHROOT
privileged: false
runAsUser: 1000000000
seLinuxOptions:
level: s0:c1,c0
terminationMessagePath: /dev/termination-log
volumeMounts:
- mountPath: /cassandra_data
name: cassandra-data
- mountPath: /secret
name: hawkular-cassandra-secrets
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: cassandra-token-5l9kw
readOnly: true
dnsPolicy: ClusterFirst
imagePullSecrets:
- name: cassandra-dockercfg-nnhxg
nodeName: ose1.bc.jonqe.lab.eng.bos.redhat.com
restartPolicy: Always
securityContext:
fsGroup: 1000000000
seLinuxOptions:
level: s0:c1,c0
supplementalGroups:
- 65534
serviceAccount: cassandra
serviceAccountName: cassandra
terminationGracePeriodSeconds: 30
volumes:
- emptyDir: {}
name: cassandra-data
- name: hawkular-cassandra-secrets
secret:
defaultMode: 420
secretName: hawkular-cassandra-secrets
- name: cassandra-token-5l9kw
secret:
defaultMode: 420
secretName: cassandra-token-5l9kw
status:
conditions:
- lastProbeTime: null
lastTransitionTime: 2017-04-20T22:53:18Z
status: "True"
type: Initialized
- lastProbeTime: null
lastTransitionTime: 2017-04-20T22:57:12Z
status: "True"
type: Ready
- lastProbeTime: null
lastTransitionTime: 2017-04-20T22:53:18Z
status: "True"
type: PodScheduled
containerStatuses:
- containerID: docker://5bdbbc04a1744d72de5debdb63e5502f42f2e7b53167ed3fdc5dde1fd764704c
image: openshift3/metrics-cassandra:3.5.0
imageID: docker-pullable://brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/openshift3/metrics-cassandra@sha256:f195339f5bbcaf5de4a844fa1738f83ddb36c372c6cb03859199bb53bcf5e093
lastState: {}
name: hawkular-cassandra-1
ready: true
restartCount: 0
state:
running:
startedAt: 2017-04-20T22:53:56Z
hostIP: 10.16.23.148
phase: Running
podIP: 10.128.0.74
startTime: 2017-04-20T22:53:18Z
- apiVersion: v1
kind: Pod
metadata:
annotations:
kubernetes.io/created-by: |
{"kind":"SerializedReference","apiVersion":"v1","reference":{"kind":"ReplicationController","namespace":"openshift-infra","name":"hawkular-metrics","uid":"599fcafa-1713-11e7-b64b-001a4a10173a","apiVersion":"v1","resourceVersion":"288799"}}
openshift.io/scc: restricted
creationTimestamp: 2017-06-06T17:16:27Z
generateName: hawkular-metrics-
labels:
metrics-infra: hawkular-metrics
name: hawkular-metrics
name: hawkular-metrics-zj777
namespace: openshift-infra
resourceVersion: "1776988"
selfLink: /api/v1/namespaces/openshift-infra/pods/hawkular-metrics-zj777
uid: dfebe954-4adb-11e7-982a-001a4a10173a
spec:
containers:
- command:
- /opt/hawkular/scripts/hawkular-metrics-wrapper.sh
- -b
- 0.0.0.0
- -Dhawkular.metrics.cassandra.nodes=hawkular-cassandra
- -Dhawkular.metrics.cassandra.use-ssl
- -Dhawkular.metrics.openshift.auth-methods=openshift-oauth,htpasswd
- -Dhawkular.metrics.openshift.htpasswd-file=/secrets/hawkular-metrics.htpasswd.file
- -Dhawkular.metrics.allowed-cors-access-control-allow-headers=authorization
- -Dhawkular.metrics.default-ttl=7
- -Dhawkular.metrics.admin-tenant=_hawkular_admin
- -Dhawkular-alerts.cassandra-nodes=hawkular-cassandra
- -Dhawkular-alerts.cassandra-use-ssl
- -Dhawkular.alerts.openshift.auth-methods=openshift-oauth,htpasswd
- -Dhawkular.alerts.openshift.htpasswd-file=/secrets/hawkular-metrics.htpasswd.file
- -Dhawkular.alerts.allowed-cors-access-control-allow-headers=authorization
- -Dorg.apache.tomcat.util.buf.UDecoder.ALLOW_ENCODED_SLASH=true
- -Dorg.apache.catalina.connector.CoyoteAdapter.ALLOW_BACKSLASH=true
- -Dcom.datastax.driver.FORCE_NIO=true
- -DKUBERNETES_MASTER_URL=https://kubernetes.default.svc.cluster.local
- -DUSER_WRITE_ACCESS=False
- --hmw.keystore=/secrets/hawkular-metrics.keystore
- --hmw.truststore=/secrets/hawkular-metrics.truststore
- --hmw.keystore_password_file=/secrets/hawkular-metrics.keystore.password
- --hmw.truststore_password_file=/secrets/hawkular-metrics.truststore.password
- --hmw.jgroups_keystore=/secrets/hawkular-metrics.jgroups.keystore
- --hmw.jgroups_keystore_password_file=/secrets/hawkular-metrics.jgroups.keystore.password
- --hmw.jgroups_alias_file=/secrets/hawkular-metrics.jgroups.alias
env:
- name: POD_NAMESPACE
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.namespace
- name: MASTER_URL
value: https://kubernetes.default.svc.cluster.local
- name: OPENSHIFT_KUBE_PING_NAMESPACE
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.namespace
- name: OPENSHIFT_KUBE_PING_LABELS
value: metrics-infra=hawkular-metrics,name=hawkular-metrics
- name: STARTUP_TIMEOUT
value: "500"
image: openshift3/metrics-hawkular-metrics:3.5.0
imagePullPolicy: IfNotPresent
livenessProbe:
exec:
command:
- /opt/hawkular/scripts/hawkular-metrics-liveness.py
failureThreshold: 3
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 1
name: hawkular-metrics
ports:
- containerPort: 8080
name: http-endpoint
protocol: TCP
- containerPort: 8443
name: https-endpoint
protocol: TCP
- containerPort: 8888
name: ping
protocol: TCP
readinessProbe:
exec:
command:
- /opt/hawkular/scripts/hawkular-metrics-readiness.py
failureThreshold: 3
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 1
resources:
limits:
memory: 2500M
requests:
memory: 1500M
securityContext:
capabilities:
drop:
- KILL
- MKNOD
- SETGID
- SETUID
- SYS_CHROOT
privileged: false
runAsUser: 1000000000
seLinuxOptions:
level: s0:c1,c0
terminationMessagePath: /dev/termination-log
volumeMounts:
- mountPath: /secrets
name: hawkular-metrics-secrets
- mountPath: /client-secrets
name: hawkular-metrics-client-secrets
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: hawkular-token-xnc2c
readOnly: true
dnsPolicy: ClusterFirst
imagePullSecrets:
- name: hawkular-dockercfg-5cqrk
nodeName: ose1.bc.jonqe.lab.eng.bos.redhat.com
restartPolicy: Always
securityContext:
fsGroup: 1000000000
seLinuxOptions:
level: s0:c1,c0
serviceAccount: hawkular
serviceAccountName: hawkular
terminationGracePeriodSeconds: 30
volumes:
- name: hawkular-metrics-secrets
secret:
defaultMode: 420
secretName: hawkular-metrics-secrets
- name: hawkular-metrics-client-secrets
secret:
defaultMode: 420
secretName: hawkular-metrics-account
- name: hawkular-token-xnc2c
secret:
defaultMode: 420
secretName: hawkular-token-xnc2c
status:
conditions:
- lastProbeTime: null
lastTransitionTime: 2017-06-06T17:16:27Z
status: "True"
type: Initialized
- lastProbeTime: null
lastTransitionTime: 2017-06-06T17:19:57Z
status: "True"
type: Ready
- lastProbeTime: null
lastTransitionTime: 2017-06-06T17:16:27Z
status: "True"
type: PodScheduled
containerStatuses:
- containerID: docker://6f8ffde05b78be196a885479dafdca04b33a4254fb53ef5075eacf4d88ce8d7c
image: openshift3/metrics-hawkular-metrics:3.5.0
imageID: docker-pullable://brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/openshift3/metrics-hawkular-metrics@sha256:f84a9f9abd9d4407b1a0b8542392ce2e57674821946c2edae410e9c7f4e3e764
lastState: {}
name: hawkular-metrics
ready: true
restartCount: 0
state:
running:
startedAt: 2017-06-06T17:16:38Z
hostIP: 10.16.23.148
phase: Running
podIP: 10.128.0.91
startTime: 2017-06-06T17:16:27Z
- apiVersion: v1
kind: Pod
metadata:
annotations:
kubernetes.io/created-by: |
{"kind":"SerializedReference","apiVersion":"v1","reference":{"kind":"ReplicationController","namespace":"openshift-infra","name":"heapster","uid":"5d811534-1713-11e7-b64b-001a4a10173a","apiVersion":"v1","resourceVersion":"188998"}}
openshift.io/scc: restricted
creationTimestamp: 2017-04-07T20:06:30Z
generateName: heapster-
labels:
metrics-infra: heapster
name: heapster
name: heapster-tpc9q
namespace: openshift-infra
resourceVersion: "1074853"
selfLink: /api/v1/namespaces/openshift-infra/pods/heapster-tpc9q
uid: b0d237b4-1bcd-11e7-9dc7-001a4a10173a
spec:
containers:
- command:
- heapster-wrapper.sh
- --wrapper.allowed_users_file=/secrets/heapster.allowed-users
- --source=kubernetes.summary_api:${MASTER_URL}?useServiceAccount=true&kubeletHttps=true&kubeletPort=10250
- --tls_cert=/secrets/heapster.cert
- --tls_key=/secrets/heapster.key
- --tls_client_ca=/secrets/heapster.client-ca
- --allowed_users=%allowed_users%
- --metric_resolution=30s
- --wrapper.username_file=/hawkular-account/hawkular-metrics.username
- --wrapper.password_file=/hawkular-account/hawkular-metrics.password
- --wrapper.endpoint_check=https://hawkular-metrics:443/hawkular/metrics/status
- --sink=hawkular:https://hawkular-metrics:443?tenant=_system&labelToTenant=pod_namespace&labelNodeId=nodename&caCert=/hawkular-cert/hawkular-metrics-ca.certificate&user=%username%&pass=%password%&filter=label(container_name:^system.slice.*|^user.slice)
env:
- name: STARTUP_TIMEOUT
value: "500"
image: openshift3/metrics-heapster:3.5.0
imagePullPolicy: IfNotPresent
name: heapster
ports:
- containerPort: 8082
name: http-endpoint
protocol: TCP
readinessProbe:
exec:
command:
- /opt/heapster-readiness.sh
failureThreshold: 3
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 1
resources:
limits:
memory: 3750M
requests:
memory: 937500k
securityContext:
capabilities:
drop:
- KILL
- MKNOD
- SETGID
- SETUID
- SYS_CHROOT
privileged: false
runAsUser: 1000000000
seLinuxOptions:
level: s0:c1,c0
terminationMessagePath: /dev/termination-log
volumeMounts:
- mountPath: /secrets
name: heapster-secrets
- mountPath: /hawkular-cert
name: hawkular-metrics-certificate
- mountPath: /hawkular-account
name: hawkular-metrics-account
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: heapster-token-1ksp6
readOnly: true
dnsPolicy: ClusterFirst
imagePullSecrets:
- name: heapster-dockercfg-k22zx
nodeName: ose2.bc.jonqe.lab.eng.bos.redhat.com
restartPolicy: Always
securityContext:
fsGroup: 1000000000
seLinuxOptions:
level: s0:c1,c0
serviceAccount: heapster
serviceAccountName: heapster
terminationGracePeriodSeconds: 30
volumes:
- name: heapster-secrets
secret:
defaultMode: 420
secretName: heapster-secrets
- name: hawkular-metrics-certificate
secret:
defaultMode: 420
secretName: hawkular-metrics-certificate
- name: hawkular-metrics-account
secret:
defaultMode: 420
secretName: hawkular-metrics-account
- name: heapster-token-1ksp6
secret:
defaultMode: 420
secretName: heapster-token-1ksp6
status:
conditions:
- lastProbeTime: null
lastTransitionTime: 2017-04-07T20:06:30Z
status: "True"
type: Initialized
- lastProbeTime: null
lastTransitionTime: 2017-05-03T03:55:47Z
status: "True"
type: Ready
- lastProbeTime: null
lastTransitionTime: 2017-04-07T20:06:30Z
status: "True"
type: PodScheduled
containerStatuses:
- containerID: docker://46f893d2183bd541e7648791c3f5cc6da31c70b5b19f7c7b2033494c097e7aad
image: openshift3/metrics-heapster:3.5.0
imageID: docker-pullable://brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/openshift3/metrics-heapster@sha256:52ec40ac8235095951cf321d77c10bdc59194e620fc24d5eb49383d671eb6937
lastState: {}
name: heapster
ready: true
restartCount: 2
state:
running:
startedAt: 2017-04-20T22:45:22Z
hostIP: 10.16.23.195
phase: Running
podIP: 10.129.0.113
startTime: 2017-04-07T20:06:30Z
kind: List
metadata: {}
resourceVersion: ""
selfLink: ""
- 'oc describe pods -n openshift-infra'
oc describe pods -n openshift-infra
Name: hawkular-cassandra-1-x94p0
Namespace: openshift-infra
Security Policy: restricted
Node: ose1.bc.jonqe.lab.eng.bos.redhat.com/10.16.23.148
Start Time: Thu, 20 Apr 2017 19:53:17 -0300
Labels: metrics-infra=hawkular-cassandra
name=hawkular-cassandra-1
type=hawkular-cassandra
Status: Running
IP: 10.128.0.73
Controllers: ReplicationController/hawkular-cassandra-1
Containers:
hawkular-cassandra-1:
Container ID: docker://aa0ee98fd46cd1727bbd423053f4725877175f8558c71e656768a1b8c2e0b82e
Image: openshift3/metrics-cassandra:3.5.0
Image ID: docker-pullable://brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/openshift3/metrics-cassandra@sha256:f195339f5bbcaf5de4a844fa1738f83ddb36c372c6cb03859199bb53bcf5e093
Ports: 9042/TCP, 9160/TCP, 7000/TCP, 7001/TCP
Command:
/opt/apache-cassandra/bin/cassandra-docker.sh
--cluster_name=hawkular-metrics
--data_volume=/cassandra_data
--internode_encryption=all
--require_node_auth=true
--enable_client_encryption=true
--require_client_auth=true
--keystore_file=/secret/cassandra.keystore
--keystore_password_file=/secret/cassandra.keystore.password
--truststore_file=/secret/cassandra.truststore
--truststore_password_file=/secret/cassandra.truststore.password
--cassandra_pem_file=/secret/cassandra.pem
Limits:
memory: 2G
Requests:
memory: 1G
State: Running
Started: Thu, 20 Apr 2017 19:53:56 -0300
Ready: True
Restart Count: 0
Readiness: exec [/opt/apache-cassandra/bin/cassandra-docker-ready.sh] delay=0s timeout=1s period=10s #success=1 #failure=3
Volume Mounts:
/cassandra_data from cassandra-data (rw)
/secret from hawkular-cassandra-secrets (rw)
/var/run/secrets/kubernetes.io/serviceaccount from cassandra-token-5l9kw (ro)
Environment Variables:
CASSANDRA_MASTER: true
CASSANDRA_DATA_VOLUME: /cassandra_data
JVM_OPTS: -Dcassandra.commitlog.ignorereplayerrors=true
POD_NAMESPACE: openshift-infra (v1:metadata.namespace)
MEMORY_LIMIT: 2000000000 (limits.memory)
CPU_LIMIT: node allocatable (limits.cpu)
Conditions:
Type Status
Initialized True
Ready True
PodScheduled True
Volumes:
cassandra-data:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
hawkular-cassandra-secrets:
Type: Secret (a volume populated by a Secret)
SecretName: hawkular-cassandra-secrets
cassandra-token-5l9kw:
Type: Secret (a volume populated by a Secret)
SecretName: cassandra-token-5l9kw
QoS Class: Burstable
Tolerations: <none>
No events.
Name: hawkular-cassandra-1-z18wk
Namespace: openshift-infra
Security Policy: restricted
Node: ose1.bc.jonqe.lab.eng.bos.redhat.com/10.16.23.148
Start Time: Thu, 20 Apr 2017 19:53:18 -0300
Labels: metrics-infra=hawkular-cassandra
name=hawkular-cassandra-1
type=hawkular-cassandra
Status: Running
IP: 10.128.0.74
Controllers: ReplicationController/hawkular-cassandra-1
Containers:
hawkular-cassandra-1:
Container ID: docker://5bdbbc04a1744d72de5debdb63e5502f42f2e7b53167ed3fdc5dde1fd764704c
Image: openshift3/metrics-cassandra:3.5.0
Image ID: docker-pullable://brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/openshift3/metrics-cassandra@sha256:f195339f5bbcaf5de4a844fa1738f83ddb36c372c6cb03859199bb53bcf5e093
Ports: 9042/TCP, 9160/TCP, 7000/TCP, 7001/TCP
Command:
/opt/apache-cassandra/bin/cassandra-docker.sh
--cluster_name=hawkular-metrics
--data_volume=/cassandra_data
--internode_encryption=all
--require_node_auth=true
--enable_client_encryption=true
--require_client_auth=true
--keystore_file=/secret/cassandra.keystore
--keystore_password_file=/secret/cassandra.keystore.password
--truststore_file=/secret/cassandra.truststore
--truststore_password_file=/secret/cassandra.truststore.password
--cassandra_pem_file=/secret/cassandra.pem
Limits:
memory: 2G
Requests:
memory: 1G
State: Running
Started: Thu, 20 Apr 2017 19:53:56 -0300
Ready: True
Restart Count: 0
Readiness: exec [/opt/apache-cassandra/bin/cassandra-docker-ready.sh] delay=0s timeout=1s period=10s #success=1 #failure=3
Volume Mounts:
/cassandra_data from cassandra-data (rw)
/secret from hawkular-cassandra-secrets (rw)
/var/run/secrets/kubernetes.io/serviceaccount from cassandra-token-5l9kw (ro)
Environment Variables:
CASSANDRA_MASTER: true
CASSANDRA_DATA_VOLUME: /cassandra_data
JVM_OPTS: -Dcassandra.commitlog.ignorereplayerrors=true
POD_NAMESPACE: openshift-infra (v1:metadata.namespace)
MEMORY_LIMIT: 2000000000 (limits.memory)
CPU_LIMIT: node allocatable (limits.cpu)
Conditions:
Type Status
Initialized True
Ready True
PodScheduled True
Volumes:
cassandra-data:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
hawkular-cassandra-secrets:
Type: Secret (a volume populated by a Secret)
SecretName: hawkular-cassandra-secrets
cassandra-token-5l9kw:
Type: Secret (a volume populated by a Secret)
SecretName: cassandra-token-5l9kw
QoS Class: Burstable
Tolerations: <none>
No events.
Name: hawkular-metrics-zj777
Namespace: openshift-infra
Security Policy: restricted
Node: ose1.bc.jonqe.lab.eng.bos.redhat.com/10.16.23.148
Start Time: Tue, 06 Jun 2017 14:16:27 -0300
Labels: metrics-infra=hawkular-metrics
name=hawkular-metrics
Status: Running
IP: 10.128.0.91
Controllers: ReplicationController/hawkular-metrics
Containers:
hawkular-metrics:
Container ID: docker://6f8ffde05b78be196a885479dafdca04b33a4254fb53ef5075eacf4d88ce8d7c
Image: openshift3/metrics-hawkular-metrics:3.5.0
Image ID: docker-pullable://brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/openshift3/metrics-hawkular-metrics@sha256:f84a9f9abd9d4407b1a0b8542392ce2e57674821946c2edae410e9c7f4e3e764
Ports: 8080/TCP, 8443/TCP, 8888/TCP
Command:
/opt/hawkular/scripts/hawkular-metrics-wrapper.sh
-b
0.0.0.0
-Dhawkular.metrics.cassandra.nodes=hawkular-cassandra
-Dhawkular.metrics.cassandra.use-ssl
-Dhawkular.metrics.openshift.auth-methods=openshift-oauth,htpasswd
-Dhawkular.metrics.openshift.htpasswd-file=/secrets/hawkular-metrics.htpasswd.file
-Dhawkular.metrics.allowed-cors-access-control-allow-headers=authorization
-Dhawkular.metrics.default-ttl=7
-Dhawkular.metrics.admin-tenant=_hawkular_admin
-Dhawkular-alerts.cassandra-nodes=hawkular-cassandra
-Dhawkular-alerts.cassandra-use-ssl
-Dhawkular.alerts.openshift.auth-methods=openshift-oauth,htpasswd
-Dhawkular.alerts.openshift.htpasswd-file=/secrets/hawkular-metrics.htpasswd.file
-Dhawkular.alerts.allowed-cors-access-control-allow-headers=authorization
-Dorg.apache.tomcat.util.buf.UDecoder.ALLOW_ENCODED_SLASH=true
-Dorg.apache.catalina.connector.CoyoteAdapter.ALLOW_BACKSLASH=true
-Dcom.datastax.driver.FORCE_NIO=true
-DKUBERNETES_MASTER_URL=https://kubernetes.default.svc.cluster.local
-DUSER_WRITE_ACCESS=False
--hmw.keystore=/secrets/hawkular-metrics.keystore
--hmw.truststore=/secrets/hawkular-metrics.truststore
--hmw.keystore_password_file=/secrets/hawkular-metrics.keystore.password
--hmw.truststore_password_file=/secrets/hawkular-metrics.truststore.password
--hmw.jgroups_keystore=/secrets/hawkular-metrics.jgroups.keystore
--hmw.jgroups_keystore_password_file=/secrets/hawkular-metrics.jgroups.keystore.password
--hmw.jgroups_alias_file=/secrets/hawkular-metrics.jgroups.alias
Limits:
memory: 2500M
Requests:
memory: 1500M
State: Running
Started: Tue, 06 Jun 2017 14:16:38 -0300
Ready: True
Restart Count: 0
Liveness: exec [/opt/hawkular/scripts/hawkular-metrics-liveness.py] delay=0s timeout=1s period=10s #success=1 #failure=3
Readiness: exec [/opt/hawkular/scripts/hawkular-metrics-readiness.py] delay=0s timeout=1s period=10s #success=1 #failure=3
Volume Mounts:
/client-secrets from hawkular-metrics-client-secrets (rw)
/secrets from hawkular-metrics-secrets (rw)
/var/run/secrets/kubernetes.io/serviceaccount from hawkular-token-xnc2c (ro)
Environment Variables:
POD_NAMESPACE: openshift-infra (v1:metadata.namespace)
MASTER_URL: https://kubernetes.default.svc.cluster.local
OPENSHIFT_KUBE_PING_NAMESPACE: openshift-infra (v1:metadata.namespace)
OPENSHIFT_KUBE_PING_LABELS: metrics-infra=hawkular-metrics,name=hawkular-metrics
STARTUP_TIMEOUT: 500
Conditions:
Type Status
Initialized True
Ready True
PodScheduled True
Volumes:
hawkular-metrics-secrets:
Type: Secret (a volume populated by a Secret)
SecretName: hawkular-metrics-secrets
hawkular-metrics-client-secrets:
Type: Secret (a volume populated by a Secret)
SecretName: hawkular-metrics-account
hawkular-token-xnc2c:
Type: Secret (a volume populated by a Secret)
SecretName: hawkular-token-xnc2c
QoS Class: Burstable
Tolerations: <none>
No events.
Name: heapster-tpc9q
Namespace: openshift-infra
Security Policy: restricted
Node: ose2.bc.jonqe.lab.eng.bos.redhat.com/10.16.23.195
Start Time: Fri, 07 Apr 2017 17:06:30 -0300
Labels: metrics-infra=heapster
name=heapster
Status: Running
IP: 10.129.0.113
Controllers: ReplicationController/heapster
Containers:
heapster:
Container ID: docker://46f893d2183bd541e7648791c3f5cc6da31c70b5b19f7c7b2033494c097e7aad
Image: openshift3/metrics-heapster:3.5.0
Image ID: docker-pullable://brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/openshift3/metrics-heapster@sha256:52ec40ac8235095951cf321d77c10bdc59194e620fc24d5eb49383d671eb6937
Port: 8082/TCP
Command:
heapster-wrapper.sh
--wrapper.allowed_users_file=/secrets/heapster.allowed-users
--source=kubernetes.summary_api:${MASTER_URL}?useServiceAccount=true&kubeletHttps=true&kubeletPort=10250
--tls_cert=/secrets/heapster.cert
--tls_key=/secrets/heapster.key
--tls_client_ca=/secrets/heapster.client-ca
--allowed_users=%allowed_users%
--metric_resolution=30s
--wrapper.username_file=/hawkular-account/hawkular-metrics.username
--wrapper.password_file=/hawkular-account/hawkular-metrics.password
--wrapper.endpoint_check=https://hawkular-metrics:443/hawkular/metrics/status
--sink=hawkular:https://hawkular-metrics:443?tenant=_system&labelToTenant=pod_namespace&labelNodeId=nodename&caCert=/hawkular-cert/hawkular-metrics-ca.certificate&user=%username%&pass=%password%&filter=label(container_name:^system.slice.*|^user.slice)
Limits:
memory: 3750M
Requests:
memory: 937500k
State: Running
Started: Thu, 20 Apr 2017 19:45:22 -0300
Ready: True
Restart Count: 2
Readiness: exec [/opt/heapster-readiness.sh] delay=0s timeout=1s period=10s #success=1 #failure=3
Volume Mounts:
/hawkular-account from hawkular-metrics-account (rw)
/hawkular-cert from hawkular-metrics-certificate (rw)
/secrets from heapster-secrets (rw)
/var/run/secrets/kubernetes.io/serviceaccount from heapster-token-1ksp6 (ro)
Environment Variables:
STARTUP_TIMEOUT: 500
Conditions:
Type Status
Initialized True
Ready True
PodScheduled True
Volumes:
heapster-secrets:
Type: Secret (a volume populated by a Secret)
SecretName: heapster-secrets
hawkular-metrics-certificate:
Type: Secret (a volume populated by a Secret)
SecretName: hawkular-metrics-certificate
hawkular-metrics-account:
Type: Secret (a volume populated by a Secret)
SecretName: hawkular-metrics-account
heapster-token-1ksp6:
Type: Secret (a volume populated by a Secret)
SecretName: heapster-token-1ksp6
QoS Class: Burstable
Tolerations: <none>
No events.
Created attachment 1285830 [details]
Cassandra Pod 1
Created attachment 1285831 [details]
Cassandra Pod 2
First, please attach output in files and not directly pasting them into the bugzilla. The main issue here is that your installation is invalid. It appears that you have increase the Cassandra RC above 1, which is not how you scale the Cassandra components. You will need to specify the number of Cassandra instances to deploy in your ansible inventory file. Otherwise you will have multiple Cassandra pods using the same filesystem which will cause problems. In this case, it doesn't appear that you are using persistent volumes and are using the emptydir volumes instead. So this is not causing your issue. @gbaufake: it appears you are using the brew image, those are not supported. Why are you not using the supported images? @jsanda: I don't see anything weird in the Cassandra logs, any idea about what is happening here? (In reply to Matt Wringe from comment #5) > First, please attach output in files and not directly pasting them into the > bugzilla. > > The main issue here is that your installation is invalid. It appears that > you have increase the Cassandra RC above 1, which is not how you scale the > Cassandra components. > > You will need to specify the number of Cassandra instances to deploy in your > ansible inventory file. Otherwise you will have multiple Cassandra pods > using the same filesystem which will cause problems. > > In this case, it doesn't appear that you are using persistent volumes and > are using the emptydir volumes instead. So this is not causing your issue. > > @gbaufake: it appears you are using the brew image, those are not supported. > Why are you not using the supported images? > > @jsanda: I don't see anything weird in the Cassandra logs, any idea about > what is happening here? I can see from the logs that both the hawkular_alerts and hawkular_metrics keyspace have been created. Cassandra must be under fairly heavy load because both logs are filled with tons of GC. @jsanda: The Hawkular Metric pod is not running in this case, so there shouldn't be anything connecting to Cassandra to cause it to be under a heavy load. @gbaufake: do you know how large a cluster size you are trying to manage here? Were metrics running at some point and are now failing? Or is this a fresh installation? (In reply to Matt Wringe from comment #7) > @jsanda: The Hawkular Metric pod is not running in this case, so there > shouldn't be anything connecting to Cassandra to cause it to be under a > heavy load. Maybe the logs in comment 3 and in comment 4 are not current because they show both keyspaces exist. > > > @gbaufake: do you know how large a cluster size you are trying to manage > here? Were metrics running at some point and are now failing? Or is this a > fresh installation? I asked gbaufake if restarting the pods made any difference, and he claimed he got the same results. @jsanda the logs are current from cassandra and hawkular-metrics. Weird, sometimes key_space is created sometimes it does not. @mwringe: my cluster has two nodes - 2 Vms of 8GB of Ram each - 40 GB of disk - RHEL 7.3 I restarted the pods and got the same results are you still using the brew images? Can you please use the supported access.redhat.com ones and then attach fresh logs when those startup? Can you log into both of the Cassandra pods and run `nodetool status` and share the output? Sorry. We have a problem on the blade center and some machines were affected. I'll let you know when I found something. I think I could reproduce the situation on a Docker Container: https://issues.jboss.org/projects/HWKMETRICS/issues/HWKMETRICS-685. The issue described in https://issues.jboss.org/projects/HWKMETRICS/issues/HWKMETRICS-685 is invalid. It should be expected that going into Cassandra while Hawkular Metrics is running and deleting keyspaces will cause problems. I believe there is a possibility for this to occur in OpenShift if you are running Cassandra without persistent storage and you delete the Cassandra pod. When the Cassandra pod comes back up, Hawkular Metrics might be able to connect to it but the keyspace will not be available (the keyspace is created at start time). We have a few options here: 1) detect this situation and have Hawkular Metrics restart itself (eg via a liveness probe). When Hawkular Metrics restarts, it will notice the keyspace does not exist and will create it. 2) have Hawkular Metrics be able to detect this situation better and automatically recreate the keyspaces if it detects they do not exist. @jsanda: any opinions here? If we put this at the Hawkular level, it will work in all environments. Not just OpenShift If we add a check and restart our Hawkular pod in this case, we will need to provide a mechanism to detect this scenario. But if we detect this scenario in the Hawkular code, it might be just as easy to just recreate the keyspace directly. Thoughts? (In reply to Matt Wringe from comment #16) > @jsanda: any opinions here? > > If we put this at the Hawkular level, it will work in all environments. Not > just OpenShift > > If we add a check and restart our Hawkular pod in this case, we will need to > provide a mechanism to detect this scenario. But if we detect this scenario > in the Hawkular code, it might be just as easy to just recreate the keyspace > directly. > > Thoughts? Detecting it might be tricky. We should get notifications from the driver on things like schema changes or schema being be added/dropped. Things get tricky though in the single C* node scenario. If we can rely on the driver notification, then this will be easy. If we cannot rely on the driver notifications, then we need to explore some other options. When we do detect that the schema has been dropped, I think the best way of handling it is a restart. (In reply to John Sanda from comment #17) > When we do detect that the schema has been dropped, I think the best way of > handling it is a restart. I am assuming it would be Hawkular Metrics which would detect that the schema has been dropped. This means we can either: 1) have Hawkular Metrics terminate. This will cause the container within the pod to restart itself. 2) expose this error somewhere (perhaps the status endpoint). We can then use the liveness probe to check for this condition and restart the container based on it. @jsanda: any thoughts on which is more perferred? (In reply to Matt Wringe from comment #18) > (In reply to John Sanda from comment #17) > > When we do detect that the schema has been dropped, I think the best way of > > handling it is a restart. > > I am assuming it would be Hawkular Metrics which would detect that the > schema has been dropped. > > This means we can either: > > 1) have Hawkular Metrics terminate. This will cause the container within the > pod to restart itself. > > 2) expose this error somewhere (perhaps the status endpoint). We can then > use the liveness probe to check for this condition and restart the container > based on it. > > @jsanda: any thoughts on which is more perferred? My preference would be through the status endpoint and using the liveness probe. |
Created attachment 1285618 [details] Hawkular-Metrics-Logs Description of problem: When I try to recreate Hawkular Metrics pod on Openshift-infra, cassandra does not find alerts_keyspace or metrics_keyspace Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. Install Openshift OCP 3.5 2. Install Metrics 3. Delete Hawkular Metrics Pod on Openshift-Infra without deleting Cassandra pods 4. Recreate Hawkular Metrics Pods Actual results: - Openshift can't create Hawkular Metrics' pods. Expected results: - Recreate Hawkukar Metrics' pods without errors Additional info: