Bug 1365787
Summary: | Failed to start hawkular-metrics pod when using registry.ops | ||||||
---|---|---|---|---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | chunchen <chunchen> | ||||
Component: | Hawkular | Assignee: | Troy Dawson <tdawson> | ||||
Status: | CLOSED ERRATA | QA Contact: | chunchen <chunchen> | ||||
Severity: | medium | Docs Contact: | |||||
Priority: | medium | ||||||
Version: | 3.3.0 | CC: | aos-bugs, chunchen, penli, pweil, wsun, xtian | ||||
Target Milestone: | --- | ||||||
Target Release: | --- | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | If docs needed, set a value | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2016-09-27 09:43:29 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
chunchen
2016-08-10 08:34:21 UTC
Created attachment 1189496 [details]
hawkular metrics pod log
I can't reproduce this. I just deployed the 3.3.0 images from registry.ops and it all works. Is this something you can reproduce? does this happen consistently? It happens consistently when I tested, could you try to deploy on an OSE containerized installation? I also tried metrics deployment against OSE rpm installation, it can work well in such installation env. What exactly do you mean by an 'OSE containerized installation'? Looking more closely at the logs, it appear the problem is that something is killing the pod while it is starting. Can you check and see what is listed under events for the Hawkular Pod? "OSE" containerized installation" means installing OSE env via containerized method. The hawkular pod events as below: Events: FirstSeen LastSeen Count From SubobjectPath Type Reason Message --------- -------- ----- ---- ------------- -------- ------ ------- 12m 12m 1 {default-scheduler } Normal Scheduled Successfully assigned hawkular-metrics-xhnnb to ip-172-18-4-201.ec2.internal 12m 12m 1 {kubelet ip-172-18-4-201.ec2.internal} spec.containers{hawkular-metrics} Normal Pulling pulling image "registry.ops.openshift.com/openshift3/metrics-hawkular-metrics:3.3.0" 7m 7m 1 {kubelet ip-172-18-4-201.ec2.internal} spec.containers{hawkular-metrics} Normal Pulled Successfully pulled image "registry.ops.openshift.com/openshift3/metrics-hawkular-metrics:3.3.0" 7m 7m 1 {kubelet ip-172-18-4-201.ec2.internal} spec.containers{hawkular-metrics} Normal Created Created container with docker id 13f981cc5473 7m 7m 1 {kubelet ip-172-18-4-201.ec2.internal} spec.containers{hawkular-metrics} Normal Started Started container with docker id 13f981cc5473 4m 4m 1 {kubelet ip-172-18-4-201.ec2.internal} spec.containers{hawkular-metrics} Normal Killing Killing container with docker id 13f981cc5473: pod "hawkular-metrics-xhnnb_openshift-infra(30ed96c9-603d-11e6-9d1a-0eeb7993154f)" container "hawkular-metrics" is unhealthy, it will be killed and re-created. 4m 4m 1 {kubelet ip-172-18-4-201.ec2.internal} spec.containers{hawkular-metrics} Normal Created Created container with docker id 891bea9986ca 4m 4m 1 {kubelet ip-172-18-4-201.ec2.internal} spec.containers{hawkular-metrics} Normal Started Started container with docker id 891bea9986ca 4m 4m 1 {kubelet ip-172-18-4-201.ec2.internal} spec.containers{hawkular-metrics} Normal Killing Killing container with docker id 891bea9986ca: pod "hawkular-metrics-xhnnb_openshift-infra(30ed96c9-603d-11e6-9d1a-0eeb7993154f)" container "hawkular-metrics" is unhealthy, it will be killed and re-created. 4m 4m 1 {kubelet ip-172-18-4-201.ec2.internal} spec.containers{hawkular-metrics} Normal Created Created container with docker id d336a478d216 4m 4m 1 {kubelet ip-172-18-4-201.ec2.internal} spec.containers{hawkular-metrics} Normal Started Started container with docker id d336a478d216 3m 3m 1 {kubelet ip-172-18-4-201.ec2.internal} spec.containers{hawkular-metrics} Normal Killing Killing container with docker id d336a478d216: pod "hawkular-metrics-xhnnb_openshift-infra(30ed96c9-603d-11e6-9d1a-0eeb7993154f)" container "hawkular-metrics" is unhealthy, it will be killed and re-created. 3m 3m 1 {kubelet ip-172-18-4-201.ec2.internal} spec.containers{hawkular-metrics} Normal Created Created container with docker id c491cbb597ee 3m 3m 1 {kubelet ip-172-18-4-201.ec2.internal} spec.containers{hawkular-metrics} Normal Started Started container with docker id c491cbb597ee 3m 3m 1 {kubelet ip-172-18-4-201.ec2.internal} Warning FailedSync Error syncing pod, skipping: Error response from daemon: devmapper: Unknown device 88ada34788941c910b494d4587d21c4dd2315ce2d47e2a238f7d7ae903ceecf0 2m 2m 1 {kubelet ip-172-18-4-201.ec2.internal} spec.containers{hawkular-metrics} Normal Killing Killing container with docker id c491cbb597ee: pod "hawkular-metrics-xhnnb_openshift-infra(30ed96c9-603d-11e6-9d1a-0eeb7993154f)" container "hawkular-metrics" is unhealthy, it will be killed and re-created. 2m 2m 1 {kubelet ip-172-18-4-201.ec2.internal} spec.containers{hawkular-metrics} Normal Created Created container with docker id f8188adde671 2m 2m 1 {kubelet ip-172-18-4-201.ec2.internal} spec.containers{hawkular-metrics} Normal Started Started container with docker id f8188adde671 5m 14s 9 {kubelet ip-172-18-4-201.ec2.internal} spec.containers{hawkular-metrics} Warning Unhealthy Liveness probe failed: 4m 13s 5 {kubelet ip-172-18-4-201.ec2.internal} spec.containers{hawkular-metrics} Normal Pulled Container image "registry.ops.openshift.com/openshift3/metrics-hawkular-metrics:3.3.0" already present on machine 13s 13s 1 {kubelet ip-172-18-4-201.ec2.internal} spec.containers{hawkular-metrics} Normal Killing Killing container with docker id f8188adde671: pod "hawkular-metrics-xhnnb_openshift-infra(30ed96c9-603d-11e6-9d1a-0eeb7993154f)" container "hawkular-metrics" is unhealthy, it will be killed and re-created. 11s 11s 1 {kubelet ip-172-18-4-201.ec2.internal} spec.containers{hawkular-metrics} Normal Created Created container with docker id 70a9683be499 10s 10s 1 {kubelet ip-172-18-4-201.ec2.internal} spec.containers{hawkular-metrics} Normal Started Started container with docker id 70a9683be499 7m <invalid> 43 {kubelet ip-172-18-4-201.ec2.internal} spec.containers{hawkular-metrics} Warning Unhealthy Readiness probe failed: ""OSE" containerized installation" means installing OSE env via containerized method." Can you please be specific about how you are installing OSE. Do you just mean you are running OpenShift itself in a docker container? Or running this under so other means? What exact steps are you following? The issue here is that the Hawkular Metrics pod is being killed. The message in the logs about the "activemq-rar.rar" can be completely ignored, that error message is because the rar is being killed while it is starting (see "*** JBossAS process (160) received TERM signal ***" in the logs right above it). Why its being killed is the problem. It looks like the liveness probe has failed: "container "hawkular-metrics" is unhealthy, it will be killed and re-created" But the liveness probe should only fail under two situations: 1) the Hawkular Metrics service status is 'FAILED' (https://github.com/openshift/origin-metrics/blob/master/hawkular-metrics/hawkular-metrics-liveness.py#L41) but this isn't the case because the Hawkular Metrics war hasn't even started yet. 2) the other reason is that if its taken more than 3 minutes from the start of the metrics startup and the Hawkular Metrics service status is not 'STARTED' (https://github.com/openshift/origin-metrics/blob/master/hawkular-metrics/hawkular-metrics-liveness.py#L50). But from the logs this shouldn't be the case either. If you run the non ops images in this containerised environment, does it still fail? Are there any more information from the OpenShift logs (not the container logs) over why this is failing? I need an environment where this issue can be reproduced, otherwise there is not much more I can do to with this issue just based on the logs. I have also opened https://bugzilla.redhat.com/show_bug.cgi?id=1367204 because the event logs really should be showing the reason for the failure. Change the status to MODIFIED since the latest images do not sync to OPS registry till now. The issue is also reproduced on OSE RPM installation env. Could you help to check if the metrics images from OPS registry are sync or built correctly? mwringe has built new images. New images have been synced to registy.ops. It's fixed, checked with the latest metrics images, the test result as below: [root@ip-172-18-12-152 ~]# oc get pod NAME READY STATUS RESTARTS AGE hawkular-cassandra-1-sjjyo 1/1 Running 0 9m hawkular-metrics-i7tti 1/1 Running 2 9m heapster-erb9k 1/1 Running 2 9m [root@ip-172-18-12-152 ~]# oc describe pod hawkular-metrics-i7tti Name: hawkular-metrics-i7tti Namespace: openshift-infra Security Policy: restricted Node: ip-172-18-0-250.ec2.internal/172.18.0.250 Start Time: Mon, 22 Aug 2016 01:34:53 -0400 Labels: metrics-infra=hawkular-metrics name=hawkular-metrics Status: Running IP: 10.1.0.5 Controllers: ReplicationController/hawkular-metrics Containers: hawkular-metrics: Container ID: docker://b7b957858e2cc95d3cd78aa3d84d991fa40b05d158557d52bcdf9b143ce1573f Image: registry.ops.openshift.com/openshift3/metrics-hawkular-metrics:3.3.0 Image ID: docker://sha256:cd137686f61ef443d9319d9b7568b7609dda198e401d4e7324585d1a26fe5496 <----------snip------------> Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2016:1933 |