Bug 1365787
| Summary: | Failed to start hawkular-metrics pod when using registry.ops | ||||||
|---|---|---|---|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | chunchen <chunchen> | ||||
| Component: | Hawkular | Assignee: | Troy Dawson <tdawson> | ||||
| Status: | CLOSED ERRATA | QA Contact: | chunchen <chunchen> | ||||
| Severity: | medium | Docs Contact: | |||||
| Priority: | medium | ||||||
| Version: | 3.3.0 | CC: | aos-bugs, chunchen, penli, pweil, wsun, xtian | ||||
| Target Milestone: | --- | ||||||
| Target Release: | --- | ||||||
| Hardware: | Unspecified | ||||||
| OS: | Unspecified | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2016-09-27 09:43:29 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Attachments: |
|
||||||
|
Description
chunchen
2016-08-10 08:34:21 UTC
Created attachment 1189496 [details]
hawkular metrics pod log
I can't reproduce this. I just deployed the 3.3.0 images from registry.ops and it all works. Is this something you can reproduce? does this happen consistently? It happens consistently when I tested, could you try to deploy on an OSE containerized installation? I also tried metrics deployment against OSE rpm installation, it can work well in such installation env. What exactly do you mean by an 'OSE containerized installation'? Looking more closely at the logs, it appear the problem is that something is killing the pod while it is starting. Can you check and see what is listed under events for the Hawkular Pod? "OSE" containerized installation" means installing OSE env via containerized method.
The hawkular pod events as below:
Events:
FirstSeen LastSeen Count From SubobjectPath Type Reason Message
--------- -------- ----- ---- ------------- -------- ------ -------
12m 12m 1 {default-scheduler } Normal Scheduled Successfully assigned hawkular-metrics-xhnnb to ip-172-18-4-201.ec2.internal
12m 12m 1 {kubelet ip-172-18-4-201.ec2.internal} spec.containers{hawkular-metrics} Normal Pulling pulling image "registry.ops.openshift.com/openshift3/metrics-hawkular-metrics:3.3.0"
7m 7m 1 {kubelet ip-172-18-4-201.ec2.internal} spec.containers{hawkular-metrics} Normal Pulled Successfully pulled image "registry.ops.openshift.com/openshift3/metrics-hawkular-metrics:3.3.0"
7m 7m 1 {kubelet ip-172-18-4-201.ec2.internal} spec.containers{hawkular-metrics} Normal Created Created container with docker id 13f981cc5473
7m 7m 1 {kubelet ip-172-18-4-201.ec2.internal} spec.containers{hawkular-metrics} Normal Started Started container with docker id 13f981cc5473
4m 4m 1 {kubelet ip-172-18-4-201.ec2.internal} spec.containers{hawkular-metrics} Normal Killing Killing container with docker id 13f981cc5473: pod "hawkular-metrics-xhnnb_openshift-infra(30ed96c9-603d-11e6-9d1a-0eeb7993154f)" container "hawkular-metrics" is unhealthy, it will be killed and re-created.
4m 4m 1 {kubelet ip-172-18-4-201.ec2.internal} spec.containers{hawkular-metrics} Normal Created Created container with docker id 891bea9986ca
4m 4m 1 {kubelet ip-172-18-4-201.ec2.internal} spec.containers{hawkular-metrics} Normal Started Started container with docker id 891bea9986ca
4m 4m 1 {kubelet ip-172-18-4-201.ec2.internal} spec.containers{hawkular-metrics} Normal Killing Killing container with docker id 891bea9986ca: pod "hawkular-metrics-xhnnb_openshift-infra(30ed96c9-603d-11e6-9d1a-0eeb7993154f)" container "hawkular-metrics" is unhealthy, it will be killed and re-created.
4m 4m 1 {kubelet ip-172-18-4-201.ec2.internal} spec.containers{hawkular-metrics} Normal Created Created container with docker id d336a478d216
4m 4m 1 {kubelet ip-172-18-4-201.ec2.internal} spec.containers{hawkular-metrics} Normal Started Started container with docker id d336a478d216
3m 3m 1 {kubelet ip-172-18-4-201.ec2.internal} spec.containers{hawkular-metrics} Normal Killing Killing container with docker id d336a478d216: pod "hawkular-metrics-xhnnb_openshift-infra(30ed96c9-603d-11e6-9d1a-0eeb7993154f)" container "hawkular-metrics" is unhealthy, it will be killed and re-created.
3m 3m 1 {kubelet ip-172-18-4-201.ec2.internal} spec.containers{hawkular-metrics} Normal Created Created container with docker id c491cbb597ee
3m 3m 1 {kubelet ip-172-18-4-201.ec2.internal} spec.containers{hawkular-metrics} Normal Started Started container with docker id c491cbb597ee
3m 3m 1 {kubelet ip-172-18-4-201.ec2.internal} Warning FailedSync Error syncing pod, skipping: Error response from daemon: devmapper: Unknown device 88ada34788941c910b494d4587d21c4dd2315ce2d47e2a238f7d7ae903ceecf0
2m 2m 1 {kubelet ip-172-18-4-201.ec2.internal} spec.containers{hawkular-metrics} Normal Killing Killing container with docker id c491cbb597ee: pod "hawkular-metrics-xhnnb_openshift-infra(30ed96c9-603d-11e6-9d1a-0eeb7993154f)" container "hawkular-metrics" is unhealthy, it will be killed and re-created.
2m 2m 1 {kubelet ip-172-18-4-201.ec2.internal} spec.containers{hawkular-metrics} Normal Created Created container with docker id f8188adde671
2m 2m 1 {kubelet ip-172-18-4-201.ec2.internal} spec.containers{hawkular-metrics} Normal Started Started container with docker id f8188adde671
5m 14s 9 {kubelet ip-172-18-4-201.ec2.internal} spec.containers{hawkular-metrics} Warning Unhealthy Liveness probe failed:
4m 13s 5 {kubelet ip-172-18-4-201.ec2.internal} spec.containers{hawkular-metrics} Normal Pulled Container image "registry.ops.openshift.com/openshift3/metrics-hawkular-metrics:3.3.0" already present on machine
13s 13s 1 {kubelet ip-172-18-4-201.ec2.internal} spec.containers{hawkular-metrics} Normal Killing Killing container with docker id f8188adde671: pod "hawkular-metrics-xhnnb_openshift-infra(30ed96c9-603d-11e6-9d1a-0eeb7993154f)" container "hawkular-metrics" is unhealthy, it will be killed and re-created.
11s 11s 1 {kubelet ip-172-18-4-201.ec2.internal} spec.containers{hawkular-metrics} Normal Created Created container with docker id 70a9683be499
10s 10s 1 {kubelet ip-172-18-4-201.ec2.internal} spec.containers{hawkular-metrics} Normal Started Started container with docker id 70a9683be499
7m <invalid> 43 {kubelet ip-172-18-4-201.ec2.internal} spec.containers{hawkular-metrics} Warning Unhealthy Readiness probe failed:
""OSE" containerized installation" means installing OSE env via containerized method." Can you please be specific about how you are installing OSE. Do you just mean you are running OpenShift itself in a docker container? Or running this under so other means? What exact steps are you following? The issue here is that the Hawkular Metrics pod is being killed. The message in the logs about the "activemq-rar.rar" can be completely ignored, that error message is because the rar is being killed while it is starting (see "*** JBossAS process (160) received TERM signal ***" in the logs right above it). Why its being killed is the problem. It looks like the liveness probe has failed: "container "hawkular-metrics" is unhealthy, it will be killed and re-created" But the liveness probe should only fail under two situations: 1) the Hawkular Metrics service status is 'FAILED' (https://github.com/openshift/origin-metrics/blob/master/hawkular-metrics/hawkular-metrics-liveness.py#L41) but this isn't the case because the Hawkular Metrics war hasn't even started yet. 2) the other reason is that if its taken more than 3 minutes from the start of the metrics startup and the Hawkular Metrics service status is not 'STARTED' (https://github.com/openshift/origin-metrics/blob/master/hawkular-metrics/hawkular-metrics-liveness.py#L50). But from the logs this shouldn't be the case either. If you run the non ops images in this containerised environment, does it still fail? Are there any more information from the OpenShift logs (not the container logs) over why this is failing? I need an environment where this issue can be reproduced, otherwise there is not much more I can do to with this issue just based on the logs. I have also opened https://bugzilla.redhat.com/show_bug.cgi?id=1367204 because the event logs really should be showing the reason for the failure. Change the status to MODIFIED since the latest images do not sync to OPS registry till now. The issue is also reproduced on OSE RPM installation env. Could you help to check if the metrics images from OPS registry are sync or built correctly? mwringe has built new images. New images have been synced to registy.ops. It's fixed, checked with the latest metrics images, the test result as below:
[root@ip-172-18-12-152 ~]# oc get pod
NAME READY STATUS RESTARTS AGE
hawkular-cassandra-1-sjjyo 1/1 Running 0 9m
hawkular-metrics-i7tti 1/1 Running 2 9m
heapster-erb9k 1/1 Running 2 9m
[root@ip-172-18-12-152 ~]# oc describe pod hawkular-metrics-i7tti
Name: hawkular-metrics-i7tti
Namespace: openshift-infra
Security Policy: restricted
Node: ip-172-18-0-250.ec2.internal/172.18.0.250
Start Time: Mon, 22 Aug 2016 01:34:53 -0400
Labels: metrics-infra=hawkular-metrics
name=hawkular-metrics
Status: Running
IP: 10.1.0.5
Controllers: ReplicationController/hawkular-metrics
Containers:
hawkular-metrics:
Container ID: docker://b7b957858e2cc95d3cd78aa3d84d991fa40b05d158557d52bcdf9b143ce1573f
Image: registry.ops.openshift.com/openshift3/metrics-hawkular-metrics:3.3.0
Image ID: docker://sha256:cd137686f61ef443d9319d9b7568b7609dda198e401d4e7324585d1a26fe5496
<----------snip------------>
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2016:1933 |