Bug 1285468
| Summary: | OSE 3.1.0.4 metrics do not show for non Resource Limited projects | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Boris Kurktchiev <kurktchiev> |
| Component: | Hawkular | Assignee: | Matt Wringe <mwringe> |
| Status: | CLOSED NOTABUG | QA Contact: | chunchen <chunchen> |
| Severity: | high | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 3.1.0 | CC: | aos-bugs, mwringe, nicholas_schuetz, spadgett, wsun |
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2016-01-11 18:33:26 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Boris Kurktchiev
2015-11-25 16:32:41 UTC
Can you post a screenshot of what you are seeing for the limited containers? Eg the ones where you are seeing a graph? (In reply to Matt Wringe from comment #1) > Can you post a screenshot of what you are seeing for the limited containers? > Eg the ones where you are seeing a graph? https://www.dropbox.com/s/7c9fk0xz8p9tsh9/NoMetrics.png?dl=0 basically empty graphs In a quota project i get this: https://www.dropbox.com/s/m77nwxb3l92hg5f/Metrics.png?dl=0 You can see that even in that I get no CPU information, only memory I can't reproduce this issue. Are you seeing anything in the Heapster logs? Or the Hawkular Metrics logs? Boris, can you attach the output of the following command? oc get -o yaml pod mwmattermost-18-gofky -n mwmattermost (In reply to Matt Wringe from comment #3) > I can't reproduce this issue. > > Are you seeing anything in the Heapster logs? Or the Hawkular Metrics logs? As soon as you tell me what to look for I am happy to look, on a quick glance, I am not seeing much of anything, but I am not familiar with either product to where I can be 100% sure :/ root@osmaster0s:~:
----> oc get pods mwmattermost-18-gofky
NAME READY STATUS RESTARTS AGE
mwmattermost-18-gofky 1/1 Running 0 6d
root@osmaster0s:~:
----> oc get pods mwmattermost-18-gofky -o yaml
apiVersion: v1
kind: Pod
metadata:
annotations:
kubernetes.io/created-by: |
{"kind":"SerializedReference","apiVersion":"v1","reference":{"kind":"ReplicationController","namespace":"mwservices","name":"mwmattermost-18","uid":"505d3aeb-91fb-11e5-866b-005056a6874f","apiVersion":"v1","resourceVersion":"3224694"}}
openshift.io/deployment-config.latest-version: "18"
openshift.io/deployment-config.name: mwmattermost
openshift.io/deployment.name: mwmattermost-18
openshift.io/scc: restricted
creationTimestamp: 2015-11-25T16:47:26Z
generateName: mwmattermost-18-
labels:
app: mwmattermost
deployment: mwmattermost-18
deploymentconfig: mwmattermost
name: mwmattermost-18-gofky
namespace: mwservices
resourceVersion: "3224709"
selfLink: /api/v1/namespaces/mwservices/pods/mwmattermost-18-gofky
uid: 3584cb46-9394-11e5-ac32-005056a6874f
spec:
containers:
- env:
- name: DB_HOST
value: 172.30.134.98
- name: DB_NAME
value: mattermost
- name: DB_PASS
value: matterm0st
- name: DB_TYPE
value: mysql
- name: DB_USER
value: mattermost
image: 172.30.16.236:5000/mwservices/mwmattermost@sha256:5d3a1cc959ce23609ab316420f0533ac22685d0711f2fc997e6ed3ceae25043a
imagePullPolicy: Always
name: mwmattermost
ports:
- containerPort: 8080
protocol: TCP
resources: {}
securityContext:
privileged: false
terminationMessagePath: /dev/termination-log
volumeMounts:
- mountPath: /opt/local/mattermost/data
name: mwmattermost-volume-1
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: default-token-2xueq
readOnly: true
dnsPolicy: ClusterFirst
host: osnode1s.devapps.unc.edu
imagePullSecrets:
- name: default-dockercfg-igm5g
nodeName: osnode1s.devapps.unc.edu
nodeSelector:
region: primary
zone: vipapps
restartPolicy: Always
securityContext: {}
serviceAccount: default
serviceAccountName: default
terminationGracePeriodSeconds: 30
volumes:
- name: mwmattermost-volume-1
persistentVolumeClaim:
claimName: mwmattermost
- name: default-token-2xueq
secret:
secretName: default-token-2xueq
status:
conditions:
- lastProbeTime: null
lastTransitionTime: 2015-11-25T16:47:28Z
status: "True"
type: Ready
containerStatuses:
- containerID: docker://f1ac5e8eecc0aa5ba13fb56899760bf48c4fffd0cdf644ef97c15997933b75f3
image: 172.30.16.236:5000/mwservices/mwmattermost@sha256:5d3a1cc959ce23609ab316420f0533ac22685d0711f2fc997e6ed3ceae25043a
imageID: docker://876b3723ee777b25a5bc22289c27075cf6587239ec3f6761a62d8e8469875db8
lastState: {}
name: mwmattermost
ready: true
restartCount: 0
state:
running:
startedAt: 2015-11-25T16:47:28Z
hostIP: 152.19.229.208
phase: Running
podIP: 10.1.1.17
startTime: 2015-11-25T16:47:26Z
(In reply to Matt Wringe from comment #3) > I can't reproduce this issue. > > Are you seeing anything in the Heapster logs? Or the Hawkular Metrics logs? biggest thing I am seeing in the heapster logs is: 2015/12/01 15:40:49 http: TLS handshake error from 10.1.0.1:57300: tls: no cipher suite supported by both client and server 971 2015/12/01 15:40:49 http: TLS handshake error from 10.1.0.1:57301: tls: no cipher suite supported by both client and server 972 2015/12/01 15:40:49 http: TLS handshake error from 10.1.0.1:57302: tls: no cipher suite supported by both client and server 973 2015/12/01 15:40:49 http: TLS handshake error from 10.1.0.1:57303: tls: no cipher suite supported by both client and server 974 2015/12/01 15:44:00 http: TLS handshake error from 10.1.0.1:35213: tls: unsupported SSLv2 handshake received As far as I can tell Hawkular and Cassandra are not throwing any errors and only the heapster log is the one tossing the above. (In reply to Boris Kurktchiev from comment #7) > (In reply to Matt Wringe from comment #3) > > I can't reproduce this issue. > > > > Are you seeing anything in the Heapster logs? Or the Hawkular Metrics logs? > > biggest thing I am seeing in the heapster logs is: > 2015/12/01 15:40:49 http: TLS handshake error from 10.1.0.1:57300: tls: no > cipher suite supported by both client and server > 971 > 2015/12/01 15:40:49 http: TLS handshake error from 10.1.0.1:57301: tls: no > cipher suite supported by both client and server > 972 > 2015/12/01 15:40:49 http: TLS handshake error from 10.1.0.1:57302: tls: no > cipher suite supported by both client and server > 973 > 2015/12/01 15:40:49 http: TLS handshake error from 10.1.0.1:57303: tls: no > cipher suite supported by both client and server > 974 > 2015/12/01 15:44:00 http: TLS handshake error from 10.1.0.1:35213: tls: > unsupported SSLv2 handshake received > > As far as I can tell Hawkular and Cassandra are not throwing any errors and > only the heapster log is the one tossing the above. And recently these have started to pop up in the log: W1202 14:18:49.380194 1 reflector.go:224] /tmp/gopath/src/k8s.io/heapster/sources/pods.go:173: watch of *api.Pod ended with: 401: The event in requested index is outdated and cleared (the requested history has been cleared [3468623/3468145]) [3469622] 1000 W1202 15:18:50.433236 1 reflector.go:224] /tmp/gopath/src/k8s.io/heapster/sources/pods.go:173: watch of *api.Pod ended with: 401: The event in requested index is outdated and cleared (the requested history has been cleared [3470101/3469623]) [3471100] Hmm, this may be part of the root cause: "http: TLS handshake error from 10.1.0.1:57300: tls: no cipher suite supported by both client and server" The problem is that its really strange that it can only receive a very specific type of metric. I would have expected all metrics or nothing. Its also really strange that I have not heard anything similar to this from anyone else, which is why I am suspecting something slightly different with the setup or install. But I don't see anything special in what you are doing. (In reply to Matt Wringe from comment #9) > Hmm, this may be part of the root cause: > > "http: TLS handshake error from 10.1.0.1:57300: tls: no cipher suite > supported by both client and server" > > The problem is that its really strange that it can only receive a very > specific type of metric. I would have expected all metrics or nothing. > > Its also really strange that I have not heard anything similar to this from > anyone else, which is why I am suspecting something slightly different with > the setup or install. But I don't see anything special in what you are doing. Not sure, original install was 3.0.2 from whatever state the ansible-playbooks github repo was. Then the upgrade to 3.1 was done with the included playbooks. Also, only RAM shows up for the resource limited projecets, no CPU, and obviously still nothing on all other projects. Can you please post the full logs from the heapster container somewhere? (In reply to Matt Wringe from comment #11) > Can you please post the full logs from the heapster container somewhere? https://gist.github.com/ebalsumgo/83f826ef2d9bfff00e8f From the log: "2015/11/30 08:08:08 http: TLS handshake error from 10.1.2.1:39503: tls: first record does not look like a TLS handshake 2015/11/30 08:08:08 http: TLS handshake error from 10.1.2.1:42048: tls: unsupported SSLv2 handshake received" Can you verify is 10.1.2.1 is the ip address for one of your nodes? Did you do anything special to setup your nodes certificate? Or are you using the default generated ones? (In reply to Matt Wringe from comment #13) > From the log: > > "2015/11/30 08:08:08 http: TLS handshake error from 10.1.2.1:39503: tls: > first record does not look like a TLS handshake > 2015/11/30 08:08:08 http: TLS handshake error from 10.1.2.1:42048: tls: > unsupported SSLv2 handshake received" > > Can you verify is 10.1.2.1 is the ip address for one of your nodes? > > Did you do anything special to setup your nodes certificate? Or are you > using the default generated ones? root@osmaster0s:~: ----> oc get hostsubnets NAME HOST HOST IP SUBNET osmaster0s.devapps.unc.edu osmaster0s.devapps.unc.edu 152.19.229.206 10.1.0.0/24 osnode0s.devapps.unc.edu osnode0s.devapps.unc.edu 152.19.229.207 10.1.2.0/24 osnode1s.devapps.unc.edu osnode1s.devapps.unc.edu 152.19.229.208 10.1.1.0/24 osnode2s.devapps.unc.edu osnode2s.devapps.unc.edu 152.19.229.209 10.1.3.0/24 Looks like that IP should live on one of my nodes, yes. For the record, here is the debug output from heapster: https://gist.github.com/ebalsumgo/d7e199abfdf2947b148d (In reply to Boris Kurktchiev from comment #15) > For the record, here is the debug output from heapster: > https://gist.github.com/ebalsumgo/d7e199abfdf2947b148d Got it working by following these instructions to replace my 3.0.2 certs: On each node with cert IP errors ================================ 1. Determine what subject alt names are already in place for the node's serving certificate: openssl x509 -in /etc/origin/node/server.crt -text -noout | grep -A "Subject Alternative Name" If the output shows: X509v3 Subject Alternative Name: DNS:mynode, DNS:mynode.mydomain.com, IP: 1.2.3.4 then your subject alt names are: mynode mynode.mydomain.com 1.2.3.4 2. Determine the IP address the node will register, listed in /etc/origin/node/node-config.yaml as the "nodeIP" key. For example: nodeIP: "10.10.10.1" This should match the IP in the log error about the node certificate. If the IP address is not listed as a subject alt name in the node certificate, it needs to be added. On the master ============= 1. Make a tmp dir and run this: signing_opts="--signer-cert=/etc/origin/master/ca.crt --signer-key=/etc/origin/master/ca.key --signer-serial=/etc/origin/master/ca.serial.txt" 2. For each node, run: oadm ca create-server-cert --cert=$nodename/server.crt --key=$nodename/server.key --hostnames=<existing subject alt names>,<new node IP> $signing_opts For example: oadm ca create-server-cert --cert=mynode/server.crt --key=mynode/server.key --hostnames=mynode,mynode.mydomain.com,1.2.3.4,10.10.10.1 $signing_opts Replace node serving certs ========================== 1. back up the existing /etc/origin/node/server.{crt,key} files on each node 2. copy the generated $nodename/server.{crt,key} files to each node under /etc/origin/node/ 3. restart the node service Since it appears that this was just an issue with certificates, I will be closing this issue. If you run into a similar issue again, please let us know. |