Description of problem: I am unable to get metrics to show up for non Resource/Quota projects and even in those I seem to only be getting Mem metrics and no CPU metrics. In unrestricted projects I just have empty graphs. Version-Release number of selected component (if applicable): ----> oc get all CONTROLLER CONTAINER(S) IMAGE(S) SELECTOR REPLICAS AGE hawkular-cassandra-1 hawkular-cassandra-1 openshift/origin-metrics-cassandra:latest name=hawkular-cassandra-1 1 1m hawkular-metrics hawkular-metrics openshift/origin-metrics-hawkular-metrics:latest name=hawkular-metrics 1 1m heapster heapster openshift/origin-metrics-heapster:latest name=heapster 1 1m NAME HOST/PORT PATH SERVICE LABELS INSECURE POLICY TLS TERMINATION hawkular-metrics ose-metrics.ose.devapps.unc.edu hawkular-metrics metrics-infra=support passthrough NAME CLUSTER_IP EXTERNAL_IP PORT(S) SELECTOR AGE hawkular-cassandra 172.30.99.158 <none> 9042/TCP,9160/TCP,7000/TCP,7001/TCP type=hawkular-cassandra 1m hawkular-cassandra-nodes None <none> 9042/TCP,9160/TCP,7000/TCP,7001/TCP type=hawkular-cassandra 1m hawkular-metrics 172.30.28.56 <none> 443/TCP name=hawkular-metrics 1m heapster 172.30.200.193 <none> 80/TCP name=heapster 1m NAME READY STATUS RESTARTS AGE hawkular-cassandra-1-hs7b5 1/1 Running 0 1m hawkular-metrics-ba6id 1/1 Running 0 1m heapster-qqs1i 1/1 Running 2 1m ----> rpm -qa | grep atomic atomic-openshift-utils-3.0.13-1.git.0.5e8c5c7.el7aos.noarch tuned-profiles-atomic-openshift-node-3.1.0.4-1.git.4.b6c7cd2.el7aos.x86_64 atomic-openshift-master-3.1.0.4-1.git.4.b6c7cd2.el7aos.x86_64 atomic-openshift-clients-3.1.0.4-1.git.4.b6c7cd2.el7aos.x86_64 atomic-openshift-3.1.0.4-1.git.4.b6c7cd2.el7aos.x86_64 atomic-openshift-node-3.1.0.4-1.git.4.b6c7cd2.el7aos.x86_64 atomic-openshift-sdn-ovs-3.1.0.4-1.git.4.b6c7cd2.el7aos.x86_64 How reproducible: Deploy with oc process -f /usr/share/ansible/openshift-ansible/roles/openshift_examples/files/examples/infrastructure-templates/enterprise/metrics-deployer.yaml -v CASSANDRA_PV_SIZE=2Gi,CASSANDRA_NODES=2,HAWKULAR_METRICS_HOSTNAME=ose-metrics.ose.devapps.unc.edu,USE_PERSISTENT_STORAGE=false | oc create -f - Steps to Reproduce: 1. 2. 3. Actual results: No metrics but in some Resrouce limited projects Expected results: Metrics everywhere Additional info: I have tried redeploying using the REDEPLOY=true I have tried deploying with Persistent Storage
Can you post a screenshot of what you are seeing for the limited containers? Eg the ones where you are seeing a graph?
(In reply to Matt Wringe from comment #1) > Can you post a screenshot of what you are seeing for the limited containers? > Eg the ones where you are seeing a graph? https://www.dropbox.com/s/7c9fk0xz8p9tsh9/NoMetrics.png?dl=0 basically empty graphs In a quota project i get this: https://www.dropbox.com/s/m77nwxb3l92hg5f/Metrics.png?dl=0 You can see that even in that I get no CPU information, only memory
I can't reproduce this issue. Are you seeing anything in the Heapster logs? Or the Hawkular Metrics logs?
Boris, can you attach the output of the following command? oc get -o yaml pod mwmattermost-18-gofky -n mwmattermost
(In reply to Matt Wringe from comment #3) > I can't reproduce this issue. > > Are you seeing anything in the Heapster logs? Or the Hawkular Metrics logs? As soon as you tell me what to look for I am happy to look, on a quick glance, I am not seeing much of anything, but I am not familiar with either product to where I can be 100% sure :/
root@osmaster0s:~: ----> oc get pods mwmattermost-18-gofky NAME READY STATUS RESTARTS AGE mwmattermost-18-gofky 1/1 Running 0 6d root@osmaster0s:~: ----> oc get pods mwmattermost-18-gofky -o yaml apiVersion: v1 kind: Pod metadata: annotations: kubernetes.io/created-by: | {"kind":"SerializedReference","apiVersion":"v1","reference":{"kind":"ReplicationController","namespace":"mwservices","name":"mwmattermost-18","uid":"505d3aeb-91fb-11e5-866b-005056a6874f","apiVersion":"v1","resourceVersion":"3224694"}} openshift.io/deployment-config.latest-version: "18" openshift.io/deployment-config.name: mwmattermost openshift.io/deployment.name: mwmattermost-18 openshift.io/scc: restricted creationTimestamp: 2015-11-25T16:47:26Z generateName: mwmattermost-18- labels: app: mwmattermost deployment: mwmattermost-18 deploymentconfig: mwmattermost name: mwmattermost-18-gofky namespace: mwservices resourceVersion: "3224709" selfLink: /api/v1/namespaces/mwservices/pods/mwmattermost-18-gofky uid: 3584cb46-9394-11e5-ac32-005056a6874f spec: containers: - env: - name: DB_HOST value: 172.30.134.98 - name: DB_NAME value: mattermost - name: DB_PASS value: matterm0st - name: DB_TYPE value: mysql - name: DB_USER value: mattermost image: 172.30.16.236:5000/mwservices/mwmattermost@sha256:5d3a1cc959ce23609ab316420f0533ac22685d0711f2fc997e6ed3ceae25043a imagePullPolicy: Always name: mwmattermost ports: - containerPort: 8080 protocol: TCP resources: {} securityContext: privileged: false terminationMessagePath: /dev/termination-log volumeMounts: - mountPath: /opt/local/mattermost/data name: mwmattermost-volume-1 - mountPath: /var/run/secrets/kubernetes.io/serviceaccount name: default-token-2xueq readOnly: true dnsPolicy: ClusterFirst host: osnode1s.devapps.unc.edu imagePullSecrets: - name: default-dockercfg-igm5g nodeName: osnode1s.devapps.unc.edu nodeSelector: region: primary zone: vipapps restartPolicy: Always securityContext: {} serviceAccount: default serviceAccountName: default terminationGracePeriodSeconds: 30 volumes: - name: mwmattermost-volume-1 persistentVolumeClaim: claimName: mwmattermost - name: default-token-2xueq secret: secretName: default-token-2xueq status: conditions: - lastProbeTime: null lastTransitionTime: 2015-11-25T16:47:28Z status: "True" type: Ready containerStatuses: - containerID: docker://f1ac5e8eecc0aa5ba13fb56899760bf48c4fffd0cdf644ef97c15997933b75f3 image: 172.30.16.236:5000/mwservices/mwmattermost@sha256:5d3a1cc959ce23609ab316420f0533ac22685d0711f2fc997e6ed3ceae25043a imageID: docker://876b3723ee777b25a5bc22289c27075cf6587239ec3f6761a62d8e8469875db8 lastState: {} name: mwmattermost ready: true restartCount: 0 state: running: startedAt: 2015-11-25T16:47:28Z hostIP: 152.19.229.208 phase: Running podIP: 10.1.1.17 startTime: 2015-11-25T16:47:26Z
(In reply to Matt Wringe from comment #3) > I can't reproduce this issue. > > Are you seeing anything in the Heapster logs? Or the Hawkular Metrics logs? biggest thing I am seeing in the heapster logs is: 2015/12/01 15:40:49 http: TLS handshake error from 10.1.0.1:57300: tls: no cipher suite supported by both client and server 971 2015/12/01 15:40:49 http: TLS handshake error from 10.1.0.1:57301: tls: no cipher suite supported by both client and server 972 2015/12/01 15:40:49 http: TLS handshake error from 10.1.0.1:57302: tls: no cipher suite supported by both client and server 973 2015/12/01 15:40:49 http: TLS handshake error from 10.1.0.1:57303: tls: no cipher suite supported by both client and server 974 2015/12/01 15:44:00 http: TLS handshake error from 10.1.0.1:35213: tls: unsupported SSLv2 handshake received As far as I can tell Hawkular and Cassandra are not throwing any errors and only the heapster log is the one tossing the above.
(In reply to Boris Kurktchiev from comment #7) > (In reply to Matt Wringe from comment #3) > > I can't reproduce this issue. > > > > Are you seeing anything in the Heapster logs? Or the Hawkular Metrics logs? > > biggest thing I am seeing in the heapster logs is: > 2015/12/01 15:40:49 http: TLS handshake error from 10.1.0.1:57300: tls: no > cipher suite supported by both client and server > 971 > 2015/12/01 15:40:49 http: TLS handshake error from 10.1.0.1:57301: tls: no > cipher suite supported by both client and server > 972 > 2015/12/01 15:40:49 http: TLS handshake error from 10.1.0.1:57302: tls: no > cipher suite supported by both client and server > 973 > 2015/12/01 15:40:49 http: TLS handshake error from 10.1.0.1:57303: tls: no > cipher suite supported by both client and server > 974 > 2015/12/01 15:44:00 http: TLS handshake error from 10.1.0.1:35213: tls: > unsupported SSLv2 handshake received > > As far as I can tell Hawkular and Cassandra are not throwing any errors and > only the heapster log is the one tossing the above. And recently these have started to pop up in the log: W1202 14:18:49.380194 1 reflector.go:224] /tmp/gopath/src/k8s.io/heapster/sources/pods.go:173: watch of *api.Pod ended with: 401: The event in requested index is outdated and cleared (the requested history has been cleared [3468623/3468145]) [3469622] 1000 W1202 15:18:50.433236 1 reflector.go:224] /tmp/gopath/src/k8s.io/heapster/sources/pods.go:173: watch of *api.Pod ended with: 401: The event in requested index is outdated and cleared (the requested history has been cleared [3470101/3469623]) [3471100]
Hmm, this may be part of the root cause: "http: TLS handshake error from 10.1.0.1:57300: tls: no cipher suite supported by both client and server" The problem is that its really strange that it can only receive a very specific type of metric. I would have expected all metrics or nothing. Its also really strange that I have not heard anything similar to this from anyone else, which is why I am suspecting something slightly different with the setup or install. But I don't see anything special in what you are doing.
(In reply to Matt Wringe from comment #9) > Hmm, this may be part of the root cause: > > "http: TLS handshake error from 10.1.0.1:57300: tls: no cipher suite > supported by both client and server" > > The problem is that its really strange that it can only receive a very > specific type of metric. I would have expected all metrics or nothing. > > Its also really strange that I have not heard anything similar to this from > anyone else, which is why I am suspecting something slightly different with > the setup or install. But I don't see anything special in what you are doing. Not sure, original install was 3.0.2 from whatever state the ansible-playbooks github repo was. Then the upgrade to 3.1 was done with the included playbooks. Also, only RAM shows up for the resource limited projecets, no CPU, and obviously still nothing on all other projects.
Can you please post the full logs from the heapster container somewhere?
(In reply to Matt Wringe from comment #11) > Can you please post the full logs from the heapster container somewhere? https://gist.github.com/ebalsumgo/83f826ef2d9bfff00e8f
From the log: "2015/11/30 08:08:08 http: TLS handshake error from 10.1.2.1:39503: tls: first record does not look like a TLS handshake 2015/11/30 08:08:08 http: TLS handshake error from 10.1.2.1:42048: tls: unsupported SSLv2 handshake received" Can you verify is 10.1.2.1 is the ip address for one of your nodes? Did you do anything special to setup your nodes certificate? Or are you using the default generated ones?
(In reply to Matt Wringe from comment #13) > From the log: > > "2015/11/30 08:08:08 http: TLS handshake error from 10.1.2.1:39503: tls: > first record does not look like a TLS handshake > 2015/11/30 08:08:08 http: TLS handshake error from 10.1.2.1:42048: tls: > unsupported SSLv2 handshake received" > > Can you verify is 10.1.2.1 is the ip address for one of your nodes? > > Did you do anything special to setup your nodes certificate? Or are you > using the default generated ones? root@osmaster0s:~: ----> oc get hostsubnets NAME HOST HOST IP SUBNET osmaster0s.devapps.unc.edu osmaster0s.devapps.unc.edu 152.19.229.206 10.1.0.0/24 osnode0s.devapps.unc.edu osnode0s.devapps.unc.edu 152.19.229.207 10.1.2.0/24 osnode1s.devapps.unc.edu osnode1s.devapps.unc.edu 152.19.229.208 10.1.1.0/24 osnode2s.devapps.unc.edu osnode2s.devapps.unc.edu 152.19.229.209 10.1.3.0/24 Looks like that IP should live on one of my nodes, yes.
For the record, here is the debug output from heapster: https://gist.github.com/ebalsumgo/d7e199abfdf2947b148d
(In reply to Boris Kurktchiev from comment #15) > For the record, here is the debug output from heapster: > https://gist.github.com/ebalsumgo/d7e199abfdf2947b148d Got it working by following these instructions to replace my 3.0.2 certs: On each node with cert IP errors ================================ 1. Determine what subject alt names are already in place for the node's serving certificate: openssl x509 -in /etc/origin/node/server.crt -text -noout | grep -A "Subject Alternative Name" If the output shows: X509v3 Subject Alternative Name: DNS:mynode, DNS:mynode.mydomain.com, IP: 1.2.3.4 then your subject alt names are: mynode mynode.mydomain.com 1.2.3.4 2. Determine the IP address the node will register, listed in /etc/origin/node/node-config.yaml as the "nodeIP" key. For example: nodeIP: "10.10.10.1" This should match the IP in the log error about the node certificate. If the IP address is not listed as a subject alt name in the node certificate, it needs to be added. On the master ============= 1. Make a tmp dir and run this: signing_opts="--signer-cert=/etc/origin/master/ca.crt --signer-key=/etc/origin/master/ca.key --signer-serial=/etc/origin/master/ca.serial.txt" 2. For each node, run: oadm ca create-server-cert --cert=$nodename/server.crt --key=$nodename/server.key --hostnames=<existing subject alt names>,<new node IP> $signing_opts For example: oadm ca create-server-cert --cert=mynode/server.crt --key=mynode/server.key --hostnames=mynode,mynode.mydomain.com,1.2.3.4,10.10.10.1 $signing_opts Replace node serving certs ========================== 1. back up the existing /etc/origin/node/server.{crt,key} files on each node 2. copy the generated $nodename/server.{crt,key} files to each node under /etc/origin/node/ 3. restart the node service
Since it appears that this was just an issue with certificates, I will be closing this issue. If you run into a similar issue again, please let us know.