Description of problem: Hi when I try to set up the metrics collection with my own certificate the hawkular-metrics pod stays with a state of 'Pending' with no discernable error in the logs. How reproducible: Each time I create it. Steps to Reproduce: 1. Set up metrics collection with custom certificate: - oc secrets new metrics-deployer hawkular-metrics.pem=/etc/pki/hawkular-metrics.paas.dev.iggroup.local.pem hawkular-metrics-ca.cert=/etc/pki/ig-private-ca.crt oc process -f metrics-deployer.yaml -v HAWKULAR_METRICS_HOSTNAME=hawkular-metrics.paas.dev.iggroup.local,USE_PERSISTENT_STORAGE=false,REDEPLOY=true | oc create -f - Actual results: NAME READY STATUS RESTARTS AGE hawkular-cassandra-1-jthxm 1/1 Running 0 1h hawkular-metrics-e25n6 0/1 Pending 0 10m heapster-xp7o3 0/1 CrashLoopBackOff 17 1h oc describe pod hawkular-metrics-e25n6 Name: hawkular-metrics-e25n6 Namespace: openshift-infra Image(s): registry.access.redhat.com/openshift3/metrics-hawkular-metrics:3.1.0 Node: vrdevosnode003.iggroup.local/ Start Time: Thu, 24 Dec 2015 11:10:17 +0000 Labels: metrics-infra=hawkular-metrics,name=hawkular-metrics Status: Pending Reason: Message: IP: Replication Controllers: hawkular-metrics (1/1 replicas created) Containers: hawkular-metrics: Container ID: Image: registry.access.redhat.com/openshift3/metrics-hawkular-metrics:3.1.0 Image ID: QoS Tier: cpu: BestEffort memory: BestEffort State: Waiting Ready: False Restart Count: 0 Environment Variables: POD_NAMESPACE: openshift-infra (v1:metadata.namespace) Volumes: hawkular-metrics-secrets: Type: Secret (a secret that should populate this volume) SecretName: hawkular-metrics-secrets hawkular-token-18xqi: Type: Secret (a secret that should populate this volume) SecretName: hawkular-token-18xqi Events: FirstSeen LastSeen Count From SubobjectPath Reason Message ───────── ──────── ───── ──── ───────────── ────── ─────── 28m 28m 1 {kubelet vrdevosnode003.iggroup.local} implicitly required container POD Pulled Container image "openshift3/ose-pod:v3.1.0.4" already present on machine 28m 28m 1 {kubelet vrdevosnode003.iggroup.local} implicitly required container POD Created Created with docker id 16ecd4d1bc6e 28m 28m 1 {kubelet vrdevosnode003.iggroup.local} implicitly required container POD Started Started with docker id 16ecd4d1bc6e 28m 28m 1 {kubelet vrdevosnode003.iggroup.local} spec.containers{hawkular-metrics} Pulling pulling image "registry.access.redhat.com/openshift3/metrics-hawkular-metrics:3.1.0" 28m 28m 1 {kubelet vrdevosnode003.iggroup.local} spec.containers{hawkular-metrics} Pulled Successfully pulled image "registry.access.redhat.com/openshift3/metrics-hawkular-metrics:3.1.0" 28m 28m 1 {kubelet vrdevosnode003.iggroup.local} spec.containers{hawkular-metrics} Created Created with docker id 0c34f0335e00 28m 28m 1 {kubelet vrdevosnode003.iggroup.local} spec.containers{hawkular-metrics} Started Started with docker id 0c34f0335e00 28m 28m 1 {scheduler } Scheduled Successfully assigned hawkular-metrics-e25n6 to vrdevosnode003.iggroup.local Looking on the node (vrdevosnode003) CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 0c34f0335e00 registry.access.redhat.com/openshift3/metrics-hawkular-metrics:3.1.0 "/opt/hawkular/script" 30 minutes ago Up 30 minutes k8s_hawkular-metrics.70adfb93_hawkular-metrics-e25n6_openshift-infra_f8cafc5d-aa2e-11e5-9488-0050568f9ceb_88f8596d 16ecd4d1bc6e openshift3/ose-pod:v3.1.0.4 "/pod" 30 minutes ago Up 30 minutes k8s_POD.e73d2a82_hawkular-metrics-e25n6_openshift-infra_f8cafc5d-aa2e-11e5-9488-0050568f9ceb_d44bd9c9 Inspecting the container: - docker inspect 0c34f0335e00 [ { "Id": "0c34f0335e0063de6588ffba85d222c991ff9a4acd746ac3621445ae496b4b5a", "Created": "2015-12-24T11:10:22.560303118Z", "Path": "/opt/hawkular/scripts/hawkular-metrics-wrapper.sh", "Args": [ "-b", "0.0.0.0", "-Dhawkular-metrics.cassandra-nodes=hawkular-cassandra", "-Dhawkular-metrics.cassandra-use-ssl", "-Dorg.apache.tomcat.util.buf.UDecoder.ALLOW_ENCODED_SLASH=true", "-Dorg.apache.catalina.connector.CoyoteAdapter.ALLOW_BACKSLASH=true", "-Dhawkular-metrics.openshift.auth-methods=openshift-oauth,htpasswd", "-Dhawkular-metrics.openshift.htpasswd-file=/secrets/hawkular-metrics.htpasswd.file", "-Dhawkular.metrics.allowed-cors-access-control-allow-headers=authorization", "-Dhawkular.metrics.default-ttl=7", "-DKUBERNETES_MASTER_URL=https://kubernetes.default.svc:443", "--hmw.keystore=/secrets/hawkular-metrics.keystore", "--hmw.truststore=/secrets/hawkular-metrics.truststore", "--hmw.keystore_password_file=/secrets/hawkular-metrics.keystore.password", "--hmw.truststore_password_file=/secrets/hawkular-metrics.truststore.password" ], "State": { "Running": true, "Paused": false, "Restarting": false, "OOMKilled": false, "Dead": false, "Pid": 14783, "ExitCode": 0, "Error": "", "StartedAt": "2015-12-24T11:10:23.479536006Z", "FinishedAt": "0001-01-01T00:00:00Z" }, "Image": "b44dc66d64f234ff1c857c6c0f621cde5b005312266ec972e1f55ec46eccca4c", "NetworkSettings": { "Bridge": "", "EndpointID": "", "Gateway": "", "GlobalIPv6Address": "", "GlobalIPv6PrefixLen": 0, "HairpinMode": false, "IPAddress": "", "IPPrefixLen": 0, "IPv6Gateway": "", "LinkLocalIPv6Address": "", "LinkLocalIPv6PrefixLen": 0, "MacAddress": "", "NetworkID": "", "PortMapping": null, "Ports": null, "SandboxKey": "", "SecondaryIPAddresses": null, "SecondaryIPv6Addresses": null }, "ResolvConfPath": "/var/lib/docker/containers/16ecd4d1bc6e6a23ec23da1d9c678974c622fea9663b7365ed315944f5c9e62f/resolv.conf", "HostnamePath": "/var/lib/docker/containers/16ecd4d1bc6e6a23ec23da1d9c678974c622fea9663b7365ed315944f5c9e62f/hostname", "HostsPath": "/var/lib/origin/openshift.local.volumes/pods/f8cafc5d-aa2e-11e5-9488-0050568f9ceb/etc-hosts", "LogPath": "/var/lib/docker/containers/0c34f0335e0063de6588ffba85d222c991ff9a4acd746ac3621445ae496b4b5a/0c34f0335e0063de6588ffba85d222c991ff9a4acd746ac3621445ae496b4b5a-json.log", "Name": "/k8s_hawkular-metrics.70adfb93_hawkular-metrics-e25n6_openshift-infra_f8cafc5d-aa2e-11e5-9488-0050568f9ceb_88f8596d", "RestartCount": 0, "Driver": "devicemapper", "ExecDriver": "native-0.2", "MountLabel": "system_u:object_r:svirt_sandbox_file_t:s0:c3,c2", "ProcessLabel": "system_u:system_r:svirt_lxc_net_t:s0:c3,c2", "AppArmorProfile": "", "ExecIDs": [ "dd2e8ffd9e4202d7b36f4f1415eb4a661db5f95f34b47013c6701cd8132a9cc3" ], "HostConfig": { "Binds": [ "/var/lib/origin/openshift.local.volumes/pods/f8cafc5d-aa2e-11e5-9488-0050568f9ceb/volumes/kubernetes.io~secret/hawkular-metrics-secrets:/secrets:Z", "/var/lib/origin/openshift.local.volumes/pods/f8cafc5d-aa2e-11e5-9488-0050568f9ceb/volumes/kubernetes.io~secret/hawkular-token-18xqi:/var/run/secrets/kubernetes.io/serviceaccount:ro,Z", "/var/lib/origin/openshift.local.volumes/pods/f8cafc5d-aa2e-11e5-9488-0050568f9ceb/etc-hosts:/etc/hosts", "/var/lib/origin/openshift.local.volumes/pods/f8cafc5d-aa2e-11e5-9488-0050568f9ceb/containers/hawkular-metrics/0c34f0335e0063de6588ffba85d222c991ff9a4acd746ac3621445ae496b4b5a:/dev/termination-log" ], "ContainerIDFile": "", "LxcConf": null, "Memory": 0, "MemorySwap": -1, "CpuShares": 2, "CpuPeriod": 0, "CpusetCpus": "", "CpusetMems": "", "CpuQuota": 0, "BlkioWeight": 0, "OomKillDisable": false, "MemorySwappiness": null, "Privileged": false, "PortBindings": null, "Links": null, "PublishAllPorts": false, "Dns": [ "172.30.0.1", "172.27.25.210", "172.24.25.210" ], "DnsSearch": [ "openshift-infra.svc.cluster.local", "svc.cluster.local", "cluster.local", "test.iggroup.local", "iggroup.local" ], "ExtraHosts": null, "VolumesFrom": null, "Devices": null, "NetworkMode": "container:16ecd4d1bc6e6a23ec23da1d9c678974c622fea9663b7365ed315944f5c9e62f", "IpcMode": "container:16ecd4d1bc6e6a23ec23da1d9c678974c622fea9663b7365ed315944f5c9e62f", "PidMode": "", "UTSMode": "", "CapAdd": null, "CapDrop": null, "GroupAdd": null, "RestartPolicy": { "Name": "", "MaximumRetryCount": 0 }, "SecurityOpt": [ "label:level:s0:c3,c2" ], "ReadonlyRootfs": false, "Ulimits": null, "LogConfig": { "Type": "json-file", "Config": {} }, "CgroupParent": "", "ConsoleSize": [ 0, 0 ] }, "GraphDriver": { "Name": "devicemapper", "Data": { "DeviceId": "223", "DeviceName": "docker-253:1-351846-0c34f0335e0063de6588ffba85d222c991ff9a4acd746ac3621445ae496b4b5a", "DeviceSize": "107374182400" } }, "Mounts": [ { "Source": "/var/lib/origin/openshift.local.volumes/pods/f8cafc5d-aa2e-11e5-9488-0050568f9ceb/volumes/kubernetes.io~secret/hawkular-metrics-secrets", "Destination": "/secrets", "Mode": "Z", "RW": true }, { "Source": "/var/lib/origin/openshift.local.volumes/pods/f8cafc5d-aa2e-11e5-9488-0050568f9ceb/volumes/kubernetes.io~secret/hawkular-token-18xqi", "Destination": "/var/run/secrets/kubernetes.io/serviceaccount", "Mode": "ro,Z", "RW": false }, { "Source": "/var/lib/origin/openshift.local.volumes/pods/f8cafc5d-aa2e-11e5-9488-0050568f9ceb/etc-hosts", "Destination": "/etc/hosts", "Mode": "", "RW": true }, { "Source": "/var/lib/origin/openshift.local.volumes/pods/f8cafc5d-aa2e-11e5-9488-0050568f9ceb/containers/hawkular-metrics/0c34f0335e0063de6588ffba85d222c991ff9a4acd746ac3621445ae496b4b5a", "Destination": "/dev/termination-log", "Mode": "", "RW": true } ], "Config": { "Hostname": "hawkular-metrics-e25n6", "Domainname": "", "User": "1000010000", "AttachStdin": false, "AttachStdout": false, "AttachStderr": false, "ExposedPorts": { "8080/tcp": {}, "8443/tcp": {}, "8444/tcp": {} }, "PublishService": "", "Tty": false, "OpenStdin": false, "StdinOnce": false, "Env": [ "POD_NAMESPACE=openshift-infra", "HAWKULAR_CASSANDRA_PORT_9042_TCP_PROTO=tcp", "HAWKULAR_CASSANDRA_PORT_9160_TCP_ADDR=172.30.221.15", "KUBERNETES_SERVICE_PORT=443", "KUBERNETES_PORT_53_UDP_PORT=53", "HAWKULAR_CASSANDRA_SERVICE_PORT_CQL_PORT=9042", "HAWKULAR_CASSANDRA_PORT_7001_TCP_PORT=7001", "HAWKULAR_CASSANDRA_SERVICE_PORT_SSL_PORT=7001", "HAWKULAR_CASSANDRA_PORT_7001_TCP=tcp://172.30.221.15:7001", "HAWKULAR_METRICS_PORT_443_TCP=tcp://172.30.144.231:443", "HAWKULAR_METRICS_PORT_443_TCP_PROTO=tcp", "KUBERNETES_PORT_443_TCP_PROTO=tcp", "KUBERNETES_PORT_443_TCP_ADDR=172.30.0.1", "KUBERNETES_PORT_53_TCP_PORT=53", "HEAPSTER_PORT_80_TCP_ADDR=172.30.175.249", "KUBERNETES_PORT_53_UDP=udp://172.30.0.1:53", "KUBERNETES_PORT_53_TCP_ADDR=172.30.0.1", "KUBERNETES_SERVICE_PORT_DNS=53", "HAWKULAR_CASSANDRA_PORT_7000_TCP_PORT=7000", "HAWKULAR_METRICS_PORT_443_TCP_PORT=443", "KUBERNETES_SERVICE_PORT_DNS_TCP=53", "KUBERNETES_PORT_53_UDP_ADDR=172.30.0.1", "KUBERNETES_PORT_53_TCP=tcp://172.30.0.1:53", "HEAPSTER_PORT_80_TCP_PORT=80", "KUBERNETES_PORT=tcp://172.30.0.1:443", "HAWKULAR_CASSANDRA_PORT_9160_TCP_PORT=9160", "HAWKULAR_CASSANDRA_PORT_7000_TCP_PROTO=tcp", "HAWKULAR_CASSANDRA_PORT_7001_TCP_PROTO=tcp", "HEAPSTER_PORT_80_TCP=tcp://172.30.175.249:80", "HAWKULAR_CASSANDRA_SERVICE_PORT_THIFT_PORT=9160", "HAWKULAR_CASSANDRA_PORT_9160_TCP=tcp://172.30.221.15:9160", "HAWKULAR_METRICS_PORT_443_TCP_ADDR=172.30.144.231", "HAWKULAR_CASSANDRA_PORT_9042_TCP=tcp://172.30.221.15:9042", "KUBERNETES_SERVICE_HOST=172.30.0.1", "HAWKULAR_CASSANDRA_SERVICE_PORT=9042", "HAWKULAR_CASSANDRA_PORT_9042_TCP_ADDR=172.30.221.15", "HAWKULAR_CASSANDRA_PORT_7000_TCP=tcp://172.30.221.15:7000", "KUBERNETES_PORT_53_TCP_PROTO=tcp", "HEAPSTER_SERVICE_PORT=80", "HAWKULAR_METRICS_SERVICE_PORT_HTTPS_ENDPOINT=443", "HEAPSTER_PORT=tcp://172.30.175.249:80", "HEAPSTER_PORT_80_TCP_PROTO=tcp", "KUBERNETES_SERVICE_PORT_HTTPS=443", "KUBERNETES_PORT_443_TCP=tcp://172.30.0.1:443", "HAWKULAR_CASSANDRA_PORT_9042_TCP_PORT=9042", "KUBERNETES_PORT_443_TCP_PORT=443", "KUBERNETES_PORT_53_UDP_PROTO=udp", "HAWKULAR_CASSANDRA_SERVICE_PORT_TCP_PORT=7000", "HAWKULAR_CASSANDRA_PORT_9160_TCP_PROTO=tcp", "HAWKULAR_CASSANDRA_PORT_7000_TCP_ADDR=172.30.221.15", "HAWKULAR_CASSANDRA_PORT_7001_TCP_ADDR=172.30.221.15", "HAWKULAR_METRICS_SERVICE_HOST=172.30.144.231", "HAWKULAR_METRICS_PORT=tcp://172.30.144.231:443", "HEAPSTER_SERVICE_HOST=172.30.175.249", "HAWKULAR_CASSANDRA_SERVICE_HOST=172.30.221.15", "HAWKULAR_CASSANDRA_PORT=tcp://172.30.221.15:9042", "HAWKULAR_METRICS_SERVICE_PORT=443", "container=docker", "PATH=/usr/local/bin:/usr/bin:/bin:/usr/local/sbin:/usr/sbin", "HOME=/home/jboss", "JAVA_HOME=/usr/lib/jvm/java-1.8.0", "JAVA_VENDOR=openjdk", "JAVA_VERSION=1.8.0", "LAUNCH_JBOSS_IN_BACKGROUND=true", "JBOSS_PRODUCT=eap", "JBOSS_EAP_VERSION=6.4.3.GA", "JBOSS_HOME=/opt/eap", "JBOSS_MODULES_SYSTEM_PKGS=org.jboss.logmanager", "JBOSS_IMAGE_NAME=jboss-eap-6/eap-openshift", "JBOSS_IMAGE_VERSION=6.4", "JBOSS_IMAGE_RELEASE=315", "STI_BUILDER=jee", "HAWKULAR_METRICS_ENDPOINT_PORT=8080", "HAWKULAR_METRICS_VERSION=0.8.0.Final", "HAWKULAR_METRICS_DIRECTORY=/opt/hawkular", "HAWKULAR_METRICS_SCRIPT_DIRECTORY=/opt/hawkular/scripts/" ], "Cmd": null, "Image": "registry.access.redhat.com/openshift3/metrics-hawkular-metrics:3.1.0", "Volumes": null, "VolumeDriver": "", "WorkingDir": "/home/jboss", "Entrypoint": [ "/opt/hawkular/scripts/hawkular-metrics-wrapper.sh", "-b", "0.0.0.0", "-Dhawkular-metrics.cassandra-nodes=hawkular-cassandra", "-Dhawkular-metrics.cassandra-use-ssl", "-Dorg.apache.tomcat.util.buf.UDecoder.ALLOW_ENCODED_SLASH=true", "-Dorg.apache.catalina.connector.CoyoteAdapter.ALLOW_BACKSLASH=true", "-Dhawkular-metrics.openshift.auth-methods=openshift-oauth,htpasswd", "-Dhawkular-metrics.openshift.htpasswd-file=/secrets/hawkular-metrics.htpasswd.file", "-Dhawkular.metrics.allowed-cors-access-control-allow-headers=authorization", "-Dhawkular.metrics.default-ttl=7", "-DKUBERNETES_MASTER_URL=https://kubernetes.default.svc:443", "--hmw.keystore=/secrets/hawkular-metrics.keystore", "--hmw.truststore=/secrets/hawkular-metrics.truststore", "--hmw.keystore_password_file=/secrets/hawkular-metrics.keystore.password", "--hmw.truststore_password_file=/secrets/hawkular-metrics.truststore.password" ], "NetworkDisabled": false, "MacAddress": "", "OnBuild": null, "Labels": { "io.kubernetes.pod.name": "openshift-infra/hawkular-metrics-e25n6", "io.kubernetes.pod.terminationGracePeriod": "30" } } } ] Looking at container logs: - docker logs 0c34f0335e00 /opt/hawkular/auth ~ Certificate was added to keystore [Storing hawkular-metrics.truststore] ~ ========================================================================= JBoss Bootstrap Environment JBOSS_HOME: /opt/eap JAVA: /usr/lib/jvm/java-1.8.0/bin/java JAVA_OPTS: -server -XX:+UseCompressedOops -verbose:gc -Xloggc:"/opt/eap/standalone/log/gc.log" -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=5 -XX:GCLogFileSize=3M -XX:-TraceClassUnloading -Xms1303m -Xmx1303m -XX:MaxPermSize=256m -Djava.net.preferIPv4Stack=true -Djboss.modules.system.pkgs=org.jboss.logmanager -Djava.awt.headless=true -Djboss.modules.policy-permissions=true -Xbootclasspath/p:/opt/eap/jboss-modules.jar:/opt/eap/modules/system/layers/base/org/jboss/logmanager/main/jboss-logmanager-1.5.4.Final-redhat-1.jar:/opt/eap/modules/system/layers/base/org/jboss/logmanager/ext/main/javax.json-1.0.4.jar:/opt/eap/modules/system/layers/base/org/jboss/logmanager/ext/main/jboss-logmanager-ext-1.0.0.Alpha2-redhat-1.jar -Djava.util.logging.manager=org.jboss.logmanager.LogManager -javaagent:/opt/eap/jolokia.jar=port=8778,host=127.0.0.1,discoveryEnabled=false ========================================================================= OpenJDK 64-Bit Server VM warning: ignoring option MaxPermSize=256m; support was removed in 8.0 I> No access restrictor found, access to all MBean is allowed Jolokia: Agent started with URL http://127.0.0.1:8778/jolokia/ 06:10:25,201 INFO [org.jboss.modules] (main) JBoss Modules version 1.3.7.Final-redhat-1 06:10:25,597 INFO [org.jboss.msc] (main) JBoss MSC version 1.1.5.Final-redhat-1 06:10:25,687 INFO [org.jboss.as] (MSC service thread 1-2) JBAS015899: JBoss EAP 6.4.3.GA (AS 7.5.3.Final-redhat-2) starting 06:10:25,693 DEBUG [org.jboss.as.config] (MSC service thread 1-2) Configured system properties: KUBERNETES_MASTER_URL = https://kubernetes.default.svc:443 [Standalone] = awt.toolkit = sun.awt.X11.XToolkit file.encoding = ANSI_X3.4-1968 file.encoding.pkg = sun.io file.separator = / hawkular-metrics.cassandra-nodes = hawkular-cassandra hawkular-metrics.cassandra-use-ssl = true hawkular-metrics.openshift.auth-methods = openshift-oauth,htpasswd hawkular-metrics.openshift.htpasswd-file = /secrets/hawkular-metrics.htpasswd.file hawkular.metrics.allowed-cors-access-control-allow-headers = authorization hawkular.metrics.default-ttl = 7 java.awt.graphicsenv = sun.awt.X11GraphicsEnvironment java.awt.headless = true java.awt.printerjob = sun.print.PSPrinterJob java.class.path = /opt/eap/jboss-modules.jar:/opt/eap/jolokia.jar java.class.version = 52.0 java.endorsed.dirs = /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.51-1.b16.el7_1.x86_64/jre/lib/endorsed java.ext.dirs = /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.51-1.b16.el7_1.x86_64/jre/lib/ext:/usr/java/packages/lib/ext java.home = /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.51-1.b16.el7_1.x86_64/jre java.io.tmpdir = /tmp java.library.path = /usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib java.net.preferIPv4Stack = true java.runtime.name = OpenJDK Runtime Environment java.runtime.version = 1.8.0_51-b16 java.specification.name = Java Platform API Specification java.specification.vendor = Oracle Corporation java.specification.version = 1.8 java.util.logging.manager = org.jboss.logmanager.LogManager java.vendor = Oracle Corporation java.vendor.url = http://java.oracle.com/ java.vendor.url.bug = http://bugreport.sun.com/bugreport/ java.version = 1.8.0_51 java.vm.info = mixed mode java.vm.name = OpenJDK 64-Bit Server VM java.vm.specification.name = Java Virtual Machine Specification java.vm.specification.vendor = Oracle Corporation java.vm.specification.version = 1.8 java.vm.vendor = Oracle Corporation java.vm.version = 25.51-b03 javax.management.builder.initial = org.jboss.as.jmx.PluggableMBeanServerBuilder javax.net.ssl.keyStore = /opt/hawkular/auth/hawkular-metrics.keystore javax.net.ssl.keyStorePassword = <redacted> javax.net.ssl.trustStore = /opt/hawkular/auth/hawkular-metrics.truststore javax.net.ssl.trustStorePassword = <redacted> javax.xml.datatype.DatatypeFactory = __redirected.__DatatypeFactory javax.xml.parsers.DocumentBuilderFactory = __redirected.__DocumentBuilderFactory javax.xml.parsers.SAXParserFactory = __redirected.__SAXParserFactory javax.xml.stream.XMLEventFactory = __redirected.__XMLEventFactory javax.xml.stream.XMLInputFactory = __redirected.__XMLInputFactory javax.xml.stream.XMLOutputFactory = __redirected.__XMLOutputFactory javax.xml.transform.TransformerFactory = __redirected.__TransformerFactory javax.xml.validation.SchemaFactory:http://www.w3.org/2001/XMLSchema = __redirected.__SchemaFactory javax.xml.xpath.XPathFactory:http://java.sun.com/jaxp/xpath/dom = __redirected.__XPathFactory jboss.bind.address = 0.0.0.0 jboss.home.dir = /opt/eap jboss.host.name = hawkular-metrics-e25n6 jboss.modules.dir = /opt/eap/modules jboss.modules.policy-permissions = true jboss.modules.system.pkgs = org.jboss.logmanager jboss.node.name = hawkular-metrics-e25n6 jboss.qualified.host.name = hawkular-metrics-e25n6 jboss.server.base.dir = /opt/eap/standalone jboss.server.config.dir = /opt/eap/standalone/configuration jboss.server.data.dir = /opt/eap/standalone/data jboss.server.deploy.dir = /opt/eap/standalone/data/content jboss.server.log.dir = /opt/eap/standalone/log jboss.server.name = hawkular-metrics-e25n6 jboss.server.persist.config = true jboss.server.temp.dir = /opt/eap/standalone/tmp jolokia.agent = http://127.0.0.1:8778/jolokia/ line.separator = logging.configuration = file:/opt/eap/standalone/configuration/logging.properties module.path = /opt/eap/modules org.apache.catalina.connector.CoyoteAdapter.ALLOW_BACKSLASH = true org.apache.tomcat.util.buf.UDecoder.ALLOW_ENCODED_SLASH = true org.jboss.boot.log.file = /opt/eap/standalone/log/server.log org.jboss.resolver.warning = true org.xml.sax.driver = __redirected.__XMLReaderFactory os.arch = amd64 os.name = Linux os.version = 3.10.0-327.el7.x86_64 path.separator = : sun.arch.data.model = 64 sun.boot.class.path = /opt/eap/jboss-modules.jar:/opt/eap/modules/system/layers/base/org/jboss/logmanager/main/jboss-logmanager-1.5.4.Final-redhat-1.jar:/opt/eap/modules/system/layers/base/org/jboss/logmanager/ext/main/javax.json-1.0.4.jar:/opt/eap/modules/system/layers/base/org/jboss/logmanager/ext/main/jboss-logmanager-ext-1.0.0.Alpha2-redhat-1.jar:/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.51-1.b16.el7_1.x86_64/jre/lib/resources.jar:/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.51-1.b16.el7_1.x86_64/jre/lib/rt.jar:/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.51-1.b16.el7_1.x86_64/jre/lib/sunrsasign.jar:/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.51-1.b16.el7_1.x86_64/jre/lib/jsse.jar:/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.51-1.b16.el7_1.x86_64/jre/lib/jce.jar:/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.51-1.b16.el7_1.x86_64/jre/lib/charsets.jar:/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.51-1.b16.el7_1.x86_64/jre/lib/jfr.jar:/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.51-1.b16.el7_1.x86_64/jre/classes sun.boot.library.path = /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.51-1.b16.el7_1.x86_64/jre/lib/amd64 sun.cpu.endian = little sun.cpu.isalist = sun.io.unicode.encoding = UnicodeLittle sun.java.command = /opt/eap/jboss-modules.jar -mp /opt/eap/modules -jaxpmodule javax.xml.jaxp-provider org.jboss.as.standalone -Djboss.home.dir=/opt/eap -Djboss.server.base.dir=/opt/eap/standalone -Djavax.net.ssl.keyStore=/opt/hawkular/auth/hawkular-metrics.keystore -Djavax.net.ssl.keyStorePassword=etOUcDIVRDJTKlB -Djavax.net.ssl.trustStore=/opt/hawkular/auth/hawkular-metrics.truststore -Djavax.net.ssl.trustStorePassword=ecyzAhganunw8ue -b 0.0.0.0 -Dhawkular-metrics.cassandra-nodes=hawkular-cassandra -Dhawkular-metrics.cassandra-use-ssl -Dorg.apache.tomcat.util.buf.UDecoder.ALLOW_ENCODED_SLASH=true -Dorg.apache.catalina.connector.CoyoteAdapter.ALLOW_BACKSLASH=true -Dhawkular-metrics.openshift.auth-methods=openshift-oauth,htpasswd -Dhawkular-metrics.openshift.htpasswd-file=/secrets/hawkular-metrics.htpasswd.file -Dhawkular.metrics.allowed-cors-access-control-allow-headers=authorization -Dhawkular.metrics.default-ttl=7 -DKUBERNETES_MASTER_URL=https://kubernetes.default.svc:443 sun.java.launcher = SUN_STANDARD sun.jnu.encoding = ANSI_X3.4-1968 sun.management.compiler = HotSpot 64-Bit Tiered Compilers sun.os.patch.level = unknown user.country = US user.dir = /home/jboss user.home = ? user.language = en user.name = ? user.timezone = America/New_York 06:10:25,693 DEBUG [org.jboss.as.config] (MSC service thread 1-2) VM Arguments: -D[Standalone] -XX:+UseCompressedOops -verbose:gc -Xloggc:/opt/eap/standalone/log/gc.log -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=5 -XX:GCLogFileSize=3M -XX:-TraceClassUnloading -Xms1303m -Xmx1303m -XX:MaxPermSize=256m -Djava.net.preferIPv4Stack=true -Djboss.modules.system.pkgs=org.jboss.logmanager -Djava.awt.headless=true -Djboss.modules.policy-permissions=true -Xbootclasspath/p:/opt/eap/jboss-modules.jar:/opt/eap/modules/system/layers/base/org/jboss/logmanager/main/jboss-logmanager-1.5.4.Final-redhat-1.jar:/opt/eap/modules/system/layers/base/org/jboss/logmanager/ext/main/javax.json-1.0.4.jar:/opt/eap/modules/system/layers/base/org/jboss/logmanager/ext/main/jboss-logmanager-ext-1.0.0.Alpha2-redhat-1.jar -Djava.util.logging.manager=org.jboss.logmanager.LogManager -javaagent:/opt/eap/jolokia.jar=port=8778,host=127.0.0.1,discoveryEnabled=false -Dorg.jboss.boot.log.file=/opt/eap/standalone/log/server.log -Dlogging.configuration=file:/opt/eap/standalone/configuration/logging.properties 06:10:27,294 INFO [org.xnio] (MSC service thread 1-4) XNIO Version 3.0.14.GA-redhat-1 06:10:27,310 INFO [org.jboss.as.server] (Controller Boot Thread) JBAS015888: Creating http management service using socket-binding (management-http) 06:10:27,317 INFO [org.xnio.nio] (MSC service thread 1-4) XNIO NIO Implementation Version 3.0.14.GA-redhat-1 06:10:27,342 INFO [org.jboss.remoting] (MSC service thread 1-4) JBoss Remoting version 3.3.5.Final-redhat-1 Expected results: The pod should enter a running or failed state. Additional info: I have tried restarting all nodes and masters but this pod is still stuck in pending. Other pods on the system appear to be working as expected (e.g. hello-openshift). Can you help me debug why this pod never makes it out of pending? regards Dave
I notice that in the describe it is missing both the container ID and an IP.
I ran into the same issue but with using the default certs. I was able to resolve the issue with a work around, I am not sure the cause of this error but I have been able to reproduce it 100% when following our documentation with a HA environment and none HA environment. https://docs.openshift.com/enterprise/3.1/install_config/cluster_metrics.html I was able to get the container to kick off and run by creating the hawkular-metrics pod myself using the template that was created by the deployer. Steps (run the following): # oc delete rc hawkular-metrics # oc delete service hawkular-metrics # oc get templates # oc process hawkular-metrics -v "IMAGE_PREFIX=registry.access.redhat.com/openshift3/,IMAGE_VERSION=3.1.0,METRIC_DURATION=7,MASTER_URL=https://<#MASTER API HOSTNAME#>:8443" | oc create -f - After running this the hawkular-metrics pod came up successfully. I was also having issues with the heapster pod after too and was able to get everything running following the same steps above using the heapster template created. # oc delete rc heapster # oc delete service heapster # oc get template # oc process hawkular-heapster -v "IMAGE_PREFIX=registry.access.redhat.com/openshift3/,IMAGE_VERSION=3.1.0,MASTER_URL=https://<#MASTER API HOSTNAME#>:8443" | oc create -f
Hi Thanks for the updates - apologies that my response has been slow due to the holiday break. I'm still concerned by the lack of appropriate response from the system to the failure of this pod - it really should come up or report failure - interestingly the pod also sits forever in the 'terminating' when I try to delete it. The pod launching process needs to be extremely robust - this doesn't feel that way and I worry about what will happen when we have many users launching their own pods. Could it be because of the liveness probe? Trying the workaround... oc process hawkular-metrics -v "IMAGE_PREFIX=registry.access.redhat.com/openshift3/,IMAGE_VERSION=3.1.0,METRIC_DURATION=7,MASTER_URL=https://osemaster.dev.iggroup.local:443" | oc create -f { "kind": "List", "apiVersion": "v1", "metadata": {}, "items": [ { "apiVersion": "v1", "kind": "Service", "metadata": { "labels": { "metrics-infra": "hawkular-metrics", "name": "hawkular-metrics" }, "name": "hawkular-metrics" }, "spec": { "ports": [ { "name": "https-endpoint", "port": 443, "targetPort": "https-endpoint" } ], "selector": { "name": "hawkular-metrics" } } }, { "apiVersion": "v1", "kind": "ReplicationController", "metadata": { "labels": { "metrics-infra": "hawkular-metrics", "name": "hawkular-metrics" }, "name": "hawkular-metrics" }, "spec": { "replicas": 1, "selector": { "name": "hawkular-metrics" }, "template": { "metadata": { "labels": { "metrics-infra": "hawkular-metrics", "name": "hawkular-metrics" } }, "spec": { "containers": [ { "command": [ "/opt/hawkular/scripts/hawkular-metrics-wrapper.sh", "-b", "0.0.0.0", "-Dhawkular-metrics.cassandra-nodes=hawkular-cassandra", "-Dhawkular-metrics.cassandra-use-ssl", "-Dorg.apache.tomcat.util.buf.UDecoder.ALLOW_ENCODED_SLASH=true", "-Dorg.apache.catalina.connector.CoyoteAdapter.ALLOW_BACKSLASH=true", "-Dhawkular-metrics.openshift.auth-methods=openshift-oauth,htpasswd", "-Dhawkular-metrics.openshift.htpasswd-file=/secrets/hawkular-metrics.htpasswd.file", "-Dhawkular.metrics.allowed-cors-access-control-allow-headers=authorization", "-Dhawkular.metrics.default-ttl=7", "-DKUBERNETES_MASTER_URL=https://osemaster.dev.iggroup.local:443", "--hmw.keystore=/secrets/hawkular-metrics.keystore", "--hmw.truststore=/secrets/hawkular-metrics.truststore", "--hmw.keystore_password_file=/secrets/hawkular-metrics.keystore.password", "--hmw.truststore_password_file=/secrets/hawkular-metrics.truststore.password" ], "env": [ { "name": "POD_NAMESPACE", "valueFrom": { "fieldRef": { "fieldPath": "metadata.namespace" } } } ], "image": "registry.access.redhat.com/openshift3/metrics-hawkular-metrics:3.1.0", "lifecycle": { "postStart": { "exec": { "command": [ "/opt/hawkular/scripts/hawkular-metrics-poststart.py" ] } } }, "livenessProbe": { "exec": { "command": [ "/opt/hawkular/scripts/hawkular-metrics-liveness.py" ] } }, "name": "hawkular-metrics", "ports": [ { "containerPort": 8080, "name": "http-endpoint" }, { "containerPort": 8444, "name": "https-endpoint" } ], "volumeMounts": [ { "mountPath": "/secrets", "name": "hawkular-metrics-secrets" } ] } ], "serviceAccount": "hawkular", "volumes": [ { "name": "hawkular-metrics-secrets", "secret": { "secretName": "hawkular-metrics-secrets" } } ] }, "version": "v1" } } } ] } oc describe pod hawkular-metrics-k21v1 Name: hawkular-metrics-k21v1 Namespace: openshift-infra Image(s): registry.access.redhat.com/openshift3/metrics-hawkular-metrics:3.1.0 Node: vrdevosnode002.iggroup.local/ Start Time: Mon, 04 Jan 2016 17:53:06 +0000 Labels: metrics-infra=hawkular-metrics,name=hawkular-metrics Status: Pending Reason: Message: IP: Replication Controllers: hawkular-metrics (1/1 replicas created) Containers: hawkular-metrics: Container ID: Image: registry.access.redhat.com/openshift3/metrics-hawkular-metrics:3.1.0 Image ID: QoS Tier: cpu: BestEffort memory: BestEffort State: Waiting Ready: False Restart Count: 0 Environment Variables: POD_NAMESPACE: openshift-infra (v1:metadata.namespace) Volumes: hawkular-metrics-secrets: Type: Secret (a secret that should populate this volume) SecretName: hawkular-metrics-secrets hawkular-token-18xqi: Type: Secret (a secret that should populate this volume) SecretName: hawkular-token-18xqi Events: FirstSeen LastSeen Count From SubobjectPath Reason Message ───────── ──────── ───── ──── ───────────── ────── ─────── 39s 39s 1 {scheduler } Scheduled Successfully assigned hawkular-metrics-k21v1 to vrdevosnode002.iggroup.local 19s 19s 1 {kubelet vrdevosnode002.iggroup.local} implicitly required container POD Pulled Container image "openshift3/ose-pod:v3.1.0.4" already present on machine 18s 18s 1 {kubelet vrdevosnode002.iggroup.local} implicitly required container POD Created Created with docker id 3221a79e6e43 18s 18s 1 {kubelet vrdevosnode002.iggroup.local} implicitly required container POD Started Started with docker id 3221a79e6e43 17s 17s 1 {kubelet vrdevosnode002.iggroup.local} spec.containers{hawkular-metrics} Pulled Container image "registry.access.redhat.com/openshift3/metrics-hawkular-metrics:3.1.0" already present on machine 17s 17s 1 {kubelet vrdevosnode002.iggroup.local} spec.containers{hawkular-metrics} Created Created with docker id c20167e0a238 16s 16s 1 {kubelet vrdevosnode002.iggroup.local} spec.containers{hawkular-metrics} Started Started with docker id c20167e0a238 Still looks to be stuck in 'pending' (note the old instances which have still not be terminated and cleaned up - which just feels wrong/broken)... oc get pods NAME READY STATUS RESTARTS AGE hawkular-cassandra-1-jthxm 1/1 Running 0 11d hawkular-metrics-e25n6 0/1 Terminating 0 11d hawkular-metrics-j0ub6 0/1 Terminating 0 11d hawkular-metrics-k21v1 0/1 Pending 0 4m heapster-xp7o3 0/1 CrashLoopBackOff 3159 11d is there something fundamental going wrong here? regards Dave
The problem would seem to be with the lifecycle post start command... "lifecycle": { "postStart": { "exec": { "command": [ "/opt/hawkular/scripts/hawkular-metrics-poststart.py" ] } } }, When this is removed the container reaches the running state although it is restarting like crazy... oc get pods NAME READY STATUS RESTARTS AGE hawkular-cassandra-1-jthxm 1/1 Running 0 12d hawkular-metrics-40q9n 0/1 Terminating 0 10m hawkular-metrics-e25n6 0/1 Terminating 0 11d hawkular-metrics-j0ub6 0/1 Terminating 0 12d hawkular-metrics-k21v1 0/1 Terminating 0 16h hawkular-metrics-sp646 1/1 Running 20 3m heapster-xp7o3 0/1 CrashLoopBackOff 3350 12d NOTE: All the old terminated pods are STILL listed. The liveness check is still there... oc get pod hawkular-metrics-sp646 -o yaml apiVersion: v1 kind: Pod ... livenessProbe: exec: command: - /opt/hawkular/scripts/hawkular-metrics-liveness.py timeoutSeconds: 1 The logs seem to indicate that the process is being killed each time it starts up: - 05:28:30,620 INFO [org.jboss.as.server] (Controller Boot Thread) JBAS015888: Creating http management service using socket-binding (management-http) 05:28:30,627 INFO [org.xnio.nio] (MSC service thread 1-2) XNIO NIO Implementation Version 3.0.14.GA-redhat-1 05:28:30,653 INFO [org.jboss.remoting] (MSC service thread 1-2) JBoss Remoting version 3.3.5.Final-redhat-1 *** JBossAS process (170) received TERM signal *** *** JBossAS process (170) received TERM signal ***
Hi From the Kubernetes Container Environment documentation http://kubernetes.io/v1.1/docs/user-guide/container-environment.html#container-hooks: - PostStart This hook is sent immediately after a container is created. It notifies the container that it has been created. No parameters are passed to the handler. The postStart script for hawkular-metrics postStart script looks designed to run in a tight loop ignoring 404 and 503 status codes until http://localhost:8080/hawkular/metrics/status returns JSON containing MetricsService == "STARTED" (or fails if it returns anything other than "STARTING". The code: - import os import json import urllib2 import time hawkularEndpointPort = os.environ.get("HAWKULAR_METRICS_ENDPOINT_PORT") statusURL = "http://localhost:" + hawkularEndpointPort + "/hawkular/metrics/status" while True: try: response = urllib2.urlopen(statusURL) statusCode = response.getcode(); if (statusCode == 200 or statusCode == 404 or statusCode == 503): if (statusCode == 200): jsonResponse = json.loads(response.read()) if (jsonResponse["MetricsService"] == "STARTED"): exit(0) # If the status is not STARTED or STARTING then exit, something went wrong elif (jsonResponse["MetricsService"] != "STARTING"): exit(1) else: exit(1) except Exception: print "An Exception occured trying to connect to the endpoint." #sleep for 1 second and let the loop try over again time.sleep(1) In the case of the metrics in my issue, the status page is stuck in the "STARTING" state... curl -v -v -v http://localhost:8080/hawkular/metrics/status * About to connect() to localhost port 8080 (#0) * Trying ::1... * Connection refused * Trying 127.0.0.1... * Connected to localhost (127.0.0.1) port 8080 (#0) > GET /hawkular/metrics/status HTTP/1.1 > User-Agent: curl/7.29.0 > Host: localhost:8080 > Accept: */* > < HTTP/1.1 200 OK < Server: Apache-Coyote/1.1 < Content-Type: application/json < Transfer-Encoding: chunked < Date: Tue, 05 Jan 2016 11:10:03 GMT < * Connection #0 to host localhost left intact {"MetricsService":"STARTING","Implementation-Version":"0.8.0.Final-redhat-1","Built-From-Git-SHA1":"826f08dd34912ad455a4cb2b34f2e79cd79ace9a"} This means the the postStart lifecycle hook is then stuck in a loop waiting for the STARTED which never happens and I'm going to guess that it is this which is causing the pod to be stuck in PENDING and not Terminating because it is still waiting for the postStart hook to terminate. The Kubernetes documentation does say "Typically we expect that users will make their hook handlers as light as possible, but there are cases where long running commands make sense." - going into an endless loop doesn't seem that light to me and leads to a really confusing set of issues to debug (i.e. containers running in docker but stuck in PENDING in Kubernetes). I'm a little confused by the use of the postStart hook AND liveness check - the postStart seems to be acting as some sort of liveness check that runs only once. I guess the root cause is that the application gets stuck in STARTING and so the postStart hook never finished - perhaps it would be pertinent to put a 5 minute timeout around the postStart loop and then fail? This would make the hook more robust. That is, of course, dependent on whether the postStart hook is the best/right place to do application liveness? regards Dave
Hi I am unsure why Redhat chose to use a postStart hook script rather than a liveness or readiness probe. The liveness probe looks like it could do with an initial delay value though to give the application time to start up ... livenessProbe: exec: command: - /opt/hawkular/scripts/hawkular-metrics-liveness.py timeoutSeconds: 1 initialDelaySeconds: 120 Now that the strange pending behaviour is understood - what about the actual issue? The issue would now seem to be that when deploying hawkular-metrics with custom certs that the JBOSS application is unable to start up. Can you help me investigate this issue? regards Dave
using the workaround (and finding the hawkular metrics logs) it looks as though the metrics application is failing to connect to cassandra... 07:54:28,337 WARN [org.hawkular.metrics.api.jaxrs.MetricsServiceLifecycle] (metricsservice-lifecycle-thread) HAWKMETRICS200003: Could not connect to Cassandra cluster - assuming its not up yet: All host(s) tried for query failed (tried: hawkular-cassandra/172.30.123.91:9042 (com.datastax.driver.core.TransportException: [hawkular-cassandra/172.30.123.91:9042] Channel has been closed)) 07:54:28,338 WARN [org.hawkular.metrics.api.jaxrs.MetricsServiceLifecycle] (metricsservice-lifecycle-thread) HAWKMETRICS200004: [10] Retrying connecting to Cassandra cluster in [2]s... 07:54:30,338 INFO [org.hawkular.metrics.api.jaxrs.MetricsServiceLifecycle] (metricsservice-lifecycle-thread) HAWKMETRICS200002: Initializing metrics service 07:54:30,369 WARN [io.netty.util.concurrent.DefaultPromise] (cluster11-nio-worker-0) An exception was thrown by com.datastax.driver.core.Connection$9.operationComplete(): java.util.concurrent.RejectedExecutionException: Task com.datastax.driver.core.Connection$9$1@3b6df991 rejected from java.util.concurrent.ThreadPoolExecutor@772ca0e7[Terminated, pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 0] at java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2047) [rt.jar:1.8.0_51] at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:823) [rt.jar:1.8.0_51] at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1369) [rt.jar:1.8.0_51] at com.google.common.util.concurrent.MoreExecutors$ListeningDecorator.execute(MoreExecutors.java:484) [guava-16.0.1.redhat-3.jar:16.0.1.redhat-3] at com.datastax.driver.core.Connection$9.operationComplete(Connection.java:566) [cassandra-driver-core-2.2.0.rc2-redhat-2.jar:2.2.0.rc2-redhat-2] at com.datastax.driver.core.Connection$9.operationComplete(Connection.java:542) [cassandra-driver-core-2.2.0.rc2-redhat-2.jar:2.2.0.rc2-redhat-2] at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:680) [netty-common-4.0.27.Final-redhat-2.jar:4.0.27.Final-redhat-2] at io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:603) [netty-common-4.0.27.Final-redhat-2.jar:4.0.27.Final-redhat-2] at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:563) [netty-common-4.0.27.Final-redhat-2.jar:4.0.27.Final-redhat-2] at io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:424) [netty-common-4.0.27.Final-redhat-2.jar:4.0.27.Final-redhat-2] at io.netty.channel.PendingWriteQueue.safeFail(PendingWriteQueue.java:252) [netty-transport-4.0.27.Final-redhat-2.jar:4.0.27.Final-redhat-2] at io.netty.channel.PendingWriteQueue.removeAndFailAll(PendingWriteQueue.java:112) [netty-transport-4.0.27.Final-redhat-2.jar:4.0.27.Final-redhat-2] at io.netty.handler.ssl.SslHandler.setHandshakeFailure(SslHandler.java:1256) [netty-handler-4.0.27.Final-redhat-2.jar:4.0.27.Final-redhat-2] at io.netty.handler.ssl.SslHandler.channelInactive(SslHandler.java:676) [netty-handler-4.0.27.Final-redhat-2.jar:4.0.27.Final-redhat-2] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:237) [netty-transport-4.0.27.Final-redhat-2.jar:4.0.27.Final-redhat-2] at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:223) [netty-transport-4.0.27.Final-redhat-2.jar:4.0.27.Final-redhat-2] at io.netty.channel.DefaultChannelPipeline.fireChannelInactive(DefaultChannelPipeline.java:829) [netty-transport-4.0.27.Final-redhat-2.jar:4.0.27.Final-redhat-2] at io.netty.channel.AbstractChannel$AbstractUnsafe$7.run(AbstractChannel.java:610) [netty-transport-4.0.27.Final-redhat-2.jar:4.0.27.Final-redhat-2] at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:357) [netty-common-4.0.27.Final-redhat-2.jar:4.0.27.Final-redhat-2] at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357) [netty-transport-4.0.27.Final-redhat-2.jar:4.0.27.Final-redhat-2] at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111) [netty-common-4.0.27.Final-redhat-2.jar:4.0.27.Final-redhat-2] at java.lang.Thread.run(Thread.java:745) [rt.jar:1.8.0_51] 07:54:30,373 WARN [org.hawkular.metrics.api.jaxrs.MetricsServiceLifecycle] (metricsservice-lifecycle-thread) HAWKMETRICS200003: Could not connect to Cassandra cluster - assuming its not up yet: All host(s) tried for query failed (tried: hawkular-cassandra/172.30.123.91:9042 (com.datastax.driver.core.TransportException: [hawkular-cassandra/172.30.123.91:9042] Channel has been closed)) 07:54:30,373 WARN [org.hawkular.metrics.api.jaxrs.MetricsServiceLifecycle] (metricsservice-lifecycle-thread) HAWKMETRICS200004: [11] Retrying connecting to Cassandra cluster in [3]s... 07:54:33,373 INFO [org.hawkular.metrics.api.jaxrs.MetricsServiceLifecycle] (metricsservice-lifecycle-thread) HAWKMETRICS200002: Initializing metrics service 07:54:33,408 WARN [org.hawkular.metrics.api.jaxrs.MetricsServiceLifecycle] (metricsservice-lifecycle-thread) HAWKMETRICS200003: Could not connect to Cassandra cluster - assuming its not up yet: All host(s) tried for query failed (tried: hawkular-cassandra/172.30.123.91:9042 (com.datastax.driver.core.TransportException: [hawkular-cassandra/172.30.123.91:9042] Channel has been closed)) The cassandra service is available... oc get svc NAME CLUSTER_IP EXTERNAL_IP PORT(S) SELECTOR AGE hawkular-cassandra 172.30.123.91 <none> 9042/TCP,9160/TCP,7000/TCP,7001/TCP type=hawkular-cassandra 7m hawkular-cassandra-nodes None <none> 9042/TCP,9160/TCP,7000/TCP,7001/TCP type=hawkular-cassandra 7m hawkular-metrics 172.30.102.231 <none> 443/TCP name=hawkular-metrics 5m It looks as though a certificate has been provisioned on the cassandra service... curl https://172.30.123.91:9042 -v -v -v * About to connect() to 172.30.123.91 port 9042 (#0) * Trying 172.30.123.91... * Connected to 172.30.123.91 (172.30.123.91) port 9042 (#0) * Initializing NSS with certpath: sql:/etc/pki/nssdb * CAfile: /etc/pki/tls/certs/ca-bundle.crt CApath: none * Server certificate: * subject: CN=hawkular-cassandra * start date: Dec 23 18:04:20 2015 GMT * expire date: Dec 22 18:04:21 2017 GMT * common name: hawkular-cassandra * issuer: CN=metrics-signer@1450893859 * NSS error -8172 (SEC_ERROR_UNTRUSTED_ISSUER) * Peer's certificate issuer has been marked as not trusted by the user. * Closing connection 0 Could the issue now be that hawkular-metrics service needs to trust the metrics signer cert and I have replaced the cert with my own (which hasn't signed cassandra)? regards Dave
Hawkular Metrics container uses the postStart hook to wait until it can connect to the Cassandra instance. Unfortunately, OpenShift cannot stop a pending container. This will be updated shortly so that the postStart script will fail after a timeout. It is a bit confusing the way its currently working. postStart hooks check if something is up and running yet, while the livenessProbe checks if something is still running. These are two different situations and is why we have two scripts to handle the situation. If Hawkular Metrics cannot communicate with the Cassandra instance, then it will not function, regardless if the postStart or livenessProbes existing or not. So looking at your logs, it looks like a certificate problem. How exactly did you add your custom certificate for Hawkular Metrics? From https://github.com/openshift/origin-metrics/blob/master/docs/deployer_configuration.adoc#deployer-secrets you will need to specify the hawkular-metrics.pem and (optional, if using self signed certificates) the hawkular-metrics-ca.cert
Hi Thanks for the reply - maybe it should be a readiness check rather than a postStart hook? They seem better suited to keeping a node out of service until it has started up correctly. These are the actions I performed in setting up the metrics: - oc project openshift-infra oc create -f - <<API apiVersion: v1 kind: ServiceAccount metadata: name: metrics-deployer secrets: - name: metrics-deployer API oadm policy add-role-to-user edit system:serviceaccount:openshift-infra:metrics-deployer oadm policy add-cluster-role-to-user cluster-reader system:serviceaccount:openshift-infra:heapster Create the file /etc/pki/hawkular-metrics.paas.dev.iggroup.local.pem: - -----BEGIN PRIVATE KEY----- MIIEvQIBADANBgkqhkiG9w0BAQEFAASCBKcwggSjAgEAAoIBAQCdMWtGH2vyJgBF ... Vwipu6DJ+Mc0czjfE0UGaAPCrDXfBhKec5/lfvmdlq9qdmrXqYSD3pdMUNtHn5q6 Xl/YSEkWx4NCCekMnXxPOFo= -----END PRIVATE KEY----- -----BEGIN CERTIFICATE----- MIIFpjCCBI6gAwIBAgIRAJ1ixZFA4cGV3Lh7xpHMW5MwDQYJKoZIhvcNAQELBQAw ... 22inbqFAm8f1pTFbrN2NjXfFMmFcSa8VQCOCHE6/J760yzB1yQ4gV9Ajtu17cxoE 0jmVsBGZJDFUQA== -----END CERTIFICATE----- Create the ca.crt /etc/pki/ig-private-ca.crt: - -----BEGIN CERTIFICATE----- MIIDrDCCApSgAwIBAgIQKJ0LDUmkrM1sKAeJDIn+dzANBgkqhkiG9w0BAQsFADBw ... CLaB9Z9PC77jivCYwKL9ubEMeBKsWr0fsMFHj76aWBTCRhIXEEv5t84N/z1IELv3 8bHY0kArqvvfmIWu9Y/PO+iwUJm5ouCIQD7uHPAZpLY= -----END CERTIFICATE----- -----BEGIN CERTIFICATE----- MIIEqTCCA5GgAwIBAgIQFRhu0C2GQlB4OgRCKNGwnDANBgkqhkiG9w0BAQsFADBw ... A5gy7PiXtic81oAT1NFHfOdeortsPtBN+sEQfGEoA8bnlk1VazPj6jScwJyE -----END CERTIFICATE----- Create the cert secrets oc secrets new metrics-deployer hawkular-metrics.pem=/etc/pki/hawkular-metrics.paas.dev.iggroup.local.pem hawkular-metrics-ca.cert=/etc/pki/ig-private-ca.crt /root/openshift-ansible-openshift-ansible-3.0.20-1/roles/openshift_examples/files/examples/v1.1/infrastructure-templates/enterprise/metrics-deployer.yaml oc process -f metrics-deployer.yaml -v HAWKULAR_METRICS_HOSTNAME=hawkular-metrics.paas.dev.iggroup.local,USE_PERSISTENT_STORAGE=false,REDEPLOY=true | oc create -f - Change the masters to access it on each master (vrdevosmaster001, 002 and 003) In master-config.yaml: - assetConfig: ... metricsPublicURL: "https://hawkular-metrics.paas.dev.iggroup.local/hawkular/metrics" -------------------------- DONE | As you can see I have created a cert and ca certificate which correctly deploys the cert onto the metrics application - i.e. when I browse to hawkular-metrics.paas.dev.iggroup.local I get my certificate correctly. Setting the cert and CA secrets doesn't seem to have affected the generation of internal certs for Cassandra. Could it be that in adding my certs into metrics application that it is no longer getting the metrics signer cert adding to its truststore? Could my CA be replacing the CA which has been used to generate the Cassandra cert? I'm not an expert in JBOSS - could I look in the truststore somehow in order to check what CA certs are trusted? Have other people deployed successfully with custom certs? regards Dave
Additional Info. The contents of truststore /secrets/hawkular-metrics.truststore: - Keystore type: JKS Keystore provider: SUN Your keystore contains 4 entries Alias name: ca Creation date: Dec 23, 2015 Entry type: trustedCertEntry Owner: CN=metrics-signer@1450893859 Issuer: CN=metrics-signer@1450893859 Serial number: 1 Valid from: Wed Dec 23 13:04:20 EST 2015 until: Mon Dec 21 13:04:21 EST 2020 Certificate fingerprints: MD5: C0:9B:F5:42:4E:D7:E4:C6:39:A1:09:25:A8:A1:A1:11 SHA1: DB:0D:86:3A:5E:A6:93:36:D0:87:54:25:7A:B1:D2:CD:32:0F:52:84 SHA256: 51:FF:8F:CF:2E:82:1D:40:1E:C4:7F:9D:51:76:1B:95:A2:D4:C2:2D:65:7A:84:47:FE:07:AB:CA:D9:FB:2B:86 Signature algorithm name: SHA256withRSA Version: 3 Extensions: #1: ObjectId: 2.5.29.19 Criticality=true BasicConstraints:[ CA:true PathLen:2147483647 ] #2: ObjectId: 2.5.29.15 Criticality=true KeyUsage [ DigitalSignature Key_Encipherment Key_CertSign ] ******************************************* ******************************************* Alias name: hawkular-cassandra Creation date: Dec 23, 2015 Entry type: trustedCertEntry Owner: CN=hawkular-cassandra Issuer: CN=metrics-signer@1450893859 Serial number: 2 Valid from: Wed Dec 23 13:04:20 EST 2015 until: Fri Dec 22 13:04:21 EST 2017 Certificate fingerprints: MD5: 10:36:98:3F:C4:30:5E:B0:FE:36:D9:B2:74:0B:61:21 SHA1: 4C:04:DE:0A:F5:F0:8C:4B:5A:9B:54:DA:E6:8F:19:F5:C5:9A:CE:6A SHA256: E7:67:13:EA:62:5E:58:75:E7:7D:F4:26:83:65:35:69:5B:0A:2E:53:F6:43:40:BF:0A:04:D4:EA:40:A2:0D:A5 Signature algorithm name: SHA256withRSA Version: 3 Extensions: #1: ObjectId: 2.5.29.19 Criticality=true BasicConstraints:[ CA:false PathLen: undefined ] #2: ObjectId: 2.5.29.37 Criticality=false ExtendedKeyUsages [ serverAuth ] #3: ObjectId: 2.5.29.15 Criticality=true KeyUsage [ DigitalSignature Key_Encipherment ] #4: ObjectId: 2.5.29.17 Criticality=false SubjectAlternativeName [ DNSName: hawkular-cassandra ] ******************************************* ******************************************* Alias name: cassandraca Creation date: Dec 23, 2015 Entry type: trustedCertEntry Owner: CN=metrics-signer@1450893859 Issuer: CN=metrics-signer@1450893859 Serial number: 1 Valid from: Wed Dec 23 13:04:20 EST 2015 until: Mon Dec 21 13:04:21 EST 2020 Certificate fingerprints: MD5: C0:9B:F5:42:4E:D7:E4:C6:39:A1:09:25:A8:A1:A1:11 SHA1: DB:0D:86:3A:5E:A6:93:36:D0:87:54:25:7A:B1:D2:CD:32:0F:52:84 SHA256: 51:FF:8F:CF:2E:82:1D:40:1E:C4:7F:9D:51:76:1B:95:A2:D4:C2:2D:65:7A:84:47:FE:07:AB:CA:D9:FB:2B:86 Signature algorithm name: SHA256withRSA Version: 3 Extensions: #1: ObjectId: 2.5.29.19 Criticality=true BasicConstraints:[ CA:true PathLen:2147483647 ] #2: ObjectId: 2.5.29.15 Criticality=true KeyUsage [ DigitalSignature Key_Encipherment Key_CertSign ] ******************************************* ******************************************* Alias name: metricsca Creation date: Dec 23, 2015 Entry type: trustedCertEntry Owner: CN=IG Index Private CA, O=IG Index Limited, L=London, ST=Greater London, C=GB Issuer: CN=IG Index Private CA, O=IG Index Limited, L=London, ST=Greater London, C=GB Serial number: 289d0b0d49a4accd6c2807890c89fe77 Valid from: Mon Aug 10 20:00:00 EDT 2015 until: Tue Dec 31 18:59:59 EST 2030 Certificate fingerprints: MD5: C6:2E:52:72:E3:B5:A6:76:E2:FE:6C:45:99:B2:F3:84 SHA1: AB:D5:E2:0F:81:90:6F:5B:8A:55:7F:74:67:D8:F7:7E:79:F3:69:16 SHA256: AC:6E:7F:64:DB:8F:4D:FF:27:E0:64:37:8D:5A:3B:64:2B:6E:30:4A:15:FB:71:FD:36:C4:54:9F:82:9E:5E:38 Signature algorithm name: SHA256withRSA Version: 3 Extensions: #1: ObjectId: 2.5.29.19 Criticality=true BasicConstraints:[ CA:true PathLen:2147483647 ] #2: ObjectId: 2.5.29.15 Criticality=true KeyUsage [ DigitalSignature Key_CertSign Crl_Sign ] #3: ObjectId: 2.5.29.14 Criticality=false SubjectKeyIdentifier [ KeyIdentifier [ 0000: B8 4D CE EC DA 6C 20 92 A6 1A 59 A2 8D 17 56 DC .M...l ...Y...V. 0010: 75 BF FB D9 u... ] ] ******************************************* ******************************************* The contents of /secrets/hawkular-metrics.keystore :- Keystore type: JKS Keystore provider: SUN Your keystore contains 1 entry Alias name: hawkular-metrics Creation date: Dec 23, 2015 Entry type: PrivateKeyEntry Certificate chain length: 1 Certificate[1]: Owner: CN=hawkular-metrics.paas.dev.iggroup.local, OU=Private Unified Communications, OU=Hosted by IG Index Limited, OU=IT, O=IG Group Limited, STREET=Cannon Bridge House, STREET=25 Dowgate Hill, L=London, ST=UK, OID.2.5.4.17=EC4R 2YA, C=GB Issuer: CN=IG Index Private Server CA, O=IG Index Limited, L=London, ST=Greater London, C=GB Serial number: 9d62c59140e1c195dcb87bc691cc5b93 Valid from: Tue Dec 22 19:00:00 EST 2015 until: Sat Dec 22 18:59:59 EST 2018 Certificate fingerprints: MD5: 3E:09:78:38:46:DF:EE:4B:62:A6:E2:32:43:CE:11:4B SHA1: 33:DF:3A:0D:BC:1C:4A:76:61:AB:78:05:F1:02:C7:B9:FF:96:18:64 SHA256: E9:7E:02:D2:2B:BA:4A:0B:18:73:D9:5C:59:FB:EA:21:F6:21:60:13:6A:9B:77:B8:A6:88:E1:D3:CC:3E:44:B4 Signature algorithm name: SHA256withRSA Version: 3 Extensions: #1: ObjectId: 1.3.6.1.5.5.7.1.1 Criticality=false AuthorityInfoAccess [ [ accessMethod: caIssuers accessLocation: URIName: http://crt.comodoca.com/IGIndexPrivateServerCA.crt , accessMethod: ocsp accessLocation: URIName: http://ocsp.comodoca.com ] ] #2: ObjectId: 2.5.29.35 Criticality=false AuthorityKeyIdentifier [ KeyIdentifier [ 0000: A5 52 C0 3A C7 00 F7 9A 3E 7F 10 34 D0 B8 63 68 .R.:....>..4..ch 0010: B0 36 32 92 .62. ] ] #3: ObjectId: 2.5.29.19 Criticality=true BasicConstraints:[ CA:false PathLen: undefined ] #4: ObjectId: 2.5.29.31 Criticality=false CRLDistributionPoints [ [DistributionPoint: [URIName: http://crl.comodoca.com/IGIndexPrivateServerCA.crl] ]] #5: ObjectId: 2.5.29.37 Criticality=false ExtendedKeyUsages [ serverAuth clientAuth ] #6: ObjectId: 2.5.29.15 Criticality=true KeyUsage [ DigitalSignature Key_Encipherment ] #7: ObjectId: 2.5.29.17 Criticality=false SubjectAlternativeName [ DNSName: hawkular-metrics.paas.dev.iggroup.local DNSName: hawkular-metrics ] #8: ObjectId: 2.5.29.14 Criticality=false SubjectKeyIdentifier [ KeyIdentifier [ 0000: A7 0B C6 3C EA 5B C7 69 29 31 9F 73 0D 91 81 3D ...<.[.i)1.s...= 0010: 03 D2 FE F9 .... ] ] ******************************************* *******************************************
See comment 10 Hmm so the metrics signer cert I see on the cassandra node.. curl -v -v -v https://172.30.123.91:9042 * About to connect() to 172.30.123.91 port 9042 (#0) * Trying 172.30.123.91... * Connected to 172.30.123.91 (172.30.123.91) port 9042 (#0) * Initializing NSS with certpath: sql:/etc/pki/nssdb * CAfile: /etc/pki/tls/certs/ca-bundle.crt CApath: none * Server certificate: * subject: CN=hawkular-cassandra * start date: Dec 23 18:04:20 2015 GMT * expire date: Dec 22 18:04:21 2017 GMT * common name: hawkular-cassandra * issuer: CN=metrics-signer@1450893859 * NSS error -8172 (SEC_ERROR_UNTRUSTED_ISSUER) * Peer's certificate issuer has been marked as not trusted by the user. * Closing connection 0 Does look to be present in the truststore ... so perhaps it is not a straight connection issue from the client connecting to the node. Does cassandra expect the application to connect with a client certificate? regards Dave
If you specify your custom Hawkular Metrics certificates and CA certificate to the deployer, the deployer should take care of managing all the keystores and truststores based on that information. This includes such things as configuring Cassandra to trust the client certificate coming from Hawkular Metrics.
Hi Yes, I'm having issues using the deployer as instruncted unless am I doing it wrong - can you see anything in the steps I posted? Started to look on the cassandra pod side of things. I can see that cassandra IS expecting client cert authentication - from the cassandra.yaml: - # enable or disable client/server encryption. client_encryption_options: enabled: true keystore: /secret/cassandra.keystore keystore_password: 8UR-bJU3_-4kVmm require_client_auth: true truststore: /secret/cassandra.truststore truststore_password: jd24gbI75gPyMgG So the client is expected to authenticate with a cert to cassdandra (my question in comment 11). The client connects to cassandra so looking at the truststore: - keytool -keystore cassandra.truststore -storepass jd24gbI75gPyMgG -l> Keystore type: JKS Keystore provider: SUN Your keystore contains 5 entries Alias name: ca Creation date: Dec 23, 2015 Entry type: trustedCertEntry Owner: CN=metrics-signer@1450893859 Issuer: CN=metrics-signer@1450893859 Serial number: 1 Valid from: Wed Dec 23 13:04:20 EST 2015 until: Mon Dec 21 13:04:21 EST 2020 Certificate fingerprints: MD5: C0:9B:F5:42:4E:D7:E4:C6:39:A1:09:25:A8:A1:A1:11 SHA1: DB:0D:86:3A:5E:A6:93:36:D0:87:54:25:7A:B1:D2:CD:32:0F:52:84 SHA256: 51:FF:8F:CF:2E:82:1D:40:1E:C4:7F:9D:51:76:1B:95:A2:D4:C2:2D:65:7A:84:47:FE:07:AB:CA:D9:FB:2B:86 Signature algorithm name: SHA256withRSA Version: 3 Extensions: #1: ObjectId: 2.5.29.19 Criticality=true BasicConstraints:[ CA:true PathLen:2147483647 ] #2: ObjectId: 2.5.29.15 Criticality=true KeyUsage [ DigitalSignature Key_Encipherment Key_CertSign ] ******************************************* ******************************************* Alias name: hawkular-metrics Creation date: Dec 23, 2015 Entry type: trustedCertEntry Owner: CN=hawkular-metrics.paas.dev.iggroup.local, OU=Private Unified Communications, OU=Hosted by IG Index Limited, OU=IT, O=IG Group Limited, STREET=Cannon Bridge House, STREET=25 Dowgate Hill, L=London, ST=UK, OID.2.5.4.17=EC4R 2YA, C=GB Issuer: CN=IG Index Private Server CA, O=IG Index Limited, L=London, ST=Greater London, C=GB Serial number: 9d62c59140e1c195dcb87bc691cc5b93 Valid from: Tue Dec 22 19:00:00 EST 2015 until: Sat Dec 22 18:59:59 EST 2018 Certificate fingerprints: MD5: 3E:09:78:38:46:DF:EE:4B:62:A6:E2:32:43:CE:11:4B SHA1: 33:DF:3A:0D:BC:1C:4A:76:61:AB:78:05:F1:02:C7:B9:FF:96:18:64 SHA256: E9:7E:02:D2:2B:BA:4A:0B:18:73:D9:5C:59:FB:EA:21:F6:21:60:13:6A:9B:77:B8:A6:88:E1:D3:CC:3E:44:B4 Signature algorithm name: SHA256withRSA Version: 3 Extensions: #1: ObjectId: 1.3.6.1.5.5.7.1.1 Criticality=false AuthorityInfoAccess [ [ accessMethod: caIssuers accessLocation: URIName: http://crt.comodoca.com/IGIndexPrivateServerCA.crt , accessMethod: ocsp accessLocation: URIName: http://ocsp.comodoca.com ] ] #2: ObjectId: 2.5.29.35 Criticality=false AuthorityKeyIdentifier [ KeyIdentifier [ 0000: A5 52 C0 3A C7 00 F7 9A 3E 7F 10 34 D0 B8 63 68 .R.:....>..4..ch 0010: B0 36 32 92 .62. ] ] #3: ObjectId: 2.5.29.19 Criticality=true BasicConstraints:[ CA:false PathLen: undefined ] #4: ObjectId: 2.5.29.31 Criticality=false CRLDistributionPoints [ [DistributionPoint: [URIName: http://crl.comodoca.com/IGIndexPrivateServerCA.crl] ]] #5: ObjectId: 2.5.29.37 Criticality=false ExtendedKeyUsages [ serverAuth clientAuth ] #6: ObjectId: 2.5.29.15 Criticality=true KeyUsage [ DigitalSignature Key_Encipherment ] #7: ObjectId: 2.5.29.17 Criticality=false SubjectAlternativeName [ DNSName: hawkular-metrics.paas.dev.iggroup.local DNSName: hawkular-metrics ] #8: ObjectId: 2.5.29.14 Criticality=false SubjectKeyIdentifier [ KeyIdentifier [ 0000: A7 0B C6 3C EA 5B C7 69 29 31 9F 73 0D 91 81 3D ...<.[.i)1.s...= 0010: 03 D2 FE F9 .... ] ] ******************************************* ******************************************* Alias name: hawkular-cassandra Creation date: Dec 23, 2015 Entry type: trustedCertEntry Owner: CN=hawkular-cassandra Issuer: CN=metrics-signer@1450893859 Serial number: 2 Valid from: Wed Dec 23 13:04:20 EST 2015 until: Fri Dec 22 13:04:21 EST 2017 Certificate fingerprints: MD5: 10:36:98:3F:C4:30:5E:B0:FE:36:D9:B2:74:0B:61:21 SHA1: 4C:04:DE:0A:F5:F0:8C:4B:5A:9B:54:DA:E6:8F:19:F5:C5:9A:CE:6A SHA256: E7:67:13:EA:62:5E:58:75:E7:7D:F4:26:83:65:35:69:5B:0A:2E:53:F6:43:40:BF:0A:04:D4:EA:40:A2:0D:A5 Signature algorithm name: SHA256withRSA Version: 3 Extensions: #1: ObjectId: 2.5.29.19 Criticality=true BasicConstraints:[ CA:false PathLen: undefined ] #2: ObjectId: 2.5.29.37 Criticality=false ExtendedKeyUsages [ serverAuth ] #3: ObjectId: 2.5.29.15 Criticality=true KeyUsage [ DigitalSignature Key_Encipherment ] #4: ObjectId: 2.5.29.17 Criticality=false SubjectAlternativeName [ DNSName: hawkular-cassandra ] ******************************************* ******************************************* Alias name: metricca Creation date: Dec 23, 2015 Entry type: trustedCertEntry Owner: CN=IG Index Private CA, O=IG Index Limited, L=London, ST=Greater London, C=GB Issuer: CN=IG Index Private CA, O=IG Index Limited, L=London, ST=Greater London, C=GB Serial number: 289d0b0d49a4accd6c2807890c89fe77 Valid from: Mon Aug 10 20:00:00 EDT 2015 until: Tue Dec 31 18:59:59 EST 2030 Certificate fingerprints: MD5: C6:2E:52:72:E3:B5:A6:76:E2:FE:6C:45:99:B2:F3:84 SHA1: AB:D5:E2:0F:81:90:6F:5B:8A:55:7F:74:67:D8:F7:7E:79:F3:69:16 SHA256: AC:6E:7F:64:DB:8F:4D:FF:27:E0:64:37:8D:5A:3B:64:2B:6E:30:4A:15:FB:71:FD:36:C4:54:9F:82:9E:5E:38 Signature algorithm name: SHA256withRSA Version: 3 Extensions: #1: ObjectId: 2.5.29.19 Criticality=true BasicConstraints:[ CA:true PathLen:2147483647 ] #2: ObjectId: 2.5.29.15 Criticality=true KeyUsage [ DigitalSignature Key_CertSign Crl_Sign ] #3: ObjectId: 2.5.29.14 Criticality=false SubjectKeyIdentifier [ KeyIdentifier [ 0000: B8 4D CE EC DA 6C 20 92 A6 1A 59 A2 8D 17 56 DC .M...l ...Y...V. 0010: 75 BF FB D9 u... ] ] ******************************************* ******************************************* Alias name: cassandraca Creation date: Dec 23, 2015 Entry type: trustedCertEntry Owner: CN=metrics-signer@1450893859 Issuer: CN=metrics-signer@1450893859 Serial number: 1 Valid from: Wed Dec 23 13:04:20 EST 2015 until: Mon Dec 21 13:04:21 EST 2020 Certificate fingerprints: MD5: C0:9B:F5:42:4E:D7:E4:C6:39:A1:09:25:A8:A1:A1:11 SHA1: DB:0D:86:3A:5E:A6:93:36:D0:87:54:25:7A:B1:D2:CD:32:0F:52:84 SHA256: 51:FF:8F:CF:2E:82:1D:40:1E:C4:7F:9D:51:76:1B:95:A2:D4:C2:2D:65:7A:84:47:FE:07:AB:CA:D9:FB:2B:86 Signature algorithm name: SHA256withRSA Version: 3 Extensions: #1: ObjectId: 2.5.29.19 Criticality=true BasicConstraints:[ CA:true PathLen:2147483647 ] #2: ObjectId: 2.5.29.15 Criticality=true KeyUsage [ DigitalSignature Key_Encipherment Key_CertSign ] ******************************************* ******************************************* So it looks like the cert is in the truststore ok.. there doesn't appear to be an ssl error in the stack trace... maybe there isn't an ssl issue? (Suspicious though). 04:55:28,805 INFO [org.hawkular.metrics.api.jaxrs.MetricsServiceLifecycle] (metricsservice-lifecycle-thread) HAWKMETRICS200002: Initializing metrics service 04:55:28,865 WARN [org.hawkular.metrics.api.jaxrs.MetricsServiceLifecycle] (metricsservice-lifecycle-thread) HAWKMETRICS200003: Could not connect to Cassandra cluster - assuming its not up yet: All host(s) tried for query failed (tried: hawkular-cassandra/172.30.123.91:9042 (com.datastax.driver.core.TransportException: [hawkular-cassandra/172.30.123.91:9042] Channel has been closed)) 04:55:28,865 WARN [org.hawkular.metrics.api.jaxrs.MetricsServiceLifecycle] (metricsservice-lifecycle-thread) HAWKMETRICS200004: [446] Retrying connecting to Cassandra cluster in [2]s... 04:55:28,865 WARN [io.netty.util.concurrent.DefaultPromise] (cluster446-nio-worker-0) An exception was thrown by com.datastax.driver.core.Connection$9.operationComplete(): java.util.concurrent.RejectedExecutionException: Task com.datastax.driver.core.Connection$9$1@2dd6581 rejected from java.util.concurrent.ThreadPoolExecutor@7348795c[Terminated, pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 0] at java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2047) [rt.jar:1.8.0_51] at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:823) [rt.jar:1.8.0_51] at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1369) [rt.jar:1.8.0_51] at com.google.common.util.concurrent.MoreExecutors$ListeningDecorator.execute(MoreExecutors.java:484) [guava-16.0.1.redhat-3.jar:16.0.1.redhat-3] at com.datastax.driver.core.Connection$9.operationComplete(Connection.java:566) [cassandra-driver-core-2.2.0.rc2-redhat-2.jar:2.2.0.rc2-redhat-2] at com.datastax.driver.core.Connection$9.operationComplete(Connection.java:542) [cassandra-driver-core-2.2.0.rc2-redhat-2.jar:2.2.0.rc2-redhat-2] at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:680) [netty-common-4.0.27.Final-redhat-2.jar:4.0.27.Final-redhat-2] at io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:603) [netty-common-4.0.27.Final-redhat-2.jar:4.0.27.Final-redhat-2] at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:563) [netty-common-4.0.27.Final-redhat-2.jar:4.0.27.Final-redhat-2] at io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:424) [netty-common-4.0.27.Final-redhat-2.jar:4.0.27.Final-redhat-2] at io.netty.channel.PendingWriteQueue.safeFail(PendingWriteQueue.java:252) [netty-transport-4.0.27.Final-redhat-2.jar:4.0.27.Final-redhat-2] at io.netty.channel.PendingWriteQueue.removeAndFailAll(PendingWriteQueue.java:112) [netty-transport-4.0.27.Final-redhat-2.jar:4.0.27.Final-redhat-2] at io.netty.handler.ssl.SslHandler.setHandshakeFailure(SslHandler.java:1256) [netty-handler-4.0.27.Final-redhat-2.jar:4.0.27.Final-redhat-2] at io.netty.handler.ssl.SslHandler.channelInactive(SslHandler.java:676) [netty-handler-4.0.27.Final-redhat-2.jar:4.0.27.Final-redhat-2] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:237) [netty-transport-4.0.27.Final-redhat-2.jar:4.0.27.Final-redhat-2] at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:223) [netty-transport-4.0.27.Final-redhat-2.jar:4.0.27.Final-redhat-2] at io.netty.channel.DefaultChannelPipeline.fireChannelInactive(DefaultChannelPipeline.java:829) [netty-transport-4.0.27.Final-redhat-2.jar:4.0.27.Final-redhat-2] at io.netty.channel.AbstractChannel$AbstractUnsafe$7.run(AbstractChannel.java:610) [netty-transport-4.0.27.Final-redhat-2.jar:4.0.27.Final-redhat-2] at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:357) [netty-common-4.0.27.Final-redhat-2.jar:4.0.27.Final-redhat-2] at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357) [netty-transport-4.0.27.Final-redhat-2.jar:4.0.27.Final-redhat-2] at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111) [netty-common-4.0.27.Final-redhat-2.jar:4.0.27.Final-redhat-2] at java.lang.Thread.run(Thread.java:745) [rt.jar:1.8.0_51] I can't see anything in the logs on the cassandra pod when the metrics pod is connecting. Can you reproduce my issue? regards Dave
Damn! Looking at the trace - wouldn't this suggest that that SSL handshake is actually failing - seen as it seems to be trying to setHandshakeFailure... [netty-transport-4.0.27.Final-redhat-2.jar:4.0.27.Final-redhat-2] at io.netty.handler.ssl.SslHandler.setHandshakeFailure(SslHandler.java:1256) So this takes me back to my perceived SSL trust/keystore problem - although I can't immediately see a problem with the truststore.
Can you please attach your logs and other verbose information in a file or in a pastebin? Pasting content like this into a bugzilla make it very difficult to read. The steps I have used for custom certificates is as follows: oadm ca create-server-cert --cert=hawkular.crt --key=hawkular.key --hostnames=hawkular-metrics cat hawkular.crt hawkular.key > hawkular.pem oc secrets new metrics-deployer hawkular-metrics.pem=hawkular.pem hawkular-metrics-ca.cert=${SIGNER_CA.CERT:-openshift.local.config/master/ca.crt} oc process -f $SOURCE_ROOT/metrics.yaml -v HAWKULAR_METRICS_HOSTNAME=hawkular-metrics.example.com,USE_PERSISTENT_STORAGE=false | oc create -f - That seems to work for me with running Hawkular Metrics with a custom certificate.
Ok, I think I'm closer to the issue... Definitely an SSL issue - I have 3 certs in play not 2! My metrics certificate is signed by an intermediate CA which is in turn signed by a root CA - so when I generate my metrics ca secret I'm enclosing two certificates in the ca cert file (comment 9)! hawkular-metrics-ca.cert=/etc/pki/ig-private-ca.crt ig-private-ca.crt is -----BEGIN CERTIFICATE----- MIIDrDCCApSgAwIBAgIQKJ0LDUmkrM1sKAeJDIn+dzANBgkqhkiG9w0BAQsFADBw MQswCQYDVQQGEwJHQjEXMBUGA1UECBMOR3JlYXRlciBMb25kb24xDzANBgNVBAcT BkxvbmRvbjEZMBcGA1UEChMQSUcgSW5kZXggTGltaXRlZDEcMBoGA1UEAxMTSUcg SW5kZXggUHJpdmF0ZSBDQTAeFw0xNTA4MTEwMDAwMDBaFw0zMDEyMzEyMzU5NTla MHAxCzAJBgNVBAYTAkdCMRcwFQYDVQQIEw5HcmVhdGVyIExvbmRvbjEPMA0GA1UE BxMGTG9uZG9uMRkwFwYDVQQKExBJRyBJbmRleCBMaW1pdGVkMRwwGgYDVQQDExNJ RyBJbmRleCBQcml2YXRlIENBMIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKC AQEAvderk0XPdKR9eBSuLTdzE1k7hHPDOiJZtlPgPRmPxyOC8rMkR/S+xpxs43iQ qKSfZSWFGwAc73yDpFVr1uM9Sh69daF0ig+uO7xGj2oZtXymZYoL8WvS1Xy5V1qW hVZY83g3KRD/hEj0QXcdj4WooQ8Heimc4kq+dJNgHXAR05c6DHG8MuhQBckDuM9c wggIzpL+1xrrZvbhKsdIYThrrsVPxu5PA1m3iDA+ukxLOmuN6WPXiSsSrAqNclvO Aj1Dg+7Z31lJREHJcnvjycYklLu5qaCw8B3mdwhuGcq68Vdzhe/hdxe8PeuxOje9 bYZrO3Tl/J12U7uirdUyC/CFiQIDAQABo0IwQDAdBgNVHQ4EFgQUuE3O7NpsIJKm GlmijRdW3HW/+9kwDgYDVR0PAQH/BAQDAgGGMA8GA1UdEwEB/wQFMAMBAf8wDQYJ KoZIhvcNAQELBQADggEBAChkpH79l7yUfVK2vVLVUwfml8sTBFefMsFZoBSczgU8 6O1IivgFBDdN5+FuOe9ZaXrPRoulCzaUGtUNA7oZTZtsd17M5sUG7QKL+sa56o93 CjymhHMZUQoV/24fzraOmPC8R274bwWut5O9NA5pZomcKcSJlMeNyDk5NLPWZpEW wlLIxwzVNDnteIVdMT35q+PWSDmgIRDE0Xf2uHAYFoT4zuI//NTOR0pua/NpiaYo CLaB9Z9PC77jivCYwKL9ubEMeBKsWr0fsMFHj76aWBTCRhIXEEv5t84N/z1IELv3 8bHY0kArqvvfmIWu9Y/PO+iwUJm5ouCIQD7uHPAZpLY= -----END CERTIFICATE----- -----BEGIN CERTIFICATE----- MIIEqTCCA5GgAwIBAgIQFRhu0C2GQlB4OgRCKNGwnDANBgkqhkiG9w0BAQsFADBw MQswCQYDVQQGEwJHQjEXMBUGA1UECBMOR3JlYXRlciBMb25kb24xDzANBgNVBAcT BkxvbmRvbjEZMBcGA1UEChMQSUcgSW5kZXggTGltaXRlZDEcMBoGA1UEAxMTSUcg SW5kZXggUHJpdmF0ZSBDQTAeFw0xNTA5MDQwMDAwMDBaFw0zMDEyMzEyMzU5NTla MHcxCzAJBgNVBAYTAkdCMRcwFQYDVQQIEw5HcmVhdGVyIExvbmRvbjEPMA0GA1UE BxMGTG9uZG9uMRkwFwYDVQQKExBJRyBJbmRleCBMaW1pdGVkMSMwIQYDVQQDExpJ RyBJbmRleCBQcml2YXRlIFNlcnZlciBDQTCCASIwDQYJKoZIhvcNAQEBBQADggEP ADCCAQoCggEBAKloODC6Xv7Jgcw+E4aQRCGeddLd1P7/8W3lHxwo+EO8mvXVYHJ/ 6YH20PSeejFbr9BNWuAGiQyHUoP2L8RZFDcNZYXg0gGmaJJtpR02YyWo/jpxVCE8 4WgulUMrervgts9kekYeTSVlnti46DjoRrlpv0WfmDY+IitoY4LppIUkCQOFcAKc OLPwhepWhjv6/HLZ7Be+xK5GGToKysFgr1is1SH53WqpN0r17gh7x+TDuL+BGneJ 1obZO+T1+oBWyE1j1tV2IqGweuXkSmYJ8lrzRVpbdbLuaJ22KuRwzSU5+SXXaHi+ r8vyDoQYbpj06N1iiainHPO0cnqasfXdlZUCAwEAAaOCATYwggEyMB8GA1UdIwQY MBaAFLhNzuzabCCSphpZoo0XVtx1v/vZMB0GA1UdDgQWBBSlUsA6xwD3mj5/EDTQ uGNosDYykjAOBgNVHQ8BAf8EBAMCAYYwEgYDVR0TAQH/BAgwBgEB/wIBADAdBgNV HSUEFjAUBggrBgEFBQcDAQYIKwYBBQUHAwIwPQYDVR0fBDYwNDAyoDCgLoYsaHR0 cDovL2NybC5jb21vZG9jYS5jb20vSUdJbmRleFByaXZhdGVDQS5jcmwwbgYIKwYB BQUHAQEEYjBgMDgGCCsGAQUFBzAChixodHRwOi8vY3J0LmNvbW9kb2NhLmNvbS9J R0luZGV4UHJpdmF0ZUNBLmNydDAkBggrBgEFBQcwAYYYaHR0cDovL29jc3AuY29t b2RvY2EuY29tMA0GCSqGSIb3DQEBCwUAA4IBAQCmsUQZ5hc0elX1SGYDZc04Z0o9 p0FU5xqIdLkqN8IxNGSBah+PXJfEacXdniXJqM+FTkaeBNoJV4WMZVCykfO6mg+X YM7dk9zDQ6FkK3paRKLbDau+SLZAlAAoONLAka+vnyxciEXoUrCXy7k3y85yQ4iX bTElZrOEFMmTL5U0oKIhHToLY8N+nlUoYemuI5aDr8W+2YOs6881Hh96MUqPoUVq 9rtv+r2oodjBk4k4aDzOsan9uDrhD11qsB4rN7RkQI9BttQ7qEciOiPth2rhUtyE A5gy7PiXtic81oAT1NFHfOdeortsPtBN+sEQfGEoA8bnlk1VazPj6jScwJyE -----END CERTIFICATE----- The metrics deployer does not seem to know how to cope with multiple CA certs in a chain: - CA: Issuer: C=GB, ST=Greater London, L=London, O=IG Index Limited, CN=IG Index Private CA Subject: C=GB, ST=Greater London, L=London, O=IG Index Limited, CN=IG Index Private CA Inter: Issuer: C=GB, ST=Greater London, L=London, O=IG Index Limited, CN=IG Index Private CA Subject: C=GB, ST=Greater London, L=London, O=IG Index Limited, CN=IG Index Private Server CA Cert: Issuer: C=GB, ST=Greater London, L=London, O=IG Index Limited, CN=IG Index Private Server CA Subject: C=GB/postalCode=EC4R 2YA, ST=UK, L=London/street=25 Dowgate Hill/street=Cannon Bridge House, O=IG Group Limited, OU=IT, OU=Hosted by IG Index Limited, OU=Private Unified Communications, CN=hawkular-metrics.paas.dev.iggroup.local On the metrics application it is creating a hawkular-metrics.keystore with a single certificate: - Your keystore contains 1 entry Alias name: hawkular-metrics Creation date: Dec 23, 2015 Entry type: PrivateKeyEntry Certificate chain length: 1 Certificate[1]: Owner: CN=hawkular-metrics.paas.dev.iggroup.local, OU=Private Unified Communications, OU=Hosted by IG Index Limited, OU=IT, O=IG Group Limited, STREET=Cannon Bridge House, STREET=25 Dowgate Hill, L=London, ST=UK, OID.2.5.4.17=EC4R 2YA, C=GB Issuer: CN=IG Index Private Server CA, O=IG Index Limited, L=London, ST=Greater London, C=GB This certificate is not a chain, it's a single certificate which has been signed by the "IG Index Private Server CA". It might be that it failed to create the chain because it only tried to add the "IG Index Private CA" cert which is not directly linked to this cert. On the cassandra truststore there are 5 certificates: - Alias name: ca - Owner: CN=metrics-signer@1450893859 Alias name: hawkular-metrics - Owner: CN=hawkular-metrics.paas.dev.iggroup.local, OU=Private Unified Communications, OU=Hosted by IG Index Limited, OU=IT, O=IG Group Limited, STREET=Cannon Bridge House, STREET=25 Dowgate Hill, L=London, ST=UK, OID.2.5.4.17=EC4R 2YA, C=GB Issuer: CN=IG Index Private Server CA, O=IG Index Limited, L=London, ST=Greater London, C=GB Alias name: hawkular-cassandra - Owner: CN=hawkular-cassandra Alias name: metricca - Owner: CN=IG Index Private CA, O=IG Index Limited, L=London, ST=Greater London, C=GB So the "IG Index Private CA" is trusted but the "IG Index Private Server CA" is not. I can't modify the deployed keystores to test because they are owned by root. I can't change the certificate because all of our certs have multiple intermediate CAs and I can't issue working certs with a single CA cert. I think the deployer process only picks up the FIRST ca certificate in the secret. Could we make it iterate over it creating aliases and import certs for EACH certificate found? regards Dave
Hi I have managed to work around the issue of supporting my certificate with a chain by manually creating my own hawkular-metrics.keystore and updating the hawkular-metric-secrets secret. My steps: - Create a p12 archive on the deploy server from the keys and pems etc.. openssl pkcs12 -export -in hawkular-metrics.paas.dev.iggroup.local.crt -inkey hawkular-metrics.paas.dev.iggroup.local.key \ -out hawkular-metrics.paas.dev.iggroup.local.p12 -name hawkular-metrics \ -CAfile hawkular-metrics.paas.dev.iggroup.local.crt.intermediate -chain Now convert this to a new keystore (using same password as before)... keytool -importkeystore \ -deststorepass XXX -destkeypass XXX -destkeystore hawkular-metrics.keystore \ -srckeystore hawkular-metrics.paas.dev.iggroup.local.p12 -srcstoretype PKCS12 -srcstorepass YYY \ -alias hawkular-metrics Check with keytool -keystore hawkular-metrics.keystore -storepass XXX -list -v Now have a keystore with alias hawkular-metrics and a chain of 3 certs. Now convert to base64 (without wrapping) base64 -w 0 cat hawkular-metrics.keystore oc edit secret hawkular-metrics-secrets (and paste in the base64 contents in hawkular-metrics.keystore) Please can you update this ticket as a request for supporting certificate chains in the metrics deployer (and other deployers which redhat creates)? regards Dave
The above workaround has resolved the issues with it connecting to cassandra and starting up but there is a new issue that it can't connect to itself which lead to a "Forbidden" message if you click on the metrics tab and the following stack trace in the hawkular-metrics logs: - 08:36:56,391 ERROR [org.hawkular.openshift.auth.OpenShiftTokenAuthentication] (http-/0.0.0.0:8444-12) Error trying to authenticate against the OpenShift server: java.net.ConnectException: Connection refused at java.net.PlainSocketImpl.socketConnect(Native Method) [rt.jar:1.8.0_51] at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:345) [rt.jar:1.8.0_51] at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206) [rt.jar:1.8.0_51] at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188) [rt.jar:1.8.0_51] at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) [rt.jar:1.8.0_51] at java.net.Socket.connect(Socket.java:589) [rt.jar:1.8.0_51] at sun.security.ssl.SSLSocketImpl.connect(SSLSocketImpl.java:668) [jsse.jar:1.8.0_51] at sun.security.ssl.BaseSSLSocketImpl.connect(BaseSSLSocketImpl.java:173) [jsse.jar:1.8.0_51] at sun.net.NetworkClient.doConnect(NetworkClient.java:180) [rt.jar:1.8.0_51] at sun.net.www.http.HttpClient.openServer(HttpClient.java:432) [rt.jar:1.8.0_51] at sun.net.www.http.HttpClient.openServer(HttpClient.java:527) [rt.jar:1.8.0_51] at sun.net.www.protocol.https.HttpsClient.<init>(HttpsClient.java:275) [rt.jar:1.8.0_51] at sun.net.www.protocol.https.HttpsClient.New(HttpsClient.java:371) [rt.jar:1.8.0_51] at sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.getNewHttpClient(AbstractDelegateHttpsURLConnection.java:191) [rt.jar:1.8.0_51] at sun.net.www.protocol.http.HttpURLConnection.plainConnect0(HttpURLConnection.java:1104) [rt.jar:1.8.0_51] at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:998) [rt.jar:1.8.0_51] at sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.connect(AbstractDelegateHttpsURLConnection.java:177) [rt.jar:1.8.0_51] at sun.net.www.protocol.http.HttpURLConnection.getOutputStream0(HttpURLConnection.java:1282) [rt.jar:1.8.0_51] at sun.net.www.protocol.http.HttpURLConnection.getOutputStream(HttpURLConnection.java:1257) [rt.jar:1.8.0_51] at sun.net.www.protocol.https.HttpsURLConnectionImpl.getOutputStream(HttpsURLConnectionImpl.java:250) [rt.jar:1.8.0_51] at org.hawkular.openshift.auth.OpenShiftTokenAuthentication.isAuthorized(OpenShiftTokenAuthentication.java:93) [hawkular-metrics-openshift-integration-0.8.0.Final-redhat-1.jar:0.8.0.Final-redhat-1] at org.hawkular.openshift.auth.OpenShiftTokenAuthentication.doFilter(OpenShiftTokenAuthentication.java:67) [hawkular-metrics-openshift-integration-0.8.0.Final-redhat-1.jar:0.8.0.Final-redhat-1] at org.hawkular.openshift.auth.OpenShiftAuthenticationFilter.doFilter(OpenShiftAuthenticationFilter.java:89) [hawkular-metrics-openshift-integration-0.8.0.Final-redhat-1.jar:0.8.0.Final-redhat-1] at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:246) [jbossweb-7.5.10.Final-redhat-1.jar:7.5.10.Final-redhat-1] at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:214) [jbossweb-7.5.10.Final-redhat-1.jar:7.5.10.Final-redhat-1] at org.hawkular.metrics.api.jaxrs.filter.CorsFilter.doFilter(CorsFilter.java:88) [classes:] at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:246) [jbossweb-7.5.10.Final-redhat-1.jar:7.5.10.Final-redhat-1] at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:214) [jbossweb-7.5.10.Final-redhat-1.jar:7.5.10.Final-redhat-1] at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:231) [jbossweb-7.5.10.Final-redhat-1.jar:7.5.10.Final-redhat-1] at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:149) [jbossweb-7.5.10.Final-redhat-1.jar:7.5.10.Final-redhat-1] at org.jboss.as.web.security.SecurityContextAssociationValve.invoke(SecurityContextAssociationValve.java:169) [jboss-as-web-7.5.3.Final-redhat-2.jar:7.5.3.Final-redhat-2] at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:150) [jbossweb-7.5.10.Final-redhat-1.jar:7.5.10.Final-redhat-1] at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:97) [jbossweb-7.5.10.Final-redhat-1.jar:7.5.10.Final-redhat-1] at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:102) [jbossweb-7.5.10.Final-redhat-1.jar:7.5.10.Final-redhat-1] at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:344) [jbossweb-7.5.10.Final-redhat-1.jar:7.5.10.Final-redhat-1] at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:854) [jbossweb-7.5.10.Final-redhat-1.jar:7.5.10.Final-redhat-1] at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:653) [jbossweb-7.5.10.Final-redhat-1.jar:7.5.10.Final-redhat-1] at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:926) [jbossweb-7.5.10.Final-redhat-1.jar:7.5.10.Final-redhat-1] at java.lang.Thread.run(Thread.java:745) [rt.jar:1.8.0_51] I've looked at the trust store and confirmed that the top level CA was added and I also manipulated it (and saved back to the secret) so that it also contains the intermediate cert too but it looks as though it us now having issues connecting to itself. :( Could this be more cert issues? The certs are pretty new and I'm we had issue with them internally with any java 8 jdk/jre <1.8.0_60 - the java 8 version in hawkular metrics is pretty old - this could be an issue. I'm wondering if there could be something else contributing to the "connection refused" message? I can confirm that hawular-metrics is starting up on port 8444 and is presenting my certificate with corect chain of 3 certs - I can also browse to https://hawkular-metrics.paas.dev.iggroup.local in a browser and get the rights certs. This has turned out a whole world of pain more difficult than I imagined that it would be! regards Dave
"08:36:56,391 ERROR [org.hawkular.openshift.auth.OpenShiftTokenAuthentication] (http-/0.0.0.0:8444-12) Error trying to authenticate against the OpenShift server: java.net.ConnectException: Connection refused" This means that the Hawkular-Metrics instance is not able to connect to the OpenShift master. Are you doing anything with your master configuration which would mean it is not accessible over the default https://kubernetes.default.svc:443 ? Are you configuring the MASTER_URL property to something during the deploy? Could you please not just post logs and verbose information directly in to the comment sections. Please attach them as file or place them in a pastebin or similar system somewhere. Doing this will make it easier for us to help figure out the problem. From what I can tell so far, it looks like there might be an issue with CA chains that we need to look into.
I have gone through and verified that intermediary CA certificates seem to be working. A PR has also been created which includes a test specifically for this: https://github.com/openshift/origin-metrics/pull/63 In the test the "hawkular-metrics.pem" is set to https://github.com/mwringe/origin-metrics/blob/intermediary_ca/hack/keys/intermediary_ca/hawkular-metrics.pem (its the concatination of the public and private keys). And the "hawkular-metrics-ca.cert" is set to https://github.com/mwringe/origin-metrics/blob/intermediary_ca/hack/keys/intermediary_ca/hawkular-metrics.pem (which includes both the intermediary CA and its root CA).
SO, I had this exact same problem. It turned out that for me it was that the box had two interfaces and the installer picked that up and configured it: My /root/.config/openshift/installer.cfg.yml looked like: ansible_config: /usr/share/atomic-openshift-utils/ansible.cfg ansible_log_path: /tmp/ansible.log ansible_ssh_user: root hosts: - connect_to: aep-all.dc2.crunchtools.com hostname: aep-all.dc2.crunchtools.com ip: 192.168.122.55 master: true node: true public_hostname: aep-all-public.dc2.crunchtools.com public_ip: 192.168.100.136 variant: atomic-enterprise variant_version: '3.1' version: v1 I am guessing this is what messed things up. I had installed, uninstalled, and re-installed like five times to figure this out. If I set all of the variables to one of the ip/hostname combinations Hawkular works fine, but if they are split, it breaks: This works (even though DNS is wrong): ansible_config: /usr/share/atomic-openshift-utils/ansible.cfg ansible_log_path: /tmp/ansible.log ansible_ssh_user: root hosts: - connect_to: aep-all.dc2.crunchtools.com hostname: aep-all.dc2.crunchtools.com ip: 192.168.100.136 master: true node: true public_hostname: aep-all.dc2.crunchtools.com public_ip: 192.168.100.136 variant: atomic-enterprise variant_version: '3.1' version: v1 And this works: hosts: - connect_to: aep-all.dc2.crunchtools.com hostname: aep-all.dc2.crunchtools.com ip: 192.168.122.55 master: true node: true public_hostname: aep-all.dcr24.crunchtools.com public_ip: 192.168.122.55 variant: atomic-enterprise variant_version: '3.1' version: v1 Hope that helps!
The stuck in pending state is a fairly common issue which indicates that something went wrong. @Scott is this ansible config file a customized one? Or something in which the installer configured? Wondering if we need to open a new BZ about the installer having an issue or not.
The "atomic-openshift-installer install" command created the first (didn't work) and second config above (worked). I changed the IP Address on the final one (worked, and the correct ip address). So, yeah, I am not sure how the logic of the installer determines what to plug into the above templates...
Can you please open a separate BZ about your installer issues?
Hi Metrics are now FINALLY WORKING - I managed with fix them with the details from this bugzilla and a ticket I raised. There were two issues preventing it from working: - 1 - metrics deployer not creating the certificate chain properly when containing 3 certs, i.e. ca -> intermediate -> cert 2 - metrics unable to start due to ha-master unless I set MASTER_URL= to the correct URL when processing the template. I had to follow this procedure in order to make it work (warning: not pretty!): - oc project openshift-infra oc create -f - <<API apiVersion: v1 kind: ServiceAccount metadata: name: metrics-deployer secrets: - name: metrics-deployer API oadm policy add-role-to-user edit system:serviceaccount:openshift-infra:metrics-deployer oadm policy add-cluster-role-to-user cluster-reader system:serviceaccount:openshift-infra:heapster Add my key and cert to /etc/pki/hawkular-metrics.paas.dev.iggroup.local.pem Add my intermediate AND ca certs to /etc/pki/ig-private-ca.crt oc secrets new metrics-deployer hawkular-metrics.pem=/etc/pki/hawkular-metrics.paas.dev.iggroup.local.pem hawkular-metrics-ca.cert=/etc/pki/ig-private-ca.crt oc process -f /usr/share/ansible/openshift-ansible/roles/openshift_examples/files/examples/v1.1/infrastructure-templates/enterprise/metrics-deployer.yaml -v HAWKULAR_METRICS_HOSTNAME=hawkular-metrics.paas.dev.iggroup.local,IMAGE_PREFIX=registry.access.redhat.com/openshift3/,IMAGE_VERSION=latest,USE_PERSISTENT_STORAGE=false,MASTER_URL=https://osemaster.dev.iggroup.local:8443 | oc create -f - Now I need to manually create my own keystore from my already issued certs from COMODO CA: - Create a p12 archive on the deploy server from the keys and pems etc.. openssl pkcs12 -export -in hawkular-metrics.paas.dev.iggroup.local.crt -inkey hawkular-metrics.paas.dev.iggroup.local.key \ -out hawkular-metrics.paas.dev.iggroup.local.p12 -name hawkular-metrics \ -CAfile hawkular-metrics.paas.dev.iggroup.local.crt.intermediate -chain Create a new password (YYY - for these instructs only) Now convert this to a new keystore... keytool -importkeystore \ -deststorepass YYYY -destkeypass YYYY -destkeystore hawkular-metrics.keystore \ -srckeystore hawkular-metrics.paas.dev.iggroup.local.p12 -srcstoretype PKCS12 -srcstorepass YYY \ -alias hawkular-metrics Check with keytool -keystore hawkular-metrics.keystore -storepass YYYY -list -v Now have a keystore with alias hawkular-metrics and a chain of 3 certs. Now convert to base64 (without wrapping) base64 -w 0 hawkular-metrics.keystore echo "YYYY" | base64 oc edit rc/hawkular-metrics-secrets Replace hawkular-metrics.keystore and hawkular-metrics.keystore.password with your new base64 encoded versions and save oc delete pod hawkular-metrics-XXXX Update the master-config (if not already done) /etc/origin/master/master-config.yaml assetConfig: ... metricsPublicURL: https://hawkular-metrics.paas.dev.iggroup.local/hawkular/metrics restart everything! DONE Getting these working has taken weeks of effort and left me with an impression of fragility with the platform. I think the following needs to happen: - 1. The documentation needs to include instructions to include MASTER_URL when processing the template if you are running HA Masters 2. The metrics deployer needs to be able to provision certs with multiple CA certs, and I'd suggest that running via the openssl plcs12 method and converting to a keystore is a good way of doing this. 3. Stop using lifecycle hooks as readiness checks! It's absolutely horrible that pods get stuck in PENDING because they have a lifecycle hook waiting for successful start up. You can't look in the logs or connect to the pod or anything useful whilst the PostStart hook has the pod in PENDING. This is just an awful thing to do. Why can't they use a readiness probe instead? I've re-tested both the default installation method and this process on 3.1.1 and can confirm that the default method still doesn't work and that this workaround gets them working. regards Dave
Congratulations on getting it to work. A couple of comments: "1 - metrics deployer not creating the certificate chain properly when containing 3 certs, i.e. ca -> intermediate -> cert" As already mentioned [https://bugzilla.redhat.com/show_bug.cgi?id=1294067#c20] we have this exact setup as part of our tests. I suspect some of the secrets you are setting are not containing the correct values and is why you are seeing this problem. You shouldn't need to do any more steps than just passing the right values to the deployer. "2 - metrics unable to start due to ha-master unless I set MASTER_URL= to the correct URL when processing the template." The default MASTER_URL should work unless you are either removing that service from the project or don't have your certificates properly configured. Having to modify the MASTER_URL usually would indicate that something is wrong with your OpenShift installation. We do have the MASTER_URL as an option though to help people get around this. Do you know the exact error you were seeing when using the default value for the MASTER_URL? "3. Stop using lifecycle hooks as readiness checks! It's absolutely horrible that pods get stuck in PENDING because they have a lifecycle hook waiting for successful start up. You can't look in the logs or connect to the pod or anything useful whilst the PostStart hook has the pod in PENDING. This is just an awful thing to do. Why can't they use a readiness probe instead?" You can get the logs from docker, its currently an issue with OpenShift where it can't get the logs from a pending container. Readiness probes and PostStart hooks do completely different things and they are not at all interchangeable. I do agree that the PostStart scripts are in the enterprise containers are a bit awkward at the moment. Some improvements have been done for the next version, and the logs should be handled better in a future release as well.
> You can get the logs from docker, its currently an issue with OpenShift where it can't get the logs from a pending container. Is fixed in 3.2.
Is they are having issues getting their custom certs working by overwriting the Hawkular Metrics cert, is it possible for them to use a re-encrypting route instead? This by passes a lot of the head aches when providing your own certificates
I am going to assume that when the customer updated to using re-encrypting endpoints that this issue was resolved. In more recent versions the only option is re-encrypting endpoints.