Bug 1294067 - hawkular-metrics pod stuck in 'Pending'
Summary: hawkular-metrics pod stuck in 'Pending'
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Hawkular
Version: 3.1.0
Hardware: x86_64
OS: Linux
high
low
Target Milestone: ---
: ---
Assignee: Matt Wringe
QA Contact: Peng Li
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2015-12-24 11:45 UTC by Dave McCormick
Modified: 2019-10-10 10:47 UTC (History)
12 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-10-31 16:03:45 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Dave McCormick 2015-12-24 11:45:07 UTC
Description of problem:

Hi when I try to set up the metrics collection with my own certificate the hawkular-metrics pod stays with a state of 'Pending' with no discernable error in the logs.

How reproducible:
Each time I create it.

Steps to Reproduce:
1. Set up metrics collection with custom certificate: -

oc secrets new metrics-deployer hawkular-metrics.pem=/etc/pki/hawkular-metrics.paas.dev.iggroup.local.pem hawkular-metrics-ca.cert=/etc/pki/ig-private-ca.crt

oc process -f metrics-deployer.yaml -v HAWKULAR_METRICS_HOSTNAME=hawkular-metrics.paas.dev.iggroup.local,USE_PERSISTENT_STORAGE=false,REDEPLOY=true | oc create -f -

Actual results:

NAME                         READY     STATUS             RESTARTS   AGE
hawkular-cassandra-1-jthxm   1/1       Running            0          1h
hawkular-metrics-e25n6       0/1       Pending            0          10m
heapster-xp7o3               0/1       CrashLoopBackOff   17         1h

oc describe pod hawkular-metrics-e25n6
Name:                           hawkular-metrics-e25n6
Namespace:                      openshift-infra
Image(s):                       registry.access.redhat.com/openshift3/metrics-hawkular-metrics:3.1.0
Node:                           vrdevosnode003.iggroup.local/
Start Time:                     Thu, 24 Dec 2015 11:10:17 +0000
Labels:                         metrics-infra=hawkular-metrics,name=hawkular-metrics
Status:                         Pending
Reason:
Message:
IP:
Replication Controllers:        hawkular-metrics (1/1 replicas created)
Containers:
  hawkular-metrics:
    Container ID:
    Image:              registry.access.redhat.com/openshift3/metrics-hawkular-metrics:3.1.0
    Image ID:
    QoS Tier:
      cpu:              BestEffort
      memory:           BestEffort
    State:              Waiting
    Ready:              False
    Restart Count:      0
    Environment Variables:
      POD_NAMESPACE:    openshift-infra (v1:metadata.namespace)
Volumes:
  hawkular-metrics-secrets:
    Type:       Secret (a secret that should populate this volume)
    SecretName: hawkular-metrics-secrets
  hawkular-token-18xqi:
    Type:       Secret (a secret that should populate this volume)
    SecretName: hawkular-token-18xqi
Events:
  FirstSeen     LastSeen        Count   From                                    SubobjectPath                           Reason          Message
  ─────────     ────────        ─────   ────                                    ─────────────                           ──────          ───────
  28m           28m             1       {kubelet vrdevosnode003.iggroup.local}  implicitly required container POD       Pulled          Container image "openshift3/ose-pod:v3.1.0.4" already present on machine
  28m           28m             1       {kubelet vrdevosnode003.iggroup.local}  implicitly required container POD       Created         Created with docker id 16ecd4d1bc6e
  28m           28m             1       {kubelet vrdevosnode003.iggroup.local}  implicitly required container POD       Started         Started with docker id 16ecd4d1bc6e
  28m           28m             1       {kubelet vrdevosnode003.iggroup.local}  spec.containers{hawkular-metrics}       Pulling         pulling image "registry.access.redhat.com/openshift3/metrics-hawkular-metrics:3.1.0"
  28m           28m             1       {kubelet vrdevosnode003.iggroup.local}  spec.containers{hawkular-metrics}       Pulled          Successfully pulled image "registry.access.redhat.com/openshift3/metrics-hawkular-metrics:3.1.0"
  28m           28m             1       {kubelet vrdevosnode003.iggroup.local}  spec.containers{hawkular-metrics}       Created         Created with docker id 0c34f0335e00
  28m           28m             1       {kubelet vrdevosnode003.iggroup.local}  spec.containers{hawkular-metrics}       Started         Started with docker id 0c34f0335e00
  28m           28m             1       {scheduler }                                                                    Scheduled       Successfully assigned hawkular-metrics-e25n6 to vrdevosnode003.iggroup.local


Looking on the node (vrdevosnode003)

CONTAINER ID        IMAGE                                                                  COMMAND                  CREATED             STATUS              PORTS               NAMES
0c34f0335e00        registry.access.redhat.com/openshift3/metrics-hawkular-metrics:3.1.0   "/opt/hawkular/script"   30 minutes ago      Up 30 minutes                           k8s_hawkular-metrics.70adfb93_hawkular-metrics-e25n6_openshift-infra_f8cafc5d-aa2e-11e5-9488-0050568f9ceb_88f8596d
16ecd4d1bc6e        openshift3/ose-pod:v3.1.0.4                                            "/pod"                   30 minutes ago      Up 30 minutes                           k8s_POD.e73d2a82_hawkular-metrics-e25n6_openshift-infra_f8cafc5d-aa2e-11e5-9488-0050568f9ceb_d44bd9c9


Inspecting the container: -

docker inspect 0c34f0335e00
[
{
    "Id": "0c34f0335e0063de6588ffba85d222c991ff9a4acd746ac3621445ae496b4b5a",
    "Created": "2015-12-24T11:10:22.560303118Z",
    "Path": "/opt/hawkular/scripts/hawkular-metrics-wrapper.sh",
    "Args": [
        "-b",
        "0.0.0.0",
        "-Dhawkular-metrics.cassandra-nodes=hawkular-cassandra",
        "-Dhawkular-metrics.cassandra-use-ssl",
        "-Dorg.apache.tomcat.util.buf.UDecoder.ALLOW_ENCODED_SLASH=true",
        "-Dorg.apache.catalina.connector.CoyoteAdapter.ALLOW_BACKSLASH=true",
        "-Dhawkular-metrics.openshift.auth-methods=openshift-oauth,htpasswd",
        "-Dhawkular-metrics.openshift.htpasswd-file=/secrets/hawkular-metrics.htpasswd.file",
        "-Dhawkular.metrics.allowed-cors-access-control-allow-headers=authorization",
        "-Dhawkular.metrics.default-ttl=7",
        "-DKUBERNETES_MASTER_URL=https://kubernetes.default.svc:443",
        "--hmw.keystore=/secrets/hawkular-metrics.keystore",
        "--hmw.truststore=/secrets/hawkular-metrics.truststore",
        "--hmw.keystore_password_file=/secrets/hawkular-metrics.keystore.password",
        "--hmw.truststore_password_file=/secrets/hawkular-metrics.truststore.password"
    ],
    "State": {
        "Running": true,
        "Paused": false,
        "Restarting": false,
        "OOMKilled": false,
        "Dead": false,
        "Pid": 14783,
        "ExitCode": 0,
        "Error": "",
        "StartedAt": "2015-12-24T11:10:23.479536006Z",
        "FinishedAt": "0001-01-01T00:00:00Z"
    },
    "Image": "b44dc66d64f234ff1c857c6c0f621cde5b005312266ec972e1f55ec46eccca4c",
    "NetworkSettings": {
        "Bridge": "",
        "EndpointID": "",
        "Gateway": "",
        "GlobalIPv6Address": "",
        "GlobalIPv6PrefixLen": 0,
        "HairpinMode": false,
        "IPAddress": "",
        "IPPrefixLen": 0,
        "IPv6Gateway": "",
        "LinkLocalIPv6Address": "",
        "LinkLocalIPv6PrefixLen": 0,
        "MacAddress": "",
        "NetworkID": "",
        "PortMapping": null,
        "Ports": null,
        "SandboxKey": "",
        "SecondaryIPAddresses": null,
        "SecondaryIPv6Addresses": null
    },
    "ResolvConfPath": "/var/lib/docker/containers/16ecd4d1bc6e6a23ec23da1d9c678974c622fea9663b7365ed315944f5c9e62f/resolv.conf",
    "HostnamePath": "/var/lib/docker/containers/16ecd4d1bc6e6a23ec23da1d9c678974c622fea9663b7365ed315944f5c9e62f/hostname",
    "HostsPath": "/var/lib/origin/openshift.local.volumes/pods/f8cafc5d-aa2e-11e5-9488-0050568f9ceb/etc-hosts",
    "LogPath": "/var/lib/docker/containers/0c34f0335e0063de6588ffba85d222c991ff9a4acd746ac3621445ae496b4b5a/0c34f0335e0063de6588ffba85d222c991ff9a4acd746ac3621445ae496b4b5a-json.log",
    "Name": "/k8s_hawkular-metrics.70adfb93_hawkular-metrics-e25n6_openshift-infra_f8cafc5d-aa2e-11e5-9488-0050568f9ceb_88f8596d",
    "RestartCount": 0,
    "Driver": "devicemapper",
    "ExecDriver": "native-0.2",
    "MountLabel": "system_u:object_r:svirt_sandbox_file_t:s0:c3,c2",
    "ProcessLabel": "system_u:system_r:svirt_lxc_net_t:s0:c3,c2",
    "AppArmorProfile": "",
    "ExecIDs": [
        "dd2e8ffd9e4202d7b36f4f1415eb4a661db5f95f34b47013c6701cd8132a9cc3"
    ],
    "HostConfig": {
        "Binds": [
            "/var/lib/origin/openshift.local.volumes/pods/f8cafc5d-aa2e-11e5-9488-0050568f9ceb/volumes/kubernetes.io~secret/hawkular-metrics-secrets:/secrets:Z",
            "/var/lib/origin/openshift.local.volumes/pods/f8cafc5d-aa2e-11e5-9488-0050568f9ceb/volumes/kubernetes.io~secret/hawkular-token-18xqi:/var/run/secrets/kubernetes.io/serviceaccount:ro,Z",
            "/var/lib/origin/openshift.local.volumes/pods/f8cafc5d-aa2e-11e5-9488-0050568f9ceb/etc-hosts:/etc/hosts",
            "/var/lib/origin/openshift.local.volumes/pods/f8cafc5d-aa2e-11e5-9488-0050568f9ceb/containers/hawkular-metrics/0c34f0335e0063de6588ffba85d222c991ff9a4acd746ac3621445ae496b4b5a:/dev/termination-log"
        ],
        "ContainerIDFile": "",
        "LxcConf": null,
        "Memory": 0,
        "MemorySwap": -1,
        "CpuShares": 2,
        "CpuPeriod": 0,
        "CpusetCpus": "",
        "CpusetMems": "",
        "CpuQuota": 0,
        "BlkioWeight": 0,
        "OomKillDisable": false,
        "MemorySwappiness": null,
        "Privileged": false,
        "PortBindings": null,
        "Links": null,
        "PublishAllPorts": false,
        "Dns": [
            "172.30.0.1",
            "172.27.25.210",
            "172.24.25.210"
        ],
        "DnsSearch": [
            "openshift-infra.svc.cluster.local",
            "svc.cluster.local",
            "cluster.local",
            "test.iggroup.local",
            "iggroup.local"
        ],
        "ExtraHosts": null,
        "VolumesFrom": null,
        "Devices": null,
        "NetworkMode": "container:16ecd4d1bc6e6a23ec23da1d9c678974c622fea9663b7365ed315944f5c9e62f",
        "IpcMode": "container:16ecd4d1bc6e6a23ec23da1d9c678974c622fea9663b7365ed315944f5c9e62f",
        "PidMode": "",
        "UTSMode": "",
        "CapAdd": null,
        "CapDrop": null,
        "GroupAdd": null,
        "RestartPolicy": {
            "Name": "",
            "MaximumRetryCount": 0
        },
        "SecurityOpt": [
            "label:level:s0:c3,c2"
        ],
        "ReadonlyRootfs": false,
        "Ulimits": null,
        "LogConfig": {
            "Type": "json-file",
            "Config": {}
        },
        "CgroupParent": "",
        "ConsoleSize": [
            0,
            0
        ]
    },
    "GraphDriver": {
        "Name": "devicemapper",
        "Data": {
            "DeviceId": "223",
            "DeviceName": "docker-253:1-351846-0c34f0335e0063de6588ffba85d222c991ff9a4acd746ac3621445ae496b4b5a",
            "DeviceSize": "107374182400"
        }
    },
    "Mounts": [
        {
            "Source": "/var/lib/origin/openshift.local.volumes/pods/f8cafc5d-aa2e-11e5-9488-0050568f9ceb/volumes/kubernetes.io~secret/hawkular-metrics-secrets",
            "Destination": "/secrets",
            "Mode": "Z",
            "RW": true
        },
        {
            "Source": "/var/lib/origin/openshift.local.volumes/pods/f8cafc5d-aa2e-11e5-9488-0050568f9ceb/volumes/kubernetes.io~secret/hawkular-token-18xqi",
            "Destination": "/var/run/secrets/kubernetes.io/serviceaccount",
            "Mode": "ro,Z",
            "RW": false
        },
        {
            "Source": "/var/lib/origin/openshift.local.volumes/pods/f8cafc5d-aa2e-11e5-9488-0050568f9ceb/etc-hosts",
            "Destination": "/etc/hosts",
            "Mode": "",
            "RW": true
        },
        {
            "Source": "/var/lib/origin/openshift.local.volumes/pods/f8cafc5d-aa2e-11e5-9488-0050568f9ceb/containers/hawkular-metrics/0c34f0335e0063de6588ffba85d222c991ff9a4acd746ac3621445ae496b4b5a",
            "Destination": "/dev/termination-log",
            "Mode": "",
            "RW": true
        }
    ],
    "Config": {
        "Hostname": "hawkular-metrics-e25n6",
        "Domainname": "",
        "User": "1000010000",
        "AttachStdin": false,
        "AttachStdout": false,
        "AttachStderr": false,
        "ExposedPorts": {
            "8080/tcp": {},
            "8443/tcp": {},
            "8444/tcp": {}
        },
        "PublishService": "",
        "Tty": false,
        "OpenStdin": false,
        "StdinOnce": false,
        "Env": [
            "POD_NAMESPACE=openshift-infra",
            "HAWKULAR_CASSANDRA_PORT_9042_TCP_PROTO=tcp",
            "HAWKULAR_CASSANDRA_PORT_9160_TCP_ADDR=172.30.221.15",
            "KUBERNETES_SERVICE_PORT=443",
            "KUBERNETES_PORT_53_UDP_PORT=53",
            "HAWKULAR_CASSANDRA_SERVICE_PORT_CQL_PORT=9042",
            "HAWKULAR_CASSANDRA_PORT_7001_TCP_PORT=7001",
            "HAWKULAR_CASSANDRA_SERVICE_PORT_SSL_PORT=7001",
            "HAWKULAR_CASSANDRA_PORT_7001_TCP=tcp://172.30.221.15:7001",
            "HAWKULAR_METRICS_PORT_443_TCP=tcp://172.30.144.231:443",
            "HAWKULAR_METRICS_PORT_443_TCP_PROTO=tcp",
            "KUBERNETES_PORT_443_TCP_PROTO=tcp",
            "KUBERNETES_PORT_443_TCP_ADDR=172.30.0.1",
            "KUBERNETES_PORT_53_TCP_PORT=53",
            "HEAPSTER_PORT_80_TCP_ADDR=172.30.175.249",
            "KUBERNETES_PORT_53_UDP=udp://172.30.0.1:53",
            "KUBERNETES_PORT_53_TCP_ADDR=172.30.0.1",
            "KUBERNETES_SERVICE_PORT_DNS=53",
            "HAWKULAR_CASSANDRA_PORT_7000_TCP_PORT=7000",
            "HAWKULAR_METRICS_PORT_443_TCP_PORT=443",
            "KUBERNETES_SERVICE_PORT_DNS_TCP=53",
            "KUBERNETES_PORT_53_UDP_ADDR=172.30.0.1",
            "KUBERNETES_PORT_53_TCP=tcp://172.30.0.1:53",
            "HEAPSTER_PORT_80_TCP_PORT=80",
            "KUBERNETES_PORT=tcp://172.30.0.1:443",
            "HAWKULAR_CASSANDRA_PORT_9160_TCP_PORT=9160",
            "HAWKULAR_CASSANDRA_PORT_7000_TCP_PROTO=tcp",
            "HAWKULAR_CASSANDRA_PORT_7001_TCP_PROTO=tcp",
            "HEAPSTER_PORT_80_TCP=tcp://172.30.175.249:80",
            "HAWKULAR_CASSANDRA_SERVICE_PORT_THIFT_PORT=9160",
            "HAWKULAR_CASSANDRA_PORT_9160_TCP=tcp://172.30.221.15:9160",
            "HAWKULAR_METRICS_PORT_443_TCP_ADDR=172.30.144.231",
            "HAWKULAR_CASSANDRA_PORT_9042_TCP=tcp://172.30.221.15:9042",
            "KUBERNETES_SERVICE_HOST=172.30.0.1",
            "HAWKULAR_CASSANDRA_SERVICE_PORT=9042",
            "HAWKULAR_CASSANDRA_PORT_9042_TCP_ADDR=172.30.221.15",
            "HAWKULAR_CASSANDRA_PORT_7000_TCP=tcp://172.30.221.15:7000",
            "KUBERNETES_PORT_53_TCP_PROTO=tcp",
            "HEAPSTER_SERVICE_PORT=80",
            "HAWKULAR_METRICS_SERVICE_PORT_HTTPS_ENDPOINT=443",
            "HEAPSTER_PORT=tcp://172.30.175.249:80",
            "HEAPSTER_PORT_80_TCP_PROTO=tcp",
            "KUBERNETES_SERVICE_PORT_HTTPS=443",
            "KUBERNETES_PORT_443_TCP=tcp://172.30.0.1:443",
            "HAWKULAR_CASSANDRA_PORT_9042_TCP_PORT=9042",
            "KUBERNETES_PORT_443_TCP_PORT=443",
            "KUBERNETES_PORT_53_UDP_PROTO=udp",
            "HAWKULAR_CASSANDRA_SERVICE_PORT_TCP_PORT=7000",
            "HAWKULAR_CASSANDRA_PORT_9160_TCP_PROTO=tcp",
            "HAWKULAR_CASSANDRA_PORT_7000_TCP_ADDR=172.30.221.15",
            "HAWKULAR_CASSANDRA_PORT_7001_TCP_ADDR=172.30.221.15",
            "HAWKULAR_METRICS_SERVICE_HOST=172.30.144.231",
            "HAWKULAR_METRICS_PORT=tcp://172.30.144.231:443",
            "HEAPSTER_SERVICE_HOST=172.30.175.249",
            "HAWKULAR_CASSANDRA_SERVICE_HOST=172.30.221.15",
            "HAWKULAR_CASSANDRA_PORT=tcp://172.30.221.15:9042",
            "HAWKULAR_METRICS_SERVICE_PORT=443",
            "container=docker",
            "PATH=/usr/local/bin:/usr/bin:/bin:/usr/local/sbin:/usr/sbin",
            "HOME=/home/jboss",
            "JAVA_HOME=/usr/lib/jvm/java-1.8.0",
            "JAVA_VENDOR=openjdk",
            "JAVA_VERSION=1.8.0",
            "LAUNCH_JBOSS_IN_BACKGROUND=true",
            "JBOSS_PRODUCT=eap",
            "JBOSS_EAP_VERSION=6.4.3.GA",
            "JBOSS_HOME=/opt/eap",
            "JBOSS_MODULES_SYSTEM_PKGS=org.jboss.logmanager",
            "JBOSS_IMAGE_NAME=jboss-eap-6/eap-openshift",
            "JBOSS_IMAGE_VERSION=6.4",
            "JBOSS_IMAGE_RELEASE=315",
            "STI_BUILDER=jee",
            "HAWKULAR_METRICS_ENDPOINT_PORT=8080",
            "HAWKULAR_METRICS_VERSION=0.8.0.Final",
            "HAWKULAR_METRICS_DIRECTORY=/opt/hawkular",
            "HAWKULAR_METRICS_SCRIPT_DIRECTORY=/opt/hawkular/scripts/"
        ],
        "Cmd": null,
        "Image": "registry.access.redhat.com/openshift3/metrics-hawkular-metrics:3.1.0",
        "Volumes": null,
        "VolumeDriver": "",
        "WorkingDir": "/home/jboss",
        "Entrypoint": [
            "/opt/hawkular/scripts/hawkular-metrics-wrapper.sh",
            "-b",
            "0.0.0.0",
            "-Dhawkular-metrics.cassandra-nodes=hawkular-cassandra",
            "-Dhawkular-metrics.cassandra-use-ssl",
            "-Dorg.apache.tomcat.util.buf.UDecoder.ALLOW_ENCODED_SLASH=true",
            "-Dorg.apache.catalina.connector.CoyoteAdapter.ALLOW_BACKSLASH=true",
            "-Dhawkular-metrics.openshift.auth-methods=openshift-oauth,htpasswd",
            "-Dhawkular-metrics.openshift.htpasswd-file=/secrets/hawkular-metrics.htpasswd.file",
            "-Dhawkular.metrics.allowed-cors-access-control-allow-headers=authorization",
            "-Dhawkular.metrics.default-ttl=7",
            "-DKUBERNETES_MASTER_URL=https://kubernetes.default.svc:443",
            "--hmw.keystore=/secrets/hawkular-metrics.keystore",
            "--hmw.truststore=/secrets/hawkular-metrics.truststore",
            "--hmw.keystore_password_file=/secrets/hawkular-metrics.keystore.password",
            "--hmw.truststore_password_file=/secrets/hawkular-metrics.truststore.password"
        ],
        "NetworkDisabled": false,
        "MacAddress": "",
        "OnBuild": null,
        "Labels": {
            "io.kubernetes.pod.name": "openshift-infra/hawkular-metrics-e25n6",
            "io.kubernetes.pod.terminationGracePeriod": "30"
        }
    }
}
]


Looking at container logs: -

docker logs 0c34f0335e00
/opt/hawkular/auth ~
Certificate was added to keystore
[Storing hawkular-metrics.truststore]
~
=========================================================================

  JBoss Bootstrap Environment

  JBOSS_HOME: /opt/eap

  JAVA: /usr/lib/jvm/java-1.8.0/bin/java

  JAVA_OPTS:  -server -XX:+UseCompressedOops -verbose:gc -Xloggc:"/opt/eap/standalone/log/gc.log" -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=5 -XX:GCLogFileSize=3M -XX:-TraceClassUnloading -Xms1303m -Xmx1303m -XX:MaxPermSize=256m -Djava.net.preferIPv4Stack=true -Djboss.modules.system.pkgs=org.jboss.logmanager -Djava.awt.headless=true -Djboss.modules.policy-permissions=true -Xbootclasspath/p:/opt/eap/jboss-modules.jar:/opt/eap/modules/system/layers/base/org/jboss/logmanager/main/jboss-logmanager-1.5.4.Final-redhat-1.jar:/opt/eap/modules/system/layers/base/org/jboss/logmanager/ext/main/javax.json-1.0.4.jar:/opt/eap/modules/system/layers/base/org/jboss/logmanager/ext/main/jboss-logmanager-ext-1.0.0.Alpha2-redhat-1.jar -Djava.util.logging.manager=org.jboss.logmanager.LogManager -javaagent:/opt/eap/jolokia.jar=port=8778,host=127.0.0.1,discoveryEnabled=false

=========================================================================

OpenJDK 64-Bit Server VM warning: ignoring option MaxPermSize=256m; support was removed in 8.0
I> No access restrictor found, access to all MBean is allowed
Jolokia: Agent started with URL http://127.0.0.1:8778/jolokia/
06:10:25,201 INFO  [org.jboss.modules] (main) JBoss Modules version 1.3.7.Final-redhat-1
06:10:25,597 INFO  [org.jboss.msc] (main) JBoss MSC version 1.1.5.Final-redhat-1
06:10:25,687 INFO  [org.jboss.as] (MSC service thread 1-2) JBAS015899: JBoss EAP 6.4.3.GA (AS 7.5.3.Final-redhat-2) starting
06:10:25,693 DEBUG [org.jboss.as.config] (MSC service thread 1-2) Configured system properties:
        KUBERNETES_MASTER_URL = https://kubernetes.default.svc:443
        [Standalone] =
        awt.toolkit = sun.awt.X11.XToolkit
        file.encoding = ANSI_X3.4-1968
        file.encoding.pkg = sun.io
        file.separator = /
        hawkular-metrics.cassandra-nodes = hawkular-cassandra
        hawkular-metrics.cassandra-use-ssl = true
        hawkular-metrics.openshift.auth-methods = openshift-oauth,htpasswd
        hawkular-metrics.openshift.htpasswd-file = /secrets/hawkular-metrics.htpasswd.file
        hawkular.metrics.allowed-cors-access-control-allow-headers = authorization
        hawkular.metrics.default-ttl = 7
        java.awt.graphicsenv = sun.awt.X11GraphicsEnvironment
        java.awt.headless = true
        java.awt.printerjob = sun.print.PSPrinterJob
        java.class.path = /opt/eap/jboss-modules.jar:/opt/eap/jolokia.jar
        java.class.version = 52.0
        java.endorsed.dirs = /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.51-1.b16.el7_1.x86_64/jre/lib/endorsed
        java.ext.dirs = /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.51-1.b16.el7_1.x86_64/jre/lib/ext:/usr/java/packages/lib/ext
        java.home = /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.51-1.b16.el7_1.x86_64/jre
        java.io.tmpdir = /tmp
        java.library.path = /usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib
        java.net.preferIPv4Stack = true
        java.runtime.name = OpenJDK Runtime Environment
        java.runtime.version = 1.8.0_51-b16
        java.specification.name = Java Platform API Specification
        java.specification.vendor = Oracle Corporation
        java.specification.version = 1.8
        java.util.logging.manager = org.jboss.logmanager.LogManager
        java.vendor = Oracle Corporation
        java.vendor.url = http://java.oracle.com/
        java.vendor.url.bug = http://bugreport.sun.com/bugreport/
        java.version = 1.8.0_51
        java.vm.info = mixed mode
        java.vm.name = OpenJDK 64-Bit Server VM
        java.vm.specification.name = Java Virtual Machine Specification
        java.vm.specification.vendor = Oracle Corporation
        java.vm.specification.version = 1.8
        java.vm.vendor = Oracle Corporation
        java.vm.version = 25.51-b03
        javax.management.builder.initial = org.jboss.as.jmx.PluggableMBeanServerBuilder
        javax.net.ssl.keyStore = /opt/hawkular/auth/hawkular-metrics.keystore
        javax.net.ssl.keyStorePassword = <redacted>
        javax.net.ssl.trustStore = /opt/hawkular/auth/hawkular-metrics.truststore
        javax.net.ssl.trustStorePassword = <redacted>
        javax.xml.datatype.DatatypeFactory = __redirected.__DatatypeFactory
        javax.xml.parsers.DocumentBuilderFactory = __redirected.__DocumentBuilderFactory
        javax.xml.parsers.SAXParserFactory = __redirected.__SAXParserFactory
        javax.xml.stream.XMLEventFactory = __redirected.__XMLEventFactory
        javax.xml.stream.XMLInputFactory = __redirected.__XMLInputFactory
        javax.xml.stream.XMLOutputFactory = __redirected.__XMLOutputFactory
        javax.xml.transform.TransformerFactory = __redirected.__TransformerFactory
        javax.xml.validation.SchemaFactory:http://www.w3.org/2001/XMLSchema = __redirected.__SchemaFactory
        javax.xml.xpath.XPathFactory:http://java.sun.com/jaxp/xpath/dom = __redirected.__XPathFactory
        jboss.bind.address = 0.0.0.0
        jboss.home.dir = /opt/eap
        jboss.host.name = hawkular-metrics-e25n6
        jboss.modules.dir = /opt/eap/modules
        jboss.modules.policy-permissions = true
        jboss.modules.system.pkgs = org.jboss.logmanager
        jboss.node.name = hawkular-metrics-e25n6
        jboss.qualified.host.name = hawkular-metrics-e25n6
        jboss.server.base.dir = /opt/eap/standalone
        jboss.server.config.dir = /opt/eap/standalone/configuration
        jboss.server.data.dir = /opt/eap/standalone/data
        jboss.server.deploy.dir = /opt/eap/standalone/data/content
        jboss.server.log.dir = /opt/eap/standalone/log
        jboss.server.name = hawkular-metrics-e25n6
        jboss.server.persist.config = true
        jboss.server.temp.dir = /opt/eap/standalone/tmp
        jolokia.agent = http://127.0.0.1:8778/jolokia/
        line.separator =

        logging.configuration = file:/opt/eap/standalone/configuration/logging.properties
        module.path = /opt/eap/modules
        org.apache.catalina.connector.CoyoteAdapter.ALLOW_BACKSLASH = true
        org.apache.tomcat.util.buf.UDecoder.ALLOW_ENCODED_SLASH = true
        org.jboss.boot.log.file = /opt/eap/standalone/log/server.log
        org.jboss.resolver.warning = true
        org.xml.sax.driver = __redirected.__XMLReaderFactory
        os.arch = amd64
        os.name = Linux
        os.version = 3.10.0-327.el7.x86_64
        path.separator = :
        sun.arch.data.model = 64
        sun.boot.class.path = /opt/eap/jboss-modules.jar:/opt/eap/modules/system/layers/base/org/jboss/logmanager/main/jboss-logmanager-1.5.4.Final-redhat-1.jar:/opt/eap/modules/system/layers/base/org/jboss/logmanager/ext/main/javax.json-1.0.4.jar:/opt/eap/modules/system/layers/base/org/jboss/logmanager/ext/main/jboss-logmanager-ext-1.0.0.Alpha2-redhat-1.jar:/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.51-1.b16.el7_1.x86_64/jre/lib/resources.jar:/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.51-1.b16.el7_1.x86_64/jre/lib/rt.jar:/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.51-1.b16.el7_1.x86_64/jre/lib/sunrsasign.jar:/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.51-1.b16.el7_1.x86_64/jre/lib/jsse.jar:/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.51-1.b16.el7_1.x86_64/jre/lib/jce.jar:/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.51-1.b16.el7_1.x86_64/jre/lib/charsets.jar:/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.51-1.b16.el7_1.x86_64/jre/lib/jfr.jar:/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.51-1.b16.el7_1.x86_64/jre/classes
        sun.boot.library.path = /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.51-1.b16.el7_1.x86_64/jre/lib/amd64
        sun.cpu.endian = little
        sun.cpu.isalist =
        sun.io.unicode.encoding = UnicodeLittle
        sun.java.command = /opt/eap/jboss-modules.jar -mp /opt/eap/modules -jaxpmodule javax.xml.jaxp-provider org.jboss.as.standalone -Djboss.home.dir=/opt/eap -Djboss.server.base.dir=/opt/eap/standalone -Djavax.net.ssl.keyStore=/opt/hawkular/auth/hawkular-metrics.keystore -Djavax.net.ssl.keyStorePassword=etOUcDIVRDJTKlB -Djavax.net.ssl.trustStore=/opt/hawkular/auth/hawkular-metrics.truststore -Djavax.net.ssl.trustStorePassword=ecyzAhganunw8ue -b 0.0.0.0 -Dhawkular-metrics.cassandra-nodes=hawkular-cassandra -Dhawkular-metrics.cassandra-use-ssl -Dorg.apache.tomcat.util.buf.UDecoder.ALLOW_ENCODED_SLASH=true -Dorg.apache.catalina.connector.CoyoteAdapter.ALLOW_BACKSLASH=true -Dhawkular-metrics.openshift.auth-methods=openshift-oauth,htpasswd -Dhawkular-metrics.openshift.htpasswd-file=/secrets/hawkular-metrics.htpasswd.file -Dhawkular.metrics.allowed-cors-access-control-allow-headers=authorization -Dhawkular.metrics.default-ttl=7 -DKUBERNETES_MASTER_URL=https://kubernetes.default.svc:443
        sun.java.launcher = SUN_STANDARD
        sun.jnu.encoding = ANSI_X3.4-1968
        sun.management.compiler = HotSpot 64-Bit Tiered Compilers
        sun.os.patch.level = unknown
        user.country = US
        user.dir = /home/jboss
        user.home = ?
        user.language = en
        user.name = ?
        user.timezone = America/New_York
06:10:25,693 DEBUG [org.jboss.as.config] (MSC service thread 1-2) VM Arguments: -D[Standalone] -XX:+UseCompressedOops -verbose:gc -Xloggc:/opt/eap/standalone/log/gc.log -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=5 -XX:GCLogFileSize=3M -XX:-TraceClassUnloading -Xms1303m -Xmx1303m -XX:MaxPermSize=256m -Djava.net.preferIPv4Stack=true -Djboss.modules.system.pkgs=org.jboss.logmanager -Djava.awt.headless=true -Djboss.modules.policy-permissions=true -Xbootclasspath/p:/opt/eap/jboss-modules.jar:/opt/eap/modules/system/layers/base/org/jboss/logmanager/main/jboss-logmanager-1.5.4.Final-redhat-1.jar:/opt/eap/modules/system/layers/base/org/jboss/logmanager/ext/main/javax.json-1.0.4.jar:/opt/eap/modules/system/layers/base/org/jboss/logmanager/ext/main/jboss-logmanager-ext-1.0.0.Alpha2-redhat-1.jar -Djava.util.logging.manager=org.jboss.logmanager.LogManager -javaagent:/opt/eap/jolokia.jar=port=8778,host=127.0.0.1,discoveryEnabled=false -Dorg.jboss.boot.log.file=/opt/eap/standalone/log/server.log -Dlogging.configuration=file:/opt/eap/standalone/configuration/logging.properties
06:10:27,294 INFO  [org.xnio] (MSC service thread 1-4) XNIO Version 3.0.14.GA-redhat-1
06:10:27,310 INFO  [org.jboss.as.server] (Controller Boot Thread) JBAS015888: Creating http management service using socket-binding (management-http)
06:10:27,317 INFO  [org.xnio.nio] (MSC service thread 1-4) XNIO NIO Implementation Version 3.0.14.GA-redhat-1
06:10:27,342 INFO  [org.jboss.remoting] (MSC service thread 1-4) JBoss Remoting version 3.3.5.Final-redhat-1


Expected results:

The pod should enter a running or failed state.

Additional info:
I have tried restarting all nodes and masters but this pod is still stuck in pending.  Other pods on the system appear to be working as expected (e.g. hello-openshift).

Can you help me debug why this pod never makes it out of pending?

regards




Dave

Comment 1 Dave McCormick 2015-12-24 11:50:42 UTC
I notice that in the describe it is missing both the container ID and an IP.

Comment 2 Ryan Howe 2015-12-30 23:35:36 UTC
I ran into the same issue but with using the default certs. I was able to resolve the issue with a work around, I am not sure the cause of this error but I have been able to reproduce it 100% when following our documentation with a HA environment and none HA environment. 

https://docs.openshift.com/enterprise/3.1/install_config/cluster_metrics.html



I was able to get the container to kick off and run by creating the hawkular-metrics pod myself using the template that was created by the deployer. 

Steps (run the following): 


# oc delete rc hawkular-metrics
# oc delete service hawkular-metrics

# oc get templates
# oc process hawkular-metrics -v "IMAGE_PREFIX=registry.access.redhat.com/openshift3/,IMAGE_VERSION=3.1.0,METRIC_DURATION=7,MASTER_URL=https://<#MASTER API HOSTNAME#>:8443" | oc create -f -

After running this the hawkular-metrics pod came up successfully. 


I was also having issues with the heapster pod after too and was able to get everything running following the same steps above using the heapster template created. 

# oc delete rc heapster
# oc delete service heapster

# oc get template 
# oc process hawkular-heapster -v "IMAGE_PREFIX=registry.access.redhat.com/openshift3/,IMAGE_VERSION=3.1.0,MASTER_URL=https://<#MASTER API HOSTNAME#>:8443" | oc create -f

Comment 3 Dave McCormick 2016-01-04 18:00:19 UTC
Hi

Thanks for the updates - apologies that my response has been slow due to the holiday break.
I'm still concerned by the lack of appropriate response from the system to the failure of this pod - it really should come up or report failure - interestingly the pod also sits forever in the 'terminating' when I try to delete it.  The pod launching process needs to be extremely robust - this doesn't feel that way and I worry about what will happen when we have many users launching their own pods.  Could it be because of the liveness probe?

Trying the workaround...

oc process hawkular-metrics -v "IMAGE_PREFIX=registry.access.redhat.com/openshift3/,IMAGE_VERSION=3.1.0,METRIC_DURATION=7,MASTER_URL=https://osemaster.dev.iggroup.local:443" | oc create -f
{
    "kind": "List",
    "apiVersion": "v1",
    "metadata": {},
    "items": [
        {
            "apiVersion": "v1",
            "kind": "Service",
            "metadata": {
                "labels": {
                    "metrics-infra": "hawkular-metrics",
                    "name": "hawkular-metrics"
                },
                "name": "hawkular-metrics"
            },
            "spec": {
                "ports": [
                    {
                        "name": "https-endpoint",
                        "port": 443,
                        "targetPort": "https-endpoint"
                    }
                ],
                "selector": {
                    "name": "hawkular-metrics"
                }
            }
        },
        {
            "apiVersion": "v1",
            "kind": "ReplicationController",
            "metadata": {
                "labels": {
                    "metrics-infra": "hawkular-metrics",
                    "name": "hawkular-metrics"
                },
                "name": "hawkular-metrics"
            },
            "spec": {
                "replicas": 1,
                "selector": {
                    "name": "hawkular-metrics"
                },
                "template": {
                    "metadata": {
                        "labels": {
                            "metrics-infra": "hawkular-metrics",
                            "name": "hawkular-metrics"
                        }
                    },
                    "spec": {
                        "containers": [
                            {
                                "command": [
                                    "/opt/hawkular/scripts/hawkular-metrics-wrapper.sh",
                                    "-b",
                                    "0.0.0.0",
                                    "-Dhawkular-metrics.cassandra-nodes=hawkular-cassandra",
                                    "-Dhawkular-metrics.cassandra-use-ssl",
                                    "-Dorg.apache.tomcat.util.buf.UDecoder.ALLOW_ENCODED_SLASH=true",
                                    "-Dorg.apache.catalina.connector.CoyoteAdapter.ALLOW_BACKSLASH=true",
                                    "-Dhawkular-metrics.openshift.auth-methods=openshift-oauth,htpasswd",
                                    "-Dhawkular-metrics.openshift.htpasswd-file=/secrets/hawkular-metrics.htpasswd.file",
                                    "-Dhawkular.metrics.allowed-cors-access-control-allow-headers=authorization",
                                    "-Dhawkular.metrics.default-ttl=7",
                                    "-DKUBERNETES_MASTER_URL=https://osemaster.dev.iggroup.local:443",
                                    "--hmw.keystore=/secrets/hawkular-metrics.keystore",
                                    "--hmw.truststore=/secrets/hawkular-metrics.truststore",
                                    "--hmw.keystore_password_file=/secrets/hawkular-metrics.keystore.password",
                                    "--hmw.truststore_password_file=/secrets/hawkular-metrics.truststore.password"
                                ],
                                "env": [
                                    {
                                        "name": "POD_NAMESPACE",
                                        "valueFrom": {
                                            "fieldRef": {
                                                "fieldPath": "metadata.namespace"
                                            }
                                        }
                                    }
                                ],
                                "image": "registry.access.redhat.com/openshift3/metrics-hawkular-metrics:3.1.0",
                                "lifecycle": {
                                    "postStart": {
                                        "exec": {
                                            "command": [
                                                "/opt/hawkular/scripts/hawkular-metrics-poststart.py"
                                            ]
                                        }
                                    }
                                },
                                "livenessProbe": {
                                    "exec": {
                                        "command": [
                                            "/opt/hawkular/scripts/hawkular-metrics-liveness.py"
                                        ]
                                    }
                                },
                                "name": "hawkular-metrics",
                                "ports": [
                                    {
                                        "containerPort": 8080,
                                        "name": "http-endpoint"
                                    },
                                    {
                                        "containerPort": 8444,
                                        "name": "https-endpoint"
                                    }
                                ],
                                "volumeMounts": [
                                    {
                                        "mountPath": "/secrets",
                                        "name": "hawkular-metrics-secrets"
                                    }
                                ]
                            }
                        ],
                        "serviceAccount": "hawkular",
                        "volumes": [
                            {
                                "name": "hawkular-metrics-secrets",
                                "secret": {
                                    "secretName": "hawkular-metrics-secrets"
                                }
                            }
                        ]
                    },
                    "version": "v1"
                }
            }
        }
    ]
}


oc describe pod hawkular-metrics-k21v1
Name:                           hawkular-metrics-k21v1
Namespace:                      openshift-infra
Image(s):                       registry.access.redhat.com/openshift3/metrics-hawkular-metrics:3.1.0
Node:                           vrdevosnode002.iggroup.local/
Start Time:                     Mon, 04 Jan 2016 17:53:06 +0000
Labels:                         metrics-infra=hawkular-metrics,name=hawkular-metrics
Status:                         Pending
Reason:
Message:
IP:
Replication Controllers:        hawkular-metrics (1/1 replicas created)
Containers:
  hawkular-metrics:
    Container ID:
    Image:              registry.access.redhat.com/openshift3/metrics-hawkular-metrics:3.1.0
    Image ID:
    QoS Tier:
      cpu:              BestEffort
      memory:           BestEffort
    State:              Waiting
    Ready:              False
    Restart Count:      0
    Environment Variables:
      POD_NAMESPACE:    openshift-infra (v1:metadata.namespace)
Volumes:
  hawkular-metrics-secrets:
    Type:       Secret (a secret that should populate this volume)
    SecretName: hawkular-metrics-secrets
  hawkular-token-18xqi:
    Type:       Secret (a secret that should populate this volume)
    SecretName: hawkular-token-18xqi
Events:
  FirstSeen     LastSeen        Count   From                                    SubobjectPath                           Reason          Message
  ─────────     ────────        ─────   ────                                    ─────────────                           ──────          ───────
  39s           39s             1       {scheduler }                                                                    Scheduled       Successfully assigned hawkular-metrics-k21v1 to vrdevosnode002.iggroup.local
  19s           19s             1       {kubelet vrdevosnode002.iggroup.local}  implicitly required container POD       Pulled          Container image "openshift3/ose-pod:v3.1.0.4" already present on machine
  18s           18s             1       {kubelet vrdevosnode002.iggroup.local}  implicitly required container POD       Created         Created with docker id 3221a79e6e43
  18s           18s             1       {kubelet vrdevosnode002.iggroup.local}  implicitly required container POD       Started         Started with docker id 3221a79e6e43
  17s           17s             1       {kubelet vrdevosnode002.iggroup.local}  spec.containers{hawkular-metrics}       Pulled          Container image "registry.access.redhat.com/openshift3/metrics-hawkular-metrics:3.1.0" already present on machine
  17s           17s             1       {kubelet vrdevosnode002.iggroup.local}  spec.containers{hawkular-metrics}       Created         Created with docker id c20167e0a238
  16s           16s             1       {kubelet vrdevosnode002.iggroup.local}  spec.containers{hawkular-metrics}       Started         Started with docker id c20167e0a238

Still looks to be stuck in 'pending' (note the old instances which have still not be terminated and cleaned up - which just feels wrong/broken)...

oc get pods
NAME                         READY     STATUS             RESTARTS   AGE
hawkular-cassandra-1-jthxm   1/1       Running            0          11d
hawkular-metrics-e25n6       0/1       Terminating        0          11d
hawkular-metrics-j0ub6       0/1       Terminating        0          11d
hawkular-metrics-k21v1       0/1       Pending            0          4m
heapster-xp7o3               0/1       CrashLoopBackOff   3159       11d

is there something fundamental going wrong here?

regards


Dave

Comment 4 Dave McCormick 2016-01-05 10:30:02 UTC
The problem would seem to be with the lifecycle post start command...

"lifecycle": {
  "postStart": {
    "exec": {
      "command": [
        "/opt/hawkular/scripts/hawkular-metrics-poststart.py"
      ]
    }
  }
},

When this is removed the container reaches the running state although it is restarting like crazy...

oc get pods
NAME                         READY     STATUS             RESTARTS   AGE
hawkular-cassandra-1-jthxm   1/1       Running            0          12d
hawkular-metrics-40q9n       0/1       Terminating        0          10m
hawkular-metrics-e25n6       0/1       Terminating        0          11d
hawkular-metrics-j0ub6       0/1       Terminating        0          12d
hawkular-metrics-k21v1       0/1       Terminating        0          16h
hawkular-metrics-sp646       1/1       Running            20         3m
heapster-xp7o3               0/1       CrashLoopBackOff   3350       12d

NOTE: All the old terminated pods are STILL listed.  The liveness check is still there...

oc get pod hawkular-metrics-sp646 -o yaml
apiVersion: v1
kind: Pod
...
    livenessProbe:
      exec:
        command:
        - /opt/hawkular/scripts/hawkular-metrics-liveness.py
      timeoutSeconds: 1

The logs seem to indicate that the process is being killed each time it starts up: -

05:28:30,620 INFO  [org.jboss.as.server] (Controller Boot Thread) JBAS015888: Creating http management service using socket-binding (management-http)
05:28:30,627 INFO  [org.xnio.nio] (MSC service thread 1-2) XNIO NIO Implementation Version 3.0.14.GA-redhat-1
05:28:30,653 INFO  [org.jboss.remoting] (MSC service thread 1-2) JBoss Remoting version 3.3.5.Final-redhat-1
*** JBossAS process (170) received TERM signal ***
*** JBossAS process (170) received TERM signal ***

Comment 5 Dave McCormick 2016-01-05 11:25:09 UTC
Hi

From the Kubernetes Container Environment documentation http://kubernetes.io/v1.1/docs/user-guide/container-environment.html#container-hooks: -

PostStart

This hook is sent immediately after a container is created.  It notifies the container that it has been created.  No parameters are passed to the handler.

The postStart script for hawkular-metrics postStart script looks designed to run in a tight loop ignoring 404 and 503 status codes until http://localhost:8080/hawkular/metrics/status returns JSON containing MetricsService == "STARTED" (or fails if it returns anything other than "STARTING".

The code: - 

import os
import json
import urllib2
import time

hawkularEndpointPort = os.environ.get("HAWKULAR_METRICS_ENDPOINT_PORT")

statusURL = "http://localhost:" + hawkularEndpointPort  + "/hawkular/metrics/status"

while True:
  try:
    response = urllib2.urlopen(statusURL)
    statusCode = response.getcode();
    if (statusCode == 200 or statusCode == 404 or statusCode == 503):
      if (statusCode == 200):
        jsonResponse = json.loads(response.read())
        if (jsonResponse["MetricsService"] == "STARTED"):
          exit(0)
        # If the status is not STARTED or STARTING then exit, something went wrong
        elif (jsonResponse["MetricsService"] != "STARTING"):
          exit(1)

    else:
      exit(1)

  except Exception:
    print "An Exception occured trying to connect to the endpoint."

  #sleep for 1 second and let the loop try over again
  time.sleep(1)

In the case of the metrics in my issue, the status page is stuck in the "STARTING" state...

curl -v -v -v http://localhost:8080/hawkular/metrics/status
* About to connect() to localhost port 8080 (#0)
*   Trying ::1...
* Connection refused
*   Trying 127.0.0.1...
* Connected to localhost (127.0.0.1) port 8080 (#0)
> GET /hawkular/metrics/status HTTP/1.1
> User-Agent: curl/7.29.0
> Host: localhost:8080
> Accept: */*
>
< HTTP/1.1 200 OK
< Server: Apache-Coyote/1.1
< Content-Type: application/json
< Transfer-Encoding: chunked
< Date: Tue, 05 Jan 2016 11:10:03 GMT
<
* Connection #0 to host localhost left intact
{"MetricsService":"STARTING","Implementation-Version":"0.8.0.Final-redhat-1","Built-From-Git-SHA1":"826f08dd34912ad455a4cb2b34f2e79cd79ace9a"}

This means the the postStart lifecycle hook is then stuck in a loop waiting for the STARTED which never happens and I'm going to guess that it is this which is causing the pod to be stuck in PENDING and not Terminating because it is still waiting for the postStart hook to terminate.  The Kubernetes documentation does say "Typically we expect that users will make their hook handlers as light as possible, but there are cases where long running commands make sense." - going into an endless loop doesn't seem that light to me and leads to a really confusing set of issues to debug (i.e. containers running in docker but stuck in PENDING in Kubernetes).

I'm a little confused by the use of the postStart hook AND liveness check - the postStart seems to be acting as some sort of liveness check that runs only once. I guess the root cause is that the application gets stuck in STARTING and so the postStart hook never finished - perhaps it would be pertinent to put a 5 minute timeout around the postStart loop and then fail?  This would make the hook more robust.  That is, of course, dependent on whether the postStart hook is the best/right place to do application liveness?

regards




Dave

Comment 6 Dave McCormick 2016-01-05 12:17:51 UTC
Hi

I am unsure why Redhat chose to use a postStart hook script rather than a liveness or readiness probe.  The liveness probe looks like it could do with an initial delay value though to give the application time to start up

...
    livenessProbe:
      exec:
        command:
        - /opt/hawkular/scripts/hawkular-metrics-liveness.py
      timeoutSeconds: 1
      initialDelaySeconds: 120

Now that the strange pending behaviour is understood - what about the actual issue?  The issue would now seem to be that when deploying hawkular-metrics with custom certs that the JBOSS application is unable to start up.  Can you help me investigate this issue?

regards


Dave

Comment 7 Dave McCormick 2016-01-06 13:02:25 UTC
using the workaround (and finding the hawkular metrics logs) it looks as though the metrics application is failing to connect to cassandra...


07:54:28,337 WARN  [org.hawkular.metrics.api.jaxrs.MetricsServiceLifecycle] (metricsservice-lifecycle-thread) HAWKMETRICS200003: Could not connect to Cassandra cluster - assuming its not up yet: All host(s) tried for query failed (tried: hawkular-cassandra/172.30.123.91:9042 (com.datastax.driver.core.TransportException: [hawkular-cassandra/172.30.123.91:9042] Channel has been closed))
07:54:28,338 WARN  [org.hawkular.metrics.api.jaxrs.MetricsServiceLifecycle] (metricsservice-lifecycle-thread) HAWKMETRICS200004: [10] Retrying connecting to Cassandra cluster in [2]s...
07:54:30,338 INFO  [org.hawkular.metrics.api.jaxrs.MetricsServiceLifecycle] (metricsservice-lifecycle-thread) HAWKMETRICS200002: Initializing metrics service
07:54:30,369 WARN  [io.netty.util.concurrent.DefaultPromise] (cluster11-nio-worker-0) An exception was thrown by com.datastax.driver.core.Connection$9.operationComplete(): java.util.concurrent.RejectedExecutionException: Task com.datastax.driver.core.Connection$9$1@3b6df991 rejected from java.util.concurrent.ThreadPoolExecutor@772ca0e7[Terminated, pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 0]
        at java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2047) [rt.jar:1.8.0_51]
        at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:823) [rt.jar:1.8.0_51]
        at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1369) [rt.jar:1.8.0_51]
        at com.google.common.util.concurrent.MoreExecutors$ListeningDecorator.execute(MoreExecutors.java:484) [guava-16.0.1.redhat-3.jar:16.0.1.redhat-3]
        at com.datastax.driver.core.Connection$9.operationComplete(Connection.java:566) [cassandra-driver-core-2.2.0.rc2-redhat-2.jar:2.2.0.rc2-redhat-2]
        at com.datastax.driver.core.Connection$9.operationComplete(Connection.java:542) [cassandra-driver-core-2.2.0.rc2-redhat-2.jar:2.2.0.rc2-redhat-2]
        at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:680) [netty-common-4.0.27.Final-redhat-2.jar:4.0.27.Final-redhat-2]
        at io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:603) [netty-common-4.0.27.Final-redhat-2.jar:4.0.27.Final-redhat-2]
        at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:563) [netty-common-4.0.27.Final-redhat-2.jar:4.0.27.Final-redhat-2]
        at io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:424) [netty-common-4.0.27.Final-redhat-2.jar:4.0.27.Final-redhat-2]
        at io.netty.channel.PendingWriteQueue.safeFail(PendingWriteQueue.java:252) [netty-transport-4.0.27.Final-redhat-2.jar:4.0.27.Final-redhat-2]
        at io.netty.channel.PendingWriteQueue.removeAndFailAll(PendingWriteQueue.java:112) [netty-transport-4.0.27.Final-redhat-2.jar:4.0.27.Final-redhat-2]
        at io.netty.handler.ssl.SslHandler.setHandshakeFailure(SslHandler.java:1256) [netty-handler-4.0.27.Final-redhat-2.jar:4.0.27.Final-redhat-2]
        at io.netty.handler.ssl.SslHandler.channelInactive(SslHandler.java:676) [netty-handler-4.0.27.Final-redhat-2.jar:4.0.27.Final-redhat-2]
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:237) [netty-transport-4.0.27.Final-redhat-2.jar:4.0.27.Final-redhat-2]
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:223) [netty-transport-4.0.27.Final-redhat-2.jar:4.0.27.Final-redhat-2]
        at io.netty.channel.DefaultChannelPipeline.fireChannelInactive(DefaultChannelPipeline.java:829) [netty-transport-4.0.27.Final-redhat-2.jar:4.0.27.Final-redhat-2]
        at io.netty.channel.AbstractChannel$AbstractUnsafe$7.run(AbstractChannel.java:610) [netty-transport-4.0.27.Final-redhat-2.jar:4.0.27.Final-redhat-2]
        at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:357) [netty-common-4.0.27.Final-redhat-2.jar:4.0.27.Final-redhat-2]
        at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357) [netty-transport-4.0.27.Final-redhat-2.jar:4.0.27.Final-redhat-2]
        at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111) [netty-common-4.0.27.Final-redhat-2.jar:4.0.27.Final-redhat-2]
        at java.lang.Thread.run(Thread.java:745) [rt.jar:1.8.0_51]

07:54:30,373 WARN  [org.hawkular.metrics.api.jaxrs.MetricsServiceLifecycle] (metricsservice-lifecycle-thread) HAWKMETRICS200003: Could not connect to Cassandra cluster - assuming its not up yet: All host(s) tried for query failed (tried: hawkular-cassandra/172.30.123.91:9042 (com.datastax.driver.core.TransportException: [hawkular-cassandra/172.30.123.91:9042] Channel has been closed))
07:54:30,373 WARN  [org.hawkular.metrics.api.jaxrs.MetricsServiceLifecycle] (metricsservice-lifecycle-thread) HAWKMETRICS200004: [11] Retrying connecting to Cassandra cluster in [3]s...
07:54:33,373 INFO  [org.hawkular.metrics.api.jaxrs.MetricsServiceLifecycle] (metricsservice-lifecycle-thread) HAWKMETRICS200002: Initializing metrics service
07:54:33,408 WARN  [org.hawkular.metrics.api.jaxrs.MetricsServiceLifecycle] (metricsservice-lifecycle-thread) HAWKMETRICS200003: Could not connect to Cassandra cluster - assuming its not up yet: All host(s) tried for query failed (tried: hawkular-cassandra/172.30.123.91:9042 (com.datastax.driver.core.TransportException: [hawkular-cassandra/172.30.123.91:9042] Channel has been closed))

The cassandra service is available...

oc get svc
NAME                       CLUSTER_IP       EXTERNAL_IP   PORT(S)                               SELECTOR                  AGE
hawkular-cassandra         172.30.123.91    <none>        9042/TCP,9160/TCP,7000/TCP,7001/TCP   type=hawkular-cassandra   7m
hawkular-cassandra-nodes   None             <none>        9042/TCP,9160/TCP,7000/TCP,7001/TCP   type=hawkular-cassandra   7m
hawkular-metrics           172.30.102.231   <none>        443/TCP                               name=hawkular-metrics     5m

It looks as though a certificate has been provisioned on the cassandra service...

curl https://172.30.123.91:9042 -v -v -v
* About to connect() to 172.30.123.91 port 9042 (#0)
*   Trying 172.30.123.91...
* Connected to 172.30.123.91 (172.30.123.91) port 9042 (#0)
* Initializing NSS with certpath: sql:/etc/pki/nssdb
*   CAfile: /etc/pki/tls/certs/ca-bundle.crt
  CApath: none
* Server certificate:
*       subject: CN=hawkular-cassandra
*       start date: Dec 23 18:04:20 2015 GMT
*       expire date: Dec 22 18:04:21 2017 GMT
*       common name: hawkular-cassandra
*       issuer: CN=metrics-signer@1450893859
* NSS error -8172 (SEC_ERROR_UNTRUSTED_ISSUER)
* Peer's certificate issuer has been marked as not trusted by the user.
* Closing connection 0

Could the issue now be that hawkular-metrics service needs to trust the metrics signer cert and I have replaced the cert with my own (which hasn't signed cassandra)?

regards



Dave

Comment 8 Matt Wringe 2016-01-06 14:38:26 UTC
Hawkular Metrics container uses the postStart hook to wait until it can connect to the Cassandra instance. Unfortunately, OpenShift cannot stop a pending container.

This will be updated shortly so that the postStart script will fail after a timeout. It is a bit confusing the way its currently working.

postStart hooks check if something is up and running yet, while the livenessProbe checks if something is still running. These are two different situations and is why we have two scripts to handle the situation.

If Hawkular Metrics cannot communicate with the Cassandra instance, then it will not function, regardless if the postStart or livenessProbes existing or not.

So looking at your logs, it looks like a certificate problem. How exactly did you add your custom certificate for Hawkular Metrics?

From https://github.com/openshift/origin-metrics/blob/master/docs/deployer_configuration.adoc#deployer-secrets you will need to specify the hawkular-metrics.pem and (optional, if using self signed certificates) the hawkular-metrics-ca.cert

Comment 9 Dave McCormick 2016-01-06 15:18:02 UTC
Hi

Thanks for the reply - maybe it should be a readiness check rather than a postStart hook?  They seem better suited to keeping a node out of service until it has started up correctly.

These are the actions I performed in setting up the metrics: -

oc project openshift-infra
oc create -f - <<API
apiVersion: v1
kind: ServiceAccount
metadata:
name: metrics-deployer
secrets:
- name: metrics-deployer
API
 
oadm policy add-role-to-user edit system:serviceaccount:openshift-infra:metrics-deployer
oadm policy add-cluster-role-to-user cluster-reader system:serviceaccount:openshift-infra:heapster

Create the file /etc/pki/hawkular-metrics.paas.dev.iggroup.local.pem: -
-----BEGIN PRIVATE KEY-----
MIIEvQIBADANBgkqhkiG9w0BAQEFAASCBKcwggSjAgEAAoIBAQCdMWtGH2vyJgBF
...
Vwipu6DJ+Mc0czjfE0UGaAPCrDXfBhKec5/lfvmdlq9qdmrXqYSD3pdMUNtHn5q6
Xl/YSEkWx4NCCekMnXxPOFo=
-----END PRIVATE KEY-----
-----BEGIN CERTIFICATE-----
MIIFpjCCBI6gAwIBAgIRAJ1ixZFA4cGV3Lh7xpHMW5MwDQYJKoZIhvcNAQELBQAw
...
22inbqFAm8f1pTFbrN2NjXfFMmFcSa8VQCOCHE6/J760yzB1yQ4gV9Ajtu17cxoE
0jmVsBGZJDFUQA==
-----END CERTIFICATE-----

Create the ca.crt /etc/pki/ig-private-ca.crt: -
-----BEGIN CERTIFICATE-----
MIIDrDCCApSgAwIBAgIQKJ0LDUmkrM1sKAeJDIn+dzANBgkqhkiG9w0BAQsFADBw
...
CLaB9Z9PC77jivCYwKL9ubEMeBKsWr0fsMFHj76aWBTCRhIXEEv5t84N/z1IELv3
8bHY0kArqvvfmIWu9Y/PO+iwUJm5ouCIQD7uHPAZpLY=
-----END CERTIFICATE-----
-----BEGIN CERTIFICATE-----
MIIEqTCCA5GgAwIBAgIQFRhu0C2GQlB4OgRCKNGwnDANBgkqhkiG9w0BAQsFADBw
...
A5gy7PiXtic81oAT1NFHfOdeortsPtBN+sEQfGEoA8bnlk1VazPj6jScwJyE
-----END CERTIFICATE-----

Create the cert secrets
oc secrets new metrics-deployer hawkular-metrics.pem=/etc/pki/hawkular-metrics.paas.dev.iggroup.local.pem hawkular-metrics-ca.cert=/etc/pki/ig-private-ca.crt

/root/openshift-ansible-openshift-ansible-3.0.20-1/roles/openshift_examples/files/examples/v1.1/infrastructure-templates/enterprise/metrics-deployer.yaml

oc process -f metrics-deployer.yaml -v HAWKULAR_METRICS_HOSTNAME=hawkular-metrics.paas.dev.iggroup.local,USE_PERSISTENT_STORAGE=false,REDEPLOY=true | oc create -f -

Change the masters to access it on each master (vrdevosmaster001, 002 and 003)
In master-config.yaml: -
assetConfig:
...
metricsPublicURL: "https://hawkular-metrics.paas.dev.iggroup.local/hawkular/metrics"

-------------------------- DONE

| As you can see I have created a cert and ca certificate which correctly deploys the cert onto the metrics application - i.e. when I browse to hawkular-metrics.paas.dev.iggroup.local I get my certificate correctly.  Setting the cert and CA secrets doesn't seem to have affected the generation of internal certs for Cassandra.  Could it be that in adding my certs into metrics application that it is no longer getting the metrics signer cert adding to its truststore?  Could my CA be replacing the CA which has been used to generate the Cassandra cert?

I'm not an expert in JBOSS - could I look in the truststore somehow in order to check what CA certs are trusted?

Have other people deployed successfully with custom certs?

regards



Dave

Comment 10 Dave McCormick 2016-01-06 18:04:30 UTC
Additional Info.

The contents of truststore /secrets/hawkular-metrics.truststore: -

Keystore type: JKS
Keystore provider: SUN

Your keystore contains 4 entries

Alias name: ca
Creation date: Dec 23, 2015
Entry type: trustedCertEntry

Owner: CN=metrics-signer@1450893859
Issuer: CN=metrics-signer@1450893859
Serial number: 1
Valid from: Wed Dec 23 13:04:20 EST 2015 until: Mon Dec 21 13:04:21 EST 2020
Certificate fingerprints:
         MD5:  C0:9B:F5:42:4E:D7:E4:C6:39:A1:09:25:A8:A1:A1:11
         SHA1: DB:0D:86:3A:5E:A6:93:36:D0:87:54:25:7A:B1:D2:CD:32:0F:52:84
         SHA256: 51:FF:8F:CF:2E:82:1D:40:1E:C4:7F:9D:51:76:1B:95:A2:D4:C2:2D:65:7A:84:47:FE:07:AB:CA:D9:FB:2B:86
         Signature algorithm name: SHA256withRSA
         Version: 3

Extensions:

#1: ObjectId: 2.5.29.19 Criticality=true
BasicConstraints:[
  CA:true
  PathLen:2147483647
]

#2: ObjectId: 2.5.29.15 Criticality=true
KeyUsage [
  DigitalSignature
  Key_Encipherment
  Key_CertSign
]



*******************************************
*******************************************


Alias name: hawkular-cassandra
Creation date: Dec 23, 2015
Entry type: trustedCertEntry

Owner: CN=hawkular-cassandra
Issuer: CN=metrics-signer@1450893859
Serial number: 2
Valid from: Wed Dec 23 13:04:20 EST 2015 until: Fri Dec 22 13:04:21 EST 2017
Certificate fingerprints:
         MD5:  10:36:98:3F:C4:30:5E:B0:FE:36:D9:B2:74:0B:61:21
         SHA1: 4C:04:DE:0A:F5:F0:8C:4B:5A:9B:54:DA:E6:8F:19:F5:C5:9A:CE:6A
         SHA256: E7:67:13:EA:62:5E:58:75:E7:7D:F4:26:83:65:35:69:5B:0A:2E:53:F6:43:40:BF:0A:04:D4:EA:40:A2:0D:A5
         Signature algorithm name: SHA256withRSA
         Version: 3

Extensions:

#1: ObjectId: 2.5.29.19 Criticality=true
BasicConstraints:[
  CA:false
  PathLen: undefined
]

#2: ObjectId: 2.5.29.37 Criticality=false
ExtendedKeyUsages [
  serverAuth
]

#3: ObjectId: 2.5.29.15 Criticality=true
KeyUsage [
  DigitalSignature
  Key_Encipherment
]

#4: ObjectId: 2.5.29.17 Criticality=false
SubjectAlternativeName [
  DNSName: hawkular-cassandra
]



*******************************************
*******************************************


Alias name: cassandraca
Creation date: Dec 23, 2015
Entry type: trustedCertEntry

Owner: CN=metrics-signer@1450893859
Issuer: CN=metrics-signer@1450893859
Serial number: 1
Valid from: Wed Dec 23 13:04:20 EST 2015 until: Mon Dec 21 13:04:21 EST 2020
Certificate fingerprints:
         MD5:  C0:9B:F5:42:4E:D7:E4:C6:39:A1:09:25:A8:A1:A1:11
         SHA1: DB:0D:86:3A:5E:A6:93:36:D0:87:54:25:7A:B1:D2:CD:32:0F:52:84
         SHA256: 51:FF:8F:CF:2E:82:1D:40:1E:C4:7F:9D:51:76:1B:95:A2:D4:C2:2D:65:7A:84:47:FE:07:AB:CA:D9:FB:2B:86
         Signature algorithm name: SHA256withRSA
         Version: 3

Extensions:

#1: ObjectId: 2.5.29.19 Criticality=true
BasicConstraints:[
  CA:true
  PathLen:2147483647
]

#2: ObjectId: 2.5.29.15 Criticality=true
KeyUsage [
  DigitalSignature
  Key_Encipherment
  Key_CertSign
]



*******************************************
*******************************************


Alias name: metricsca
Creation date: Dec 23, 2015
Entry type: trustedCertEntry

Owner: CN=IG Index Private CA, O=IG Index Limited, L=London, ST=Greater London, C=GB
Issuer: CN=IG Index Private CA, O=IG Index Limited, L=London, ST=Greater London, C=GB
Serial number: 289d0b0d49a4accd6c2807890c89fe77
Valid from: Mon Aug 10 20:00:00 EDT 2015 until: Tue Dec 31 18:59:59 EST 2030
Certificate fingerprints:
         MD5:  C6:2E:52:72:E3:B5:A6:76:E2:FE:6C:45:99:B2:F3:84
         SHA1: AB:D5:E2:0F:81:90:6F:5B:8A:55:7F:74:67:D8:F7:7E:79:F3:69:16
         SHA256: AC:6E:7F:64:DB:8F:4D:FF:27:E0:64:37:8D:5A:3B:64:2B:6E:30:4A:15:FB:71:FD:36:C4:54:9F:82:9E:5E:38
         Signature algorithm name: SHA256withRSA
         Version: 3

Extensions:

#1: ObjectId: 2.5.29.19 Criticality=true
BasicConstraints:[
  CA:true
  PathLen:2147483647
]

#2: ObjectId: 2.5.29.15 Criticality=true
KeyUsage [
  DigitalSignature
  Key_CertSign
  Crl_Sign
]

#3: ObjectId: 2.5.29.14 Criticality=false
SubjectKeyIdentifier [
KeyIdentifier [
0000: B8 4D CE EC DA 6C 20 92   A6 1A 59 A2 8D 17 56 DC  .M...l ...Y...V.
0010: 75 BF FB D9                                        u...
]
]



*******************************************
*******************************************

The contents of /secrets/hawkular-metrics.keystore :-

Keystore type: JKS
Keystore provider: SUN

Your keystore contains 1 entry

Alias name: hawkular-metrics
Creation date: Dec 23, 2015
Entry type: PrivateKeyEntry
Certificate chain length: 1
Certificate[1]:
Owner: CN=hawkular-metrics.paas.dev.iggroup.local, OU=Private Unified Communications, OU=Hosted by IG Index Limited, OU=IT, O=IG Group Limited, STREET=Cannon Bridge House, STREET=25 Dowgate Hill, L=London, ST=UK, OID.2.5.4.17=EC4R 2YA, C=GB
Issuer: CN=IG Index Private Server CA, O=IG Index Limited, L=London, ST=Greater London, C=GB
Serial number: 9d62c59140e1c195dcb87bc691cc5b93
Valid from: Tue Dec 22 19:00:00 EST 2015 until: Sat Dec 22 18:59:59 EST 2018
Certificate fingerprints:
         MD5:  3E:09:78:38:46:DF:EE:4B:62:A6:E2:32:43:CE:11:4B
         SHA1: 33:DF:3A:0D:BC:1C:4A:76:61:AB:78:05:F1:02:C7:B9:FF:96:18:64
         SHA256: E9:7E:02:D2:2B:BA:4A:0B:18:73:D9:5C:59:FB:EA:21:F6:21:60:13:6A:9B:77:B8:A6:88:E1:D3:CC:3E:44:B4
         Signature algorithm name: SHA256withRSA
         Version: 3

Extensions:

#1: ObjectId: 1.3.6.1.5.5.7.1.1 Criticality=false
AuthorityInfoAccess [
  [
   accessMethod: caIssuers
   accessLocation: URIName: http://crt.comodoca.com/IGIndexPrivateServerCA.crt
,
   accessMethod: ocsp
   accessLocation: URIName: http://ocsp.comodoca.com
]
]

#2: ObjectId: 2.5.29.35 Criticality=false
AuthorityKeyIdentifier [
KeyIdentifier [
0000: A5 52 C0 3A C7 00 F7 9A   3E 7F 10 34 D0 B8 63 68  .R.:....>..4..ch
0010: B0 36 32 92                                        .62.
]
]

#3: ObjectId: 2.5.29.19 Criticality=true
BasicConstraints:[
  CA:false
  PathLen: undefined
]

#4: ObjectId: 2.5.29.31 Criticality=false
CRLDistributionPoints [
  [DistributionPoint:
     [URIName: http://crl.comodoca.com/IGIndexPrivateServerCA.crl]
]]

#5: ObjectId: 2.5.29.37 Criticality=false
ExtendedKeyUsages [
  serverAuth
  clientAuth
]

#6: ObjectId: 2.5.29.15 Criticality=true
KeyUsage [
  DigitalSignature
  Key_Encipherment
]

#7: ObjectId: 2.5.29.17 Criticality=false
SubjectAlternativeName [
  DNSName: hawkular-metrics.paas.dev.iggroup.local
  DNSName: hawkular-metrics
]

#8: ObjectId: 2.5.29.14 Criticality=false
SubjectKeyIdentifier [
KeyIdentifier [
0000: A7 0B C6 3C EA 5B C7 69   29 31 9F 73 0D 91 81 3D  ...<.[.i)1.s...=
0010: 03 D2 FE F9                                        ....
]
]



*******************************************
*******************************************

Comment 11 Dave McCormick 2016-01-06 18:09:10 UTC
See comment 10

Hmm so the metrics signer cert I see on the cassandra node..

curl -v -v -v https://172.30.123.91:9042
* About to connect() to 172.30.123.91 port 9042 (#0)
*   Trying 172.30.123.91...
* Connected to 172.30.123.91 (172.30.123.91) port 9042 (#0)
* Initializing NSS with certpath: sql:/etc/pki/nssdb
*   CAfile: /etc/pki/tls/certs/ca-bundle.crt
  CApath: none
* Server certificate:
*       subject: CN=hawkular-cassandra
*       start date: Dec 23 18:04:20 2015 GMT
*       expire date: Dec 22 18:04:21 2017 GMT
*       common name: hawkular-cassandra
*       issuer: CN=metrics-signer@1450893859
* NSS error -8172 (SEC_ERROR_UNTRUSTED_ISSUER)
* Peer's certificate issuer has been marked as not trusted by the user.
* Closing connection 0

Does look to be present in the truststore ... so perhaps it is not a straight connection issue from the client connecting to the node.  Does cassandra expect the application to connect with a client certificate?

regards



Dave

Comment 12 Matt Wringe 2016-01-06 18:40:33 UTC
If you specify your custom Hawkular Metrics certificates and CA certificate to the deployer, the deployer should take care of managing all the keystores and truststores based on that information. This includes such things as configuring Cassandra to trust the client certificate coming from Hawkular Metrics.

Comment 13 Dave McCormick 2016-01-07 12:36:01 UTC
Hi

Yes, I'm having issues using the deployer as instruncted unless am I doing it wrong - can you see anything in the steps I posted?

Started to look on the cassandra pod side of things.  I can see that cassandra IS expecting client cert authentication - from the cassandra.yaml: -

# enable or disable client/server encryption.
client_encryption_options:
    enabled: true
    keystore: /secret/cassandra.keystore
    keystore_password: 8UR-bJU3_-4kVmm
    require_client_auth: true
    truststore: /secret/cassandra.truststore
    truststore_password: jd24gbI75gPyMgG

So the client is expected to authenticate with a cert to cassdandra (my question in comment 11).
The client connects to cassandra so looking at the truststore: -

keytool -keystore cassandra.truststore -storepass jd24gbI75gPyMgG -l>

Keystore type: JKS
Keystore provider: SUN

Your keystore contains 5 entries

Alias name: ca
Creation date: Dec 23, 2015
Entry type: trustedCertEntry

Owner: CN=metrics-signer@1450893859
Issuer: CN=metrics-signer@1450893859
Serial number: 1
Valid from: Wed Dec 23 13:04:20 EST 2015 until: Mon Dec 21 13:04:21 EST 2020
Certificate fingerprints:
         MD5:  C0:9B:F5:42:4E:D7:E4:C6:39:A1:09:25:A8:A1:A1:11
         SHA1: DB:0D:86:3A:5E:A6:93:36:D0:87:54:25:7A:B1:D2:CD:32:0F:52:84
         SHA256: 51:FF:8F:CF:2E:82:1D:40:1E:C4:7F:9D:51:76:1B:95:A2:D4:C2:2D:65:7A:84:47:FE:07:AB:CA:D9:FB:2B:86
         Signature algorithm name: SHA256withRSA
         Version: 3

Extensions:

#1: ObjectId: 2.5.29.19 Criticality=true
BasicConstraints:[
  CA:true
  PathLen:2147483647
]

#2: ObjectId: 2.5.29.15 Criticality=true
KeyUsage [
  DigitalSignature
  Key_Encipherment
  Key_CertSign
]



*******************************************
*******************************************


Alias name: hawkular-metrics
Creation date: Dec 23, 2015
Entry type: trustedCertEntry

Owner: CN=hawkular-metrics.paas.dev.iggroup.local, OU=Private Unified Communications, OU=Hosted by IG Index Limited, OU=IT, O=IG Group Limited, STREET=Cannon Bridge House, STREET=25 Dowgate Hill, L=London, ST=UK, OID.2.5.4.17=EC4R 2YA, C=GB
Issuer: CN=IG Index Private Server CA, O=IG Index Limited, L=London, ST=Greater London, C=GB
Serial number: 9d62c59140e1c195dcb87bc691cc5b93
Valid from: Tue Dec 22 19:00:00 EST 2015 until: Sat Dec 22 18:59:59 EST 2018
Certificate fingerprints:
         MD5:  3E:09:78:38:46:DF:EE:4B:62:A6:E2:32:43:CE:11:4B
         SHA1: 33:DF:3A:0D:BC:1C:4A:76:61:AB:78:05:F1:02:C7:B9:FF:96:18:64
         SHA256: E9:7E:02:D2:2B:BA:4A:0B:18:73:D9:5C:59:FB:EA:21:F6:21:60:13:6A:9B:77:B8:A6:88:E1:D3:CC:3E:44:B4
         Signature algorithm name: SHA256withRSA
         Version: 3

Extensions:

#1: ObjectId: 1.3.6.1.5.5.7.1.1 Criticality=false
AuthorityInfoAccess [
  [
   accessMethod: caIssuers
   accessLocation: URIName: http://crt.comodoca.com/IGIndexPrivateServerCA.crt
,
   accessMethod: ocsp
   accessLocation: URIName: http://ocsp.comodoca.com
]
]

#2: ObjectId: 2.5.29.35 Criticality=false
AuthorityKeyIdentifier [
KeyIdentifier [
0000: A5 52 C0 3A C7 00 F7 9A   3E 7F 10 34 D0 B8 63 68  .R.:....>..4..ch
0010: B0 36 32 92                                        .62.
]
]

#3: ObjectId: 2.5.29.19 Criticality=true
BasicConstraints:[
  CA:false
  PathLen: undefined
]

#4: ObjectId: 2.5.29.31 Criticality=false
CRLDistributionPoints [
  [DistributionPoint:
     [URIName: http://crl.comodoca.com/IGIndexPrivateServerCA.crl]
]]

#5: ObjectId: 2.5.29.37 Criticality=false
ExtendedKeyUsages [
  serverAuth
  clientAuth
]

#6: ObjectId: 2.5.29.15 Criticality=true
KeyUsage [
  DigitalSignature
  Key_Encipherment
]

#7: ObjectId: 2.5.29.17 Criticality=false
SubjectAlternativeName [
  DNSName: hawkular-metrics.paas.dev.iggroup.local
  DNSName: hawkular-metrics
]

#8: ObjectId: 2.5.29.14 Criticality=false
SubjectKeyIdentifier [
KeyIdentifier [
0000: A7 0B C6 3C EA 5B C7 69   29 31 9F 73 0D 91 81 3D  ...<.[.i)1.s...=
0010: 03 D2 FE F9                                        ....
]
]



*******************************************
*******************************************


Alias name: hawkular-cassandra
Creation date: Dec 23, 2015
Entry type: trustedCertEntry

Owner: CN=hawkular-cassandra
Issuer: CN=metrics-signer@1450893859
Serial number: 2
Valid from: Wed Dec 23 13:04:20 EST 2015 until: Fri Dec 22 13:04:21 EST 2017
Certificate fingerprints:
         MD5:  10:36:98:3F:C4:30:5E:B0:FE:36:D9:B2:74:0B:61:21
         SHA1: 4C:04:DE:0A:F5:F0:8C:4B:5A:9B:54:DA:E6:8F:19:F5:C5:9A:CE:6A
         SHA256: E7:67:13:EA:62:5E:58:75:E7:7D:F4:26:83:65:35:69:5B:0A:2E:53:F6:43:40:BF:0A:04:D4:EA:40:A2:0D:A5
         Signature algorithm name: SHA256withRSA
         Version: 3

Extensions:

#1: ObjectId: 2.5.29.19 Criticality=true
BasicConstraints:[
  CA:false
  PathLen: undefined
]

#2: ObjectId: 2.5.29.37 Criticality=false
ExtendedKeyUsages [
  serverAuth
]

#3: ObjectId: 2.5.29.15 Criticality=true
KeyUsage [
  DigitalSignature
  Key_Encipherment
]

#4: ObjectId: 2.5.29.17 Criticality=false
SubjectAlternativeName [
  DNSName: hawkular-cassandra
]



*******************************************
*******************************************


Alias name: metricca
Creation date: Dec 23, 2015
Entry type: trustedCertEntry

Owner: CN=IG Index Private CA, O=IG Index Limited, L=London, ST=Greater London, C=GB
Issuer: CN=IG Index Private CA, O=IG Index Limited, L=London, ST=Greater London, C=GB
Serial number: 289d0b0d49a4accd6c2807890c89fe77
Valid from: Mon Aug 10 20:00:00 EDT 2015 until: Tue Dec 31 18:59:59 EST 2030
Certificate fingerprints:
         MD5:  C6:2E:52:72:E3:B5:A6:76:E2:FE:6C:45:99:B2:F3:84
         SHA1: AB:D5:E2:0F:81:90:6F:5B:8A:55:7F:74:67:D8:F7:7E:79:F3:69:16
         SHA256: AC:6E:7F:64:DB:8F:4D:FF:27:E0:64:37:8D:5A:3B:64:2B:6E:30:4A:15:FB:71:FD:36:C4:54:9F:82:9E:5E:38
         Signature algorithm name: SHA256withRSA
         Version: 3

Extensions:

#1: ObjectId: 2.5.29.19 Criticality=true
BasicConstraints:[
  CA:true
  PathLen:2147483647
]

#2: ObjectId: 2.5.29.15 Criticality=true
KeyUsage [
  DigitalSignature
  Key_CertSign
  Crl_Sign
]

#3: ObjectId: 2.5.29.14 Criticality=false
SubjectKeyIdentifier [
KeyIdentifier [
0000: B8 4D CE EC DA 6C 20 92   A6 1A 59 A2 8D 17 56 DC  .M...l ...Y...V.
0010: 75 BF FB D9                                        u...
]
]



*******************************************
*******************************************


Alias name: cassandraca
Creation date: Dec 23, 2015
Entry type: trustedCertEntry

Owner: CN=metrics-signer@1450893859
Issuer: CN=metrics-signer@1450893859
Serial number: 1
Valid from: Wed Dec 23 13:04:20 EST 2015 until: Mon Dec 21 13:04:21 EST 2020
Certificate fingerprints:
         MD5:  C0:9B:F5:42:4E:D7:E4:C6:39:A1:09:25:A8:A1:A1:11
         SHA1: DB:0D:86:3A:5E:A6:93:36:D0:87:54:25:7A:B1:D2:CD:32:0F:52:84
         SHA256: 51:FF:8F:CF:2E:82:1D:40:1E:C4:7F:9D:51:76:1B:95:A2:D4:C2:2D:65:7A:84:47:FE:07:AB:CA:D9:FB:2B:86
         Signature algorithm name: SHA256withRSA
         Version: 3

Extensions:

#1: ObjectId: 2.5.29.19 Criticality=true
BasicConstraints:[
  CA:true
  PathLen:2147483647
]

#2: ObjectId: 2.5.29.15 Criticality=true
KeyUsage [
  DigitalSignature
  Key_Encipherment
  Key_CertSign
]



*******************************************
*******************************************


So it looks like the cert is in the truststore ok.. there doesn't appear to be an ssl error in the stack trace...  maybe there isn't an ssl issue? (Suspicious though).


04:55:28,805 INFO  [org.hawkular.metrics.api.jaxrs.MetricsServiceLifecycle] (metricsservice-lifecycle-thread) HAWKMETRICS200002: Initializing metrics service
04:55:28,865 WARN  [org.hawkular.metrics.api.jaxrs.MetricsServiceLifecycle] (metricsservice-lifecycle-thread) HAWKMETRICS200003: Could not connect to Cassandra cluster - assuming its not up yet: All host(s) tried for query failed (tried: hawkular-cassandra/172.30.123.91:9042 (com.datastax.driver.core.TransportException: [hawkular-cassandra/172.30.123.91:9042] Channel has been closed))
04:55:28,865 WARN  [org.hawkular.metrics.api.jaxrs.MetricsServiceLifecycle] (metricsservice-lifecycle-thread) HAWKMETRICS200004: [446] Retrying connecting to Cassandra cluster in [2]s...
04:55:28,865 WARN  [io.netty.util.concurrent.DefaultPromise] (cluster446-nio-worker-0) An exception was thrown by com.datastax.driver.core.Connection$9.operationComplete(): java.util.concurrent.RejectedExecutionException: Task com.datastax.driver.core.Connection$9$1@2dd6581 rejected from java.util.concurrent.ThreadPoolExecutor@7348795c[Terminated, pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 0]
        at java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2047) [rt.jar:1.8.0_51]
        at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:823) [rt.jar:1.8.0_51]
        at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1369) [rt.jar:1.8.0_51]
        at com.google.common.util.concurrent.MoreExecutors$ListeningDecorator.execute(MoreExecutors.java:484) [guava-16.0.1.redhat-3.jar:16.0.1.redhat-3]
        at com.datastax.driver.core.Connection$9.operationComplete(Connection.java:566) [cassandra-driver-core-2.2.0.rc2-redhat-2.jar:2.2.0.rc2-redhat-2]
        at com.datastax.driver.core.Connection$9.operationComplete(Connection.java:542) [cassandra-driver-core-2.2.0.rc2-redhat-2.jar:2.2.0.rc2-redhat-2]
        at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:680) [netty-common-4.0.27.Final-redhat-2.jar:4.0.27.Final-redhat-2]
        at io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:603) [netty-common-4.0.27.Final-redhat-2.jar:4.0.27.Final-redhat-2]
        at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:563) [netty-common-4.0.27.Final-redhat-2.jar:4.0.27.Final-redhat-2]
        at io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:424) [netty-common-4.0.27.Final-redhat-2.jar:4.0.27.Final-redhat-2]
        at io.netty.channel.PendingWriteQueue.safeFail(PendingWriteQueue.java:252) [netty-transport-4.0.27.Final-redhat-2.jar:4.0.27.Final-redhat-2]
        at io.netty.channel.PendingWriteQueue.removeAndFailAll(PendingWriteQueue.java:112) [netty-transport-4.0.27.Final-redhat-2.jar:4.0.27.Final-redhat-2]
        at io.netty.handler.ssl.SslHandler.setHandshakeFailure(SslHandler.java:1256) [netty-handler-4.0.27.Final-redhat-2.jar:4.0.27.Final-redhat-2]
        at io.netty.handler.ssl.SslHandler.channelInactive(SslHandler.java:676) [netty-handler-4.0.27.Final-redhat-2.jar:4.0.27.Final-redhat-2]
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:237) [netty-transport-4.0.27.Final-redhat-2.jar:4.0.27.Final-redhat-2]
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:223) [netty-transport-4.0.27.Final-redhat-2.jar:4.0.27.Final-redhat-2]
        at io.netty.channel.DefaultChannelPipeline.fireChannelInactive(DefaultChannelPipeline.java:829) [netty-transport-4.0.27.Final-redhat-2.jar:4.0.27.Final-redhat-2]
        at io.netty.channel.AbstractChannel$AbstractUnsafe$7.run(AbstractChannel.java:610) [netty-transport-4.0.27.Final-redhat-2.jar:4.0.27.Final-redhat-2]
        at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:357) [netty-common-4.0.27.Final-redhat-2.jar:4.0.27.Final-redhat-2]
        at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357) [netty-transport-4.0.27.Final-redhat-2.jar:4.0.27.Final-redhat-2]
        at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111) [netty-common-4.0.27.Final-redhat-2.jar:4.0.27.Final-redhat-2]
        at java.lang.Thread.run(Thread.java:745) [rt.jar:1.8.0_51]

I can't see anything in the logs on the cassandra pod when the metrics pod is connecting.

Can you reproduce my issue?

regards



Dave

Comment 14 Dave McCormick 2016-01-07 15:49:24 UTC
Damn! Looking at the trace - wouldn't this suggest that that SSL handshake is actually failing - seen as it seems to be trying to setHandshakeFailure... 

[netty-transport-4.0.27.Final-redhat-2.jar:4.0.27.Final-redhat-2]
        at io.netty.handler.ssl.SslHandler.setHandshakeFailure(SslHandler.java:1256)

So this takes me back to my perceived SSL trust/keystore problem - although I can't immediately see a problem with the truststore.

Comment 15 Matt Wringe 2016-01-07 15:59:40 UTC
Can you please attach your logs and other verbose information in a file or in a pastebin? Pasting content like this into a bugzilla make it very difficult to read.

The steps I have used for custom certificates is as follows:

oadm ca create-server-cert --cert=hawkular.crt --key=hawkular.key --hostnames=hawkular-metrics

cat hawkular.crt hawkular.key > hawkular.pem

oc secrets new metrics-deployer hawkular-metrics.pem=hawkular.pem hawkular-metrics-ca.cert=${SIGNER_CA.CERT:-openshift.local.config/master/ca.crt}

oc process -f $SOURCE_ROOT/metrics.yaml -v HAWKULAR_METRICS_HOSTNAME=hawkular-metrics.example.com,USE_PERSISTENT_STORAGE=false | oc create -f -

That seems to work for me with running Hawkular Metrics with a custom certificate.

Comment 16 Dave McCormick 2016-01-07 17:12:51 UTC
Ok, I think I'm closer to the issue... Definitely an SSL issue - I have 3 certs in play not 2!

My metrics certificate is signed by an intermediate CA which is in turn signed by a root CA - so when I generate my metrics ca secret I'm enclosing two certificates in the ca cert file (comment 9)!

hawkular-metrics-ca.cert=/etc/pki/ig-private-ca.crt
ig-private-ca.crt is 
-----BEGIN CERTIFICATE-----
MIIDrDCCApSgAwIBAgIQKJ0LDUmkrM1sKAeJDIn+dzANBgkqhkiG9w0BAQsFADBw
MQswCQYDVQQGEwJHQjEXMBUGA1UECBMOR3JlYXRlciBMb25kb24xDzANBgNVBAcT
BkxvbmRvbjEZMBcGA1UEChMQSUcgSW5kZXggTGltaXRlZDEcMBoGA1UEAxMTSUcg
SW5kZXggUHJpdmF0ZSBDQTAeFw0xNTA4MTEwMDAwMDBaFw0zMDEyMzEyMzU5NTla
MHAxCzAJBgNVBAYTAkdCMRcwFQYDVQQIEw5HcmVhdGVyIExvbmRvbjEPMA0GA1UE
BxMGTG9uZG9uMRkwFwYDVQQKExBJRyBJbmRleCBMaW1pdGVkMRwwGgYDVQQDExNJ
RyBJbmRleCBQcml2YXRlIENBMIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKC
AQEAvderk0XPdKR9eBSuLTdzE1k7hHPDOiJZtlPgPRmPxyOC8rMkR/S+xpxs43iQ
qKSfZSWFGwAc73yDpFVr1uM9Sh69daF0ig+uO7xGj2oZtXymZYoL8WvS1Xy5V1qW
hVZY83g3KRD/hEj0QXcdj4WooQ8Heimc4kq+dJNgHXAR05c6DHG8MuhQBckDuM9c
wggIzpL+1xrrZvbhKsdIYThrrsVPxu5PA1m3iDA+ukxLOmuN6WPXiSsSrAqNclvO
Aj1Dg+7Z31lJREHJcnvjycYklLu5qaCw8B3mdwhuGcq68Vdzhe/hdxe8PeuxOje9
bYZrO3Tl/J12U7uirdUyC/CFiQIDAQABo0IwQDAdBgNVHQ4EFgQUuE3O7NpsIJKm
GlmijRdW3HW/+9kwDgYDVR0PAQH/BAQDAgGGMA8GA1UdEwEB/wQFMAMBAf8wDQYJ
KoZIhvcNAQELBQADggEBAChkpH79l7yUfVK2vVLVUwfml8sTBFefMsFZoBSczgU8
6O1IivgFBDdN5+FuOe9ZaXrPRoulCzaUGtUNA7oZTZtsd17M5sUG7QKL+sa56o93
CjymhHMZUQoV/24fzraOmPC8R274bwWut5O9NA5pZomcKcSJlMeNyDk5NLPWZpEW
wlLIxwzVNDnteIVdMT35q+PWSDmgIRDE0Xf2uHAYFoT4zuI//NTOR0pua/NpiaYo
CLaB9Z9PC77jivCYwKL9ubEMeBKsWr0fsMFHj76aWBTCRhIXEEv5t84N/z1IELv3
8bHY0kArqvvfmIWu9Y/PO+iwUJm5ouCIQD7uHPAZpLY=
-----END CERTIFICATE-----
-----BEGIN CERTIFICATE-----
MIIEqTCCA5GgAwIBAgIQFRhu0C2GQlB4OgRCKNGwnDANBgkqhkiG9w0BAQsFADBw
MQswCQYDVQQGEwJHQjEXMBUGA1UECBMOR3JlYXRlciBMb25kb24xDzANBgNVBAcT
BkxvbmRvbjEZMBcGA1UEChMQSUcgSW5kZXggTGltaXRlZDEcMBoGA1UEAxMTSUcg
SW5kZXggUHJpdmF0ZSBDQTAeFw0xNTA5MDQwMDAwMDBaFw0zMDEyMzEyMzU5NTla
MHcxCzAJBgNVBAYTAkdCMRcwFQYDVQQIEw5HcmVhdGVyIExvbmRvbjEPMA0GA1UE
BxMGTG9uZG9uMRkwFwYDVQQKExBJRyBJbmRleCBMaW1pdGVkMSMwIQYDVQQDExpJ
RyBJbmRleCBQcml2YXRlIFNlcnZlciBDQTCCASIwDQYJKoZIhvcNAQEBBQADggEP
ADCCAQoCggEBAKloODC6Xv7Jgcw+E4aQRCGeddLd1P7/8W3lHxwo+EO8mvXVYHJ/
6YH20PSeejFbr9BNWuAGiQyHUoP2L8RZFDcNZYXg0gGmaJJtpR02YyWo/jpxVCE8
4WgulUMrervgts9kekYeTSVlnti46DjoRrlpv0WfmDY+IitoY4LppIUkCQOFcAKc
OLPwhepWhjv6/HLZ7Be+xK5GGToKysFgr1is1SH53WqpN0r17gh7x+TDuL+BGneJ
1obZO+T1+oBWyE1j1tV2IqGweuXkSmYJ8lrzRVpbdbLuaJ22KuRwzSU5+SXXaHi+
r8vyDoQYbpj06N1iiainHPO0cnqasfXdlZUCAwEAAaOCATYwggEyMB8GA1UdIwQY
MBaAFLhNzuzabCCSphpZoo0XVtx1v/vZMB0GA1UdDgQWBBSlUsA6xwD3mj5/EDTQ
uGNosDYykjAOBgNVHQ8BAf8EBAMCAYYwEgYDVR0TAQH/BAgwBgEB/wIBADAdBgNV
HSUEFjAUBggrBgEFBQcDAQYIKwYBBQUHAwIwPQYDVR0fBDYwNDAyoDCgLoYsaHR0
cDovL2NybC5jb21vZG9jYS5jb20vSUdJbmRleFByaXZhdGVDQS5jcmwwbgYIKwYB
BQUHAQEEYjBgMDgGCCsGAQUFBzAChixodHRwOi8vY3J0LmNvbW9kb2NhLmNvbS9J
R0luZGV4UHJpdmF0ZUNBLmNydDAkBggrBgEFBQcwAYYYaHR0cDovL29jc3AuY29t
b2RvY2EuY29tMA0GCSqGSIb3DQEBCwUAA4IBAQCmsUQZ5hc0elX1SGYDZc04Z0o9
p0FU5xqIdLkqN8IxNGSBah+PXJfEacXdniXJqM+FTkaeBNoJV4WMZVCykfO6mg+X
YM7dk9zDQ6FkK3paRKLbDau+SLZAlAAoONLAka+vnyxciEXoUrCXy7k3y85yQ4iX
bTElZrOEFMmTL5U0oKIhHToLY8N+nlUoYemuI5aDr8W+2YOs6881Hh96MUqPoUVq
9rtv+r2oodjBk4k4aDzOsan9uDrhD11qsB4rN7RkQI9BttQ7qEciOiPth2rhUtyE
A5gy7PiXtic81oAT1NFHfOdeortsPtBN+sEQfGEoA8bnlk1VazPj6jScwJyE
-----END CERTIFICATE-----

The metrics deployer does not seem to know how to cope with multiple CA certs in a chain: -

CA:     Issuer: C=GB, ST=Greater London, L=London, O=IG Index Limited, CN=IG Index Private CA
        Subject: C=GB, ST=Greater London, L=London, O=IG Index Limited, CN=IG Index Private CA
Inter:  Issuer: C=GB, ST=Greater London, L=London, O=IG Index Limited, CN=IG Index Private CA
        Subject: C=GB, ST=Greater London, L=London, O=IG Index Limited, CN=IG Index Private Server CA
Cert:   Issuer: C=GB, ST=Greater London, L=London, O=IG Index Limited, CN=IG Index Private Server CA
        Subject: C=GB/postalCode=EC4R 2YA, ST=UK, L=London/street=25 Dowgate Hill/street=Cannon Bridge House, O=IG Group Limited, OU=IT, OU=Hosted by IG Index Limited, OU=Private Unified Communications, CN=hawkular-metrics.paas.dev.iggroup.local

On the metrics application it is creating a hawkular-metrics.keystore with a single certificate: -

Your keystore contains 1 entry

Alias name: hawkular-metrics
Creation date: Dec 23, 2015
Entry type: PrivateKeyEntry
Certificate chain length: 1
Certificate[1]:
Owner: CN=hawkular-metrics.paas.dev.iggroup.local, OU=Private Unified Communications, OU=Hosted by IG Index Limited, OU=IT, O=IG Group Limited, STREET=Cannon Bridge House, STREET=25 Dowgate Hill, L=London, ST=UK, OID.2.5.4.17=EC4R 2YA, C=GB
Issuer: CN=IG Index Private Server CA, O=IG Index Limited, L=London, ST=Greater London, C=GB

This certificate is not a chain, it's a single certificate which has been signed by the "IG Index Private Server CA".  It might be that it failed to create the chain because it only tried to add the "IG Index Private CA" cert which is not directly linked to this cert.

On the cassandra truststore there are 5 certificates: -

Alias name: ca - Owner: CN=metrics-signer@1450893859
Alias name: hawkular-metrics - Owner: CN=hawkular-metrics.paas.dev.iggroup.local, OU=Private Unified Communications, OU=Hosted by IG Index Limited, OU=IT, O=IG Group Limited, STREET=Cannon Bridge House, STREET=25 Dowgate Hill, L=London, ST=UK, OID.2.5.4.17=EC4R 2YA, C=GB
Issuer: CN=IG Index Private Server CA, O=IG Index Limited, L=London, ST=Greater London, C=GB
Alias name: hawkular-cassandra - Owner: CN=hawkular-cassandra
Alias name: metricca - Owner: CN=IG Index Private CA, O=IG Index Limited, L=London, ST=Greater London, C=GB

So the "IG Index Private CA" is trusted but the "IG Index Private Server CA" is not.

I can't modify the deployed keystores to test because they are owned by root.
I can't change the certificate because all of our certs have multiple intermediate CAs and I can't issue working certs with a single CA cert.
I think the deployer process only picks up the FIRST ca certificate in the secret.  Could we make it iterate over it creating aliases and import certs for EACH certificate found?

regards



Dave

Comment 17 Dave McCormick 2016-01-09 16:51:43 UTC
Hi 

I have managed to work around the issue of supporting my certificate with a chain by manually creating my own hawkular-metrics.keystore and updating the hawkular-metric-secrets secret.

My steps: -

Create a p12 archive on the deploy server from the keys and pems etc..

openssl pkcs12 -export -in hawkular-metrics.paas.dev.iggroup.local.crt -inkey hawkular-metrics.paas.dev.iggroup.local.key \
               -out hawkular-metrics.paas.dev.iggroup.local.p12 -name hawkular-metrics \
               -CAfile hawkular-metrics.paas.dev.iggroup.local.crt.intermediate -chain
Now convert this to a new keystore (using same password as before)...
keytool -importkeystore \
        -deststorepass XXX -destkeypass XXX -destkeystore hawkular-metrics.keystore \
        -srckeystore hawkular-metrics.paas.dev.iggroup.local.p12 -srcstoretype PKCS12 -srcstorepass YYY \
        -alias hawkular-metrics

Check with keytool -keystore hawkular-metrics.keystore -storepass XXX -list -v
Now have a keystore with alias hawkular-metrics and a chain of 3 certs.
Now convert to base64 (without wrapping)
base64 -w 0 cat hawkular-metrics.keystore
oc edit secret hawkular-metrics-secrets (and paste in the base64 contents in hawkular-metrics.keystore)

Please can you update this ticket as a request for supporting certificate chains in the metrics deployer (and other deployers which redhat creates)?

regards



Dave

Comment 18 Dave McCormick 2016-01-10 14:03:22 UTC
The above workaround has resolved the issues with it connecting to cassandra and starting up but there is a new issue that it can't connect to itself which lead to a "Forbidden" message if you click on the metrics tab and the following stack trace in the hawkular-metrics logs: -

08:36:56,391 ERROR [org.hawkular.openshift.auth.OpenShiftTokenAuthentication] (http-/0.0.0.0:8444-12) Error trying to authenticate against the OpenShift server: java.net.ConnectException: Connection refused
        at java.net.PlainSocketImpl.socketConnect(Native Method) [rt.jar:1.8.0_51]
        at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:345) [rt.jar:1.8.0_51]
        at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206) [rt.jar:1.8.0_51]
        at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188) [rt.jar:1.8.0_51]
        at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) [rt.jar:1.8.0_51]
        at java.net.Socket.connect(Socket.java:589) [rt.jar:1.8.0_51]
        at sun.security.ssl.SSLSocketImpl.connect(SSLSocketImpl.java:668) [jsse.jar:1.8.0_51]
        at sun.security.ssl.BaseSSLSocketImpl.connect(BaseSSLSocketImpl.java:173) [jsse.jar:1.8.0_51]
        at sun.net.NetworkClient.doConnect(NetworkClient.java:180) [rt.jar:1.8.0_51]
        at sun.net.www.http.HttpClient.openServer(HttpClient.java:432) [rt.jar:1.8.0_51]
        at sun.net.www.http.HttpClient.openServer(HttpClient.java:527) [rt.jar:1.8.0_51]
        at sun.net.www.protocol.https.HttpsClient.<init>(HttpsClient.java:275) [rt.jar:1.8.0_51]
        at sun.net.www.protocol.https.HttpsClient.New(HttpsClient.java:371) [rt.jar:1.8.0_51]
        at sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.getNewHttpClient(AbstractDelegateHttpsURLConnection.java:191) [rt.jar:1.8.0_51]
        at sun.net.www.protocol.http.HttpURLConnection.plainConnect0(HttpURLConnection.java:1104) [rt.jar:1.8.0_51]
        at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:998) [rt.jar:1.8.0_51]
        at sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.connect(AbstractDelegateHttpsURLConnection.java:177) [rt.jar:1.8.0_51]
        at sun.net.www.protocol.http.HttpURLConnection.getOutputStream0(HttpURLConnection.java:1282) [rt.jar:1.8.0_51]
        at sun.net.www.protocol.http.HttpURLConnection.getOutputStream(HttpURLConnection.java:1257) [rt.jar:1.8.0_51]
        at sun.net.www.protocol.https.HttpsURLConnectionImpl.getOutputStream(HttpsURLConnectionImpl.java:250) [rt.jar:1.8.0_51]
        at org.hawkular.openshift.auth.OpenShiftTokenAuthentication.isAuthorized(OpenShiftTokenAuthentication.java:93) [hawkular-metrics-openshift-integration-0.8.0.Final-redhat-1.jar:0.8.0.Final-redhat-1]
        at org.hawkular.openshift.auth.OpenShiftTokenAuthentication.doFilter(OpenShiftTokenAuthentication.java:67) [hawkular-metrics-openshift-integration-0.8.0.Final-redhat-1.jar:0.8.0.Final-redhat-1]
        at org.hawkular.openshift.auth.OpenShiftAuthenticationFilter.doFilter(OpenShiftAuthenticationFilter.java:89) [hawkular-metrics-openshift-integration-0.8.0.Final-redhat-1.jar:0.8.0.Final-redhat-1]
        at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:246) [jbossweb-7.5.10.Final-redhat-1.jar:7.5.10.Final-redhat-1]
        at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:214) [jbossweb-7.5.10.Final-redhat-1.jar:7.5.10.Final-redhat-1]
        at org.hawkular.metrics.api.jaxrs.filter.CorsFilter.doFilter(CorsFilter.java:88) [classes:]
        at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:246) [jbossweb-7.5.10.Final-redhat-1.jar:7.5.10.Final-redhat-1]
        at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:214) [jbossweb-7.5.10.Final-redhat-1.jar:7.5.10.Final-redhat-1]
        at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:231) [jbossweb-7.5.10.Final-redhat-1.jar:7.5.10.Final-redhat-1]
        at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:149) [jbossweb-7.5.10.Final-redhat-1.jar:7.5.10.Final-redhat-1]
        at org.jboss.as.web.security.SecurityContextAssociationValve.invoke(SecurityContextAssociationValve.java:169) [jboss-as-web-7.5.3.Final-redhat-2.jar:7.5.3.Final-redhat-2]
        at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:150) [jbossweb-7.5.10.Final-redhat-1.jar:7.5.10.Final-redhat-1]
        at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:97) [jbossweb-7.5.10.Final-redhat-1.jar:7.5.10.Final-redhat-1]
        at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:102) [jbossweb-7.5.10.Final-redhat-1.jar:7.5.10.Final-redhat-1]
        at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:344) [jbossweb-7.5.10.Final-redhat-1.jar:7.5.10.Final-redhat-1]
        at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:854) [jbossweb-7.5.10.Final-redhat-1.jar:7.5.10.Final-redhat-1]
        at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:653) [jbossweb-7.5.10.Final-redhat-1.jar:7.5.10.Final-redhat-1]
        at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:926) [jbossweb-7.5.10.Final-redhat-1.jar:7.5.10.Final-redhat-1]
        at java.lang.Thread.run(Thread.java:745) [rt.jar:1.8.0_51]

I've looked at the trust store and confirmed that the top level CA was added and I also manipulated it (and saved back to the secret) so that it also contains the intermediate cert too but it looks as though it us now having issues connecting to itself. :(  Could this be more cert issues?  The certs are pretty new and I'm we had issue with them internally with any java 8 jdk/jre <1.8.0_60 - the java 8 version in hawkular metrics is pretty old - this could be an issue.  I'm wondering if there could be something else contributing to the "connection refused" message?

I can confirm that hawular-metrics is starting up on port 8444 and is presenting my certificate with corect chain of 3 certs - I can also browse to https://hawkular-metrics.paas.dev.iggroup.local in a browser and get the rights certs.

This has turned out a whole world of pain more difficult than I imagined that it would be!

regards



Dave

Comment 19 Matt Wringe 2016-01-14 14:38:45 UTC
"08:36:56,391 ERROR [org.hawkular.openshift.auth.OpenShiftTokenAuthentication] (http-/0.0.0.0:8444-12) Error trying to authenticate against the OpenShift server: java.net.ConnectException: Connection refused"

This means that the Hawkular-Metrics instance is not able to connect to the OpenShift master. Are you doing anything with your master configuration which would mean it is not accessible over the default https://kubernetes.default.svc:443 ? Are you configuring the MASTER_URL property to something during the deploy?

Could you please not just post logs and verbose information directly in to the comment sections. Please attach them as file or place them in a pastebin or similar system somewhere. Doing this will make it easier for us to help figure out the problem.

From what I can tell so far, it looks like there might be an issue with CA chains that we need to look into.

Comment 20 Matt Wringe 2016-01-18 22:01:17 UTC
I have gone through and verified that intermediary CA certificates seem to be working. A PR has also been created which includes a test specifically for this: https://github.com/openshift/origin-metrics/pull/63

In the test the "hawkular-metrics.pem" is set to https://github.com/mwringe/origin-metrics/blob/intermediary_ca/hack/keys/intermediary_ca/hawkular-metrics.pem (its the concatination of the public and private keys).

And the "hawkular-metrics-ca.cert" is set to https://github.com/mwringe/origin-metrics/blob/intermediary_ca/hack/keys/intermediary_ca/hawkular-metrics.pem (which includes both the intermediary CA and its root CA).

Comment 21 Scott McCarty 2016-02-04 14:30:42 UTC
SO, I had this exact same problem. It turned out that for me it was that the box had two interfaces and the installer picked that up and configured it:

My /root/.config/openshift/installer.cfg.yml looked like:

ansible_config: /usr/share/atomic-openshift-utils/ansible.cfg
ansible_log_path: /tmp/ansible.log
ansible_ssh_user: root
hosts:
- connect_to: aep-all.dc2.crunchtools.com
  hostname: aep-all.dc2.crunchtools.com
  ip: 192.168.122.55
  master: true
  node: true
  public_hostname: aep-all-public.dc2.crunchtools.com
  public_ip: 192.168.100.136
variant: atomic-enterprise
variant_version: '3.1'
version: v1

I am guessing this is what messed things up. I had installed, uninstalled, and re-installed like five times to figure this out. If I set all of the variables to one of the ip/hostname combinations Hawkular works fine, but if they are split, it breaks:

This works (even though DNS is wrong):

ansible_config: /usr/share/atomic-openshift-utils/ansible.cfg
ansible_log_path: /tmp/ansible.log
ansible_ssh_user: root
hosts:
- connect_to: aep-all.dc2.crunchtools.com
  hostname: aep-all.dc2.crunchtools.com
  ip: 192.168.100.136
  master: true
  node: true
  public_hostname: aep-all.dc2.crunchtools.com
  public_ip: 192.168.100.136
variant: atomic-enterprise
variant_version: '3.1'
version: v1

And this works:

hosts:
- connect_to: aep-all.dc2.crunchtools.com
  hostname: aep-all.dc2.crunchtools.com
  ip: 192.168.122.55
  master: true
  node: true
  public_hostname: aep-all.dcr24.crunchtools.com
  public_ip: 192.168.122.55
variant: atomic-enterprise
variant_version: '3.1'
version: v1

Hope that helps!

Comment 22 Matt Wringe 2016-02-04 15:12:40 UTC
The stuck in pending state is a fairly common issue which indicates that something went wrong.

@Scott is this ansible config file a customized one? Or something in which the installer configured? Wondering if we need to open a new BZ about the installer having an issue or not.

Comment 23 Scott McCarty 2016-02-04 23:56:35 UTC
The "atomic-openshift-installer install" command created the first (didn't work) and second config above (worked). I changed the IP Address on the final one (worked, and the correct ip address).

So, yeah, I am not sure how the logic of the installer determines what to plug into the above templates...

Comment 24 Scott McCarty 2016-02-04 23:59:54 UTC
The "atomic-openshift-installer install" command created the first (didn't work) and second config above (worked). I changed the IP Address on the final one (worked, and the correct ip address).

So, yeah, I am not sure how the logic of the installer determines what to plug into the above templates...

Comment 25 Matt Wringe 2016-02-05 13:59:57 UTC
Can you please open a separate BZ about your installer issues?

Comment 26 Dave McCormick 2016-02-12 10:29:45 UTC
Hi

Metrics are now FINALLY WORKING - I managed with fix them with the details from this bugzilla and a ticket I raised.

There were two issues preventing it from working: -

1  - metrics deployer not creating the certificate chain properly when containing 3 certs, i.e. ca -> intermediate -> cert
2 - metrics unable to start due to ha-master unless I set MASTER_URL= to the correct URL when processing the template.

I had to follow this procedure in order to make it work (warning: not pretty!): -

oc project openshift-infra
oc create -f - <<API
apiVersion: v1
kind: ServiceAccount
metadata:
 name: metrics-deployer
secrets:
- name: metrics-deployer
API
 
oadm policy add-role-to-user edit system:serviceaccount:openshift-infra:metrics-deployer
oadm policy add-cluster-role-to-user cluster-reader system:serviceaccount:openshift-infra:heapster

Add my key and cert to /etc/pki/hawkular-metrics.paas.dev.iggroup.local.pem
Add my intermediate AND ca certs to /etc/pki/ig-private-ca.crt

oc secrets new metrics-deployer hawkular-metrics.pem=/etc/pki/hawkular-metrics.paas.dev.iggroup.local.pem hawkular-metrics-ca.cert=/etc/pki/ig-private-ca.crt
oc process -f /usr/share/ansible/openshift-ansible/roles/openshift_examples/files/examples/v1.1/infrastructure-templates/enterprise/metrics-deployer.yaml -v HAWKULAR_METRICS_HOSTNAME=hawkular-metrics.paas.dev.iggroup.local,IMAGE_PREFIX=registry.access.redhat.com/openshift3/,IMAGE_VERSION=latest,USE_PERSISTENT_STORAGE=false,MASTER_URL=https://osemaster.dev.iggroup.local:8443 | oc create -f -

Now I need to manually create my own keystore from my already issued certs from COMODO CA: -

Create a p12 archive on the deploy server from the keys and pems etc..

openssl pkcs12 -export -in hawkular-metrics.paas.dev.iggroup.local.crt -inkey hawkular-metrics.paas.dev.iggroup.local.key \
               -out hawkular-metrics.paas.dev.iggroup.local.p12 -name hawkular-metrics \
               -CAfile hawkular-metrics.paas.dev.iggroup.local.crt.intermediate -chain
Create a new password (YYY - for these instructs only)
Now convert this to a new keystore...
keytool -importkeystore \
        -deststorepass YYYY -destkeypass YYYY -destkeystore hawkular-metrics.keystore \
        -srckeystore hawkular-metrics.paas.dev.iggroup.local.p12 -srcstoretype PKCS12 -srcstorepass YYY \
        -alias hawkular-metrics

Check with keytool -keystore hawkular-metrics.keystore -storepass YYYY -list -v
Now have a keystore with alias hawkular-metrics and a chain of 3 certs.
Now convert to base64 (without wrapping)
base64 -w 0 hawkular-metrics.keystore
echo "YYYY" | base64

oc edit rc/hawkular-metrics-secrets
Replace hawkular-metrics.keystore and hawkular-metrics.keystore.password with your new base64 encoded versions and save
oc delete pod hawkular-metrics-XXXX

Update the master-config (if not already done)
/etc/origin/master/master-config.yaml
assetConfig:
...
metricsPublicURL: https://hawkular-metrics.paas.dev.iggroup.local/hawkular/metrics

restart everything!
DONE

Getting these working has taken weeks of effort and left me with an impression of fragility with the platform.

I think the following needs to happen: -

1.  The documentation needs to include instructions to include MASTER_URL when processing the template if you are running HA Masters
2. The metrics deployer needs to be able to provision certs with multiple CA certs, and I'd suggest that running via the openssl plcs12 method and converting to a keystore is a good way of doing this.
3. Stop using lifecycle hooks as readiness checks!  It's absolutely horrible that pods get stuck in PENDING because they have a lifecycle hook waiting for successful start up.  You can't look in the logs or connect to the pod or anything useful whilst the PostStart hook has the pod in PENDING.  This is just an awful thing to do.  Why can't they use a readiness probe instead?

I've re-tested both the default installation method and this process on 3.1.1 and can confirm that the default method still doesn't work and that this workaround gets them working.

regards



Dave

Comment 27 Matt Wringe 2016-02-12 14:26:49 UTC
Congratulations on getting it to work.

A couple of comments:

"1  - metrics deployer not creating the certificate chain properly when containing 3 certs, i.e. ca -> intermediate -> cert"

As already mentioned [https://bugzilla.redhat.com/show_bug.cgi?id=1294067#c20] we have this exact setup as part of our tests. I suspect some of the secrets you are setting are not containing the correct values and is why you are seeing this problem.

You shouldn't need to do any more steps than just passing the right values to the deployer.


"2 - metrics unable to start due to ha-master unless I set MASTER_URL= to the correct URL when processing the template."

The default MASTER_URL should work unless you are either removing that service from the project or don't have your certificates properly configured. Having to modify the MASTER_URL usually would indicate that something is wrong with your OpenShift installation. We do have the MASTER_URL as an option though to help people get around this.

Do you know the exact error you were seeing when using the default value for the MASTER_URL?


"3. Stop using lifecycle hooks as readiness checks!  It's absolutely horrible that pods get stuck in PENDING because they have a lifecycle hook waiting for successful start up.  You can't look in the logs or connect to the pod or anything useful whilst the PostStart hook has the pod in PENDING.  This is just an awful thing to do.  Why can't they use a readiness probe instead?"

You can get the logs from docker, its currently an issue with OpenShift where it  can't get the logs from a pending container.

Readiness probes and PostStart hooks do completely different things and they are not at all interchangeable. I do agree that the PostStart scripts are in the enterprise containers are a bit awkward at the moment. Some improvements have been done for the next version, and the logs should be handled better in a future release as well.

Comment 28 Clayton Coleman 2016-02-21 20:51:24 UTC
> You can get the logs from docker, its currently an issue with OpenShift where it  can't get the logs from a pending container.

Is fixed in 3.2.

Comment 34 Matt Wringe 2016-04-13 23:20:07 UTC
Is they are having issues getting their custom certs working by overwriting the Hawkular Metrics cert, is it possible for them to use a re-encrypting route instead?

This by passes a lot of the head aches when providing your own certificates

Comment 38 Matt Wringe 2016-10-31 16:03:45 UTC
I am going to assume that when the customer updated to using re-encrypting endpoints that this issue was resolved. In more recent versions the only option is re-encrypting endpoints.


Note You need to log in before you can comment on or make changes to this bug.