Bug 1592303 - After running redeploy-certificates.yml playbook in OCP 3.9 webconsole stop working.
Summary: After running redeploy-certificates.yml playbook in OCP 3.9 webconsole stop w...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Installer
Version: 3.9.0
Hardware: Unspecified
OS: Linux
unspecified
high
Target Milestone: ---
: 3.9.z
Assignee: Vadim Rutkovsky
QA Contact: Yadan Pei
URL:
Whiteboard:
Depends On: 1596233
Blocks: 1596557 1623987 1667981
TreeView+ depends on / blocked
 
Reported: 2018-06-18 12:14 UTC by Joel Rosental R.
Modified: 2019-01-30 10:30 UTC (History)
22 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1596557 1667981 (view as bug list)
Environment:
Last Closed: 2018-09-22 04:53:09 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
oc get ev -n openshift-web-console (11.91 KB, text/plain)
2018-06-28 11:25 UTC, Yanping Zhang
no flags Details
`ansible-playbook -vvv` log (4.32 MB, text/plain)
2018-06-28 11:46 UTC, Yanping Zhang
no flags Details
Controllers (169.13 KB, application/x-xz)
2018-06-28 12:08 UTC, Vadim Rutkovsky
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Knowledge Base (Solution) 3520161 0 None None None 2018-07-05 13:31:21 UTC
Red Hat Product Errata RHBA-2018:2658 0 None None None 2018-09-22 04:53:57 UTC

Description Joel Rosental R. 2018-06-18 12:14:32 UTC
Description of problem:
After running /usr/share/ansible/openshift-ansible/playbooks/redeploy-certificates.yml to redeploy certificates the webconsole stop working returning a 502 HTTP error code:

and in webconsole pod logs I can see:

I0618 11:14:17.438336       1 start.go:201] OpenShift Web Console Version: v3.9.14
I0618 11:14:17.438552       1 serve.go:89] Serving securely on 0.0.0.0:8443
I0618 11:20:03.679898       1 logs.go:41] http: TLS handshake error from 10.128.0.1:51254: remote error: tls: bad certificate
I0618 11:20:07.225821       1 logs.go:41] http: TLS handshake error from 10.128.0.1:51260: remote error: tls: bad certificate

so most likely the webconsole secrets are not being updated.

Current workaround is to re-create"openshift-web-console" project and then run the /usr/share/ansible/openshift-ansible/playbooks/openshift-web-console/config.yml playbook

Version-Release number of selected component (if applicable):
Tested in v3.9.14

How reproducible:
Always

Steps to Reproduce:
1. ansible-playbook -i /path/to/inventory /usr/share/ansible/openshift-ansible/playbooks/redeploy-certificates.yml


Actual results:
Accessing the webconsole returns a 502 HTTP error code.

Expected results:
It should refresh webconsole secrets.

Additional info:

Comment 1 Borja Aranda 2018-06-18 14:49:01 UTC
The playbook should also take in consideration if a new named certificate is provided for the public url, so the secret is created using it instead of the selfsigned one.

Comment 3 Vadim Rutkovsky 2018-06-21 12:08:50 UTC
PR for master: https://github.com/openshift/openshift-ansible/pull/8891

Comment 5 Aaron Ship 2018-06-26 09:59:14 UTC
Hi,
I am the critical Situation Manager for EMEA and would like to ask please on the ETA of this Fix.. If this will be scheduled in the next Openshift-ansible errata when will that date be?

Comment 6 Vadim Rutkovsky 2018-06-28 08:35:22 UTC
PR for 3.9 merged https://github.com/openshift/openshift-ansible/pull/9005

Comment 8 Yanping Zhang 2018-06-28 10:58:22 UTC
Tried with fix in pr9005, after run "ansible-playbook -i /path/to/inventory /usr/share/ansible/openshift-ansible/playbooks/redeploy-certificates.yml", web pod could not running due to lacking of secret "webconsole-serving-cert"
# oc get pod
NAME                          READY     STATUS              RESTARTS   AGE
webconsole-6b74f5c578-n8mdg   0/1       ContainerCreating   0          6m

# oc describe pod webconsole-6b74f5c578-n8mdg
Name:           webconsole-6b74f5c578-n8mdg
Namespace:      openshift-web-console
Node:           qe-juzhao-39-qeos-1-master-etcd-1/172.16.120.82
Start Time:     Thu, 28 Jun 2018 06:49:50 -0400
Labels:         app=openshift-web-console
                pod-template-hash=2630917134
                webconsole=true
Annotations:    openshift.io/scc=restricted
Status:         Pending
IP:             
Controlled By:  ReplicaSet/webconsole-6b74f5c578
Containers:
  webconsole:
    Container ID:  
    Image:         registry.reg-aws.openshift.com:443/openshift3/ose-web-console:v3.9.31
    Image ID:      
    Port:          8443/TCP
    Command:
      /usr/bin/origin-web-console
      --audit-log-path=-
      -v=0
      --config=/var/webconsole-config/webconsole-config.yaml
    State:          Waiting
      Reason:       ContainerCreating
    Ready:          False
    Restart Count:  0
    Requests:
      cpu:     100m
      memory:  100Mi
    Liveness:  exec [/bin/sh -c if [[ ! -f /tmp/webconsole-config.hash ]]; then \
  md5sum /var/webconsole-config/webconsole-config.yaml > /tmp/webconsole-config.hash; \
elif [[ $(md5sum /var/webconsole-config/webconsole-config.yaml) != $(cat /tmp/webconsole-config.hash) ]]; then \
  echo 'webconsole-config.yaml has changed.'; \
  exit 1; \
fi && curl -k -f https://0.0.0.0:8443/console/] delay=0s timeout=1s period=10s #success=1 #failure=3
    Readiness:    http-get https://:8443/healthz delay=0s timeout=1s period=10s #success=1 #failure=3
    Environment:  <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from webconsole-token-26l96 (ro)
      /var/serving-cert from serving-cert (rw)
      /var/webconsole-config from webconsole-config (rw)
Conditions:
  Type           Status
  Initialized    True 
  Ready          False 
  PodScheduled   True 
Volumes:
  serving-cert:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  webconsole-serving-cert
    Optional:    false
  webconsole-config:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      webconsole-config
    Optional:  false
  webconsole-token-26l96:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  webconsole-token-26l96
    Optional:    false
QoS Class:       Burstable
Node-Selectors:  node-role.kubernetes.io/master=true
Tolerations:     node.kubernetes.io/memory-pressure:NoSchedule
Events:
  Type     Reason                 Age               From                                        Message
  ----     ------                 ----              ----                                        -------
  Normal   Scheduled              1m                default-scheduler                           Successfully assigned webconsole-6b74f5c578-n8mdg to qe-juzhao-39-qeos-1-master-etcd-1
  Normal   SuccessfulMountVolume  1m                kubelet, qe-juzhao-39-qeos-1-master-etcd-1  MountVolume.SetUp succeeded for volume "webconsole-config"
  Normal   SuccessfulMountVolume  1m                kubelet, qe-juzhao-39-qeos-1-master-etcd-1  MountVolume.SetUp succeeded for volume "webconsole-token-26l96"
  Warning  FailedMount            27s (x8 over 1m)  kubelet, qe-juzhao-39-qeos-1-master-etcd-1  MountVolume.SetUp failed for volume "serving-cert" : secrets "webconsole-serving-cert" not found
[root@qe-juzhao-39-qeos-1-master-etcd-1 ~]# oc get secret
NAME                         TYPE                                  DATA      AGE
builder-dockercfg-4xx56      kubernetes.io/dockercfg               1         9h
builder-token-nx57t          kubernetes.io/service-account-token   4         9h
builder-token-zrr24          kubernetes.io/service-account-token   4         9h
default-dockercfg-fjvpm      kubernetes.io/dockercfg               1         9h
default-token-4gsx6          kubernetes.io/service-account-token   4         9h
default-token-ssrnq          kubernetes.io/service-account-token   4         9h
deployer-dockercfg-m2c92     kubernetes.io/dockercfg               1         9h
deployer-token-4pd56         kubernetes.io/service-account-token   4         9h
deployer-token-z2c68         kubernetes.io/service-account-token   4         9h
webconsole-dockercfg-bg5hc   kubernetes.io/dockercfg               1         9h
webconsole-token-26l96       kubernetes.io/service-account-token   4         9h
webconsole-token-c8m6j       kubernetes.io/service-account-token   4         9h

Comment 9 Vadim Rutkovsky 2018-06-28 11:03:25 UTC
Please attach the output of `ansible-playbook -vvv` and `oc get ev -n openshift-web-console` output

Comment 10 Yanping Zhang 2018-06-28 11:25:05 UTC
Created attachment 1455244 [details]
oc get ev -n openshift-web-console

Comment 11 Yanping Zhang 2018-06-28 11:46:32 UTC
Created attachment 1455251 [details]
`ansible-playbook -vvv` log

Comment 12 Vadim Rutkovsky 2018-06-28 12:08:21 UTC
Created attachment 1455280 [details]
Controllers

Controllers are not recreating this secret for some reason, see log attached

Comment 13 Vadim Rutkovsky 2018-06-28 13:25:37 UTC
Created a PR https://github.com/openshift/openshift-ansible/pull/9012 to workaround the issue with controllers (bug #1596233)

Comment 16 Mike Fiedler 2018-06-28 16:53:19 UTC
Successfully tested with the hotfix build in comment 15.   Prior to running the playbook the web console pod was stuck in ContainerCreating as described in comment 8.  

After running the playbook, the web console pod was Running and the web console was accessible via browser.

Not marking this Verified as of now - it needs to be tested in an official puddle.

Comment 17 Vadim Rutkovsky 2018-07-20 13:19:52 UTC
Fix is available in openshift-ansible-3.9.37-1

Comment 18 Yanping Zhang 2018-07-23 08:10:49 UTC
openshift v3.9.37
kubernetes v1.9.1+a0ce1bc657
openshift-ansible-3.9.37-1.git.0.51fbd81.el7.noarch.rpm 
Using command: "ansible-playbook -i /path/to/inventory /usr/share/ansible/openshift-ansible/playbooks/redeploy-certificates.yml"

Redeploy certificates on OCP 3.9 env with above package version. the web console pod and cert are redeployed and web console could be accessed successfully.
The bug has been fixed, so move it to Verified.

Comment 19 Andrea Spagnolo 2018-08-22 12:57:05 UTC
The PR for 3.9 merged https://github.com/openshift/openshift-ansible/pull/9005 doesn't fix if you want to re-deploy only master certificates

I created the PR https://github.com/openshift/openshift-ansible/pull/9713

regards

Comment 22 Yanping Zhang 2018-09-07 02:19:22 UTC
https://github.com/openshift/openshift-ansible/pull/9713
The pr is not merged yet.

Comment 23 Vadim Rutkovsky 2018-09-07 06:44:18 UTC
(In reply to Yanping Zhang from comment #22)
> https://github.com/openshift/openshift-ansible/pull/9713
> The pr is not merged yet.

This PR is not required, https://github.com/openshift/openshift-ansible/pull/9012 is sufficient to get web console working after certs were redeployed.

Comment 25 errata-xmlrpc 2018-09-22 04:53:09 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:2658


Note You need to log in before you can comment on or make changes to this bug.