Bug 1665597

Summary: The router is broken: 4.0.0-0.nightly-2019-01-11-205323: /var/lib/haproxy/conf/haproxy-config.template missing
Product: OpenShift Container Platform Reporter: Hongkai Liu <hongkliu>
Component: NetworkingAssignee: Dan Mace <dmace>
Networking sub component: router QA Contact: Hongan Li <hongli>
Status: CLOSED ERRATA Docs Contact:
Severity: high    
Priority: unspecified CC: aos-bugs, bperkins, ccoleman, hongkliu, mifiedle, vlaad, xtian
Version: 4.1.0Keywords: TestBlocker
Target Milestone: ---   
Target Release: 4.1.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-06-04 10:41:49 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Hongkai Liu 2019-01-11 21:53:24 UTC
Description of problem:
Router is broken after the installation.

Version-Release number of selected component (if applicable):
$ oc adm release info --pullspecs registry.svc.ci.openshift.org/ocp/release:4.0.0-0.nightly-2019-01-11-205323 | grep installer
  installer                                     registry.svc.ci.openshift.org/ocp/4.0-art-latest-2019-01-11-205323@sha256:58b5bc0f10caa359d520b7ee2cf695b60c1971d3c141abe99d33b8e024ef114f


How reproducible:
1/1

Steps to Reproduce:
1. check the pod after installation
2.
3.

Actual results:


Expected results:


Additional info:

$ oc logs -n openshift-ingress                                         router-default-86f48b66c4-5fsrn
I0111 21:31:04.467607       1 template.go:299] Starting template router (v4.0.0-0.136.0)
error: open /var/lib/haproxy/conf/haproxy-config.template: no such file or directory

$ oc describe pod -n openshift-ingress                                         router-default-86f48b66c4-5fsrn
Name:               router-default-86f48b66c4-5fsrn
Namespace:          openshift-ingress
Priority:           2000000000
PriorityClassName:  system-cluster-critical
Node:               ip-10-0-162-93.us-east-2.compute.internal/10.0.162.93
Start Time:         Fri, 11 Jan 2019 21:25:00 +0000
Labels:             app=router
                    pod-template-hash=4290462270
                    router=router-default
Annotations:        <none>
Status:             Running
IP:                 10.128.2.4
Controlled By:      ReplicaSet/router-default-86f48b66c4
Containers:
  router:
    Container ID:   cri-o://c7813f3f1c56b48809547c884908b55c7f589287502559046e2849e537ce1e9c
    Image:          registry.svc.ci.openshift.org/ocp/4.0-art-latest-2019-01-11-205323@sha256:6ede9cb0b73dc9df975b35822667662c21eacb725fba37dc20ab5f2327b12818
    Image ID:       registry.svc.ci.openshift.org/ocp/4.0-art-latest-2019-01-11-205323@sha256:6ede9cb0b73dc9df975b35822667662c21eacb725fba37dc20ab5f2327b12818
    Ports:          80/TCP, 443/TCP, 1936/TCP
    Host Ports:     0/TCP, 0/TCP, 0/TCP
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Error
      Exit Code:    1
      Started:      Fri, 11 Jan 2019 21:31:04 +0000
      Finished:     Fri, 11 Jan 2019 21:31:04 +0000
    Ready:          False
    Restart Count:  6
    Liveness:       http-get http://:1936/healthz delay=10s timeout=1s period=10s #success=1 #failure=3
    Readiness:      http-get http://:1936/healthz delay=10s timeout=1s period=10s #success=1 #failure=3
    Environment:
      STATS_PORT:                 1936
      ROUTER_SERVICE_NAMESPACE:   openshift-ingress
      DEFAULT_CERTIFICATE_DIR:    /etc/pki/tls/private
      ROUTER_SERVICE_NAME:        default
      ROUTER_CANONICAL_HOSTNAME:  apps.hongkliu.qe.devcluster.openshift.com
    Mounts:
      /etc/pki/tls/private from default-certificate (ro)
      /var/run/secrets/kubernetes.io/serviceaccount from router-token-ktp67 (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  default-certificate:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  router-certs-default
    Optional:    false
  router-token-ktp67:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  router-token-ktp67
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  node-role.kubernetes.io/worker=
Tolerations:     <none>
Events:
  Type     Reason       Age               From                                                Message
  ----     ------       ----              ----                                                -------
  Normal   Scheduled    7m                default-scheduler                                   Successfully assigned openshift-ingress/router-default-86f48b66c4-5fsrn to ip-10-0-162-93.us-east-2.compute.internal
  Warning  FailedMount  7m                kubelet, ip-10-0-162-93.us-east-2.compute.internal  MountVolume.SetUp failed for volume "default-certificate" : secrets "router-certs-default" not found
  Normal   Pulling      7m                kubelet, ip-10-0-162-93.us-east-2.compute.internal  pulling image "registry.svc.ci.openshift.org/ocp/4.0-art-latest-2019-01-11-205323@sha256:6ede9cb0b73dc9df975b35822667662c21eacb725fba37dc20ab5f2327b12818"
  Normal   Pulled       7m                kubelet, ip-10-0-162-93.us-east-2.compute.internal  Successfully pulled image "registry.svc.ci.openshift.org/ocp/4.0-art-latest-2019-01-11-205323@sha256:6ede9cb0b73dc9df975b35822667662c21eacb725fba37dc20ab5f2327b12818"
  Normal   Created      6m (x4 over 7m)   kubelet, ip-10-0-162-93.us-east-2.compute.internal  Created container
  Normal   Started      6m (x4 over 7m)   kubelet, ip-10-0-162-93.us-east-2.compute.internal  Started container
  Normal   Pulled       5m (x4 over 7m)   kubelet, ip-10-0-162-93.us-east-2.compute.internal  Container image "registry.svc.ci.openshift.org/ocp/4.0-art-latest-2019-01-11-205323@sha256:6ede9cb0b73dc9df975b35822667662c21eacb725fba37dc20ab5f2327b12818" already present on machine
  Warning  BackOff      2m (x27 over 7m)  kubelet, ip-10-0-162-93.us-east-2.compute.internal  Back-off restarting failed container

Comment 1 Dan Mace 2019-01-11 22:15:55 UTC
This report is filed against a broken build:

https://openshift-release.svc.ci.openshift.org

4.0.0-0.nightly-2019-01-11-205323	Rejected (VerificationFailed)	1 hour ago	e2e-aws e2e-aws-serial

I'll leave it open for now. Clayton and ART are working on fixing the build. But it doesn't seem like we should be testing broken builds.

Comment 4 Hongan Li 2019-01-14 09:02:45 UTC
tested with 4.0.0-0.nightly-2019-01-12-000105 and issue has been fixed.

# oc get clusterversions.config.openshift.io 
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE     STATUS
version   4.0.0-0.nightly-2019-01-12-000105   True        False         4h        Cluster version is 4.0.0-0.nightly-2019-01-12-000105

# oc get pod -n openshift-ingress
NAME                             READY     STATUS    RESTARTS   AGE
router-default-77994b7b7-2n8g8   1/1       Running   0          2h

# oc -n openshift-ingress logs router-default-77994b7b7-2n8g8
I0114 06:21:24.693186       1 template.go:299] Starting template router (v4.0.0-0.136.0)
I0114 06:21:24.735370       1 router.go:482] Router reloaded:
 - Checking http://localhost:80 ...
 - Health check ok : 0 retry attempt(s).
I0114 06:21:24.735407       1 router.go:255] Router is including routes in all namespaces

Comment 5 Hongkai Liu 2019-01-14 15:06:32 UTC
Thanks, Hongli.
It works for me too.

$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE     STATUS
version   4.0.0-0.nightly-2019-01-12-000105   True        False         1m        Cluster version is 4.0.0-0.nightly-2019-01-12-000105
[fedora@ip-172-31-32-37 20190114]$ oc get pod -n openshift-ingress
NAME                              READY     STATUS    RESTARTS   AGE
router-default-6f5b8695d7-g2k54   1/1       Running   0          5m

Comment 8 errata-xmlrpc 2019-06-04 10:41:49 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0758