Bug 1665597 - The router is broken: 4.0.0-0.nightly-2019-01-11-205323: /var/lib/haproxy/conf/haproxy-config.template missing
Summary: The router is broken: 4.0.0-0.nightly-2019-01-11-205323: /var/lib/haproxy/con...
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Routing
Version: 4.1.0
Hardware: Unspecified
OS: Unspecified
Target Milestone: ---
: 4.1.0
Assignee: Dan Mace
QA Contact: Hongan Li
Depends On:
TreeView+ depends on / blocked
Reported: 2019-01-11 21:53 UTC by Hongkai Liu
Modified: 2019-06-04 10:41 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Last Closed: 2019-06-04 10:41:49 UTC
Target Upstream Version:

Attachments (Terms of Use)

System ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2019:0758 None None None 2019-06-04 10:41:55 UTC

Description Hongkai Liu 2019-01-11 21:53:24 UTC
Description of problem:
Router is broken after the installation.

Version-Release number of selected component (if applicable):
$ oc adm release info --pullspecs registry.svc.ci.openshift.org/ocp/release:4.0.0-0.nightly-2019-01-11-205323 | grep installer
  installer                                     registry.svc.ci.openshift.org/ocp/4.0-art-latest-2019-01-11-205323@sha256:58b5bc0f10caa359d520b7ee2cf695b60c1971d3c141abe99d33b8e024ef114f

How reproducible:

Steps to Reproduce:
1. check the pod after installation

Actual results:

Expected results:

Additional info:

$ oc logs -n openshift-ingress                                         router-default-86f48b66c4-5fsrn
I0111 21:31:04.467607       1 template.go:299] Starting template router (v4.0.0-0.136.0)
error: open /var/lib/haproxy/conf/haproxy-config.template: no such file or directory

$ oc describe pod -n openshift-ingress                                         router-default-86f48b66c4-5fsrn
Name:               router-default-86f48b66c4-5fsrn
Namespace:          openshift-ingress
Priority:           2000000000
PriorityClassName:  system-cluster-critical
Node:               ip-10-0-162-93.us-east-2.compute.internal/
Start Time:         Fri, 11 Jan 2019 21:25:00 +0000
Labels:             app=router
Annotations:        <none>
Status:             Running
Controlled By:      ReplicaSet/router-default-86f48b66c4
    Container ID:   cri-o://c7813f3f1c56b48809547c884908b55c7f589287502559046e2849e537ce1e9c
    Image:          registry.svc.ci.openshift.org/ocp/4.0-art-latest-2019-01-11-205323@sha256:6ede9cb0b73dc9df975b35822667662c21eacb725fba37dc20ab5f2327b12818
    Image ID:       registry.svc.ci.openshift.org/ocp/4.0-art-latest-2019-01-11-205323@sha256:6ede9cb0b73dc9df975b35822667662c21eacb725fba37dc20ab5f2327b12818
    Ports:          80/TCP, 443/TCP, 1936/TCP
    Host Ports:     0/TCP, 0/TCP, 0/TCP
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Error
      Exit Code:    1
      Started:      Fri, 11 Jan 2019 21:31:04 +0000
      Finished:     Fri, 11 Jan 2019 21:31:04 +0000
    Ready:          False
    Restart Count:  6
    Liveness:       http-get http://:1936/healthz delay=10s timeout=1s period=10s #success=1 #failure=3
    Readiness:      http-get http://:1936/healthz delay=10s timeout=1s period=10s #success=1 #failure=3
      STATS_PORT:                 1936
      ROUTER_SERVICE_NAMESPACE:   openshift-ingress
      DEFAULT_CERTIFICATE_DIR:    /etc/pki/tls/private
      ROUTER_SERVICE_NAME:        default
      ROUTER_CANONICAL_HOSTNAME:  apps.hongkliu.qe.devcluster.openshift.com
      /etc/pki/tls/private from default-certificate (ro)
      /var/run/secrets/kubernetes.io/serviceaccount from router-token-ktp67 (ro)
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
    Type:        Secret (a volume populated by a Secret)
    SecretName:  router-certs-default
    Optional:    false
    Type:        Secret (a volume populated by a Secret)
    SecretName:  router-token-ktp67
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  node-role.kubernetes.io/worker=
Tolerations:     <none>
  Type     Reason       Age               From                                                Message
  ----     ------       ----              ----                                                -------
  Normal   Scheduled    7m                default-scheduler                                   Successfully assigned openshift-ingress/router-default-86f48b66c4-5fsrn to ip-10-0-162-93.us-east-2.compute.internal
  Warning  FailedMount  7m                kubelet, ip-10-0-162-93.us-east-2.compute.internal  MountVolume.SetUp failed for volume "default-certificate" : secrets "router-certs-default" not found
  Normal   Pulling      7m                kubelet, ip-10-0-162-93.us-east-2.compute.internal  pulling image "registry.svc.ci.openshift.org/ocp/4.0-art-latest-2019-01-11-205323@sha256:6ede9cb0b73dc9df975b35822667662c21eacb725fba37dc20ab5f2327b12818"
  Normal   Pulled       7m                kubelet, ip-10-0-162-93.us-east-2.compute.internal  Successfully pulled image "registry.svc.ci.openshift.org/ocp/4.0-art-latest-2019-01-11-205323@sha256:6ede9cb0b73dc9df975b35822667662c21eacb725fba37dc20ab5f2327b12818"
  Normal   Created      6m (x4 over 7m)   kubelet, ip-10-0-162-93.us-east-2.compute.internal  Created container
  Normal   Started      6m (x4 over 7m)   kubelet, ip-10-0-162-93.us-east-2.compute.internal  Started container
  Normal   Pulled       5m (x4 over 7m)   kubelet, ip-10-0-162-93.us-east-2.compute.internal  Container image "registry.svc.ci.openshift.org/ocp/4.0-art-latest-2019-01-11-205323@sha256:6ede9cb0b73dc9df975b35822667662c21eacb725fba37dc20ab5f2327b12818" already present on machine
  Warning  BackOff      2m (x27 over 7m)  kubelet, ip-10-0-162-93.us-east-2.compute.internal  Back-off restarting failed container

Comment 1 Dan Mace 2019-01-11 22:15:55 UTC
This report is filed against a broken build:


4.0.0-0.nightly-2019-01-11-205323	Rejected (VerificationFailed)	1 hour ago	e2e-aws e2e-aws-serial

I'll leave it open for now. Clayton and ART are working on fixing the build. But it doesn't seem like we should be testing broken builds.

Comment 4 Hongan Li 2019-01-14 09:02:45 UTC
tested with 4.0.0-0.nightly-2019-01-12-000105 and issue has been fixed.

# oc get clusterversions.config.openshift.io 
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE     STATUS
version   4.0.0-0.nightly-2019-01-12-000105   True        False         4h        Cluster version is 4.0.0-0.nightly-2019-01-12-000105

# oc get pod -n openshift-ingress
NAME                             READY     STATUS    RESTARTS   AGE
router-default-77994b7b7-2n8g8   1/1       Running   0          2h

# oc -n openshift-ingress logs router-default-77994b7b7-2n8g8
I0114 06:21:24.693186       1 template.go:299] Starting template router (v4.0.0-0.136.0)
I0114 06:21:24.735370       1 router.go:482] Router reloaded:
 - Checking http://localhost:80 ...
 - Health check ok : 0 retry attempt(s).
I0114 06:21:24.735407       1 router.go:255] Router is including routes in all namespaces

Comment 5 Hongkai Liu 2019-01-14 15:06:32 UTC
Thanks, Hongli.
It works for me too.

$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE     STATUS
version   4.0.0-0.nightly-2019-01-12-000105   True        False         1m        Cluster version is 4.0.0-0.nightly-2019-01-12-000105
[fedora@ip-172-31-32-37 20190114]$ oc get pod -n openshift-ingress
NAME                              READY     STATUS    RESTARTS   AGE
router-default-6f5b8695d7-g2k54   1/1       Running   0          5m

Comment 8 errata-xmlrpc 2019-06-04 10:41:49 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.