Bug 1703489

Summary: "DaemonSet \"openshift-sdn/sdn\" is not available (awaiting 3 nodes)" on nightly build
Product: OpenShift Container Platform Reporter: Alberto <agarcial>
Component: ReleaseAssignee: Luke Meyer <lmeyer>
Status: CLOSED ERRATA QA Contact: zhaozhanqi <zzhao>
Severity: high Docs Contact:
Priority: unspecified    
Version: 4.1.0CC: aos-bugs, jokerman, mmccomas, smunilla
Target Milestone: ---   
Target Release: 4.1.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-06-04 10:48:05 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Alberto 2019-04-26 15:05:36 UTC
Description of problem:


Version-Release number of selected component (if applicable):
4.1

How reproducible:
Consitently on CI https://openshift-gce-devel.appspot.com/builds/origin-ci-test/logs/release-openshift-ocp-installer-e2e-aws-4.1/

{
    "apiVersion": "v1",
    "items": [
        {
            "apiVersion": "config.openshift.io/v1",
            "kind": "ClusterOperator",
            "metadata": {
                "creationTimestamp": "2019-04-26T03:19:09Z",
                "generation": 1,
                "name": "network",
                "resourceVersion": "5196",
                "selfLink": "/apis/config.openshift.io/v1/clusteroperators/network",
                "uid": "0e96ef71-67d2-11e9-9881-12ab88a974f6"
            },
            "spec": {},
            "status": {
                "conditions": [
                    {
                        "lastTransitionTime": "2019-04-26T03:19:10Z",
                        "status": "False",
                        "type": "Degraded"
                    },
                    {
                        "lastTransitionTime": "2019-04-26T03:19:10Z",
                        "message": "DaemonSet \"openshift-sdn/sdn\" is not available (awaiting 3 nodes)",
                        "reason": "Deploying",
                        "status": "True",
                        "type": "Progressing"
                    },
                    {
                        "lastTransitionTime": "2019-04-26T03:19:10Z",
                        "message": "The network is starting up",
                        "reason": "Startup",
                        "status": "False",
                        "type": "Available"
                    }
                ],
                "extension": null
            }
        }
    ],
    "kind": "List",
    "metadata": {
        "resourceVersion": "",
        "selfLink": ""
    }
}

Comment 1 Casey Callendrello 2019-04-26 15:47:13 UTC
It seems the ART / Release image is missing the CNI binaries:

                "containerStatuses": [
                    {
                        "containerID": "cri-o://d1a3fd93eaad050621ebeefb4d47a493c0500c4af1651e25027c7fee7c86543a",
                        "image": "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:a47d33353c544b861eecbb81a09a85355197add71cd7a03a5f0b59bfc838bdd9",
                        "imageID": "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:a47d33353c544b861eecbb81a09a85355197add71cd7a03a5f0b59bfc838bdd9",
                        "lastState": {
                            "terminated": {
                                "containerID": "cri-o://d1a3fd93eaad050621ebeefb4d47a493c0500c4af1651e25027c7fee7c86543a",
                                "exitCode": 1,
                                "finishedAt": "2019-04-26T12:30:09Z",
                                "message": "cp: cannot stat '/opt/cni/bin/*': No such file or directory\n",
                                "reason": "Error",
                                "startedAt": "2019-04-26T12:30:09Z"
                            }
                        },
                        "name": "sdn",
                        "ready": false,
                        "restartCount": 10,
                        "state": {
                            "waiting": {
                                "message": "Back-off 5m0s restarting failed container=sdn pod=sdn-lqlhg_openshift-sdn(4d931c67-681b-11e9-ba28-12bb28af3cd6)",
                                "reason": "CrashLoopBackOff"
                            }
                        }
                    }
                ],

Comment 2 Casey Callendrello 2019-04-26 15:57:16 UTC
Further analysis shows that this is caused by a OSBS hotfix that broke something to do with multiline dockerfile RUN's

Comment 3 Luke Meyer 2019-04-26 20:52:14 UTC
openshift-enterprise-node-container-v4.1.0-201904261432 has been built successfully and should be up to date, so whenever a release includes that this should be ready to review.

Comment 5 zhaozhanqi 2019-04-29 06:59:26 UTC
Verified this bug on 4.1.0-0.nightly-2019-04-28-064010

Comment 7 errata-xmlrpc 2019-06-04 10:48:05 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0758