Bug 1694793

Summary: [UPI] [METAL] pivot.service fails to start
Product: OpenShift Container Platform Reporter: David Sanz <dsanzmor>
Component: RHCOSAssignee: Steve Milner <smilner>
Status: CLOSED ERRATA QA Contact: Micah Abbott <miabbott>
Severity: medium Docs Contact:
Priority: medium    
Version: 4.1.0CC: bbreard, dustymabe, imcleod, jligon, nstielau, tbielawa, wsun
Target Milestone: ---Keywords: BetaBlocker
Target Release: 4.1.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-06-04 10:46:44 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description David Sanz 2019-04-01 17:07:53 UTC
Description of problem:
pivot.service is not starting correctly on RHCOS ootpa

# systemctl restart pivot
Apr 01 16:00:24 dell-r730-063.dsal.lab.eng.rdu2.redhat.com systemd[1]: Starting Pivot Tool...
Apr 01 16:00:24 dell-r730-063.dsal.lab.eng.rdu2.redhat.com pivot[5582]: pivot version 0.0.3 (7ee2318613fac74b32f0dd75bb6c32b292342214)
Apr 01 16:00:24 dell-r730-063.dsal.lab.eng.rdu2.redhat.com pivot[5582]: I0401 16:00:24.714484    5582 root.go:218] Using image pullspec from /etc/pivot/image-pullspec
Apr 01 16:00:24 dell-r730-063.dsal.lab.eng.rdu2.redhat.com pivot[5582]: I0401 16:00:24.714598    5582 run.go:16] Running: rpm-ostree status --json
Apr 01 16:00:24 dell-r730-063.dsal.lab.eng.rdu2.redhat.com pivot[5582]: I0401 16:00:24.770175    5582 run.go:16] Running: skopeo inspect --authfile /var/lib/kubelet/config.json docker://registry.svc.ci.openshift.org/openshift/origin-v4.0-2019-04-01-154356@sha256:83e25d1681c42be4e9257e288d79d3d3591301e1d5897caff96cecc38d7878f0
Job for pivot.service failed because the control process exited with error code.
See "systemctl status pivot.service" and "journalctl -xe" for details.
# Apr 01 16:00:27 dell-r730-063.dsal.lab.eng.rdu2.redhat.com pivot[5582]: I0401 16:00:27.092836    5582 root.go:143] Resolved to: registry.svc.ci.openshift.org/openshift/origin-v4.0-2019-04-01-154356@sha256:83e25d1681c42be4e9257e288d79d3d3591301e1d5897caff96cecc38d7878f0
Apr 01 16:00:27 dell-r730-063.dsal.lab.eng.rdu2.redhat.com pivot[5582]: F0401 16:00:27.092930    5582 root.go:147] parsing current osImageURL: parsing reference: "": invalid reference format
Apr 01 16:00:27 dell-r730-063.dsal.lab.eng.rdu2.redhat.com systemd[1]: pivot.service: Main process exited, code=exited, status=255/n/a
Apr 01 16:00:27 dell-r730-063.dsal.lab.eng.rdu2.redhat.com systemd[1]: pivot.service: Failed with result 'exit-code'.
Apr 01 16:00:27 dell-r730-063.dsal.lab.eng.rdu2.redhat.com systemd[1]: Failed to start Pivot Tool.

Skopeo command works fine and node can also pull the image:

# skopeo inspect --authfile /var/lib/kubelet/config.json docker://registry.svc.ci.openshift.org/openshift/origin-v4.0-2019-04-01-154356@sha256:83e25d1681c42be4e9257e288d79d3d3591301e1d5897caff96cecc38d7878f0
{
    "Name": "registry.svc.ci.openshift.org/openshift/origin-v4.0-2019-04-01-154356",
    "Digest": "sha256:83e25d1681c42be4e9257e288d79d3d3591301e1d5897caff96cecc38d7878f0",
    "RepoTags": [
        "must-gather",
        "cluster-storage-operator",
        "console",
        "control-plane",
        "csi-operator",
        "jenkins-agent-nodejs",
        "machine-api-operator",
        "cluster-autoscaler-operator",
        "docker-registry",
        "etcd",
        "k8s-prometheus-adapter",
        "logging-kibana5",
        "metering-operator",
        "artifacts",
        "egress-http-proxy",
        "kubemark-machine-controllers",
        "logging-eventrouter",
        "jenkins-agent-base",
        "libvirt-machine-controllers",
        "cli",
        "cloud-credential-operator",
        "f5-router",
        "node-problem-detector-operator",
        "prometheus",
        "sriov-network-device-plugin",
        "elasticsearch-operator",
        "hyperkube",
        "node",
        "cluster-dns-operator",
        "cluster-samples-operator",
        "csi-external-provisioner",
        "deployer",
        "prometheus-config-reloader",
        "_test_",
        "ansible-service-broker-operator",
        "cluster-authentication-operator",
        "cluster-ingress-operator",
        "cluster-svcat-apiserver-operator",
        "metering-helm",
        "logging-elasticsearch5",
        "operator-registry",
        "egress-router",
        "federation-controller",
        "machine-config-daemon",
        "machine-os-content",
        "ansible-service-broker",
        "descheduler-operator",
        "kube-client-agent",
        "metering-presto",
        "prometheus-alertmanager",
        "setup-etcd-environment",
        "cluster-image-registry-operator",
        "cluster-kube-apiserver-operator",
        "cluster-machine-approver",
        "installer",
        "jenkins",
        "kube-state-metrics",
        "cluster-capacity",
        "cluster-version-operator",
        "configmap-reloader",
        "grafana",
        "jenkins-agent-maven",
        "logging-fluentd",
        "manila-provisioner",
        "multus-cni",
        "telemeter",
        "base",
        "kube-etcd-signer-server",
        "metering-reporting-operator",
        "prometheus-operator",
        "recycler",
        "console-operator",
        "csi-external-attacher",
        "metering-hive",
        "azure-machine-controllers",
        "cluster-node-tuned",
        "snapshot-controller",
        "cluster-bootstrap",
        "keepalived-ipfailover",
        "reporting-operator",
        "sriov-cni",
        "template-service-broker",
        "baremetal-machine-controllers",
        "cluster-kube-scheduler-operator",
        "coredns",
        "csi-livenessprobe",
        "local-storage-diskmaker",
        "service-catalog",
        "branding",
        "cluster-autoscaler",
        "docker-builder",
        "hypershift",
        "operator-lifecycle-manager",
        "prom-label-proxy",
        "ansible-operator",
        "container-networking-plugins-unsupported",
        "logging-rsyslog",
        "metering-helm-operator",
        "cluster-kube-controller-manager-operator",
        "cluster-logging-operator",
        "cluster-network-operator",
        "cluster-svcat-controller-manager-operator",
        "logging-curator5",
        "metering-hadoop",
        "node-problem-detector",
        "pod",
        "cluster-api",
        "cluster-monitoring-operator",
        "cluster-node-tuning-operator",
        "local-storage-operator",
        "machine-config-operator",
        "machine-config-server",
        "sriov-dp-admission-controller",
        "tests",
        "cluster-config-operator",
        "cluster-openshift-controller-manager-operator",
        "descheduler",
        "haproxy-router",
        "oauth-proxy",
        "openshift-tuned",
        "cluster-etcd-operator",
        "multus-admission-controller",
        "prometheus-node-exporter",
        "service-ca-operator",
        "snapshot-provisioner",
        "openstack-machine-controllers",
        "aws-machine-controllers",
        "egress-dns-proxy",
        "hadoop",
        "helm",
        "machine-config-controller",
        "nginx-router",
        "presto",
        "cluster-openshift-apiserver-operator",
        "csi-driver-registrar",
        "container-networking-plugins-supported",
        "kube-rbac-proxy",
        "operator-marketplace",
        "ansible",
        "ovn-kubernetes"
    ],
    "Created": "2019-03-29T17:00:19.220915457Z",
    "DockerVersion": "",
    "Labels": {
        "com.coreos.ostree-commit": "53350d9bae2f2917562a70d0d4c107b80611e4f45a80a5a5378d131d4537b4b3",
        "version": "410.8.20190329.0"
    },
    "Architecture": "amd64",
    "Os": "linux",
    "Layers": [
        "sha256:754a74d2cc4a9baa55f14d7494b496db3e75e6e281756528da03752c97b2ab32"
    ]
}

# podman pull registry.svc.ci.openshift.org/openshift/origin-v4.0-2019-04-01-154356@sha256:83e25d1681c42be4e9257e288d79d3d3591301e1d5897caff96cecc38d7878f0
Trying to pull registry.svc.ci.openshift.org/openshift/origin-v4.0-2019-04-01-154356@sha256:83e25d1681c42be4e9257e288d79d3d3591301e1d5897caff96cecc38d7878f0...
Getting image source signatures
Copying blob 754a74d2cc4a: 599.23 MiB / 599.23 MiB [========================] 8s
Copying config 78f2388a49e8: 422 B / 422 B [================================] 0s
Writing manifest to image destination
Storing signatures
78f2388a49e8628cb444e0ab169967000300598b78cd5c14c8db696c35e7e94c


# cat /etc/os-release 
NAME="Red Hat Enterprise Linux CoreOS"
VERSION="410.8.20190326.0"
VERSION_ID="4.1"
PRETTY_NAME="Red Hat Enterprise Linux CoreOS 410.8.20190326.0 (Ootpa)"
ID="rhcos"
ID_LIKE="rhel fedora"
ANSI_COLOR="0;31"
HOME_URL="https://www.redhat.com/"
BUG_REPORT_URL="https://bugzilla.redhat.com/"
REDHAT_BUGZILLA_PRODUCT="OpenShift Container Platform"
REDHAT_BUGZILLA_PRODUCT_VERSION="4.1"
REDHAT_SUPPORT_PRODUCT="OpenShift Container Platform"
REDHAT_SUPPORT_PRODUCT_VERSION="4.1"
OSTREE_VERSION=410.8.20190326.0

# rpm -qa | grep pivot
error: Macro % has illegal name (%define)
error: Macro % has illegal name (%define)
error: Macro % has illegal name (%define)
pivot-0.0.3-4.el7.x86_64

# /root/bin/openshift-install-4.0.0-0.nightly-2019-03-29-040459 --dir /root/installation/baremetal-0104 version
/root/bin/openshift-install-4.0.0-0.nightly-2019-03-29-040459 v4.0.22-201903272149-dirty
built from commit 99775e8fd42bd09ca596a98778837c6fbe764437


How reproducible:

Steps to Reproduce:
1.Install OCP4 on baremetal according to the guide
2.Access master node and run systemctl status pivot
3.service.pivot is down

Comment 2 Steve Milner 2019-04-01 17:29:15 UTC
On initial review the problem is occurring due to `imgref.ParseNamed` in `pivot` not being given proper information. This may be an issue with information provided down from a higher level. Digging in a bit more.

Comment 9 David Sanz 2019-04-16 15:44:48 UTC
Verified on:

RHCOS: ootpa/410.8.20190415.2
OPENSHIFT-INSTALL: unreleased-master-811-g5a3c57cb37b0f175c2ae33e64cd9a6947bd1d567-dirty, commit: 5a3c57cb37b0f175c2ae33e64cd9a6947bd1d567, image: registry.svc.ci.openshift.org/origin/release:v4.1

Comment 11 errata-xmlrpc 2019-06-04 10:46:44 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0758