Hide Forgot
Created attachment 1539670 [details] undercloud_install.log Description of problem: During undercloud installation podman pull fails with Digest did not match errors: "unable to pull 192.168.24.1:8787/rhosp15/openstack-nova-placement-api:20190226.1: unable to pull image: Error reading blob sha256:34e50c868cb3516f770bda83749a1de6bca1815bf398170e7ed1eac4b818b6c7: Digest did not match, expected sha256:34e50c868cb3516f770bda83749a1de6bca1815bf398170e7ed1eac4b818b6c7, got sha256:e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855", "2019-02-28 17:15:22,523 WARNING: 18814 -- Retrying running container: nova_placement", "2019-02-28 17:15:25,651 WARNING: 18814 -- ['/usr/bin/podman', 'start', '-a', 'docker-puppet-nova_placement'] run failed after unable to find container docker-puppet-nova_placement: no container with name or ID docker-puppet-nova_placement found: no such container", "2019-02-28 17:15:25,652 WARNING: 18814 -- Retrying running container: nova_placement", "2019-02-28 17:15:28,772 WARNING: 18814 -- ['/usr/bin/podman', 'start', '-a', 'docker-puppet-nova_placement'] run failed after unable to find container docker-puppet-nova_placement: no container with name or ID docker-puppet-nova_placement found: no such container", "2019-02-28 17:15:28,773 WARNING: 18814 -- Retrying running container: nova_placement", "2019-02-28 17:15:28,773 ERROR: 18814 -- Failed running container for nova_placement", "2019-02-28 17:15:28,773 INFO: 18814 -- Finished processing puppet configs for nova_placement", "2019-02-28 17:15:28,773 ERROR: 18813 -- ERROR configuring crond", "2019-02-28 17:15:28,774 ERROR: 18813 -- ERROR configuring glance_api", "2019-02-28 17:15:28,774 ERROR: 18813 -- ERROR configuring haproxy", "2019-02-28 17:15:28,774 ERROR: 18813 -- ERROR configuring heat_api", "2019-02-28 17:15:28,774 ERROR: 18813 -- ERROR configuring heat_api_cfn", "2019-02-28 17:15:28,774 ERROR: 18813 -- ERROR configuring heat", "2019-02-28 17:15:28,774 ERROR: 18813 -- ERROR configuring ironic_api", "2019-02-28 17:15:28,774 ERROR: 18813 -- ERROR configuring ironic", "2019-02-28 17:15:28,774 ERROR: 18813 -- ERROR configuring ironic_inspector", "2019-02-28 17:15:28,774 ERROR: 18813 -- ERROR configuring neutron", "2019-02-28 17:15:28,774 ERROR: 18813 -- ERROR configuring iscsid", "2019-02-28 17:15:28,774 ERROR: 18813 -- ERROR configuring keepalived", "2019-02-28 17:15:28,774 ERROR: 18813 -- ERROR configuring keystone", "2019-02-28 17:15:28,774 ERROR: 18813 -- ERROR configuring memcached", "2019-02-28 17:15:28,774 ERROR: 18813 -- ERROR configuring mistral", "2019-02-28 17:15:28,774 ERROR: 18813 -- ERROR configuring mysql", "2019-02-28 17:15:28,774 ERROR: 18813 -- ERROR configuring nova", "2019-02-28 17:15:28,774 ERROR: 18813 -- ERROR configuring nova_metadata", "2019-02-28 17:15:28,775 ERROR: 18813 -- ERROR configuring nova_placement", "2019-02-28 17:15:28,775 ERROR: 18813 -- ERROR configuring rabbitmq", "2019-02-28 17:15:28,775 ERROR: 18813 -- ERROR configuring swift", "2019-02-28 17:15:28,775 ERROR: 18813 -- ERROR configuring swift_ringbuilder", "2019-02-28 17:15:28,775 ERROR: 18813 -- ERROR configuring tripleo-ui", "2019-02-28 17:15:28,775 ERROR: 18813 -- ERROR configuring zaqar" ] } PLAY RECAP *********************************************************************************************************************************************************************************************************************************** undercloud-0 : ok=234 changed=140 unreachable=0 failed=1 Version-Release number of selected component (if applicable): RHOS_TRUNK-15.0-RHEL-8-20190226.n.2 How reproducible: 100% Steps to Reproduce: 1. Install undercloud with container images from brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/rhosp15 tag 20190226.1 Actual results: TASK [Run docker-puppet tasks (generate config) during step 1] fails Expected results: No failure Additional info: Attaching log output
This is affecting all containers uploaded the undercloud(see attached log) so the issue is not isolated only to the openstack-nova-placement-api container image.
There was a digest corruption fix that landed upstream on 14-02-2019, can you confirm you have this change in the puddle you're testing? I will also attempt to replicate the issue with this exact registry.
Can you please supply the ContainerImagePrepare parameter value you're deploying with?
(In reply to Steve Baker from comment #3) > Can you please supply the ContainerImagePrepare parameter value you're > deploying with? Hey Steve! I am using openstack-tripleo-common-10.4.1-0.20190305155644.7897a6e.el8ost.noarch and I see it includes https://review.openstack.org/#/c/636460/ This is the ContainerImagePrepare I'm using: [stack@undercloud-0 ~]$ cat containers-prepare-parameter.yaml # Generated with the following on 2019-03-06T10:53:16.974495 # # openstack tripleo container image prepare default --output-env-file /home/stack/containers-prepare-parameter.yaml --local-push-destination # parameter_defaults: ContainerImagePrepare: - push_destination: true set: ceph_image: daemon ceph_namespace: docker.io/ceph ceph_tag: v3.2.1-stable-3.2-luminous-centos-7-x86_64 name_prefix: openstack- name_suffix: '' namespace: brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/rhosp15 neutron_driver: null openshift_asb_namespace: docker.io/ansibleplaybookbundle openshift_asb_tag: latest openshift_cluster_monitoring_image: cluster-monitoring-operator openshift_cluster_monitoring_namespace: quay.io/coreos openshift_cluster_monitoring_tag: v0.1.1 openshift_cockpit_image: kubernetes openshift_cockpit_namespace: docker.io/cockpit openshift_cockpit_tag: latest openshift_configmap_reload_image: configmap-reload openshift_configmap_reload_namespace: quay.io/coreos openshift_configmap_reload_tag: v0.0.1 openshift_etcd_image: etcd openshift_etcd_namespace: registry.fedoraproject.org/latest openshift_etcd_tag: latest openshift_gluster_block_image: glusterblock-provisioner openshift_gluster_image: gluster-centos openshift_gluster_namespace: docker.io/gluster openshift_gluster_tag: latest openshift_grafana_namespace: docker.io/grafana openshift_grafana_tag: 5.2.1 openshift_heketi_image: heketi openshift_heketi_namespace: docker.io/heketi openshift_heketi_tag: latest openshift_kube_rbac_proxy_image: kube-rbac-proxy openshift_kube_rbac_proxy_namespace: quay.io/coreos openshift_kube_rbac_proxy_tag: v0.3.1 openshift_kube_state_metrics_image: kube-state-metrics openshift_kube_state_metrics_namespace: quay.io/coreos openshift_kube_state_metrics_tag: v1.3.1 openshift_namespace: docker.io/openshift openshift_oauth_proxy_tag: v1.1.0 openshift_prefix: origin openshift_prometheus_alertmanager_tag: v0.15.2 openshift_prometheus_config_reload_image: prometheus-config-reloader openshift_prometheus_config_reload_namespace: quay.io/coreos openshift_prometheus_config_reload_tag: v0.23.2 openshift_prometheus_node_exporter_tag: v0.16.0 openshift_prometheus_operator_image: prometheus-operator openshift_prometheus_operator_namespace: quay.io/coreos openshift_prometheus_operator_tag: v0.23.2 openshift_prometheus_tag: v2.3.2 openshift_tag: v3.11.0 tag: 20190226.1
It seems that some of the blobs have 0 size: [root@undercloud-0 stack]# podman pull 192.168.24.1:8787/rhosp15/openstack-zaqar-wsgi:20190226.1 Trying to pull 192.168.24.1:8787/rhosp15/openstack-zaqar-wsgi:20190226.1...Getting image source signatures Copying blob 657767ef9dcf: 0 B / 67.69 MiB [--------------------------------] 1s Copying blob 34e50c868cb3: 0 B / 1.11 KiB [---------------------------------] 1s Copying blob 455b726c0a1a: 38.80 MiB / 38.80 MiB [==========================] 1s Copying blob 0b1cb91fdaf0: 0 B / 75.19 MiB [--------------------------------] 1s Copying blob f8deaa2b9a83: 18.44 MiB / 18.44 MiB [==========================] 1s Copying blob 5c19bb850bed: 0 B / 1.12 KiB [---------------------------------] 1s Failed error pulling image "192.168.24.1:8787/rhosp15/openstack-zaqar-wsgi:20190226.1": unable to pull 192.168.24.1:8787/rhosp15/openstack-zaqar-wsgi:20190226.1: unable to pull image: Error reading blob sha256:657767ef9dcfef1a4d3bb5fa82669625ce57ce9d4f7b1a9c65d8166bb27a0c4b: Digest did not match, expected sha256:657767ef9dcfef1a4d3bb5fa82669625ce57ce9d4f7b1a9c65d8166bb27a0c4b, got sha256:e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855 [root@undercloud-0 stack]# ls -l /var/lib/image-serve/v2/rhosp15/openstack-zaqar-wsgi/blobs/ total 58624 -rw-r--r--. 30 root root 0 Mar 6 11:08 sha256:0b1cb91fdaf06cca4bb029243adbbe690d0538f205fc3ac8ad7d25e3f145be4e.gz -rw-r--r--. 38 root root 0 Mar 6 11:08 sha256:34e50c868cb3516f770bda83749a1de6bca1815bf398170e7ed1eac4b818b6c7.gz -rw-r--r--. 38 root root 40687279 Mar 6 11:08 sha256:455b726c0a1afaf4778a7c12f5de43ef2b0ab1748d0ec5e71c817713b841a1df.gz -rw-r--r--. 1 root root 0 Mar 6 11:14 sha256:5c19bb850bedc7806b7e0153aefe1fee17ec56ecfe146631df22f32bccb16f77.gz -rw-r--r--. 38 root root 0 Mar 6 11:08 sha256:657767ef9dcfef1a4d3bb5fa82669625ce57ce9d4f7b1a9c65d8166bb27a0c4b.gz -rw-r--r--. 1 root root 4292 Mar 6 11:14 sha256:9973e6e8af15639cbbb5c51921fa5c360cdc3f24bb868e2c6492dedc2f8b8623 -rw-r--r--. 1 root root 19333041 Mar 6 11:14 sha256:f8deaa2b9a8343dbe3ebbb056e1eb74849af1f8f6f3959944aa5d8f47f82c509.gz
Hmm, zero byte layers is pretty suspicious. If I can reproduce with this registry then I can likely make registry->registry transfers more robust. This other change[1] was an attempt to do that, but maybe more is needed. [1] https://review.openstack.org/#/c/638266/
I've done many full prepares followed by pulling every image and have yet to see a digest mismatch or a zero length layer. As a workaround I'd suggest deleting the image-serve content and doing the deploy again: rm -rf /var/lib/image-serve/v2/rhosp15 /var/lib/image-serve/v2/ceph I will propose a change upstream to retry layer transfer if the layer size is zero, or if the digest doesn't match for a registry->registry layer transfer
Removing the blocker flag. /var/lib/image-serve got populated with zero length layers because transfers were failing due to TLS cert validation failures. The workaround is to install the root CA cert. I'll still use this bug to add more validation checks to layer transfers.
The linked gerrit change will prevent zero length layer files being written when there is an exception raised when writing out the file.
Verified on RHOS_TRUNK-15.0-RHEL-8-20190509.n.1 Undercloud with container images from brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/rhosp15 Installed successfully
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2019:2811
Reported the same issue here; https://bugzilla.redhat.com/show_bug.cgi?id=1779517 Seems like hitting this issue again.