Bug 1684301 - During undercloud installation podman pull fails with Digest did not match errors when uploading images to the undercloud registry
Summary: During undercloud installation podman pull fails with Digest did not match er...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo-common
Version: 15.0 (Stein)
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: Upstream M3
: 15.0 (Stein)
Assignee: Steve Baker
QA Contact: Victor Voronkov
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-02-28 22:34 UTC by Marius Cornea
Modified: 2019-12-05 12:18 UTC (History)
13 users (show)

Fixed In Version: openstack-tripleo-common-10.6.1-0.20190404000356.3398bec.el8ost
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-09-21 11:20:27 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
undercloud_install.log (453.29 KB, text/plain)
2019-02-28 22:34 UTC, Marius Cornea
no flags Details


Links
System ID Private Priority Status Summary Last Updated
OpenStack gerrit 636460 0 'None' MERGED Handle uncompressed layers on image export 2021-02-18 12:44:31 UTC
OpenStack gerrit 641856 0 'None' MERGED Improve handling of layer transfer errors 2021-02-18 12:44:31 UTC
Red Hat Product Errata RHEA-2019:2811 0 None None None 2019-09-21 11:20:55 UTC

Description Marius Cornea 2019-02-28 22:34:52 UTC
Created attachment 1539670 [details]
undercloud_install.log

Description of problem:

During undercloud installation podman pull fails with Digest did not match errors:

        "unable to pull 192.168.24.1:8787/rhosp15/openstack-nova-placement-api:20190226.1: unable to pull image: Error reading blob sha256:34e50c868cb3516f770bda83749a1de6bca1815bf398170e7ed1eac4b818b6c7: Digest did not match, expected sha256:34e50c868cb3516f770bda83749a1de6bca1815bf398170e7ed1eac4b818b6c7, got sha256:e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855",
        "2019-02-28 17:15:22,523 WARNING: 18814 -- Retrying running container: nova_placement",
        "2019-02-28 17:15:25,651 WARNING: 18814 -- ['/usr/bin/podman', 'start', '-a', 'docker-puppet-nova_placement'] run failed after unable to find container docker-puppet-nova_placement: no container with name or ID docker-puppet-nova_placement found: no such container",
        "2019-02-28 17:15:25,652 WARNING: 18814 -- Retrying running container: nova_placement",
        "2019-02-28 17:15:28,772 WARNING: 18814 -- ['/usr/bin/podman', 'start', '-a', 'docker-puppet-nova_placement'] run failed after unable to find container docker-puppet-nova_placement: no container with name or ID docker-puppet-nova_placement found: no such container",
        "2019-02-28 17:15:28,773 WARNING: 18814 -- Retrying running container: nova_placement",
        "2019-02-28 17:15:28,773 ERROR: 18814 -- Failed running container for nova_placement",
        "2019-02-28 17:15:28,773 INFO: 18814 -- Finished processing puppet configs for nova_placement",
        "2019-02-28 17:15:28,773 ERROR: 18813 -- ERROR configuring crond",
        "2019-02-28 17:15:28,774 ERROR: 18813 -- ERROR configuring glance_api",
        "2019-02-28 17:15:28,774 ERROR: 18813 -- ERROR configuring haproxy",
        "2019-02-28 17:15:28,774 ERROR: 18813 -- ERROR configuring heat_api",
        "2019-02-28 17:15:28,774 ERROR: 18813 -- ERROR configuring heat_api_cfn",
        "2019-02-28 17:15:28,774 ERROR: 18813 -- ERROR configuring heat",
        "2019-02-28 17:15:28,774 ERROR: 18813 -- ERROR configuring ironic_api",
        "2019-02-28 17:15:28,774 ERROR: 18813 -- ERROR configuring ironic",
        "2019-02-28 17:15:28,774 ERROR: 18813 -- ERROR configuring ironic_inspector",
        "2019-02-28 17:15:28,774 ERROR: 18813 -- ERROR configuring neutron",
        "2019-02-28 17:15:28,774 ERROR: 18813 -- ERROR configuring iscsid",
        "2019-02-28 17:15:28,774 ERROR: 18813 -- ERROR configuring keepalived",
        "2019-02-28 17:15:28,774 ERROR: 18813 -- ERROR configuring keystone",
        "2019-02-28 17:15:28,774 ERROR: 18813 -- ERROR configuring memcached",
        "2019-02-28 17:15:28,774 ERROR: 18813 -- ERROR configuring mistral",
        "2019-02-28 17:15:28,774 ERROR: 18813 -- ERROR configuring mysql",
        "2019-02-28 17:15:28,774 ERROR: 18813 -- ERROR configuring nova",
        "2019-02-28 17:15:28,774 ERROR: 18813 -- ERROR configuring nova_metadata",
        "2019-02-28 17:15:28,775 ERROR: 18813 -- ERROR configuring nova_placement",
        "2019-02-28 17:15:28,775 ERROR: 18813 -- ERROR configuring rabbitmq",
        "2019-02-28 17:15:28,775 ERROR: 18813 -- ERROR configuring swift",
        "2019-02-28 17:15:28,775 ERROR: 18813 -- ERROR configuring swift_ringbuilder",
        "2019-02-28 17:15:28,775 ERROR: 18813 -- ERROR configuring tripleo-ui",
        "2019-02-28 17:15:28,775 ERROR: 18813 -- ERROR configuring zaqar"
    ]
}

PLAY RECAP ***********************************************************************************************************************************************************************************************************************************
undercloud-0               : ok=234  changed=140  unreachable=0    failed=1


Version-Release number of selected component (if applicable):
RHOS_TRUNK-15.0-RHEL-8-20190226.n.2

How reproducible:
100%

Steps to Reproduce:
1. Install undercloud with container images from brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/rhosp15 tag 20190226.1

Actual results:
TASK [Run docker-puppet tasks (generate config) during step 1] fails

Expected results:
No failure

Additional info:
Attaching log output

Comment 1 Marius Cornea 2019-03-04 22:43:59 UTC
This is affecting all containers uploaded the undercloud(see attached log) so the issue is not isolated only to the openstack-nova-placement-api container image.

Comment 2 Steve Baker 2019-03-06 02:46:36 UTC
There was a digest corruption fix that landed upstream on 14-02-2019, can you confirm you have this change in the puddle you're testing?

I will also attempt to replicate the issue with this exact registry.

Comment 3 Steve Baker 2019-03-06 03:07:23 UTC
Can you please supply the ContainerImagePrepare parameter value you're deploying with?

Comment 4 Marius Cornea 2019-03-06 16:22:44 UTC
(In reply to Steve Baker from comment #3)
> Can you please supply the ContainerImagePrepare parameter value you're
> deploying with?

Hey Steve! I am using openstack-tripleo-common-10.4.1-0.20190305155644.7897a6e.el8ost.noarch and I see it includes https://review.openstack.org/#/c/636460/

This is the ContainerImagePrepare I'm using:


[stack@undercloud-0 ~]$ cat containers-prepare-parameter.yaml 
# Generated with the following on 2019-03-06T10:53:16.974495
#
#   openstack tripleo container image prepare default --output-env-file /home/stack/containers-prepare-parameter.yaml --local-push-destination
#

parameter_defaults:
  ContainerImagePrepare:
  - push_destination: true
    set:
      ceph_image: daemon
      ceph_namespace: docker.io/ceph
      ceph_tag: v3.2.1-stable-3.2-luminous-centos-7-x86_64
      name_prefix: openstack-
      name_suffix: ''
      namespace: brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/rhosp15
      neutron_driver: null
      openshift_asb_namespace: docker.io/ansibleplaybookbundle
      openshift_asb_tag: latest
      openshift_cluster_monitoring_image: cluster-monitoring-operator
      openshift_cluster_monitoring_namespace: quay.io/coreos
      openshift_cluster_monitoring_tag: v0.1.1
      openshift_cockpit_image: kubernetes
      openshift_cockpit_namespace: docker.io/cockpit
      openshift_cockpit_tag: latest
      openshift_configmap_reload_image: configmap-reload
      openshift_configmap_reload_namespace: quay.io/coreos
      openshift_configmap_reload_tag: v0.0.1
      openshift_etcd_image: etcd
      openshift_etcd_namespace: registry.fedoraproject.org/latest
      openshift_etcd_tag: latest
      openshift_gluster_block_image: glusterblock-provisioner
      openshift_gluster_image: gluster-centos
      openshift_gluster_namespace: docker.io/gluster
      openshift_gluster_tag: latest
      openshift_grafana_namespace: docker.io/grafana
      openshift_grafana_tag: 5.2.1
      openshift_heketi_image: heketi
      openshift_heketi_namespace: docker.io/heketi
      openshift_heketi_tag: latest
      openshift_kube_rbac_proxy_image: kube-rbac-proxy
      openshift_kube_rbac_proxy_namespace: quay.io/coreos
      openshift_kube_rbac_proxy_tag: v0.3.1
      openshift_kube_state_metrics_image: kube-state-metrics
      openshift_kube_state_metrics_namespace: quay.io/coreos
      openshift_kube_state_metrics_tag: v1.3.1
      openshift_namespace: docker.io/openshift
      openshift_oauth_proxy_tag: v1.1.0
      openshift_prefix: origin
      openshift_prometheus_alertmanager_tag: v0.15.2
      openshift_prometheus_config_reload_image: prometheus-config-reloader
      openshift_prometheus_config_reload_namespace: quay.io/coreos
      openshift_prometheus_config_reload_tag: v0.23.2
      openshift_prometheus_node_exporter_tag: v0.16.0
      openshift_prometheus_operator_image: prometheus-operator
      openshift_prometheus_operator_namespace: quay.io/coreos
      openshift_prometheus_operator_tag: v0.23.2
      openshift_prometheus_tag: v2.3.2
      openshift_tag: v3.11.0
      tag: 20190226.1

Comment 5 Marius Cornea 2019-03-06 16:27:18 UTC
It seems that some of the blobs have 0 size:

[root@undercloud-0 stack]# podman pull 192.168.24.1:8787/rhosp15/openstack-zaqar-wsgi:20190226.1
Trying to pull 192.168.24.1:8787/rhosp15/openstack-zaqar-wsgi:20190226.1...Getting image source signatures
Copying blob 657767ef9dcf: 0 B / 67.69 MiB [--------------------------------] 1s
Copying blob 34e50c868cb3: 0 B / 1.11 KiB [---------------------------------] 1s
Copying blob 455b726c0a1a: 38.80 MiB / 38.80 MiB [==========================] 1s
Copying blob 0b1cb91fdaf0: 0 B / 75.19 MiB [--------------------------------] 1s
Copying blob f8deaa2b9a83: 18.44 MiB / 18.44 MiB [==========================] 1s
Copying blob 5c19bb850bed: 0 B / 1.12 KiB [---------------------------------] 1s
Failed
error pulling image "192.168.24.1:8787/rhosp15/openstack-zaqar-wsgi:20190226.1": unable to pull 192.168.24.1:8787/rhosp15/openstack-zaqar-wsgi:20190226.1: unable to pull image: Error reading blob sha256:657767ef9dcfef1a4d3bb5fa82669625ce57ce9d4f7b1a9c65d8166bb27a0c4b: Digest did not match, expected sha256:657767ef9dcfef1a4d3bb5fa82669625ce57ce9d4f7b1a9c65d8166bb27a0c4b, got sha256:e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855

[root@undercloud-0 stack]# ls -l /var/lib/image-serve/v2/rhosp15/openstack-zaqar-wsgi/blobs/
total 58624
-rw-r--r--. 30 root root        0 Mar  6 11:08 sha256:0b1cb91fdaf06cca4bb029243adbbe690d0538f205fc3ac8ad7d25e3f145be4e.gz
-rw-r--r--. 38 root root        0 Mar  6 11:08 sha256:34e50c868cb3516f770bda83749a1de6bca1815bf398170e7ed1eac4b818b6c7.gz
-rw-r--r--. 38 root root 40687279 Mar  6 11:08 sha256:455b726c0a1afaf4778a7c12f5de43ef2b0ab1748d0ec5e71c817713b841a1df.gz
-rw-r--r--.  1 root root        0 Mar  6 11:14 sha256:5c19bb850bedc7806b7e0153aefe1fee17ec56ecfe146631df22f32bccb16f77.gz
-rw-r--r--. 38 root root        0 Mar  6 11:08 sha256:657767ef9dcfef1a4d3bb5fa82669625ce57ce9d4f7b1a9c65d8166bb27a0c4b.gz
-rw-r--r--.  1 root root     4292 Mar  6 11:14 sha256:9973e6e8af15639cbbb5c51921fa5c360cdc3f24bb868e2c6492dedc2f8b8623
-rw-r--r--.  1 root root 19333041 Mar  6 11:14 sha256:f8deaa2b9a8343dbe3ebbb056e1eb74849af1f8f6f3959944aa5d8f47f82c509.gz

Comment 6 Steve Baker 2019-03-06 20:07:33 UTC
Hmm, zero byte layers is pretty suspicious. If I can reproduce with this registry then I can likely make registry->registry transfers more robust. This other change[1] was an attempt to do that, but maybe more is needed.

[1] https://review.openstack.org/#/c/638266/

Comment 7 Steve Baker 2019-03-06 22:39:03 UTC
I've done many full prepares followed by pulling every image and have yet to see a digest mismatch or a zero length layer.

As a workaround I'd suggest deleting the image-serve content and doing the deploy again:

  rm -rf /var/lib/image-serve/v2/rhosp15 /var/lib/image-serve/v2/ceph

I will propose a change upstream to retry layer transfer if the layer size is zero, or if the digest doesn't match for a registry->registry layer transfer

Comment 9 Steve Baker 2019-03-07 20:30:59 UTC
Removing the blocker flag. /var/lib/image-serve got populated with zero length layers because transfers were failing due to TLS cert validation failures. The workaround is to install the root CA cert.

I'll still use this bug to add more validation checks to layer transfers.

Comment 10 Steve Baker 2019-03-07 23:09:13 UTC
The linked gerrit change will prevent zero length layer files being written when there is an exception raised when writing out the file.

Comment 14 Victor Voronkov 2019-05-15 14:18:49 UTC
Verified on RHOS_TRUNK-15.0-RHEL-8-20190509.n.1
Undercloud with container images from brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/rhosp15
Installed successfully

Comment 17 errata-xmlrpc 2019-09-21 11:20:27 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2019:2811

Comment 18 Salman Khan 2019-12-05 12:18:45 UTC
Reported the same issue here;

https://bugzilla.redhat.com/show_bug.cgi?id=1779517

Seems like hitting this issue again.


Note You need to log in before you can comment on or make changes to this bug.