Description of problem: When performing an overcloud deployment, the compute nodes fail to download container images reporting a "Filesystem layer verification failed for digest..." The specific digest is not consistent. When attempting to manually perform a "docker pull" on the node, the same error is repeated, however it reports a different digest. Attempting the "docker pull" from the other compute node also results in the same error description, but different digest. Version-Release number of selected component (if applicable): 13 How reproducible: Very, thus far Steps to Reproduce: 1. Install director on the undercloud node 2. Download images from registry.access.redaht.com to the director node using openstack commands 3. Attempt to deploy a basic overcloud using only node-info.yaml and overcloud_images.yaml environment files. Actual results: deployment fails with ansible error when attempting to download container images from undercloud node. Expected results: Successful deployment Additional info: Output from 'overcloud stack failures list overcloud' overcloud.AllNodesDeploySteps.ComputeDeployment_Step1.1: resource_type: OS::Heat::StructuredDeployment physical_resource_id: 55899e41-a652-4d4b-9330-d4e391a85648 status: CREATE_FAILED status_reason: | Error: resources[1]: Deployment to server failed: deploy_status_code : Deployment exited with non-zero status code: 2 deploy_stdout: | ... "2018-11-26 19:50:22,717 ERROR: 18830 -- ERROR configuring crond", [28/1968] "2018-11-26 19:50:22,718 ERROR: 18830 -- ERROR configuring neutron", "2018-11-26 19:50:22,718 ERROR: 18830 -- ERROR configuring iscsid" ] } to retry, use: --limit @/var/lib/heat-config/heat-config-ansible/71a99174-f23e-4b41-a9ed-aa34174e33e8_playbook.retry PLAY RECAP ********************************************************************* localhost : ok=24 changed=12 unreachable=0 failed=1 (truncated, view all with --long) deploy_stderr: | overcloud.AllNodesDeploySteps.ComputeDeployment_Step1.0: resource_type: OS::Heat::StructuredDeployment physical_resource_id: d6ac863b-4f95-45fd-a67e-e51741330299 status: CREATE_FAILED status_reason: | Error: resources[0]: Deployment to server failed: deploy_status_code : Deployment exited with non-zero status code: 2 deploy_stdout: | ... "2018-11-26 19:50:10,675 ERROR: 18780 -- ERROR configuring crond", "2018-11-26 19:50:10,675 ERROR: 18780 -- ERROR configuring neutron", "2018-11-26 19:50:10,676 ERROR: 18780 -- ERROR configuring iscsid" ] } to retry, use: --limit @/var/lib/heat-config/heat-config-ansible/dc0c2f10-5309-4139-9fea-edf0815d9e4f_playbook.retry PLAY RECAP ********************************************************************* localhost : ok=24 changed=12 unreachable=0 failed=1 (truncated, view all with --long) deploy_stderr: | overcloud.AllNodesDeploySteps.ControllerDeployment_Step1.0: resource_type: OS::Heat::StructuredDeployment physical_resource_id: b0a43f9c-89e7-4c84-aced-e50587a75277 status: CREATE_FAILED status_reason: | Error: resources[0]: Deployment to server failed: deploy_status_code : Deployment exited with non-zero status code: 2 deploy_stdout: | ... "2018-11-26 20:18:38,280 ERROR: 24672 -- ERROR configuring heat", "2018-11-26 20:18:38,299 ERROR: 24672 -- ERROR configuring cinder", "2018-11-26 20:18:38,300 ERROR: 24672 -- ERROR configuring heat_api_cfn" ] } to retry, use: --limit @/var/lib/heat-config/heat-config-ansible/f3aab16d-d227-4fef-b462-adbc3215f79c_playbook.retry PLAY RECAP ********************************************************************* localhost : ok=25 changed=13 unreachable=0 failed=1 (truncated, view all with --long) deploy_stderr: | <<<<<<<<<<< End >>>>>>>>>>>>>>>>>>> output from "docker pull" on one of the compute nodes [root@overcloud-compute-1 ~]# docker pull 172.16.0.1:8787/rhosp13/openstack-nova-compute:13.0-70 Trying to pull repository 172.16.0.1:8787/rhosp13/openstack-nova-compute ... 13.0-70: Pulling from 172.16.0.1:8787/rhosp13/openstack-nova-compute 9a1bea865f79: Verifying Checksum 602125c154e3: Download complete b7519b93f3ea: Verifying Checksum 5e9937699472: Download complete 4bf83a9d745a: Download complete e72d3835b70d: Download complete filesystem layer verification failed for digest sha256:b7519b93f3eaa062e38e1b5d2f2da0f1331e3d4619d27a75faa65bf2fff2374a <<<<<<<<<<<<<<<<<< End >>>>>>>>>>>>>>>>> output from a docker pull on the other compute node Trying to pull repository 172.16.0.1:8787/rhosp13/openstack-nova-compute ... 13.0-70: Pulling from 172.16.0.1:8787/rhosp13/openstack-nova-compute 9a1bea865f79: Verifying Checksum 602125c154e3: Download complete b7519b93f3ea: Verifying Checksum 5e9937699472: Download complete 4bf83a9d745a: Verifying Checksum e72d3835b70d: Verifying Checksum filesystem layer verification failed for digest sha256:e72d3835b70d29047e06f48f5d4e63835da374999fa3f13176bd861ee0817499 <<<<<<<<<<<<<<<< End >>>>>>>>>>>>> I will attach the output from an 'openstack stack failures list overcloud --long' as well as the overcloud_images.yaml and other relevant files.
Created attachment 1508677 [details] overcloud failures --long
Created attachment 1508678 [details] overcloud images template file
Doing a docker pull using the public Red Hat registry results in the same error: [root@overcloud-compute-1 ~]# docker pull registry.access.redhat.com/rhosp13/openstack-nova-compute:13.0-70 Trying to pull repository registry.access.redhat.com/rhosp13/openstack-nova-compute ... 13.0-70: Pulling from registry.access.redhat.com/rhosp13/openstack-nova-compute 9a1bea865f79: Extracting [==========================> ] 40.11 MB/75.72 MB 602125c154e3: Download complete b7519b93f3ea: Verifying Checksum 5e9937699472: Download complete 4bf83a9d745a: Verifying Checksum e72d3835b70d: Verifying Checksum filesystem layer verification failed for digest sha256:e72d3835b70d29047e06f48f5d4e63835da374999fa3f13176bd861ee0817499
doing a docker pull from docker.io seems to work.... [root@overcloud-compute-1 log]# docker pull centos:7 Trying to pull repository registry.access.redhat.com/centos ... Trying to pull repository docker.io/library/centos ... 7: Pulling from docker.io/library/centos aeb7866da422: Pull complete Digest: sha256:67dad89757a55bfdfabec8abd0e22f8c7c12a1856514726470228063ed86593b Status: Downloaded newer image for docker.io/centos:7 [root@overcloud-compute-1 log]# [root@overcloud-compute-1 log]# docker image ls REPOSITORY TAG IMAGE ID CREATED SIZE docker.io/centos 7 75835a67d134 6 weeks ago 200 MB [root@overcloud-compute-1 log]#
Thanks to another architect on the team I work on we tracked this down. The bug is actually a Ravello bug. There seems to be an issue in Ravello when a host is assigned 16GB of RAM, it is not enumerated correctly to the operating system, and in this case at least, lead to corruption in the filesystem layers during download. Once we changed the compute nodes to have 12GB of RAM the docker pull worked as expected. I will close this bugzilla out as "NOTABUG" since there does not seem to be any problem outside of Ravello.