Bug 1653445 - docker pull during overcloud deployment fails with "filesystem layer verification failed"
Summary: docker pull during overcloud deployment fails with "filesystem layer verifica...
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: rhosp-director
Version: 13.0 (Queens)
Hardware: x86_64
OS: Linux
medium
medium
Target Milestone: ---
: ---
Assignee: RHOS Maint
QA Contact: Gurenko Alex
URL:
Whiteboard: Triaged, ZStream
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-11-26 21:36 UTC by Thomas Crowe
Modified: 2018-11-27 23:07 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-11-27 23:07:06 UTC
Target Upstream Version:


Attachments (Terms of Use)
overcloud failures --long (525.46 KB, text/plain)
2018-11-26 21:43 UTC, Thomas Crowe
no flags Details
overcloud images template file (6.07 KB, text/plain)
2018-11-26 21:45 UTC, Thomas Crowe
no flags Details

Description Thomas Crowe 2018-11-26 21:36:53 UTC
Description of problem:
When performing an overcloud deployment, the compute nodes fail to download container images reporting a "Filesystem layer verification failed for digest..."  The specific digest is not consistent. When attempting to manually perform a "docker pull"  on the node, the same error is repeated, however it reports a different digest.  Attempting the "docker pull" from the other compute node also results in the same error description, but different digest.

Version-Release number of selected component (if applicable): 13


How reproducible: Very, thus far


Steps to Reproduce:
1. Install director on the undercloud node
2. Download images from registry.access.redaht.com to the director node using openstack commands
3. Attempt to deploy a basic overcloud using only node-info.yaml and overcloud_images.yaml environment files.

Actual results:

deployment fails with ansible error when attempting to download container images from undercloud node.

Expected results:

Successful deployment

Additional info:

Output from 'overcloud stack failures list overcloud'

overcloud.AllNodesDeploySteps.ComputeDeployment_Step1.1:
  resource_type: OS::Heat::StructuredDeployment
  physical_resource_id: 55899e41-a652-4d4b-9330-d4e391a85648
  status: CREATE_FAILED
  status_reason: |
    Error: resources[1]: Deployment to server failed: deploy_status_code : Deployment exited with non-zero status code: 2
  deploy_stdout: |
    ...
            "2018-11-26 19:50:22,717 ERROR: 18830 -- ERROR configuring crond",                                                                                                                                                     [28/1968]
            "2018-11-26 19:50:22,718 ERROR: 18830 -- ERROR configuring neutron", 
            "2018-11-26 19:50:22,718 ERROR: 18830 -- ERROR configuring iscsid"
        ]
    }
        to retry, use: --limit @/var/lib/heat-config/heat-config-ansible/71a99174-f23e-4b41-a9ed-aa34174e33e8_playbook.retry
    
    PLAY RECAP *********************************************************************
    localhost                  : ok=24   changed=12   unreachable=0    failed=1   
    
    (truncated, view all with --long)
  deploy_stderr: |

overcloud.AllNodesDeploySteps.ComputeDeployment_Step1.0:
  resource_type: OS::Heat::StructuredDeployment
  physical_resource_id: d6ac863b-4f95-45fd-a67e-e51741330299
  status: CREATE_FAILED
  status_reason: |
    Error: resources[0]: Deployment to server failed: deploy_status_code : Deployment exited with non-zero status code: 2
  deploy_stdout: |
    ...
            "2018-11-26 19:50:10,675 ERROR: 18780 -- ERROR configuring crond", 
            "2018-11-26 19:50:10,675 ERROR: 18780 -- ERROR configuring neutron", 
            "2018-11-26 19:50:10,676 ERROR: 18780 -- ERROR configuring iscsid"
        ]
    }
        to retry, use: --limit @/var/lib/heat-config/heat-config-ansible/dc0c2f10-5309-4139-9fea-edf0815d9e4f_playbook.retry
    

    PLAY RECAP *********************************************************************
    localhost                  : ok=24   changed=12   unreachable=0    failed=1   
    
    (truncated, view all with --long)
  deploy_stderr: |

overcloud.AllNodesDeploySteps.ControllerDeployment_Step1.0:
  resource_type: OS::Heat::StructuredDeployment
  physical_resource_id: b0a43f9c-89e7-4c84-aced-e50587a75277
  status: CREATE_FAILED
  status_reason: |
    Error: resources[0]: Deployment to server failed: deploy_status_code : Deployment exited with non-zero status code: 2
  deploy_stdout: |
    ...
            "2018-11-26 20:18:38,280 ERROR: 24672 -- ERROR configuring heat", 
            "2018-11-26 20:18:38,299 ERROR: 24672 -- ERROR configuring cinder", 
            "2018-11-26 20:18:38,300 ERROR: 24672 -- ERROR configuring heat_api_cfn"
        ]
    }
        to retry, use: --limit @/var/lib/heat-config/heat-config-ansible/f3aab16d-d227-4fef-b462-adbc3215f79c_playbook.retry
    
    PLAY RECAP *********************************************************************
    localhost                  : ok=25   changed=13   unreachable=0    failed=1   
    
    (truncated, view all with --long)
  deploy_stderr: |
<<<<<<<<<<< End >>>>>>>>>>>>>>>>>>>

output from "docker pull" on one of the compute nodes
[root@overcloud-compute-1 ~]# docker pull 172.16.0.1:8787/rhosp13/openstack-nova-compute:13.0-70
Trying to pull repository 172.16.0.1:8787/rhosp13/openstack-nova-compute ... 
13.0-70: Pulling from 172.16.0.1:8787/rhosp13/openstack-nova-compute
9a1bea865f79: Verifying Checksum 
602125c154e3: Download complete 
b7519b93f3ea: Verifying Checksum 
5e9937699472: Download complete 
4bf83a9d745a: Download complete 
e72d3835b70d: Download complete 
filesystem layer verification failed for digest sha256:b7519b93f3eaa062e38e1b5d2f2da0f1331e3d4619d27a75faa65bf2fff2374a
<<<<<<<<<<<<<<<<<< End >>>>>>>>>>>>>>>>>

output from a docker pull on the other compute node
Trying to pull repository 172.16.0.1:8787/rhosp13/openstack-nova-compute ... 
13.0-70: Pulling from 172.16.0.1:8787/rhosp13/openstack-nova-compute
9a1bea865f79: Verifying Checksum 
602125c154e3: Download complete 
b7519b93f3ea: Verifying Checksum 
5e9937699472: Download complete 
4bf83a9d745a: Verifying Checksum 
e72d3835b70d: Verifying Checksum 
filesystem layer verification failed for digest sha256:e72d3835b70d29047e06f48f5d4e63835da374999fa3f13176bd861ee0817499
<<<<<<<<<<<<<<<< End >>>>>>>>>>>>>

I will attach the output from an 'openstack stack failures list overcloud --long' as well as the overcloud_images.yaml and other relevant files.

Comment 1 Thomas Crowe 2018-11-26 21:43:33 UTC
Created attachment 1508677 [details]
overcloud failures --long

Comment 2 Thomas Crowe 2018-11-26 21:45:05 UTC
Created attachment 1508678 [details]
overcloud images template file

Comment 3 Thomas Crowe 2018-11-26 22:19:06 UTC
Doing a docker pull using the public Red Hat registry results in the same error:

[root@overcloud-compute-1 ~]# docker pull registry.access.redhat.com/rhosp13/openstack-nova-compute:13.0-70
Trying to pull repository registry.access.redhat.com/rhosp13/openstack-nova-compute ... 
13.0-70: Pulling from registry.access.redhat.com/rhosp13/openstack-nova-compute
9a1bea865f79: Extracting [==========================>                        ] 40.11 MB/75.72 MB
602125c154e3: Download complete 
b7519b93f3ea: Verifying Checksum 
5e9937699472: Download complete 
4bf83a9d745a: Verifying Checksum 
e72d3835b70d: Verifying Checksum 
filesystem layer verification failed for digest sha256:e72d3835b70d29047e06f48f5d4e63835da374999fa3f13176bd861ee0817499

Comment 4 Thomas Crowe 2018-11-26 22:41:14 UTC
doing a docker pull from docker.io seems to work....

[root@overcloud-compute-1 log]# docker pull centos:7
Trying to pull repository registry.access.redhat.com/centos ... 
Trying to pull repository docker.io/library/centos ... 
7: Pulling from docker.io/library/centos
aeb7866da422: Pull complete 
Digest: sha256:67dad89757a55bfdfabec8abd0e22f8c7c12a1856514726470228063ed86593b
Status: Downloaded newer image for docker.io/centos:7
[root@overcloud-compute-1 log]# 
[root@overcloud-compute-1 log]# docker image ls
REPOSITORY          TAG                 IMAGE ID            CREATED             SIZE
docker.io/centos    7                   75835a67d134        6 weeks ago         200 MB
[root@overcloud-compute-1 log]#

Comment 9 Thomas Crowe 2018-11-27 23:07:06 UTC
Thanks to another architect on the team I work on we tracked this down.  The bug is actually a Ravello bug.  There seems to be an issue in Ravello when a host is assigned 16GB of RAM, it is not enumerated correctly to the operating system, and in this case at least, lead to corruption in the filesystem layers during download.

Once we changed the compute nodes to have 12GB of RAM the docker pull worked as expected.

I will close this bugzilla out as "NOTABUG" since there does not seem to be any problem outside of Ravello.


Note You need to log in before you can comment on or make changes to this bug.