Bug 1845736

Summary: overcloud deploy fails at step 2 when file driver + NFS share is used in Gnocchi
Product: Red Hat OpenStack Reporter: Takashi Kajinami <tkajinam>
Component: openstack-tripleo-heat-templatesAssignee: Matthias Runge <mrunge>
Status: CLOSED ERRATA QA Contact: David Rosenfeld <drosenfe>
Severity: high Docs Contact:
Priority: medium    
Version: 16.0 (Train)CC: cjeanner, jbeaudoi, lmadsen, lnatapov, mburns
Target Milestone: z2Keywords: Triaged, ZStream
Target Release: 16.1 (Train on RHEL 8.2)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: openstack-tripleo-heat-templates-11.3.2-1.20200818063410.8f2a74e.el8ost Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-10-28 15:37:36 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Takashi Kajinami 2020-06-09 22:47:58 UTC
Description of problem:

When file storage driver and NFS share is used in Gnocchi, overcloud deploy fails at step 2.

2020-06-01 12:55:41,020 p=512345 u=mistral |  TASK [Wait for containers to start for step 2 using paunch] ********************
2020-06-01 12:55:41,020 p=512345 u=mistral |  task path: /var/lib/mistral/overcloud/common_deploy_steps_tasks.yaml:174
2020-06-01 12:55:41,021 p=512345 u=mistral |  Friday 01 June 2020  11:55:41 +0900 (0:00:01.244)       0:22:28.047 *********** 
...
2020-06-01 12:59:26,028 p=512345 u=mistral |  fatal: [controller-1]: FAILED! => {"ansible_job_id": "26087583737.58046", "attempts": 71, "changed": false, "finished": 1, "msg": "Paunch failed with config_id tripleo_step2", "rc": 126, ...
2020-06-01 12:59:26,204 p=512345 u=mistral |  fatal: [controller-2]: FAILED! => {"ansible_job_id": "957944797099.57990", "attempts": 71, "changed": false, "finished": 1, "msg": "Paunch failed with config_id tripleo_step2", "rc": 126, 
2020-06-01 13:03:24,083 p=512345 u=mistral |  fatal: [controller-0]: FAILED! => {"ansible_job_id": "857975676694.58634", "attempts": 145, "changed": false, "finished": 1, "msg": "Paunch failed with config_id tripleo_step2", "rc": 126 ...
2020-06-01 13:03:24,095 p=512345 u=mistral |  NO MORE HOSTS LEFT *************************************************************
2020-06-01 13:03:24,096 p=512345 u=mistral |  PLAY RECAP *********************************************************************
2020-06-01 13:03:24,096 p=512345 u=mistral |  undercloud                 : ok=19   changed=8    unreachable=0    failed=0    skipped=19   rescued=0    ignored=0   
2020-06-01 13:03:24,096 p=512345 u=mistral |  compute-0                  : ok=215  changed=124  unreachable=0    failed=0    skipped=89   rescued=0    ignored=0   
2020-06-01 13:03:24,097 p=512345 u=mistral |  compute-1                  : ok=211  changed=124  unreachable=0    failed=0    skipped=89   rescued=0    ignored=0   
2020-06-01 13:03:24,097 p=512345 u=mistral |  controller-0               : ok=277  changed=167  unreachable=0    failed=1    skipped=116  rescued=0    ignored=0   
2020-06-01 13:03:24,097 p=512345 u=mistral |  controller-1               : ok=269  changed=167  unreachable=0    failed=1    skipped=116  rescued=0    ignored=0   
2020-06-01 13:03:24,097 p=512345 u=mistral |  controller-2               : ok=269  changed=167  unreachable=0    failed=1    skipped=116  rescued=0    ignored=0   
2020-06-01 13:03:24,097 p=512345 u=mistral |  Friday 01 June 2020  16:03:24 +0900 (0:07:43.076)       0:30:11.123 *********** 
2020-06-01 13:03:24,098 p=512345 u=mistral |  =============================================================================== 

According to the paunch.log in controller nodes, we can identify the error with gnocci_init_lib
because it tried to relabel /var/lib/gnocchi, but failed because nfs shared doesn't support that operation.

paunch.log
~~~
2020-06-01 11:55:44.268 58639 ERROR paunch [  ] Error running ['podman', 'run', '--name', 'gnocchi_init_lib', '--label', 'config_id=tripleo_step2', '--label', 'container_name=gnocchi_init_lib', '--label', 'managed_by=tripleo-Controller', '--label', 'config_data={"command": ["/bin/bash", "-c", "chown -R gnocchi:gnocchi /var/lib/gnocchi"], "image": "192.168.24.1:8787/rhosp-rhel8/openstack-gnocchi-api:16.0-96", "net": "none", "user": "root", "volumes": ["/var/lib/gnocchi:/var/lib/gnocchi:shared,z"]}', '--conmon-pidfile=/var/run/gnocchi_init_lib.pid', '--detach=true', '--log-driver', 'k8s-file', '--log-opt', 'path=/var/log/containers/stdouts/gnocchi_init_lib.log', '--net=none', '--user=root', '--volume=/var/lib/gnocchi:/var/lib/gnocchi:shared,z', '--cpuset-cpus=0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15', '192.168.24.1:8787/rhosp-rhel8/openstack-gnocchi-api:16.0-96', '/bin/bash', '-c', 'chown -R gnocchi:gnocchi /var/lib/gnocchi']. [126]

2020-06-01 11:55:44.269 58639 ERROR paunch [  ] stdout: 
2020-06-01 11:55:44.269 58639 ERROR paunch [  ] stderr: Error: relabel failed "/var/lib/gnocchi": operation not supported
~~~

We have the fix for similar issue in nova[1] and glance[2], and we need the same for gnocchi
to resolve the error.
 [1] https://github.com/openstack/tripleo-heat-templates/commit/b56c521e01d0a4b42f44f2d9d03f524a4dc60475
 [2] https://github.com/openstack/tripleo-heat-templates/commit/aa1f4bf62156fa5e72b8171702acf3db755a67d8

Note that currently tripleo doesn't support file driver + nfs share in gnocchi.
To achieve this deployment, nfs share should be configured additionally in ExtraConfigPre.


Version-Release number of selected component (if applicable):


How reproducible:
Always

Steps to Reproduce:
1. Deploy overcloud with nfs driver + nfs share for gnocchi

Actual results:
overcloud deploy fails at step 2

Expected results:
overcloud deploy completes without any failures 


Additional info:

Comment 12 errata-xmlrpc 2020-10-28 15:37:36 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat OpenStack Platform 16.1 bug fix and enhancement advisory), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2020:4284