Bug 1845736 - overcloud deploy fails at step 2 when file driver + NFS share is used in Gnocchi
Summary: overcloud deploy fails at step 2 when file driver + NFS share is used in Gnocchi
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo-heat-templates
Version: 16.0 (Train)
Hardware: Unspecified
OS: Unspecified
medium
high
Target Milestone: z2
: 16.1 (Train on RHEL 8.2)
Assignee: Matthias Runge
QA Contact: David Rosenfeld
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-06-09 22:47 UTC by Takashi Kajinami
Modified: 2023-12-15 18:07 UTC (History)
5 users (show)

Fixed In Version: openstack-tripleo-heat-templates-11.3.2-1.20200818063410.8f2a74e.el8ost
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-10-28 15:37:36 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
OpenStack gerrit 734009 0 None MERGED Add support for Gnocchi NFS Backend 2020-11-23 11:26:47 UTC
Red Hat Issue Tracker OSP-29465 0 None None None 2023-10-06 20:32:12 UTC
Red Hat Product Errata RHEA-2020:4284 0 None None None 2020-10-28 15:37:55 UTC

Description Takashi Kajinami 2020-06-09 22:47:58 UTC
Description of problem:

When file storage driver and NFS share is used in Gnocchi, overcloud deploy fails at step 2.

2020-06-01 12:55:41,020 p=512345 u=mistral |  TASK [Wait for containers to start for step 2 using paunch] ********************
2020-06-01 12:55:41,020 p=512345 u=mistral |  task path: /var/lib/mistral/overcloud/common_deploy_steps_tasks.yaml:174
2020-06-01 12:55:41,021 p=512345 u=mistral |  Friday 01 June 2020  11:55:41 +0900 (0:00:01.244)       0:22:28.047 *********** 
...
2020-06-01 12:59:26,028 p=512345 u=mistral |  fatal: [controller-1]: FAILED! => {"ansible_job_id": "26087583737.58046", "attempts": 71, "changed": false, "finished": 1, "msg": "Paunch failed with config_id tripleo_step2", "rc": 126, ...
2020-06-01 12:59:26,204 p=512345 u=mistral |  fatal: [controller-2]: FAILED! => {"ansible_job_id": "957944797099.57990", "attempts": 71, "changed": false, "finished": 1, "msg": "Paunch failed with config_id tripleo_step2", "rc": 126, 
2020-06-01 13:03:24,083 p=512345 u=mistral |  fatal: [controller-0]: FAILED! => {"ansible_job_id": "857975676694.58634", "attempts": 145, "changed": false, "finished": 1, "msg": "Paunch failed with config_id tripleo_step2", "rc": 126 ...
2020-06-01 13:03:24,095 p=512345 u=mistral |  NO MORE HOSTS LEFT *************************************************************
2020-06-01 13:03:24,096 p=512345 u=mistral |  PLAY RECAP *********************************************************************
2020-06-01 13:03:24,096 p=512345 u=mistral |  undercloud                 : ok=19   changed=8    unreachable=0    failed=0    skipped=19   rescued=0    ignored=0   
2020-06-01 13:03:24,096 p=512345 u=mistral |  compute-0                  : ok=215  changed=124  unreachable=0    failed=0    skipped=89   rescued=0    ignored=0   
2020-06-01 13:03:24,097 p=512345 u=mistral |  compute-1                  : ok=211  changed=124  unreachable=0    failed=0    skipped=89   rescued=0    ignored=0   
2020-06-01 13:03:24,097 p=512345 u=mistral |  controller-0               : ok=277  changed=167  unreachable=0    failed=1    skipped=116  rescued=0    ignored=0   
2020-06-01 13:03:24,097 p=512345 u=mistral |  controller-1               : ok=269  changed=167  unreachable=0    failed=1    skipped=116  rescued=0    ignored=0   
2020-06-01 13:03:24,097 p=512345 u=mistral |  controller-2               : ok=269  changed=167  unreachable=0    failed=1    skipped=116  rescued=0    ignored=0   
2020-06-01 13:03:24,097 p=512345 u=mistral |  Friday 01 June 2020  16:03:24 +0900 (0:07:43.076)       0:30:11.123 *********** 
2020-06-01 13:03:24,098 p=512345 u=mistral |  =============================================================================== 

According to the paunch.log in controller nodes, we can identify the error with gnocci_init_lib
because it tried to relabel /var/lib/gnocchi, but failed because nfs shared doesn't support that operation.

paunch.log
~~~
2020-06-01 11:55:44.268 58639 ERROR paunch [  ] Error running ['podman', 'run', '--name', 'gnocchi_init_lib', '--label', 'config_id=tripleo_step2', '--label', 'container_name=gnocchi_init_lib', '--label', 'managed_by=tripleo-Controller', '--label', 'config_data={"command": ["/bin/bash", "-c", "chown -R gnocchi:gnocchi /var/lib/gnocchi"], "image": "192.168.24.1:8787/rhosp-rhel8/openstack-gnocchi-api:16.0-96", "net": "none", "user": "root", "volumes": ["/var/lib/gnocchi:/var/lib/gnocchi:shared,z"]}', '--conmon-pidfile=/var/run/gnocchi_init_lib.pid', '--detach=true', '--log-driver', 'k8s-file', '--log-opt', 'path=/var/log/containers/stdouts/gnocchi_init_lib.log', '--net=none', '--user=root', '--volume=/var/lib/gnocchi:/var/lib/gnocchi:shared,z', '--cpuset-cpus=0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15', '192.168.24.1:8787/rhosp-rhel8/openstack-gnocchi-api:16.0-96', '/bin/bash', '-c', 'chown -R gnocchi:gnocchi /var/lib/gnocchi']. [126]

2020-06-01 11:55:44.269 58639 ERROR paunch [  ] stdout: 
2020-06-01 11:55:44.269 58639 ERROR paunch [  ] stderr: Error: relabel failed "/var/lib/gnocchi": operation not supported
~~~

We have the fix for similar issue in nova[1] and glance[2], and we need the same for gnocchi
to resolve the error.
 [1] https://github.com/openstack/tripleo-heat-templates/commit/b56c521e01d0a4b42f44f2d9d03f524a4dc60475
 [2] https://github.com/openstack/tripleo-heat-templates/commit/aa1f4bf62156fa5e72b8171702acf3db755a67d8

Note that currently tripleo doesn't support file driver + nfs share in gnocchi.
To achieve this deployment, nfs share should be configured additionally in ExtraConfigPre.


Version-Release number of selected component (if applicable):


How reproducible:
Always

Steps to Reproduce:
1. Deploy overcloud with nfs driver + nfs share for gnocchi

Actual results:
overcloud deploy fails at step 2

Expected results:
overcloud deploy completes without any failures 


Additional info:

Comment 12 errata-xmlrpc 2020-10-28 15:37:36 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat OpenStack Platform 16.1 bug fix and enhancement advisory), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2020:4284


Note You need to log in before you can comment on or make changes to this bug.