1830148 – tripleo.storage.v1.ceph-install workflow showing as running when ceph-install-workflow.log shows as complete

Bug 1830148 - tripleo.storage.v1.ceph-install workflow showing as running when ceph-install-workflow.log shows as complete

Summary: tripleo.storage.v1.ceph-install workflow showing as running when ceph-install...

Keywords:
Status:	CLOSED DUPLICATE of bug 1703618
Alias:	None
Product:	Red Hat OpenStack
Classification:	Red Hat
Component:	ceph-ansible
Sub Component:
Version:	13.0 (Queens)
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Target Release:	---
Assignee:	Guillaume Abrioux
QA Contact:	Yogev Rabl
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2020-04-30 22:50 UTC by ldenny
Modified:	2023-10-06 19:51 UTC (History)
CC List:	2 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2020-05-11 13:55:04 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Issue Tracker	OSP-28423	0	None	None	None	2023-09-07 23:05:33 UTC

Description ldenny 2020-04-30 22:50:19 UTC

Description of problem:
When deploying a new environment we are hitting a time-out issue, looking at the Mistral workflows tripleo.storage.v1.ceph-install is what is stuck.

~~~
(undercloud) [stack@director ~]$ openstack workflow execution list --fit-width
+-------------------+-------------------+-------------------+-------------------+-------------------+---------+------------+
| ID                | Workflow ID       | Workflow name     | Description       | Task Execution ID | State   | State info |
+-------------------+-------------------+-------------------+-------------------+-------------------+---------+------------+
...
| 30443f5b-a7e7-46a | 84f1067d-c6d0-4dc | tripleo.storage.v | sub-workflow      | a8273ec5-a77d-442 | RUNNING | None       |
| 1-9370-7fdba789e1 | 0-9ea5-a961d566b1 | 1.ceph-install    | execution         | 6-87e5-effc136f77 |         |            |
...
~~~

However looking at the ceph-install-workflow.log we can see the Ceph deployment completes 

~~~
2020-04-29 12:37:38,627 p=6519 u=mistral |  PLAY RECAP *********************************************************************
... 
2020-04-29 12:37:38,631 p=6519 u=mistral |  INSTALLER STATUS ***************************************************************
2020-04-29 12:37:38,635 p=6519 u=mistral |  Install Ceph Monitor        : Complete (0:06:24)
2020-04-29 12:37:38,635 p=6519 u=mistral |  Install Ceph Manager        : Complete (0:01:42)
2020-04-29 12:37:38,635 p=6519 u=mistral |  Install Ceph OSD            : Complete (0:26:29)
2020-04-29 12:37:38,635 p=6519 u=mistral |  Install Ceph RGW            : Complete (0:01:40)
2020-04-29 12:37:38,635 p=6519 u=mistral |  Install Ceph Client         : Complete (0:12:05)
2020-04-29 12:37:38,635 p=6519 u=mistral |  Wednesday 29 April 2020  12:37:38 -0400 (0:00:00.357)       0:54:55.575 ******* 
2020-04-29 12:37:38,636 p=6519 u=mistral |  =============================================================================== 
~~~

Looking at KCS https://access.redhat.com/solutions/4091811 we can see a similar issue where the Ceph install completes but Mistral is not updated to reflect that and needs to be manually updated using the following steps:
~~~
source ~/stackrc
WORKFLOW='tripleo.storage.v1.ceph-install'
UUID=$(mistral execution-list --limit=-1 | grep $WORKFLOW | awk {'print $2'} | tail -1)
for TASK_ID in $(mistral task-list $UUID | awk {'print $2'} | egrep -v 'ID|^$'); do
  echo $TASK_ID
  mistral task-get $TASK_ID
done
~~~

We don't see the same `InternalError: (1153, u"Got a packet bigger than 'max_allowed_packet' bytes")` errors in the Mistral engine log so I am not sure if it is the same as https://bugzilla.redhat.com/show_bug.cgi?id=1703618

Note the environment is not using the latest z stream (openstack-tripleo-common-8.6.8-5.el7ost.noarch) so this might be the issue.


How reproducible:
Each time the deployment is ran the work around needs to be followed as the deployment halts on the Ceph install

Comment 1 Giulio Fidente 2020-05-04 13:26:30 UTC

We have seen this happening on older zstreams when CephAnsiblePlaybookVerbosity was set to non-zero because that increased by a lot the amount of log lines produced by ceph-ansible which had to be stored into mistral table

Can you please check setting CephAnsiblePlaybookVerbosity to 0 if that isn't the case already?

Note You need to log in before you can comment on or make changes to this bug.