1702413 – Overcloud deployment fails when deploying nodes with more than 2 disks

Bug 1702413 - Overcloud deployment fails when deploying nodes with more than 2 disks

Summary: Overcloud deployment fails when deploying nodes with more than 2 disks

Keywords:
Status:	CLOSED DUPLICATE of bug 1691551
Alias:	None
Product:	Red Hat OpenStack
Classification:	Red Hat
Component:	rhosp-director
Sub Component:
Version:	15.0 (Stein)
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	urgent
Target Milestone:	---
Target Release:	---
Assignee:	RHOS Maint
QA Contact:	Sasha Smolyak
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2019-04-23 17:31 UTC by Marius Cornea
Modified:	2019-04-24 23:39 UTC (History)
CC List:	9 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2019-04-24 23:39:30 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Bugzilla	1691551	0	high	CLOSED	Deployment of overcloud nodes with more than 2 disks fails	2023-09-18 00:15:43 UTC

Description Marius Cornea 2019-04-23 17:31:40 UTC

Description of problem:

Overcloud deployment fails when deploying nodes with more than 2 disks:

(undercloud) [stack@undercloud-0 ~]$ nova list
/usr/lib/python3.6/site-packages/urllib3/connection.py:374: SubjectAltNameWarning: Certificate for 192.168.24.2 has no `subjectAltName`, falling back to check for a `commonName` for now. This feature is being removed by major browsers and deprecated by RFC 2818. (See https://github.com/shazow/urllib3/issues/497 for details.)
  SubjectAltNameWarning
/usr/lib/python3.6/site-packages/urllib3/connection.py:374: SubjectAltNameWarning: Certificate for 192.168.24.2 has no `subjectAltName`, falling back to check for a `commonName` for now. This feature is being removed by major browsers and deprecated by RFC 2818. (See https://github.com/shazow/urllib3/issues/497 for details.)
  SubjectAltNameWarning
+--------------------------------------+--------------+--------+------------+-------------+------------------------+
| ID                                   | Name         | Status | Task State | Power State | Networks               |
+--------------------------------------+--------------+--------+------------+-------------+------------------------+
| 9148dc1b-64ed-4721-b827-9ceb374c7b1a | ceph-0       | ERROR  | -          | NOSTATE     |                        |
| 2f519f9f-79be-4dad-acab-2901307074e4 | ceph-1       | BUILD  | scheduling | NOSTATE     |                        |
| 5871d3e1-9f95-46e7-91be-0bedf055d149 | ceph-2       | BUILD  | scheduling | NOSTATE     |                        |
| 304be7f2-9709-4687-8263-58f5549653ec | compute-0    | ACTIVE | -          | Running     | ctlplane=192.168.24.8  |
| 01273806-5f86-48f7-9888-a7fed31453ac | compute-1    | ACTIVE | -          | Running     | ctlplane=192.168.24.15 |
| 50e02de4-4d55-45c2-aadb-ede3f7c8fb43 | compute-2    | ACTIVE | -          | Running     | ctlplane=192.168.24.10 |
| f2936327-1e0f-42ba-8cd7-42ee0ea85b2e | controller-0 | ACTIVE | -          | Running     | ctlplane=192.168.24.9  |
| 386013ee-94d6-4d2e-88e1-90a9d0cc9af1 | controller-1 | ACTIVE | -          | Running     | ctlplane=192.168.24.11 |
| bdd88f96-f8c2-4c65-8a1e-f2575622c26f | controller-2 | ACTIVE | -          | Running     | ctlplane=192.168.24.21 |
+--------------------------------------+--------------+--------+------------+-------------+------------------------+


The ceph nodes have 6 disks with the following configuration:

https://github.com/redhat-openstack/infrared/blob/master/plugins/virsh/defaults/topology/nodes/ceph.yml#L11-L53

Version-Release number of selected component (if applicable):
15  -p RHOS_TRUNK-15.0-RHEL-8-20190418.n.0


How reproducible:
100%

Steps to Reproduce:
1. Deploy overcloud with nodes that have more than 2 disks

Actual results:
Overcloud deployment fails because the nodes with multiple disks fail to get deployed

Expected results:
Overcloud deployment passes without issues.

Additional info:

Comment 2 Dmitry Tantsur 2019-04-24 12:47:29 UTC

This error doesn't seem to be related to disks:

2019-04-23 15:14:07.330 8 ERROR ironic.drivers.modules.agent_client [req-a8564393-0c08-4adf-8d0c-af1a2e26dff4 - - - - -] Failed to connect to the agent running on node 81724853-5131-4d60-b568-134931b8b60e for invoking command image.install_bootloader. Error: HTTPConnectionPool(host='192.168.24.16', port=9999): Read timed out. (read timeout=60): requests.exceptions.ReadTimeout: HTTPConnectionPool(host='192.168.24.16', port=9999): Read timed out. (read timeout=60)

It also seems transient, since some commands succeed. Can you confirm that 192.168.24.16 is the correct address and is reachable from the undercloud?

Comment 3 Marius Cornea 2019-04-24 12:53:22 UTC

(In reply to Dmitry Tantsur from comment #2)
> This error doesn't seem to be related to disks:
> 
> 2019-04-23 15:14:07.330 8 ERROR ironic.drivers.modules.agent_client
> [req-a8564393-0c08-4adf-8d0c-af1a2e26dff4 - - - - -] Failed to connect to
> the agent running on node 81724853-5131-4d60-b568-134931b8b60e for invoking
> command image.install_bootloader. Error:
> HTTPConnectionPool(host='192.168.24.16', port=9999): Read timed out. (read
> timeout=60): requests.exceptions.ReadTimeout:
> HTTPConnectionPool(host='192.168.24.16', port=9999): Read timed out. (read
> timeout=60)
> 
> It also seems transient, since some commands succeed. Can you confirm that
> 192.168.24.16 is the correct address and is reachable from the undercloud?

I don't have this environment anymore but I'll confirm on the next one. Based on my observations though deployment passes when I leave only 2 disks for the ceph nodes.

Comment 4 Derek Higgins 2019-04-24 22:36:36 UTC

See https://bugzilla.redhat.com/show_bug.cgi?id=1691551#c10

Comment 5 Bob Fournier 2019-04-24 23:39:30 UTC

Nice find Derek!

As this has the same symptoms as https://bugzilla.redhat.com/show_bug.cgi?id=1691551, marking this as a duplicate so we have one place to track this issue

*** This bug has been marked as a duplicate of bug 1691551 ***

Note You need to log in before you can comment on or make changes to this bug.