1973793 – [Backup and Restore] After restoring the control plane from backup, new images can't be uploaded to glance

Bug 1973793 - [Backup and Restore] After restoring the control plane from backup, new images can't be uploaded to glance

Summary: [Backup and Restore] After restoring the control plane from backup, new image...

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	Red Hat OpenStack
Classification:	Red Hat
Component:	ceph
Sub Component:
Version:	16.2 (Train)
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	medium
Target Milestone:	---
Target Release:	---
Assignee:	Juan Larriba
QA Contact:	myadla
Docs Contact:
URL:
Whiteboard:
Depends On:	1908656 1984430
Blocks:
TreeView+	depends on / blocked

Reported:	2021-06-18 17:29 UTC by Eliad Cohen
Modified:	2022-01-04 15:43 UTC (History)
CC List:	15 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2022-01-04 15:42:19 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Issue Tracker	OSP-5316	0	None	None	None	2022-01-04 15:43:17 UTC
Red Hat Issue Tracker	UPG-3291	0	None	None	None	2021-09-07 10:09:08 UTC

Description Eliad Cohen 2021-06-18 17:29:21 UTC

Description of problem:
After a complete restore of the control plane and undercloud from backup, no new images can be uploaded to glance


Version-Release number of selected component (if applicable):
RHOS-16.2-RHEL-8-20210614.n.1

How reproducible:
Couldn't upload any image, tried centos and cirros

Steps to Reproduce:
1. Deploy overcloud
2. Run backup and REstore
3. Try to upload new image

Actual results:
See attached - 
Error finding address for http://10.0.0.144:9292/v2/images/09ef72f2-3584-4102-8c32-a1cf729a1696/file: [Errno 32] Broken pipe
clean_up CreateImage: Error finding address for http://10.0.0.144:9292/v2/images/09ef72f2-3584-4102-8c32-a1cf729a1696/file: [Errno 32] Broken pipe
END return value: 1

Expected results:
New image should be uploaded

Additional info:

Comment 4 Cyril Roelandt 2021-06-21 18:30:26 UTC

I'm not familiar with the "backup and restore" procedure, could you link some documentation?


A few questions:


1) Did image upload work before?

2) Can you send us the Glance logs from the glance-api container?

3) It seems like you're using Swift. Does Swift work when you use it directly?

4) Does "glance image-list" work?

Comment 5 Eliad Cohen 2021-06-21 20:33:18 UTC

Hi Cyril,
Please see more here:
https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/16.1/html/undercloud_and_control_plane_back_up_and_restore/index
And here:
https://github.com/openstack/tripleo-ansible/tree/master/tripleo_ansible/roles/backup_and_restore

1. Yes it worked prior to this, Tobiko ran and created several resources including instances, images and subnets/networks
2. See attached logs from controller-1, env is also available
3. I tried using the client only (openstack image create), and it failed
4. Openstack image list works yes. Shows the problematic image as still "saving" days later

Best,
Eliad

Comment 7 Cyril Roelandt 2021-06-29 01:53:41 UTC

OK, so this is interesting. According to the OpenStack CLI verbose logs:

1) The image is created (POST http://10.0.0.144:9292/v2/images)

2) Then we try to upload data (PUT http://10.0.0.144:9292/v2/images/$IMAGE_ID/file) but we never hear about this request again and I cannot find it in the Glance logs

3) We finally end up deleting the image (DELETE /v2/images/$IMAGE_ID), and we see an error because it cannot be found.


So, there was a similar issue at https://bugs.launchpad.net/glance/+bug/1772651 . One of the comments states "I use multiple vlans, so mine problem was if I remember correctly that one of ceph components was not in VLAN that it needed to be. It is network related, at least for me". Is it possible that there is a network issue after doing the Backup & Restore? It does seem like Glance and Ceph have trouble talking to one another.


@Pranali: do you know much about what tripleo does in the backup & restore procedure? Could that impact Glance, Ceph and the network configuration?

Comment 11 Cyril Roelandt 2021-07-21 00:04:36 UTC

I think this should be moved to Ceph as this truly looks like a Ceph issue rather than a Glance one. Giulio is currently on PTO, but Francesco could probably help.


@Francesco: do you agree with this?

Comment 24 Juan Larriba 2022-01-04 15:42:19 UTC

This has been fixed by the serialized backup introduced in 16.2.1. Since then we have not been able to reproduce it again, so I am closing it.

Note You need to log in before you can comment on or make changes to this bug.