Bug 1973793 - [Backup and Restore] After restoring the control plane from backup, new images can't be uploaded to glance
Summary: [Backup and Restore] After restoring the control plane from backup, new image...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: ceph
Version: 16.2 (Train)
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: ---
Assignee: Juan Larriba
QA Contact: myadla
URL:
Whiteboard:
Depends On: 1908656 1984430
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-06-18 17:29 UTC by Eliad Cohen
Modified: 2022-01-04 15:43 UTC (History)
15 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-01-04 15:42:19 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker OSP-5316 0 None None None 2022-01-04 15:43:17 UTC
Red Hat Issue Tracker UPG-3291 0 None None None 2021-09-07 10:09:08 UTC

Description Eliad Cohen 2021-06-18 17:29:21 UTC
Description of problem:
After a complete restore of the control plane and undercloud from backup, no new images can be uploaded to glance


Version-Release number of selected component (if applicable):
RHOS-16.2-RHEL-8-20210614.n.1

How reproducible:
Couldn't upload any image, tried centos and cirros

Steps to Reproduce:
1. Deploy overcloud
2. Run backup and REstore
3. Try to upload new image

Actual results:
See attached - 
Error finding address for http://10.0.0.144:9292/v2/images/09ef72f2-3584-4102-8c32-a1cf729a1696/file: [Errno 32] Broken pipe
clean_up CreateImage: Error finding address for http://10.0.0.144:9292/v2/images/09ef72f2-3584-4102-8c32-a1cf729a1696/file: [Errno 32] Broken pipe
END return value: 1

Expected results:
New image should be uploaded

Additional info:

Comment 4 Cyril Roelandt 2021-06-21 18:30:26 UTC
I'm not familiar with the "backup and restore" procedure, could you link some documentation?


A few questions:


1) Did image upload work before?

2) Can you send us the Glance logs from the glance-api container?

3) It seems like you're using Swift. Does Swift work when you use it directly?

4) Does "glance image-list" work?

Comment 5 Eliad Cohen 2021-06-21 20:33:18 UTC
Hi Cyril,
Please see more here:
https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/16.1/html/undercloud_and_control_plane_back_up_and_restore/index
And here:
https://github.com/openstack/tripleo-ansible/tree/master/tripleo_ansible/roles/backup_and_restore

1. Yes it worked prior to this, Tobiko ran and created several resources including instances, images and subnets/networks
2. See attached logs from controller-1, env is also available
3. I tried using the client only (openstack image create), and it failed
4. Openstack image list works yes. Shows the problematic image as still "saving" days later

Best,
Eliad

Comment 7 Cyril Roelandt 2021-06-29 01:53:41 UTC
OK, so this is interesting. According to the OpenStack CLI verbose logs:

1) The image is created (POST http://10.0.0.144:9292/v2/images)

2) Then we try to upload data (PUT http://10.0.0.144:9292/v2/images/$IMAGE_ID/file) but we never hear about this request again and I cannot find it in the Glance logs

3) We finally end up deleting the image (DELETE /v2/images/$IMAGE_ID), and we see an error because it cannot be found.


So, there was a similar issue at https://bugs.launchpad.net/glance/+bug/1772651 . One of the comments states "I use multiple vlans, so mine problem was if I remember correctly that one of ceph components was not in VLAN that it needed to be. It is network related, at least for me". Is it possible that there is a network issue after doing the Backup & Restore? It does seem like Glance and Ceph have trouble talking to one another.


@Pranali: do you know much about what tripleo does in the backup & restore procedure? Could that impact Glance, Ceph and the network configuration?

Comment 11 Cyril Roelandt 2021-07-21 00:04:36 UTC
I think this should be moved to Ceph as this truly looks like a Ceph issue rather than a Glance one. Giulio is currently on PTO, but Francesco could probably help.


@Francesco: do you agree with this?

Comment 24 Juan Larriba 2022-01-04 15:42:19 UTC
This has been fixed by the serialized backup introduced in 16.2.1. Since then we have not been able to reproduce it again, so I am closing it.


Note You need to log in before you can comment on or make changes to this bug.