Description of problem: After a complete restore of the control plane and undercloud from backup, no new images can be uploaded to glance Version-Release number of selected component (if applicable): RHOS-16.2-RHEL-8-20210614.n.1 How reproducible: Couldn't upload any image, tried centos and cirros Steps to Reproduce: 1. Deploy overcloud 2. Run backup and REstore 3. Try to upload new image Actual results: See attached - Error finding address for http://10.0.0.144:9292/v2/images/09ef72f2-3584-4102-8c32-a1cf729a1696/file: [Errno 32] Broken pipe clean_up CreateImage: Error finding address for http://10.0.0.144:9292/v2/images/09ef72f2-3584-4102-8c32-a1cf729a1696/file: [Errno 32] Broken pipe END return value: 1 Expected results: New image should be uploaded Additional info:
I'm not familiar with the "backup and restore" procedure, could you link some documentation? A few questions: 1) Did image upload work before? 2) Can you send us the Glance logs from the glance-api container? 3) It seems like you're using Swift. Does Swift work when you use it directly? 4) Does "glance image-list" work?
Hi Cyril, Please see more here: https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/16.1/html/undercloud_and_control_plane_back_up_and_restore/index And here: https://github.com/openstack/tripleo-ansible/tree/master/tripleo_ansible/roles/backup_and_restore 1. Yes it worked prior to this, Tobiko ran and created several resources including instances, images and subnets/networks 2. See attached logs from controller-1, env is also available 3. I tried using the client only (openstack image create), and it failed 4. Openstack image list works yes. Shows the problematic image as still "saving" days later Best, Eliad
OK, so this is interesting. According to the OpenStack CLI verbose logs: 1) The image is created (POST http://10.0.0.144:9292/v2/images) 2) Then we try to upload data (PUT http://10.0.0.144:9292/v2/images/$IMAGE_ID/file) but we never hear about this request again and I cannot find it in the Glance logs 3) We finally end up deleting the image (DELETE /v2/images/$IMAGE_ID), and we see an error because it cannot be found. So, there was a similar issue at https://bugs.launchpad.net/glance/+bug/1772651 . One of the comments states "I use multiple vlans, so mine problem was if I remember correctly that one of ceph components was not in VLAN that it needed to be. It is network related, at least for me". Is it possible that there is a network issue after doing the Backup & Restore? It does seem like Glance and Ceph have trouble talking to one another. @Pranali: do you know much about what tripleo does in the backup & restore procedure? Could that impact Glance, Ceph and the network configuration?
I think this should be moved to Ceph as this truly looks like a Ceph issue rather than a Glance one. Giulio is currently on PTO, but Francesco could probably help. @Francesco: do you agree with this?
This has been fixed by the serialized backup introduced in 16.2.1. Since then we have not been able to reproduce it again, so I am closing it.