Bug 1341350
Summary: | rhel-osp-director: registering the overcloud images fails on first attempt with "500 Internal Server Error: Failed to upload image 51672726-cc40-40ea-9ca0-1f8b2267313c (HTTP 500)" | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat OpenStack | Reporter: | Alexander Chuzhoy <sasha> | ||||
Component: | instack-undercloud | Assignee: | Jiri Stransky <jstransk> | ||||
Status: | CLOSED ERRATA | QA Contact: | Alexander Chuzhoy <sasha> | ||||
Severity: | unspecified | Docs Contact: | |||||
Priority: | unspecified | ||||||
Version: | 9.0 (Mitaka) | CC: | bnemec, dbecker, dmacpher, jason.dobies, jcoufal, jjoyce, mburns, mcornea, morazi, rhel-osp-director-maint, tvignaud | ||||
Target Milestone: | ga | Keywords: | Triaged | ||||
Target Release: | 9.0 (Mitaka) | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | instack-undercloud-4.0.0-8 | Doc Type: | Bug Fix | ||||
Doc Text: |
Slow environments experienced timeouts when Glance tried to communicate with Swift as a backend. This caused some Glance operations, such as image uploads, to fail. This fix increases the Swift proxy server's default node_timeout value to 60 seconds. This increases the reliability of Glance image uploads on slow environments using Swift as an image storage backend.
|
Story Points: | --- | ||||
Clone Of: | Environment: | ||||||
Last Closed: | 2016-08-11 11:31:46 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
Alexander Chuzhoy
2016-05-31 22:31:54 UTC
Created attachment 1163383 [details]
glance logs.
From what I can tell the image upload to swift is timing out. We can see in the openstack-swift-proxy.service journal the following: ERROR with Object server 192.0.2.1:6000/1 re: Trying to get final status of PUT to /v1/AUTH_4b8a69b9d00b41babd2041819f4bec39/glance/a9f16a3f-5ca2-4060-a303-476d696bcec7: Timeout (10.0s) Object PUT returning 503 for [503] (txn: tx2c5dab211085411b857ad-0057587a8a) (client_ip: 192.0.2.1) Note that I've only seen it on virt environments. I did some testing and tried switching the cache mode of the undercloud vm disk from unsafe to default and I couldn't reproduce this issue anymore. Is this still an issue? I've deployed a virt environment on Monday 6th June and didn't hit this. Please feel free to e-mail/irc me environment login details when we have it reproduced. I haven't reproduced it (yet) with the last build, despite: [root@instack ~]# grep default_store /etc/glance/glance-api.conf #default_store = file default_store = swift and cache='unsafe' Reproduced with [root@instack ~]# grep default_store /etc/glance/glance-api.conf #default_store = file default_store = swift and cache='unsafe' in VM's xml. I still didn't hit this, with default_store = swift, unsafe cache in VM, and virtual environment. Sasha, can you please ping me with an environment where the issue appeared? I think both the error that i had the opportunity to look at in the environment, and what Marius pasted above, are timeouts controlled by the node_timeout setting of proxy-server. It could be that the virtual environment is just too slow for the default timeout values. Sasha, you mentioned you can reproduce this fairly reliably, could you please try if it's still reproducible after running the following commands? sudo crudini --set /etc/swift/proxy-server.conf app:proxy-server node_timeout 30 sudo systemctl restart openstack-swift-proxy I've run into this a number of times over the years in different environments, both virt and baremetal, but it's not necessarily reproducible even on the same hardware and software versions. The other setting that _seems_ to help with this in my experience is this one from Glance: # The size, in MB, that Glance will start chunking image files and do # a large object manifest in Swift. (integer value) #swift_store_large_object_size=5120 I set this to 500 or 1000 so Glance will upload the image in smaller chunks that don't seem to timeout. Although again, this is a fairly intermittent problem so it's hard to say if changing that fixed the problem or if I just got lucky. :-) Changing the proxy timeout seems like a reasonable solution too, so +1 to making that change. I would suggest doing it on the overcloud as well, since I and a few others have run into this there too. Jiri, I reproduced the issue after setting: sudo crudini --set /etc/swift/proxy-server.conf app:proxy-server node_timeout 30 But after setting: sudo crudini --set /etc/swift/proxy-server.conf app:proxy-server node_timeout 60 The issue didn't reproduce. Environment: instack-undercloud-4.0.0-8.el7ost.noarch Wasn't able to reproduce the issue in the interim. Will try some more. Verified: Per comment #12. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHEA-2016-1599.html |