Bug 1282984
| Summary: | 500 Internal Server Error from running 'glance image-create' on the overcloud | ||
|---|---|---|---|
| Product: | Red Hat OpenStack | Reporter: | Ronelle Landy <rlandy> |
| Component: | rhosp-director | Assignee: | Flavio Percoco <fpercoco> |
| Status: | CLOSED DUPLICATE | QA Contact: | yeylon <yeylon> |
| Severity: | unspecified | Docs Contact: | |
| Priority: | high | ||
| Version: | 7.0 (Kilo) | CC: | fpercoco, gfidente, hbrock, jcoufal, mburns, ohochman, rhel-osp-director-maint, slinaber, srevivo, whayutin |
| Target Milestone: | y2 | Keywords: | Automation |
| Target Release: | 7.0 (Kilo) | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2015-11-24 16:09:36 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Ronelle Landy
2015-11-18 00:00:34 UTC
We're also seeing the confusing output from tempest in ci.centos jobs for rdo-manager, e.g. https://ci.centos.org/view/rdo/job/rdo_manager-periodic-7-rdo-liberty-delorean_mgt-centos-7.0-templates-virthost-minimal_ha-neutron-ml2-vxlan-smoke/173/console I opened a bug upstream against tempest: https://bugs.launchpad.net/tempest/+bug/1517536 Can I have the config files for glance and ceph ? By looking at the traceback, it'd seem that there's something wrong in the configs. Is the ceph config file configured? Is it readable? I looked into one of the environments and it doesn't have ceph enabled. The logs on that environment don't show the error reported in this BZ. For this error to happen, ceph must be enabled and used on uploads. I was able to get another 500 while running config_tempest.py. 1. login to uc, su stack 2. source ~/overcloudrc 3. glance image-delete <for all images> 4. re-run config_tempest.py using command from khaleesi output* Having added some 'print' debugging in tempest/services/image/v2/json/image_client.py, I can see the response from the glance api: store_image: v2/images/4b5cb428-6be5-4248-99d1-e53cebdacb59/file 500 I can't find anything in journalctl or /var/log reflecting this, though. Maybe looking in the wrong way or place. *to wit, source /home/stack/overcloudrc; cd /home/stack/tempest && tools/config_tempest.py --out etc/tempest.conf --network-id 5996b189-3bac-4506-beaf-1f6fe584571d --deployer-input ~/tempest-deployer-input.conf --debug --create identity.uri $OS_AUTH_URL identity.admin_password $OS_PASSWORD network.tenant_network_cidr 192.168.0.0/24 object-storage.operator_role swiftoperator orchestration.stack_owner_role heat_stack_owner (In reply to Steve Linabery from comment #5) > I was able to get another 500 while running config_tempest.py. > > 1. login to uc, su stack > 2. source ~/overcloudrc > 3. glance image-delete <for all images> > 4. re-run config_tempest.py using command from khaleesi output* > > Having added some 'print' debugging in > tempest/services/image/v2/json/image_client.py, I can see the response from > the glance api: > store_image: > v2/images/4b5cb428-6be5-4248-99d1-e53cebdacb59/file > 500 > > I can't find anything in journalctl or /var/log reflecting this, though. > Maybe looking in the wrong way or place. > > *to wit, > source /home/stack/overcloudrc; cd /home/stack/tempest && > tools/config_tempest.py --out etc/tempest.conf --network-id > 5996b189-3bac-4506-beaf-1f6fe584571d --deployer-input > ~/tempest-deployer-input.conf --debug --create identity.uri $OS_AUTH_URL > identity.admin_password $OS_PASSWORD network.tenant_network_cidr > 192.168.0.0/24 object-storage.operator_role swiftoperator > orchestration.stack_owner_role heat_stack_owner I should note this was not on the env where the original bug was produced, but on a virthost-based installation that I ran subsequently. I believe I know the answers, but I want to clarify: * Is this happening on all deployment configurations or just some specific ones? * Is this bug affecting basic overcloud functionality (ability to launch VM since you cannot create an image)? Thanks, Jarda (In reply to Jaromir Coufal from comment #7) > I believe I know the answers, but I want to clarify: > * Is this happening on all deployment configurations or just some specific > ones? I can't say for sure b/c it is intermittent in the places where we've seen it. IOW, it could be more widespread than we have observed (not to be alarmist, but idk). > * Is this bug affecting basic overcloud functionality (ability to launch VM > since you cannot create an image)? If you can create an image (see 'intermittent' ^^) tempest passes. > > Thanks, Jarda Here's what it looks like in glance/api.log from a ci.centos run where we saw the error output from config_tempest.py https://ci.centos.org/artifacts/rdo/jenkins-rdo_manager-periodic-7-rdo-liberty-production-centos-7.0-templates-virthost-minimal_ha-neutron-ml2-vxlan-smoke-37/overcloud-controller-1/var/log/glance/api.log.gz fpercoco just found this https://bugs.launchpad.net/glance-store/+bug/1213179 I am not sure if it is related, since the bug was filed more than 2 years ago and we caught this issue in the CI just now... We think this is isolated to HA deployments. Cannot find an example of it failing with the tempest error output on 'minimal'. Following up on comment #3 from Flavio, can you paste the entire cmdline used to deploy? (In reply to Giulio Fidente from comment #13) > Following up on comment #3 from Flavio, can you paste the entire cmdline > used to deploy? openstack overcloud deploy --debug --log-file overcloud_deployment_71.log --templates --libvirt-type=qemu --neutron-network-type vxlan --neutron-tunnel-types vxlan --ntp-server 10.5.26.10 --control-scale 3 --compute-scale 1 --ceph-storage-scale 0 --block-storage-scale 0 --swift-storage-scale 0 --control-flavor baremetal --compute-flavor baremetal --ceph-storage-flavor baremetal --block-storage-flavor baremetal --swift-storage-flavor baremetal -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/net-single-nic-with-vlans.yaml -e ~/network-environment.yaml We have a baremetal multinode poodle-based install from this afternoon on which I am unable to reproduce the error. (In reply to Steve Linabery from comment #15) > We have a baremetal multinode poodle-based install from this afternoon on > which I am unable to reproduce the error. Steve can you describe your reproduce steps? Can you also ensure you try a script to upload an image to glance at 10-20 times (In reply to wes hayutin from comment #17) > This is also interesting... > https://rhos-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/osp_director-rhos- > 7_director-poodle-rhel-7.2-templates-baremetal-dell_pe_r630-minimal_ha- > bond_with_vlans-neutron-gre/14/testReport/tempest.api.volume. > test_volumes_get/VolumesV2GetTest/ > test_volume_create_get_update_delete_from_image_id_54a01030_c7fc_447c_86ee_c1 > 182beae638_image_smoke_/ To reproduce on the virthost env, I deleted all glance images and ran config_tempest.py. Fails approx every other image upload. here's what I used to exercise the baremetal where I cannot reproduce a failure: #!/bin/bash counter=0 while [ 1 == 1 ] ; do source ~/overcloudrc ; for n in `glance image-list | grep cirros | awk '{print $2}'`; do glance image-delete $n; done; source /home/stack/overcloudrc; cd /home/stack/tempest && tools/config_tempest.py --out etc/tempest.conf --network-id 213cda4c-0af5-4895-8179-10e643222de3 --deployer-input ~/tempest-deployer-input.conf --debug --create identity.uri $OS_AUTH_URL identity.admin_password $OS_PASSWORD network.tenant_network_cidr 192.168.0.0/24 object-storage.operator_role swiftoperator orchestration.stack_owner_role heat_stack_owner if [ $? != 0 ]; then echo "failed" break fi ((counter+=1)) echo "passed $counter times" done https://bugzilla.redhat.com/show_bug.cgi?id=1284845 This looks like the root cause. config_tempest.py is using the v2 api: https://github.com/redhat-openstack/tempest/blob/kilo/tempest/services/image/v2/json/image_client.py#L94 *** This bug has been marked as a duplicate of bug 1284845 *** |