Hide Forgot
Description of problem: Snapshot disappears while saving if metadata are updated via computer/servers REST API. Version-Release number of selected component (if applicable): 6 and 7 at least. How reproducible: Following sequence of actions causes the problem: - start snapshot creation - get metadata - set metadata (original + custom) - wait for snapshot to be active (this fails) Note that the problem occurred even when metadata was updated without any modification. However, reordering the steps fixes the problem: - start snapshot creation - wait for snapshot to be active - get metadata - set metadata (original + custom) There is an automated reproducer (good=764713cb5, bad=85d92b9ea): ``` $ git clone git.lab.eng.bos.redhat.com:jbossqe-jenkins/mwqa-cloud-slaves.git -b rt398929 $ . ~/openrc.sh $ git checkout 764713cb5 $ ./bin/runscript.groovy bin/update-snapshot.groovy --name test --server df74fe66-7798-4a3e-84e7-e52819b0e423 --metadata key=value Uploading new snapshot test This will take a while... Status: SAVING (25%) ... Uploaded ec7dad6c-1d5f-4d8c-ba4c-23bb87192b0f $ git checkout 85d92b9ea $ ./bin/runscript.groovy bin/update-snapshot.groovy --name test --server ID_OF_RUNNING_SERVER --metadata key=value Uploading new snapshot test This will take a while... Status: SAVING (25%) ... Status: SAVING (50%) Status: ERROR (0%) Caught: java.lang.NullPointerException: Cannot get property 'status' on null object java.lang.NullPointerException: Cannot get property 'status' on null object at update-snapshot.run(update-snapshot.groovy:39) at runscript.runScript(runscript.groovy:45) at runscript$runScript.callCurrent(Unknown Source) at runscript.run(runscript.groovy:74) ``` Actual results: Client site polling might observe snapshot to enter ERROR state with 0% progress while waiting for saving to complete. Then the snapshot disappears entirely leaving no trace of problem visible for the user that I know of. Expected results: Metadata will be attached to saving snapshot, then snapshot will be correctly saved, the metadata are still attached. Additional info: I observe this on QEOS(v6) and OSP7(v7). Also reported as https://engineering.redhat.com/rt/Ticket/Display.html?id=398929
Can you reproduce this with the nova command line utilities please? I don't know what your client is doing so it's hard to comment on whether it's a nova problem or not. Also, please include a full set of nova and cinder logs that capture the time interval that you reproduce it. Thanks!
I asked SysOps to extract those logs for you in the ticket. I have retried reproducing with command line utilities and it seems to work: ``` $ nova image-create f9edd50c-bc8c-44db-be33-c1b7cfa9fb48 test-snapshot --show +-------------------------------------+---------------------------------------------------------------+ | Property | Value | +-------------------------------------+---------------------------------------------------------------+ | OS-EXT-IMG-SIZE:size | 0 | | created | 2016-05-06T09:26:24Z | | id | 10b69091-a802-4ef0-9176-d2466a5474a0 | | metadata base_image_ref | 43c818b3-0826-49a8-8eaf-b9a6af61c15b | | metadata commit | dfbbc077f6aae3ca69efa5a4f184601e3c146f48 | | metadata image_type | snapshot | | metadata instance_type_ephemeral_gb | 0 | | metadata instance_type_flavorid | 4 | | metadata instance_type_id | 3 | | metadata instance_type_memory_mb | 8192 | | metadata instance_type_name | m1.large | | metadata instance_type_root_gb | 80 | | metadata instance_type_rxtx_factor | 1 | | metadata instance_type_swap | 0 | | metadata instance_type_vcpus | 4 | | metadata instance_uuid | f9edd50c-bc8c-44db-be33-c1b7cfa9fb48 | | metadata labels | rhel7 rhel x86_64 mwqa_drives_tools novaclient docker openjdk | | metadata user_id | 381fdc6cb43a47c6a06bd50bd5ae8e26 | | minDisk | 80 | | minRam | 0 | | name | test-snapshot | | progress | 25 | | server | f9edd50c-bc8c-44db-be33-c1b7cfa9fb48 | | status | SAVING | | updated | 2016-05-06T09:26:24Z | +-------------------------------------+---------------------------------------------------------------+ [ogondza@9e20a229adbc mwqa-cloud-slaves]$ glance image-update 10b69091-a802-4ef0-9176-d2466a5474a0 --property a=b +----------------------------+---------------------------------------------------------------+ | Property | Value | +----------------------------+---------------------------------------------------------------+ | a | b | | base_image_ref | 43c818b3-0826-49a8-8eaf-b9a6af61c15b | | commit | dfbbc077f6aae3ca69efa5a4f184601e3c146f48 | | container_format | bare | | created_at | 2016-05-06T09:26:24Z | | disk_format | qcow2 | | id | 10b69091-a802-4ef0-9176-d2466a5474a0 | | image_type | snapshot | | instance_type_ephemeral_gb | 0 | | instance_type_flavorid | 4 | | instance_type_id | 3 | | instance_type_memory_mb | 8192 | | instance_type_name | m1.large | | instance_type_root_gb | 80 | | instance_type_rxtx_factor | 1 | | instance_type_swap | 0 | | instance_type_vcpus | 4 | | instance_uuid | f9edd50c-bc8c-44db-be33-c1b7cfa9fb48 | | labels | rhel7 rhel x86_64 mwqa_drives_tools novaclient docker openjdk | | min_disk | 80 | | min_ram | 0 | | name | test-snapshot | | protected | False | | size | 0 | | status | queued | | tags | [] | | updated_at | 2016-05-06T09:27:03Z | | user_id | 381fdc6cb43a47c6a06bd50bd5ae8e26 | | visibility | private | +----------------------------+---------------------------------------------------------------+ [ogondza@9e20a229adbc mwqa-cloud-slaves]$ glance image-show 10b69091-a802-4ef0-9176-d2466a5474a0 +----------------------------+---------------------------------------------------------------+ | Property | Value | +----------------------------+---------------------------------------------------------------+ | a | b | | base_image_ref | 43c818b3-0826-49a8-8eaf-b9a6af61c15b | | checksum | 0f23ec33f5dad06aea118bc9c0f337a9 | | commit | dfbbc077f6aae3ca69efa5a4f184601e3c146f48 | | container_format | bare | | created_at | 2016-05-06T09:26:24Z | | disk_format | qcow2 | | id | 10b69091-a802-4ef0-9176-d2466a5474a0 | | image_location | snapshot | | image_state | available | | image_type | snapshot | | instance_type_ephemeral_gb | 0 | | instance_type_flavorid | 4 | | instance_type_id | 3 | | instance_type_memory_mb | 8192 | | instance_type_name | m1.large | | instance_type_root_gb | 80 | | instance_type_rxtx_factor | 1 | | instance_type_swap | 0 | | instance_type_vcpus | 4 | | instance_uuid | f9edd50c-bc8c-44db-be33-c1b7cfa9fb48 | | labels | rhel7 rhel x86_64 mwqa_drives_tools novaclient docker openjdk | | min_disk | 80 | | min_ram | 0 | | name | test-snapshot | | os_type | None | | owner_id | 9ef4d9c5b208431abc45bd2535389725 | | protected | False | | size | 4397924352 | | status | active | | tags | [] | | updated_at | 2016-05-06T09:31:34Z | | user_id | 381fdc6cb43a47c6a06bd50bd5ae8e26 | | visibility | private | +----------------------------+---------------------------------------------------------------+ ```
Okay, so everything I see above looks right to me, unless I'm missing something. Sounds like this is a bug in whatever tool you're using to do this?
Please re-open if this is really reproducible with the main command line tools.
I am reporting this as an issue in openstack REST API (I happen to be invoking through java wrapper that might or might not be involved in the problem) so I do not think it is necessary to reproduce with the tools. I am attaching network logs from both broken[1] and working[2] approach while urging admins to provide you with server-side logs. [1] http://pastebin.test.redhat.com/372372 [2] http://pastebin.test.redhat.com/372373
Here are the server side logs (https://engineering.redhat.com/rt/Ticket/Display.html?id=398929#txn-8190434): ``` User: 381fdc6cb43a47c6a06bd50bd5ae8e26 Snapshot: 63691ebf-cf2c-48b2-aa44-6d5c19649873 glance/api.log 2016-04-12 11:58:20.471 19282 ERROR glance.api.v1.upload_utils [5be525e6-98f5-42d6-8a42-c387733a1aa9 381fdc6cb43a47c6a06bd50bd5ae8e26 9ef4d9c5b208431abc45bd2535389725] Supplied checksum (None) and checksum generated from uploaded image (0ac99921b73dc0c38d03539936eaa1ff) did not match. Setting image status to 'killed'. 2016-04-12 11:58:21.513 19282 ERROR glance.api.v1.upload_utils [5be525e6-98f5-42d6-8a42-c387733a1aa9 381fdc6cb43a47c6a06bd50bd5ae8e26 9ef4d9c5b208431abc45bd2535389725] Received HTTP error while uploading image 63691ebf-cf2c-48b2-aa44-6d5c19649873 2016-04-12 11:58:21.513 19282 TRACE glance.api.v1.upload_utils Traceback (most recent call last): 2016-04-12 11:58:21.513 19282 TRACE glance.api.v1.upload_utils File "/usr/lib/python2.6/site-packages/glance/api/v1/upload_utils.py", line 132, in upload_data_to_store 2016-04-12 11:58:21.513 19282 TRACE glance.api.v1.upload_utils _kill_mismatched(image_meta, 'checksum', checksum) 2016-04-12 11:58:21.513 19282 TRACE glance.api.v1.upload_utils File "/usr/lib/python2.6/site-packages/glance/api/v1/upload_utils.py", line 127, in _kill_mismatched 2016-04-12 11:58:21.513 19282 TRACE glance.api.v1.upload_utils request=req) 2016-04-12 11:58:21.513 19282 TRACE glance.api.v1.upload_utils HTTPBadRequest: Supplied checksum (None) and checksum generated from uploaded image (0ac99921b73dc0c38d03539936eaa1ff) did not match. Setting image status to 'killed'. 2016-04-12 11:58:21.513 19282 TRACE glance.api.v1.upload_utils Snapshot: c5e6013c-56ef-4d3f-8a19-29938322f068 horizon_access.log-20160501 10.8.56.22 - - [26/Apr/2016:10:08:17 +0000] "GET /dashboard/project/images_and_snapshots/?action=row_update&table=images&obj_id=c5e6013c-56ef-4d3f-8a19-29938322f068 HTTP/1.1" 200 433 "http://qeos.lab.eng.rdu2.redhat.com/dashboard/project/images_and_snapshots/" [Open URL] "Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Firefox/45.0" 10.8.56.22 - - [26/Apr/2016:10:08:19 +0000] "GET /dashboard/project/images_and_snapshots/c5e6013c-56ef-4d3f-8a19-29938322f068/ HTTP/1.1" 200 2320 "http://qeos.lab.eng.rdu2.redhat.com/dashboard/project/images_and_snapshots/" [Open URL] "Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Firefox/45.0" 10.8.56.22 - - [26/Apr/2016:10:08:19 +0000] "GET /dashboard/i18n/js/horizon/ HTTP/1.1" 200 718 "http://qeos.lab.eng.rdu2.redhat.com/dashboard/project/images_and_snapshots/c5e6013c-56ef-4d3f-8a19-29938322f068/" [Open URL] "Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Firefox/45.0" glance/api.log 2016-04-26 10:08:44.662 19300 ERROR glance.api.middleware.cache [12830def-3716-4d43-9a51-3026248ea67e 8aca16fd455d48ca90a359a6a9893c92 f0269e92050742a3b68fbde6777a3110] Checksum header is missing. ``` Hope that helps.
We would really need to see the api requests made to Nova which result in this failure in glance. The fact that this worked with the CLI suggests an error in the tool is not unlikely.