+++ This bug was initially created as a clone of Bug #1659264 +++ Cinder ignores some glanceclient errors (such as failed checksum validation) when creating volumes from images. https://bugs.launchpad.net/cinder/+bug/1799221
What the posted patch has to do with this BZ requires some explanation. 1. This BZ is for upstream Bug #1799221, fixed by upstream Change-Id Ic011fe30b4840e5098db1a594ea276ec98768bff, Rocky commit bf89f76 2. The fix for (1) depends on an exception introduced to fix upstream Bug #1798147 by upstream Change-Id If7c22ac4516f8c2a6ccd8bf6b6ed98409312b138, Rocky commit 805368e 3. The fix for (2) introduced upstream Bug #1808443, fixed by upstream Change-Id I2aa56da73660794c6dedcbb8a66e84bcec511a9c, Rocky commit 9c696ce 4. The fix for (2) introduced upstream Bug #1811184, fixed by upstream Change-Id I6d8dedfd056add3414f8f4bf7f7279eae4763286, Rocky commit 844b627 These have all been fixed upstream in stable/rocky. Everything except (4) was included in the import for 14z1. So that's what the patch posted to fix this bug contains.
Brian, I'm guessing in general it's a simple create vol from image, which should work. Which errors / how do I induce them on Glance's side so as to show that this fix addresses them and we're clear to verify? Any ideas/tips would be welcomed. Thanks
(In reply to Tzach Shefi from comment #7) > Which errors / how do I induce them on Glance's side so as to show that this > fix addresses them and we're clear to verify? Probably the easiest thing would be to change the os_hash_value in the database for an image. (In pre-Rocky, there won't be a os_hash_value; you can change the checksum.)
Fixed-in > than my openstack-cinder-13.0.3-0.20190118014304.44c5314.el7ost.noarch Waiting for newer puddle.
Verified on openstack-cinder-13.0.3-0.20190118014305.44c5314.el7ost.noarch I'd uploaded a cirros image to Glance (overcloud) [stack@undercloud-0 ~]$ glance image-create --disk-format qcow2 --container-format bare --file cirros-0.3.5-i386-disk.img --name cirros.bad +------------------+----------------------------------------------------------------------------------+ | Property | Value | +------------------+----------------------------------------------------------------------------------+ | checksum | 7316af7358dd32ca1956d72ac2c9e147 | | container_format | bare | | created_at | 2019-02-26T11:16:35Z | | direct_url | swift+config://ref1/glance/e300fa2c-95ec-438f-b48c-10e81a40339b | | disk_format | qcow2 | | id | e300fa2c-95ec-438f-b48c-10e81a40339b | | min_disk | 0 | | min_ram | 0 | | name | cirros.bad | | os_hash_algo | sha512 | | os_hash_value | 734c9281adf72b9947eb9ab85f5e9db0fe388b742ebb68469d4e87d17065e7a39b501c184e60913c | | | 79ec9e79042992024a2063b906c66a5c37735626f8f14bae | | os_hidden | False | | owner | 335bafdaf624413e80accc5428a22eb1 | | protected | False | | size | 12528640 | | status | active | | tags | [] | | updated_at | 2019-02-26T11:16:36Z | | virtual_size | Not available | | visibility | shared | +------------------+----------------------------------------------------------------------------------+ Tired playing(hacking) nice but os_hash_value is a read only :( #openstack image set e300fa2c-95ec-438f-b48c-10e81a40339b --property os_hash_value=734c9281adf72b9947eb9ab85f5e9db0fe388b742ebb68469d4e87d17065e7a39b501c184e60913c79ec9e79042992024a2063b906c66a5c37735626f8f14zzz 403 Forbidden: Attribute 'os_hash_value' is read-only. (HTTP 403) Oh well I'll just backdoor it via direct DB abouse: [root@controller-0 ~]# docker exec -it galera-bundle-docker-0 /bin/bash mysql MariaDB [(none)]> use glance; MariaDB [glance]> show tables; MariaDB [glance]> select * from images where name="cirros.bad"; +--------------------------------------+------------+----------+--------+---------------------+---------------------+------------+---------+-------------+------------------+----------------------------------+----------------------------------+----------+---------+-----------+--------------+------------+-----------+--------------+----------------------------------------------------------------------------------------------------------------------------------+ | id | name | size | status | created_at | updated_at | deleted_at | deleted | disk_format | container_format | checksum | owner | min_disk | min_ram | protected | virtual_size | visibility | os_hidden | os_hash_algo | os_hash_value | +--------------------------------------+------------+----------+--------+---------------------+---------------------+------------+---------+-------------+------------------+----------------------------------+----------------------------------+----------+---------+-----------+--------------+------------+-----------+--------------+----------------------------------------------------------------------------------------------------------------------------------+ | e300fa2c-95ec-438f-b48c-10e81a40339b | cirros.bad | 12528640 | active | 2019-02-26 11:16:35 | 2019-02-26 11:16:36 | NULL | 0 | qcow2 | bare | 7316af7358dd32ca1956d72ac2c9e147 | 335bafdaf624413e80accc5428a22eb1 | 0 | 0 | 0 | NULL | shared | 0 | sha512 | 734c9281adf72b9947eb9ab85f5e9db0fe388b742ebb68469d4e87d17065e7a39b501c184e60913c79ec9e79042992024a2063b906c66a5c37735626f8f14bae | +--------------------------------------+------------+----------+--------+---------------------+---------------------+------------+---------+-------------+------------------+----------------------------------+----------------------------------+----------+---------+-----------+--------------+------------+-----------+--------------+----------------------------------------------------------------------------------------------------------------------------------+ 1 row in set (0.00 sec) Now change checksum value, to introduce/simulate "glance client errors" MariaDB [glance]> update images set checksum="734c9281adf72b9947eb9ab85f5e9db0fe388b742ebb68469d4e87d17065e7a39b501c184e60913c79ec9e79042992024a2063b906c66a5c37735626f8f14zzz" where name="cirros.bad"; I'd replaced the last three chars with zzz. Glance image show still reports original hash ending with ..bae Now lets try to create a volume from this "malformed" image: (overcloud) [stack@undercloud-0 ~]$ cinder create 1 --image e300fa2c-95ec-438f-b48c-10e81a40339b +--------------------------------+---------------------------------------+ | Property | Value | +--------------------------------+---------------------------------------+ | attachments | [] | | availability_zone | nova | | bootable | false | | consistencygroup_id | None | | created_at | 2019-02-26T11:36:00.000000 | | description | None | | encrypted | False | | id | 03fa99b8-cb07-4864-8e8d-2bf58b3de4fe | | metadata | {} | | migration_status | None | | multiattach | False | | name | None | | os-vol-host-attr:host | hostgroup@tripleo_iscsi#tripleo_iscsi | | os-vol-mig-status-attr:migstat | None | | os-vol-mig-status-attr:name_id | None | | os-vol-tenant-attr:tenant_id | 335bafdaf624413e80accc5428a22eb1 | | replication_status | None | | size | 1 | | snapshot_id | None | | source_volid | None | | status | creating | | updated_at | 2019-02-26T11:36:00.000000 | | user_id | f79f18216ec9403f9d003ab8ae1fa5b2 | | volume_type | tripleo | +--------------------------------+---------------------------------------+ Image create fails as it should volume in error state: (overcloud) [stack@undercloud-0 ~]$ cinder list +--------------------------------------+--------+--------------+------+-------------+----------+--------------------------------------+ | ID | Status | Name | Size | Volume Type | Bootable | Attached to | +--------------------------------------+--------+--------------+------+-------------+----------+--------------------------------------+ | 03fa99b8-cb07-4864-8e8d-2bf58b3de4fe | error | - | 1 | tripleo | false | | Cinder scheduler log reports the expected reason why we failed to create said volume: /var/log/containers/cinder/cinder-scheduler.log:2019-02-26 11:36:04.697 1 ERROR cinder.scheduler.filter_scheduler [req-168a48db-fa39-4f81-b613-51c92120abc1 f79f18216ec9403f9d003ab8ae1fa5b2 335bafdaf624413e80accc5428a22eb1 - default default] Error scheduling 03fa99b8-cb07-4864-8e8d-2bf58b3de4fe from last vol-service: hostgroup@tripleo_iscsi#tripleo_iscsi : [u'Traceback (most recent call last):\n', u' File "/usr/lib/python2.7/site-packages/taskflow/engines/action_engine/executor.py", line 53, in _execute_task\n result = task.execute(**arguments)\n', u' File "/usr/lib/python2.7/site-packages/cinder/volume/flows/manager/create_volume.py", line 1041, in execute\n **volume_spec)\n', u' File "/usr/lib/python2.7/site-packages/cinder/volume/flows/manager/create_volume.py", line 951, in _create_from_image\n image_service)\n', u' File "/usr/lib/python2.7/site-packages/cinder/volume/flows/manager/create_volume.py", line 814, in _create_from_image_cache_or_download\n backend_name) as tmp_image:\n', u' File "/usr/lib64/python2.7/contextlib.py", line 17, in __enter__\n return self.gen.next()\n', u' File "/usr/lib/python2.7/site-packages/cinder/image/image_utils.py", line 832, in fetch\n fetch_verify_image(context, image_service, image_id, tmp)\n', u' File "/usr/lib/python2.7/site-packages/cinder/image/image_utils.py", line 451, in fetch_verify_image\n None, None)\n', u' File "/usr/lib/python2.7/site-packages/cinder/image/image_utils.py", line 393, in fetch\n reason=reason)\n', u'ImageDownloadFailed: Failed to download image e300fa2c-95ec-438f-b48c-10e81a40339b, reason: IOError: 32 Corrupt image download. Checksum was 7316af7358dd32ca1956d72ac2c9e147 expected 734c9281adf72b9947eb9ab85f5e9db0\n'] Looking good to verify.
A few comments about Tzach's verification. 1. As Tzach mentioned, os_hash_value and checksum are both read-only for security reasons, so the only way to modify them is to go directly to the DB. 2. Tzach mentioned that after he changed the value by substituting the last 3 chars with 'zzz', the os_hash_value on the image-show response stayed the same. The change should have been visible (glance has no local memory of image properties, it must always fetch them from the DB). What happened is that the DB change actually modified the 'checksum' (the md5 hash) of the image. Since that field in the DB is limited to 32 chars, the DB only stored the first 32 chars of the string Tzach set 'checksum' to. That's why in the last line of the error message at the end of comment #10, you don't see any 'zzz' chars. 3. As Tzach documented, an IOError was thrown, and the volume failed to create, which verifies the fix for this bug. 4. The reason I'm writing this is that when the Glance 'multihash' was introduced in Rocky, the glanceclient was modified so that it would prefer to use the new verification (os_hash_algo + os_hash_value) over the old md5 checksum. So in this case, what I'd have expected to see is that the download would succeed, because Tzach didn't modify the os_hash_algo and os_hash_value in the DB, so they should have been fine (when the 'multihash' is available, the checksum is just ignored because it's an insecure hash anyway). So there's something strange going on here that I will look into. (Could be that full client-side multihash didn't show up in glanceclient until early in Stein, or our Cinder distro is using an outdated glanceclient.) 5. On the plus side, though, this bugzilla has been verified.
Answering my point 4 in comment #11. The glanceclient released with Rocky was 2.12.1. Multihash verification was introduced into master (Stein) and then backported to stable/rocky and released as python-glanceclient 2.13.0 (tag date: 2018-10-31 21:57:36 +0000).
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:0586