Bug 1664687 - [RHOS14] cinder ignores errors from glanceclient when creating volumes
Summary: [RHOS14] cinder ignores errors from glanceclient when creating volumes
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-cinder
Version: 14.0 (Rocky)
Hardware: Unspecified
OS: Unspecified
medium
high
Target Milestone: ---
: 14.0 (Rocky)
Assignee: Brian Rosmaita
QA Contact: Tzach Shefi
RHOS Documentation Team
URL:
Whiteboard:
Depends On:
Blocks: 1659264
TreeView+ depends on / blocked
 
Reported: 2019-01-09 13:21 UTC by Brian Rosmaita
Modified: 2019-03-18 12:56 UTC (History)
7 users (show)

Fixed In Version: openstack-cinder-13.0.3-0.20190118014305.44c5314.el7ost
Doc Type: Bug Fix
Doc Text:
A code change in OpenStack Platform 12 introduced a regression that caused the Block Storage service (cinder) to ignore some IOError exceptions raised by the glanceclient during image download. Ignored IOError exceptions can result in a volume with truncated or corrupt data. This release modifies the code so that such exceptions are not ignored. When an IOError occurs during image download, the Block Storage service logs the exception and handles it correctly.
Clone Of: 1659264
Environment:
Last Closed: 2019-03-18 12:56:24 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Launchpad 1798147 0 None None None 2019-02-04 18:56:14 UTC
Launchpad 1799221 0 None None None 2019-01-09 13:21:34 UTC
Launchpad 1808443 0 None None None 2019-02-04 19:00:28 UTC
Launchpad 1811184 0 None None None 2019-02-04 19:00:28 UTC
OpenStack gerrit 618978 0 'None' 'MERGED' 'Handling unexpected python error "NoneType object is not iterable"' 2019-11-21 11:04:07 UTC
OpenStack gerrit 625148 0 'None' 'MERGED' 'Set message property in ImageDownloadFailed' 2019-11-21 11:04:07 UTC
OpenStack gerrit 629091 0 'None' 'MERGED' 'cinder-volume: Stop masking IOError different than ENOSPC' 2019-11-21 11:04:07 UTC
OpenStack gerrit 631997 0 'None' 'MERGED' 'Pass image_id to ImageDownloadFailed' 2019-11-21 11:04:07 UTC
Red Hat Product Errata RHBA-2019:0586 0 None None None 2019-03-18 12:56:29 UTC

Description Brian Rosmaita 2019-01-09 13:21:35 UTC
+++ This bug was initially created as a clone of Bug #1659264 +++

Cinder ignores some glanceclient errors (such as failed checksum validation) when creating volumes from images.

https://bugs.launchpad.net/cinder/+bug/1799221

Comment 3 Brian Rosmaita 2019-02-04 18:54:57 UTC
What the posted patch has to do with this BZ requires some explanation.

1. This BZ is for upstream Bug #1799221, fixed by upstream Change-Id Ic011fe30b4840e5098db1a594ea276ec98768bff, Rocky commit bf89f76

2. The fix for (1) depends on an exception introduced to fix upstream Bug #1798147 by upstream Change-Id If7c22ac4516f8c2a6ccd8bf6b6ed98409312b138, Rocky commit 805368e

3. The fix for (2) introduced upstream Bug #1808443, fixed by upstream Change-Id I2aa56da73660794c6dedcbb8a66e84bcec511a9c, Rocky commit 9c696ce

4. The fix for (2) introduced upstream Bug #1811184, fixed by upstream Change-Id I6d8dedfd056add3414f8f4bf7f7279eae4763286, Rocky commit 844b627

These have all been fixed upstream in stable/rocky.  Everything except (4) was included in the import for 14z1.  So that's what the patch posted to fix this bug contains.

Comment 7 Tzach Shefi 2019-02-07 22:36:27 UTC
Brian, 

I'm guessing in general it's a simple create vol from image, which should work.
Which errors / how do I induce them on Glance's side so as to show that this fix addresses them and we're clear to verify? 

Any ideas/tips would be welcomed. 
Thanks

Comment 8 Brian Rosmaita 2019-02-08 15:56:42 UTC
(In reply to Tzach Shefi from comment #7)
> Which errors / how do I induce them on Glance's side so as to show that this
> fix addresses them and we're clear to verify? 

Probably the easiest thing would be to change the os_hash_value in the database for an image.  (In pre-Rocky, there won't be a os_hash_value; you can change the checksum.)

Comment 9 Tzach Shefi 2019-02-13 05:32:24 UTC
Fixed-in > than my openstack-cinder-13.0.3-0.20190118014304.44c5314.el7ost.noarch
Waiting for newer puddle.

Comment 10 Tzach Shefi 2019-02-26 11:44:26 UTC
Verified on
openstack-cinder-13.0.3-0.20190118014305.44c5314.el7ost.noarch

I'd uploaded a cirros image to Glance

(overcloud) [stack@undercloud-0 ~]$ glance image-create --disk-format qcow2 --container-format bare --file cirros-0.3.5-i386-disk.img  --name cirros.bad
+------------------+----------------------------------------------------------------------------------+
| Property         | Value                                                                            |
+------------------+----------------------------------------------------------------------------------+
| checksum         | 7316af7358dd32ca1956d72ac2c9e147                                                 |
| container_format | bare                                                                             |
| created_at       | 2019-02-26T11:16:35Z                                                             |
| direct_url       | swift+config://ref1/glance/e300fa2c-95ec-438f-b48c-10e81a40339b                  |
| disk_format      | qcow2                                                                            |
| id               | e300fa2c-95ec-438f-b48c-10e81a40339b                                             |
| min_disk         | 0                                                                                |
| min_ram          | 0                                                                                |
| name             | cirros.bad                                                                       |
| os_hash_algo     | sha512                                                                           |
| os_hash_value    | 734c9281adf72b9947eb9ab85f5e9db0fe388b742ebb68469d4e87d17065e7a39b501c184e60913c |
|                  | 79ec9e79042992024a2063b906c66a5c37735626f8f14bae                                 |
| os_hidden        | False                                                                            |
| owner            | 335bafdaf624413e80accc5428a22eb1                                                 |
| protected        | False                                                                            |
| size             | 12528640                                                                         |
| status           | active                                                                           |
| tags             | []                                                                               |
| updated_at       | 2019-02-26T11:16:36Z                                                             |
| virtual_size     | Not available                                                                    |
| visibility       | shared                                                                           |
+------------------+----------------------------------------------------------------------------------+

Tired playing(hacking) nice but os_hash_value is a read only :( 
#openstack image set e300fa2c-95ec-438f-b48c-10e81a40339b --property os_hash_value=734c9281adf72b9947eb9ab85f5e9db0fe388b742ebb68469d4e87d17065e7a39b501c184e60913c79ec9e79042992024a2063b906c66a5c37735626f8f14zzz
403 Forbidden: Attribute 'os_hash_value' is read-only. (HTTP 403)

Oh well I'll just backdoor it via direct DB abouse:

[root@controller-0 ~]# docker exec -it galera-bundle-docker-0 /bin/bash
mysql
MariaDB [(none)]> use glance;
MariaDB [glance]> show tables;
MariaDB [glance]> select * from images where name="cirros.bad";
+--------------------------------------+------------+----------+--------+---------------------+---------------------+------------+---------+-------------+------------------+----------------------------------+----------------------------------+----------+---------+-----------+--------------+------------+-----------+--------------+----------------------------------------------------------------------------------------------------------------------------------+
| id                                   | name       | size     | status | created_at          | updated_at          | deleted_at | deleted | disk_format | container_format | checksum                         | owner                            | min_disk | min_ram | protected | virtual_size | visibility | os_hidden | os_hash_algo | os_hash_value                                                                                                                    |
+--------------------------------------+------------+----------+--------+---------------------+---------------------+------------+---------+-------------+------------------+----------------------------------+----------------------------------+----------+---------+-----------+--------------+------------+-----------+--------------+----------------------------------------------------------------------------------------------------------------------------------+
| e300fa2c-95ec-438f-b48c-10e81a40339b | cirros.bad | 12528640 | active | 2019-02-26 11:16:35 | 2019-02-26 11:16:36 | NULL       |       0 | qcow2       | bare             | 7316af7358dd32ca1956d72ac2c9e147 | 335bafdaf624413e80accc5428a22eb1 |        0 |       0 |         0 |         NULL | shared     |         0 | sha512       | 734c9281adf72b9947eb9ab85f5e9db0fe388b742ebb68469d4e87d17065e7a39b501c184e60913c79ec9e79042992024a2063b906c66a5c37735626f8f14bae |
+--------------------------------------+------------+----------+--------+---------------------+---------------------+------------+---------+-------------+------------------+----------------------------------+----------------------------------+----------+---------+-----------+--------------+------------+-----------+--------------+----------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

Now change checksum value, to introduce/simulate "glance client errors"
MariaDB [glance]> update images set checksum="734c9281adf72b9947eb9ab85f5e9db0fe388b742ebb68469d4e87d17065e7a39b501c184e60913c79ec9e79042992024a2063b906c66a5c37735626f8f14zzz"  where name="cirros.bad";

I'd replaced the last three chars with zzz. 

Glance image show still reports original hash ending with ..bae

Now lets try to create a volume from this "malformed" image:
(overcloud) [stack@undercloud-0 ~]$ cinder create 1 --image e300fa2c-95ec-438f-b48c-10e81a40339b
+--------------------------------+---------------------------------------+
| Property                       | Value                                 |
+--------------------------------+---------------------------------------+
| attachments                    | []                                    |
| availability_zone              | nova                                  |
| bootable                       | false                                 |
| consistencygroup_id            | None                                  |
| created_at                     | 2019-02-26T11:36:00.000000            |
| description                    | None                                  |
| encrypted                      | False                                 |
| id                             | 03fa99b8-cb07-4864-8e8d-2bf58b3de4fe  |
| metadata                       | {}                                    |
| migration_status               | None                                  |
| multiattach                    | False                                 |
| name                           | None                                  |
| os-vol-host-attr:host          | hostgroup@tripleo_iscsi#tripleo_iscsi |
| os-vol-mig-status-attr:migstat | None                                  |
| os-vol-mig-status-attr:name_id | None                                  |
| os-vol-tenant-attr:tenant_id   | 335bafdaf624413e80accc5428a22eb1      |
| replication_status             | None                                  |
| size                           | 1                                     |
| snapshot_id                    | None                                  |
| source_volid                   | None                                  |
| status                         | creating                              |
| updated_at                     | 2019-02-26T11:36:00.000000            |
| user_id                        | f79f18216ec9403f9d003ab8ae1fa5b2      |
| volume_type                    | tripleo                               |
+--------------------------------+---------------------------------------+

Image create fails as it should volume in error state:
(overcloud) [stack@undercloud-0 ~]$ cinder list
+--------------------------------------+--------+--------------+------+-------------+----------+--------------------------------------+
| ID                                   | Status | Name         | Size | Volume Type | Bootable | Attached to                          |
+--------------------------------------+--------+--------------+------+-------------+----------+--------------------------------------+
| 03fa99b8-cb07-4864-8e8d-2bf58b3de4fe | error  | -            | 1    | tripleo     | false    |                                      |


Cinder scheduler log reports the expected reason why we failed to create said volume:

/var/log/containers/cinder/cinder-scheduler.log:2019-02-26 11:36:04.697 1 ERROR cinder.scheduler.filter_scheduler [req-168a48db-fa39-4f81-b613-51c92120abc1 f79f18216ec9403f9d003ab8ae1fa5b2 335bafdaf624413e80accc5428a22eb1 - default default] Error scheduling 03fa99b8-cb07-4864-8e8d-2bf58b3de4fe from last vol-service: hostgroup@tripleo_iscsi#tripleo_iscsi : [u'Traceback (most recent call last):\n', u'  File "/usr/lib/python2.7/site-packages/taskflow/engines/action_engine/executor.py", line 53, in _execute_task\n    result = task.execute(**arguments)\n', u'  File "/usr/lib/python2.7/site-packages/cinder/volume/flows/manager/create_volume.py", line 1041, in execute\n    **volume_spec)\n', u'  File "/usr/lib/python2.7/site-packages/cinder/volume/flows/manager/create_volume.py", line 951, in _create_from_image\n    image_service)\n', u'  File "/usr/lib/python2.7/site-packages/cinder/volume/flows/manager/create_volume.py", line 814, in _create_from_image_cache_or_download\n    backend_name) as tmp_image:\n', u'  File "/usr/lib64/python2.7/contextlib.py", line 17, in __enter__\n    return self.gen.next()\n', u'  File "/usr/lib/python2.7/site-packages/cinder/image/image_utils.py", line 832, in fetch\n    fetch_verify_image(context, image_service, image_id, tmp)\n', u'  File "/usr/lib/python2.7/site-packages/cinder/image/image_utils.py", line 451, in fetch_verify_image\n    None, None)\n', u'  File "/usr/lib/python2.7/site-packages/cinder/image/image_utils.py", line 393, in fetch\n    reason=reason)\n', u'ImageDownloadFailed: Failed to download image e300fa2c-95ec-438f-b48c-10e81a40339b, reason: IOError: 32 Corrupt image download. Checksum was 7316af7358dd32ca1956d72ac2c9e147 expected 734c9281adf72b9947eb9ab85f5e9db0\n']


Looking good to verify.

Comment 11 Brian Rosmaita 2019-02-26 13:42:02 UTC
A few comments about Tzach's verification.

1. As Tzach mentioned, os_hash_value and checksum are both read-only for security reasons, so the only way to modify them is to go directly to the DB.

2. Tzach mentioned that after he changed the value by substituting the last 3 chars with 'zzz', the os_hash_value on the image-show response stayed the same. The change should have been visible (glance has no local memory of image properties, it must always fetch them from the DB).  What happened is that the DB change actually modified the 'checksum' (the md5 hash) of the image.  Since that field in the DB is limited to 32 chars, the DB only stored the first 32 chars of the string Tzach set 'checksum' to.  That's why in the last line of the error message at the end of comment #10, you don't see any 'zzz' chars.

3. As Tzach documented, an IOError was thrown, and the volume failed to create, which verifies the fix for this bug.

4. The reason I'm writing this is that when the Glance 'multihash' was introduced in Rocky, the glanceclient was modified so that it would prefer to use the new verification (os_hash_algo + os_hash_value) over the old md5 checksum.  So in this case, what I'd have expected to see is that the download would succeed, because Tzach didn't modify the os_hash_algo and os_hash_value in the DB, so they should have been fine (when the 'multihash' is available, the checksum is just ignored because it's an insecure hash anyway).  So there's something strange going on here that I will look into.  (Could be that full client-side multihash didn't show up in glanceclient until early in Stein, or our Cinder distro is using an outdated glanceclient.)

5. On the plus side, though, this bugzilla has been verified.

Comment 12 Brian Rosmaita 2019-02-27 14:51:57 UTC
Answering my point 4 in comment #11.

The glanceclient released with Rocky was 2.12.1.  Multihash verification was introduced into master (Stein) and then backported to stable/rocky and released as python-glanceclient 2.13.0 (tag date: 2018-10-31 21:57:36 +0000).

Comment 16 errata-xmlrpc 2019-03-18 12:56:24 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0586


Note You need to log in before you can comment on or make changes to this bug.