Bug 1642155

Summary: cinder always check image_volume_cache_max_size_gb and image_volume_cache_max_count when either of them is specified.
Product: Red Hat OpenStack Reporter: Elise Gafford <egafford>
Component: openstack-cinderAssignee: Elise Gafford <egafford>
Status: CLOSED ERRATA QA Contact: Avi Avraham <aavraham>
Severity: medium Docs Contact: Kim Nylander <knylande>
Priority: medium    
Version: 12.0 (Pike)CC: aavraham, abishop, amcleod, egafford, knoha, knylande, srevivo, tshefi
Target Milestone: ---Keywords: Triaged, ZStream
Target Release: 12.0 (Pike)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: openstack-cinder-11.1.0-21.el7ost Doc Type: Bug Fix
Doc Text:
The Block Storage service (cinder) uses two volume cache limit settings. Previously, when only one cache limit was configured, adding a new entry to the cache caused an existing entry to be ejected from the cache. Only a single entry would be cached, regardless of the configured cache limit. With this update, the Block Storage service handles volume cache limits correctly.
Story Points: ---
Clone Of: 1622453 Environment:
Last Closed: 2018-12-05 18:49:12 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1622453, 1641111    
Bug Blocks:    

Description Elise Gafford 2018-10-23 19:29:35 UTC
+++ This bug was initially created as a clone of Bug #1622453 +++

Description of problem:
cinder always check image_volume_cache_max_size_gb and image_volume_cache_max_count when either of them is specified.

Version-Release number of selected component (if applicable):
Current RHOSP10 cinder.

How reproducible:
Always

Steps to Reproduce:
1. Add  image_volume_cache_max_count parameter and set a value greater than 0
2. Try to create a volume and it can't create more than 1.
3.

Actual results:
Only 1 image cache is available.
And cinder shows following message.
~~~
Reclaiming image-volume cache space; removing cache entry 
~~~

Expected results:
Image cache should be created as the number of image_volume_cache_max_count 

Additional info:
image_volume_cache_max_size_gb and image_volume_cache_max_count are read from the following codes.

~~~
--volume/driver.py
    248     cfg.IntOpt('image_volume_cache_max_size_gb',
    249                default=0,
    250                help='Max size of the image volume cache for this backend in '
    251                     'GB. 0 => unlimited.'),
    252     cfg.IntOpt('image_volume_cache_max_count',
    253                default=0,
    254                help='Max number of entries allowed in the image volume cache. '
    255                     '0 => unlimited.'),
~~~

Then these are passed to image_cache.ImageVolumeCache().
--volume/manager.py
    265         if self.driver.configuration.safe_get(
    266                 'image_volume_cache_enabled'):
    267 
    268             max_cache_size = self.driver.configuration.safe_get(
    269                 'image_volume_cache_max_size_gb')
    270             max_cache_entries = self.driver.configuration.safe_get(
    271                 'image_volume_cache_max_count')
    272 
    273             self.image_volume_cache = image_cache.ImageVolumeCache(
    274                 self.db,
    275                 cinder_volume.API(),
    276                 max_cache_size,
    277                 max_cache_entries
    278             )
    279             LOG.info(_LI('Image-volume cache enabled for host %(host)s.'),
    280                      {'host': self.host})
~~~

Class ImageVolumeCache is defined as follows.
These aregument is passed to max_cache_size_gb and max_cache_size_count.
~~~
     31 class ImageVolumeCache(object):
     32     def __init__(self, db, volume_api, max_cache_size_gb=0,
     33                  max_cache_size_count=0):
     34         self.db = db
     35         self.volume_api = volume_api
     36         self.max_cache_size_gb = int(max_cache_size_gb)
     37         self.max_cache_size_count = int(max_cache_size_count)
     38         self.notifier = rpc.get_notifier('volume', CONF.host)
~~~

'Reclaiming image-volume cache space; removing cache entry' was shown in ensure_space().

In this method, when max_cache_size_gb and max_cache_size_count are zero, it is returned as True.
~~~
    109     def ensure_space(self, context, space_required, host):
    110         """Makes room for a cache entry.
    111 
    112         Returns True if successful, false otherwise.
    113         """
    114 
    115         # Check to see if the cache is actually limited.
    116         if self.max_cache_size_gb == 0 and self.max_cache_size_count == 0:
    117             return True
    118 
    119         # Make sure that we can potentially fit the image in the cache
    120         # and bail out before evicting everything else to try and make
    121         # room for it.
    122         if (self.max_cache_size_gb != 0 and
    123                 space_required > self.max_cache_size_gb):
    124             return False
    125 
    126         # Assume the entries are ordered by most recently used to least used.
    127         entries = self.db.image_volume_cache_get_all_for_host(context, host)
    128 
    129         current_count = len(entries)
    130 
    131         current_size = 0
    132         for entry in entries:
    133             current_size += entry['size']
    134 
    135         # Add values for the entry we intend to create.
    136         current_size += space_required
    137         current_count += 1
    138 
    139         LOG.debug('Image-volume cache for host %(host)s current_size (GB) = '
    140                   '%(size_gb)s (max = %(max_gb)s), current count = %(count)s '
    141                   '(max = %(max_count)s).',
    142                   {'host': host,
    143                    'size_gb': current_size,
    144                    'max_gb': self.max_cache_size_gb,
    145                    'count': current_count,
    146                    'max_count': self.max_cache_size_count})
    147 
    148         while ((current_size > self.max_cache_size_gb
    149                or current_count > self.max_cache_size_count)
    150                and len(entries)):
    151             entry = entries.pop()
    152             LOG.debug('Reclaiming image-volume cache space; removing cache '
    153                       'entry %(entry)s.', {'entry': self._entry_to_str(entry)})
    154             self._delete_image_volume(context, entry)
    155             current_size -= entry['size']
    156             current_count -= 1
    157             LOG.debug('Image-volume cache for host %(host)s new size (GB) = '
    158                       '%(size_gb)s, new count = %(count)s.',
    159                       {'host': host,
    160                        'size_gb': current_size,
    161                        'count': current_count})
~~~

一方で、いずれかのパラメータに値が代入されている場合には以下の処理にて
Reclaiming 処理の判定が行なわれます。ここで、本来であれば 0 であるパラメータ
については制限をかけないため、比較処理をする必要が無いのですが、0 であるパラ
メータに関しても比較処理が行なわれます。お客様の場合には、max_cache_size_gb が
常に True が返却されるため、Reclaiming 処理が実施されることになります。

On the other hand, when either of them has a value greater than 0, both of them are evaluated at following codes.

~~~
    148         while ((current_size > self.max_cache_size_gb
    149                or current_count > self.max_cache_size_count)
    150                and len(entries)):
    151             entry = entries.pop()
    152             LOG.debug('Reclaiming image-volume cache space; removing cache '
    153                       'entry %(entry)s.', {'entry': self._entry_to_str(entry)})
    154             self._delete_image_volume(context, entry)
    155             current_size -= entry['size']
    156             current_count -= 1
    157             LOG.debug('Image-volume cache for host %(host)s new size (GB) = '
    158                       '%(size_gb)s, new count = %(count)s.',
    159                       {'host': host,
    160                        'size_gb': current_size,
    161                        'count': current_count})
~~~

This section is always check the parameter which has zero then it prevent the work of other parameter.

Usually, user thinks that assigned parameter is used. But current implementation doesn't respect to the assigned value.

--- Additional comment from Keigo Noha on 2018-08-27 04:51:24 EDT ---

Please ignore the Japanese sentence in c#0.

--- Additional comment from Alan Bishop on 2018-09-06 09:47:48 EDT ---

I see the problem in the upstream code.

--- Additional comment from Alan Bishop on 2018-09-27 10:41:27 EDT ---

Starting backports to upstream stable branches.

--- Additional comment from Keigo Noha on 2018-10-08 20:38:43 EDT ---

Hello Alan,

Is it possible to include the fix into RHOSP13 also?

Best Regards,
Keigo Noha

--- Additional comment from Alan Bishop on 2018-10-09 08:50:15 EDT ---

The fix will be available in OSP-13z3. The patch is on upstream stable/queens, and will be brought downstream during the next bulk import. You may soon see this BZ cloned to track the fix for all releases (except OSP-11, which is EOL) back down to OSP-10.

--- Additional comment from Alan Bishop on 2018-10-19 15:34:49 EDT ---

Just to clarify, if you follow bug #1641111 you will see the upstream patch for this issue will be included in the next OSP-13 release. Work continues on backporting the fix downstream for this OSP-10 release.

Comment 10 Tzach Shefi 2018-12-03 13:39:18 UTC
Verified on:
openstack-cinder-11.1.0-22.el7ost.noarch


Configure cache setting

UC
PROJECT_ID = openstack project list | grep admin | awk {'print $2'}
USER_ID = openstack user list | grep admin | awk {'print $2'}


crudini --set /etc/cinder/cinder.conf DEFAULT cinder_internal_tenant_project_id $PROJECT_ID
crudini --set /etc/cinder/cinder.conf DEFAULT cinder_internal_tenant_user_id $USER_ID
crudini --set /etc/cinder/cinder.conf netapp image_volume_cache_enabled True
crudini --set /etc/cinder/cinder.conf netapp image_volume_cache_max_count 2
systemctl stop openstack-cinder-volume.service
systemctl start openstack-cinder-volume.service


Upload three different images:
glance image-create --disk-format qcow2 --container-format bare --file cirros-0.3.5-i386-disk.img --name cirros
glance image-create --disk-format raw --container-format bare --file cirros-0.3.5-i386-disk.raw --name cirros.raw
glance image-create --disk-format iso --container-format bare --file dsl-4.4.10.iso  --name dsl

Create a volume, see it's associated image-xx cache volume. 
cinder create 1 --image cirros --name cirros

cinder list
+--------------------------------------+-----------+--------------------------------------------+------+-------------+----------+--------------------------------------+                                                                     
| ID                                   | Status    | Name                                       | Size | Volume Type | Bootable | Attached to                          |                                                                     
+--------------------------------------+-----------+--------------------------------------------+------+-------------+----------+--------------------------------------+                                                                     
| 4c0119e7-bb33-4061-a6f7-4a26afb98aea | available | image-2812df11-1830-4b42-8335-13977f2e75d6 | 1    | -           | false    |                                      |                                                                     
                                                                  
| fd2aeb6a-91b8-4f55-82fa-bc2aab93407c | available | cirros                                     | 1    | -           | true     |                                      |                                                                     
+--------------------------------------+-----------+--------------------------------------------+------+-------------+----------+--------------------------------------+  

Create a second volume
cinder create 1 --image cirros.raw --name cirros.raw
Volume created we also now have two images cached volumes.

cinder create 1 --image cirros.raw --name cirros_raw
cinder list
+--------------------------------------+-----------+--------------------------------------------+------+-------------+----------+--------------------------------------+
| ID                                   | Status    | Name                                       | Size | Volume Type | Bootable | Attached to                          |
+--------------------------------------+-----------+--------------------------------------------+------+-------------+----------+--------------------------------------+
| 4c0119e7-bb33-4061-a6f7-4a26afb98aea | available | image-2812df11-1830-4b42-8335-13977f2e75d6 | 1    | -           | false    |                                      |
| b955a352-6fed-48c5-95af-d64a36f89e72 | available | image-724f8f78-9a78-4ba2-b2cb-06bfee0f0d3b | 1    | -           | false    |                                      |
| c7ab48c0-efb1-4cee-816e-c94ea87108cb | available | cirros_raw                                 | 1    | -           | true     |                                      |
| fd2aeb6a-91b8-4f55-82fa-bc2aab93407c | available | cirros                                     | 1    | -           | true     |                                      |
+--------------------------------------+-----------+--------------------------------------------+------+-------------+----------+--------------------------------------+


Now a third volume
cinder create 1 --image dsl --name dsl

Notice 3 volumes and only two cached images, notice one of the cached images was replaced.

cinder list
+--------------------------------------+-----------+--------------------------------------------+------+-------------+----------+--------------------------------------+
| ID                                   | Status    | Name                                       | Size | Volume Type | Bootable | Attached to                          |
+--------------------------------------+-----------+--------------------------------------------+------+-------------+----------+--------------------------------------+
| 4f463f8d-d15e-4038-9e68-44feca028f1f | available | dsl                                        | 1    | -           | true     |                                      |
| 584e8b1f-f581-4ccc-acd1-b39cdcfbaf87 | available | image-82443c02-cafe-4d7d-8391-ebae352c19f8 | 1    | -           | false    |                                      |
| b955a352-6fed-48c5-95af-d64a36f89e72 | available | image-724f8f78-9a78-4ba2-b2cb-06bfee0f0d3b | 1    | -           | false    |                                      |
| c7ab48c0-efb1-4cee-816e-c94ea87108cb | available | cirros_raw                                 | 1    | -           | true     |                                      |
| fd2aeb6a-91b8-4f55-82fa-bc2aab93407c | available | cirros                                     | 1    | -           | true     |                                      |
+--------------------------------------+-----------+--------------------------------------------+------+-------------+----------+--------------------------------------+


New image  image-82443.. - replaced image-2812df11..

C-vol log evict notice
ache [req-f1a95091-c50e-48be-b0d8-dd027a6d453b 21437ff80c704066ad2bc5b09dfe1609 26b9425705624361828aca58c3ccf31a - - -] Evicting image cache entry: {'last_used': datetime.datetime(2018, 12, 3, 13, 34, 7), 'image_updated_at': datetime.datetime(2018, 12, 3, 13, 21, 37), 'image_id': u'2812df11-1830-4b42-8335-13977f2e75d6', 'host': u'



Looks great. 

Tip for reference:
DEFAULT cinder_internal_tenant_project_id  ...
DEFAULT cinder_internal_tenant_user_id ....

These two under are set under backend itself
tripleo_iscsi image_volume_cache_enabled True
tripleo_iscsi image_volume_cache_max_count 2

Restart service/docker

Related config bits
https://docs.openstack.org/cinder/latest/admin/blockstorage-image-volume-cache.html

Comment 12 errata-xmlrpc 2018-12-05 18:49:12 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:3785