Bug 2100697

Summary: With image-volume cache enabled backend, volume creation for a smaller volume than image-volume cache has the same size of image-volume cache
Product: Red Hat OpenStack Reporter: Keigo Noha <knoha>
Component: openstack-cinderAssignee: Rajat Dhasmana <rdhasman>
Status: CLOSED MIGRATED QA Contact: Evelina Shames <eshames>
Severity: medium Docs Contact: RHOS Documentation Team <rhos-docs>
Priority: medium    
Version: 13.0 (Queens)CC: abishop, gcharot, rdhasman, udesale, zaitcev
Target Milestone: ---Keywords: Triaged
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2024-01-05 11:17:52 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Keigo Noha 2022-06-24 02:57:50 UTC
Description of problem:
Let me assume the following situation.

A new image is around 800MiB.
The first volume creation with the image is 20GiB.
The second volume creation with the image is 1GiB.
The both volume creation requests are fulfilled the condition volume size is bigger than the image size.

However, this order of the volume creation results in unexpected situation.
The first volume creation with the image is 20GiB.
The second volume creation with the image is 20GiB while the requested size is 1GiB.

The reason why this behavior is the current cinder creates a image cache volume entry based on the created volume while the first request.
 
~~~
    758     def _create_from_image_cache_or_download(self, context, volume,
    759                                              image_location, image_id,
    760                                              image_meta, image_service,
    761                                              update_cache=False):
:
    859             if should_create_cache_entry:
    860                 # Update the newly created volume db entry before we clone it
    861                 # for the image-volume creation.
    862                 if model_update:
    863                     volume.update(model_update)
    864                     volume.save()
    865                 self.manager._create_image_cache_volume_entry(internal_context,
    866                                                               volume,
    867                                                               image_id,
    868                                                               image_meta)
~~~
   1389     def _create_image_cache_volume_entry(self, ctx, volume_ref,
   1390                                          image_id, image_meta):
   1391         """Create a new image-volume and cache entry for it.
   1392 
   1393         This assumes that the image has already been downloaded and stored
   1394         in the volume described by the volume_ref.
   1395         """
   1396         cache_entry = self.image_volume_cache.get_entry(ctx,
   1397                                                         volume_ref,
   1398                                                         image_id,
   1399                                                         image_meta)
   1400         if cache_entry:
   1401             LOG.debug('Cache entry already exists with image ID %'
   1402                       '(image_id)s',
   1403                       {'image_id': image_id})
   1404             return
   1405 
   1406         image_volume = None
   1407         try:
   1408             if not self.image_volume_cache.ensure_space(ctx, volume_ref):
   1409                 LOG.warning('Unable to ensure space for image-volume in'
   1410                             ' cache. Will skip creating entry for image'
   1411                             ' %(image)s on %(service)s.',
   1412                             {'image': image_id,
   1413                              'service': volume_ref.service_topic_queue})
   1414                 return
   1415 
   1416             image_volume = self._clone_image_volume(ctx,
   1417                                                     volume_ref,
   1418                                                     image_meta)
   1419             if not image_volume:
   1420                 LOG.warning('Unable to clone image_volume for image '
   1421                             '%(image_id)s will not create cache entry.',
   1422                             {'image_id': image_id})
   1423                 return
   1424             self.image_volume_cache.create_cache_entry(
   1425                 ctx,
   1426                 image_volume,
   1427                 image_id,
   1428                 image_meta
   1429             )
   1430         except exception.CinderException as e:
   1431             LOG.warning('Failed to create new image-volume cache entry.'
   1432                         ' Error: %(exception)s', {'exception': e})
   1433             if image_volume:
   1434                 self.delete_volume(ctx, image_volume)
~~~

Due to that image cache volume has 20GiB based on the first request.

In the second request, the volume creation request will search the image cache based on the image-id.
Then clone the volume without requested volume size check.

As a result, the second volume creation resulted in the unexpected volume size, 20GiB, not requested size, 1GiB.

Version-Release number of selected component (if applicable):
RHOSP13

How reproducible:
Every time when volume creation with smaller size than image volume cache.

Steps to Reproduce:
1. Configure image cache volume enabled for the backend in cinder.
2. Upload RHEL image to glance
3. Create a 20GiB volume with the image.
4. Create a smaller volume than Step3 with the image.
5. Check the actual size from Storage side or instance side.

Actual results:
The second or following volume requests which has smaller size than image cache result in the same size of the image cache.

Expected results:
The smaller size volume request than image cache has the requested volume size in the original request.

Additional info:
To fix this issue, I think we need to modify the code in the following functions.

~~~
--- volume/manager.py
   1549     def _create_image_cache_volume_entry(self, ctx, volume_ref,
   1550                                          image_id, image_meta) -> None:
   1551         """Create a new image-volume and cache entry for it.
   1552 
   1553         This assumes that the image has already been downloaded and stored
   1554         in the volume described by the volume_ref.
   1555         """
   1556         assert self.image_volume_cache is not None
   1557         cache_entry = self.image_volume_cache.get_entry(ctx,
   1558                                                         volume_ref,
   1559                                                         image_id,
   1560                                                         image_meta)
   1561         if cache_entry:
   1562             LOG.debug('Cache entry already exists with image ID %'
   1563                       '(image_id)s',
   1564                       {'image_id': image_id})
   1565             return
   1566 
   1567         image_volume = None
   1568         try:
   1569             if not self.image_volume_cache.ensure_space(ctx, volume_ref):
   1570                 LOG.warning('Unable to ensure space for image-volume in'
   1571                             ' cache. Will skip creating entry for image'
   1572                             ' %(image)s on %(service)s.',
   1573                             {'image': image_id,
   1574                              'service': volume_ref.service_topic_queue})
   1575                 return
   1576 
   1577             image_volume = self._clone_image_volume(ctx,
   1578                                                     volume_ref,
   1579                                                     image_meta) <--- This should be changed to volume creation with image size.
~~~

-- image/cache.py
~~~     64     def get_entry(self,
     65                   context: context.RequestContext,
     66                   volume_ref: objects.Volume,
     67                   image_id: str,
     68                   image_meta: dict) -> Optional[dict]:
     69         cache_entry = self.db.image_volume_cache_get_and_update_last_used(
     70             context,
     71             image_id,
     72             **self._get_query_filters(volume_ref)
     73         )
     74 
     75         if cache_entry:
     76             LOG.debug('Found image-volume cache entry: %(entry)s.',
     77                       {'entry': self._entry_to_str(cache_entry)})
     78 
     79             if self._should_update_entry(cache_entry, image_meta): <-- need to add the if statement for checking requested volume size and image volume cache's size.
     80                 LOG.debug('Image-volume cache entry is out-dated, evicting: '
     81                           '%(entry)s.',
     82                           {'entry': self._entry_to_str(cache_entry)})
     83                 self._delete_image_volume(context, cache_entry)
     84                 cache_entry = None
     85 
     86         if cache_entry:
     87             self._notify_cache_hit(context, cache_entry['image_id'],
     88                                    cache_entry['host'])
     89         else:
     90             self._notify_cache_miss(context, image_id,
     91                                     volume_ref['host'])
     92         return cache_entry
     93 
~~~

The same issue will happen with the upstream.

Comment 1 Alan Bishop 2022-06-24 16:39:52 UTC
The cinder squad discussed this at length, and there's no clear/simple resolution that doesn't include its own side effects, and the side effects could in turn be considered another bug. For example, if initial volume cache entries are created with using the image size, then that means many volumes created from the cache will need to be extended, and that can be an expensive operation that effectively reduces the value of caching feature.

A full solution, without adverse side effects, will require a fair amount of design work, and might potentially require additional configuration options that will affect backporting the change. For that reason, we're targeting the work for a future release, and will assess backporting the changes at a later date.

Comment 3 Keigo Noha 2022-06-27 00:42:46 UTC
Hi Alan,

Thank you for your comment! I understand creating a image volume cache with the image size may cause another issue, resizing the volume every time when the user requested larger image than the image size.

On the other hand, the current implementation causes unexpected storage capacity shortage. Cinder recorded the volume which is requested smaller size than image size as requested. It will cause the inconsistency issue between cinder DB and storage usage.

As temporary solution, can we fix the current behavior as below?
If the request volume size is smaller than the image cache's volume size, cinder doesn't use the clone volume with the image cache then create a volume with the original image. Also, we should add log.INFO to notify this behavior to operators.

How do you think about it?

Best Regards,
Keigo Noha

Comment 4 Alan Bishop 2022-06-27 13:22:08 UTC
The cinder squad did discuss the possibility of bypassing the cache when the new requested volume size is smaller than the one in the cache. That would satisfy some use cases, but potentially cause problems with other use cases. You're trading one set of problems for another. In this case, you're asking cinder to use the cache for some volume requests, but not use the cache for other requests, and there will be different behavior that depends on the size of the volume that happened to seed the cache.

Have you considered uploading the same image (so there are two image IDs), and using one of them for the smaller volumes? That way there would be two cache entries, and users would get an appropriately sized volume by specifying one or the other image?

A less satisfactory workaround would be to eject the large volume from the cache, and seed the cache with a smaller volume. But bear in mind this will lead to performance issues, where all large volumes created from the small cache entry will need to be extended.

Comment 5 Keigo Noha 2022-06-28 07:59:31 UTC
Hi Alan,

Thank you for your comment!! Today, we had a meeting with the customer. The main problem of this issue is the inconsistency between cinder and storage backend.
To align the data in cinder to storage backend, the only way to fix it is modifying cinder DB.
Is it allowed to update cinder DB for aligning cinder DB to storage backend?

Best Regards,
Keigo Noha

Comment 6 Alan Bishop 2022-06-28 18:29:50 UTC
If there's a DB consistency issue somewhere then that needs to be investigated. That was not apparent to me when I first read the BZ comments.

As a side note, this highlights why it's very important for BZ descriptions to focus on just explaining the problem, and not delve into proposed solutions. I fear we got distracted by discussing possible solutions, before we had a full understanding of the problem.

With this in mind, please restate the problem, and provide the usual data that Engineering requires to analyze a BZ (log files, what is the storage backend, etc.)

There's no need to add a needinfo, as the entire squad sees the BZ comments.

Comment 7 Keigo Noha 2022-06-29 00:29:12 UTC
Hi Alan,

Thank you for your comment. It looks that my previous makes you confused. I'm sorry for that.
What I meant the inconsistency between cinder and storage backend in my previous comment is the following.

As I wrote in the description. the volume creation which is smaller than image cache volume results in the same size of the image volume cache.
That means, where the volume creation request is done with 1 GiB volume and the image cache volume is 20GiB. 
Cinder shows the volume size as 1GiB. But the storage backend shows that the corresponding volume consumes 20GiB.
That is the inconsistency between cinder and storage backend which I wrote in my previous comment.

As you wrote, we can't fix the issue in the case description. The customer needs to think fixing the inconsistency between cinder and the storage backend by themselves.
This inconsistency affects to volume quota. So, the customer wants to fix this gap between cinder and storage backend.
Technically, shrinking the volume on storage backend will corrupt the filesystem in the volume. So, there is the only choice that they need to align the size data in cinder DB to the usage in storage backend.

If my comment is not clear to you or something need to add, please let me know.

Comment 14 Keigo Noha 2022-08-23 00:38:24 UTC
Hi Alan,

We discussed with the customer about how to improve cinder for avoiding the issue.
We understand the current behavior but we think that the current implementation has a room to improve.

In the discussion, we had a rough idea to avoid this issue.

- Introduce a new tag for setting the minimum volume size, e.g. min_vol_size.
If the tag is set in image, cinder reads it and creates the volume cache image along with the size specified with the tag.
Also, the tag can be used for size checker whether the requested size fulfills the minimum volume size in the environment.
Without the tag, doing the same of the current implementation.

I think this change doesn't have the current usage and has a benefit to users who want to enforce their users to use volumes which has larger than the specified volume size.

How do you think about this idea?

Best Regards,
Keigo Noha

Comment 15 Alan Bishop 2022-08-23 17:18:54 UTC
Any design changes like that would require a full review by the entire cinder community. This likely entails writing a spec, and the need to add a microversion. You and the customer are certainly welcome to pursue this further, but it's not something that would be backportable.

Comment 31 Red Hat Bugzilla 2024-05-05 04:25:05 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days