1553737 – Delay in stats reporting in cinder

Bug 1553737 - Delay in stats reporting in cinder

Summary: Delay in stats reporting in cinder

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat OpenStack
Classification:	Red Hat
Component:	openstack-cinder
Sub Component:
Version:	9.0 (Mitaka)
Hardware:	x86_64
OS:	Linux
Priority:	high
Severity:	medium
Target Milestone:	---
Target Release:	9.0 (Mitaka)
Assignee:	Gorka Eguileor
QA Contact:	Avi Avraham
Docs Contact:	Kim Nylander
URL:
Whiteboard:
Depends On:	1553731
Blocks:	1553738 1553739 1553740
TreeView+	depends on / blocked

Reported:	2018-03-09 12:37 UTC by Gorka Eguileor
Modified:	2018-07-05 12:31 UTC (History)
CC List:	14 users (show)
Fixed In Version:	openstack-cinder-8.1.1-17.el7ost
Doc Type:	Bug Fix
Doc Text:	Race conditions on the volume service when reporting stats from the storage backend means that the scheduler uses out-of-date stats. This allows them to create more volumes than they should. Remove these race conditions. Schedulers use more up-to-date stats, which bring the number of volumes allowed to be created closer to the configured limits.
Clone Of:	1553731
Clones:	1553738 (view as bug list)
Environment:
Last Closed:	2018-07-05 12:31:02 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
OpenStack gerrit	546717	None	MERGED	Fix reporting old stats	2020-05-04 09:44:43 UTC
OpenStack gerrit	546983	None	MERGED	Fix allocated_capacity_gb race on create volume	2020-05-04 09:44:43 UTC
Red Hat Product Errata	RHBA-2018:2136	None	None	None	2018-07-05 12:31:33 UTC

Description Gorka Eguileor 2018-03-09 12:37:56 UTC

+++ This bug was initially created as a clone of Bug #1553731 +++

+++ This bug was initially created as a clone of Bug #1470392 +++

After looking at the logs I have found that there are 2 problems, one that I
can explain and another that I can't explain at this moment.

The problem that I can explain is, in essence, a race condition between the
value of the allocated_capacity_gb in the scheduler service and the volume
service.

The one that I can't explain is that the scheduler seem to be receiving the
previous stats report, son on moment T it will be using the stats from T-1,
which increases the effects of the race condition.

For the purposes of this explanation, and since the volume service is not
reporting a value for provisioned_capacity_gb and the reported
allocated_capacity_gb value will be used by the scheduler instead, I'll be
using allocated_capacity_gb to refer to them both.

Let's see why the race condition happens.

The volume service calculates the allocated_capacity_gb on start, and then
updates it on the completion of each operation.  So when a create operation is
completed the value is increased by the size of the volume, and when a volume
is deleted it is decreased in the same manner.

This value is reported to all the schedulers periodically, by default it is
every 60 seconds.

The scheduler has its own copy of the allocated_capacity_gb as received from
the volume service, but on volume creation it also updates its value.

Now, as mentioned before the problem is a race condition, specifically a race
condition where the volume has a smaller allocated_capacity_gb than the one in
the scheduler, because it hasn't completed the volume creation yet, and when it
sends an update to the scheduler it overwrites the value of the scheduler,
setting it to 0, and only once all the volume creation operations have
completed on the volume service the report from the volume will include all the
space allocated for the volumes.

To better illustrate this race condition I have created a flow diagram where we
can see the 3 Cinder services and the NetApp storage (this situation is
independent of the storage), and the volume (VOL) and scheduler (SCH) services
have their current allocated_capacity_gb (agb) value.

In the diagram we see 4 volumes of 100GB each being created, and assuming we
should only be able to create up to 200GB we can see how the scheduler would
allow us to create 400GB.


             +-----+         +-----+         +-----+        +--------+
             | API |         | SCH |         | VOL |        | NETAPP |
             +--|--+         +--|--+         +--|--+        +---|----+
   create vol1  |               |agb=0          |agb=0          |
 -------------->|  create vol1  |               |               |
                |-------------->|               |               |
   create vol2  |               |agb=100        |               |
 -------------->|  create vol2  |               |               |
                |-------------->|               |               |
                |               |agb=200        |               |
                |               |               |               |
                |               |  create vol1  |               |
                |               |-------------->|               |
                |               |               |               |
                |               |  create vol2  |               |
                |               |-------------->|               |
                |               |               |               |
                |               | report stats  |               |
                |               |<--------------|               |
                |               |agb=0          |  create vol1  |
   create vol3  |               |               |-------------->|
 -------------->|  create vol3  |               |agb=100        |
                |-------------->|               |  create vol2  |
   create vol4  |               |agb=100        |-------------->|
 -------------->|  create vol4  |               |agb=200        |
                |-------------->|               |               |
                |               |agb=200        |               |
                |               |               |               |
                |               |  create vol3  |               |
                |               |-------------->|               |
                |               |               |               |
                |               |  create vol4  |               |
                |               |-------------->|  create vol1  |
                |               |               |-------------->|
                |               |               |agb=300        |
                |               |               |  create vol2  |
                |               |               |-------------->|
                |               |               |agb=400        |
                |               | report stats  |               |
                |               |<--------------|               |
                |               |agb=400        |               |
                |               |               |               |

In the logs we can observe both issues, the race condition and using what
should be older stats data.  I have cut some data from the log entries to increase readability.

The 3 schedulers receive the stats updates from the backend:

  Feb 16 09:53:20 lplospiuctlg1 cinder-scheduler: 2018-02-16 09:53:20.335 29418 DEBUG cinder.scheduler.host_manager [req-33153c67-af51-4bc5-a2f4-c1262a203a5b - - - - -] Received volume service update from hostgroup@netapp1: {u'allocated_capacity_gb': 0, ... u'allocated_capacity_gb': 0,
  Feb 16 09:53:20 lplospiuctlg0 cinder-scheduler: 2018-02-16 09:53:20.339 12297 DEBUG cinder.scheduler.host_manager [req-33153c67-af51-4bc5-a2f4-c1262a203a5b - - - - -] Received volume service update from hostgroup@netapp1: {u'allocated_capacity_gb': 0, ... u'allocated_capacity_gb': 0,
  Feb 16 09:53:20 lplospiuctlg2 cinder-scheduler: 2018-02-16 09:53:20.341 13703 DEBUG cinder.scheduler.host_manager [req-33153c67-af51-4bc5-a2f4-c1262a203a5b - - - - -] Received volume service update from hostgroup@netapp1: {u'allocated_capacity_gb': 0, ... u'allocated_capacity_gb': 0,


Then the creation of the volumes starts completing:

  Feb 16 09:53:51 lplospiuctlg2 cinder-volume: 2018-02-16 09:53:51.623 27158 INFO cinder.volume.manager [req-dcd2f426-aa8c-4ed0-a3d7-f0f101a1ad1e - - - - -] Created volume successfully.
  Feb 16 09:53:53 lplospiuctlg2 cinder-volume: 2018-02-16 09:53:53.250 27158 INFO cinder.volume.manager [req-a43960fb-a6d6-4fb2-b1d8-07b42301552b - - - - -] Created volume successfully.
  Feb 16 09:53:55 lplospiuctlg2 cinder-volume: 2018-02-16 09:53:55.246 27158 INFO cinder.volume.manager [req-6dfc3e7c-afb9-4169-9222-1194c68fe837 - - - - -] Created volume successfully.
  [... 15 more volumes are successfully created ...]


Then on the next stats report we sill say that we haven't allocated anything:

  Feb 16 09:54:20 lplospiuctlg1 cinder-scheduler: 2018-02-16 09:54:20.331 29418 DEBUG cinder.scheduler.host_manager [req-a953be9d-7e8e-4d9a-bc35-27f8b258b095 - - - - -] Received volume service update from hostgroup@netapp1: {u'allocated_capacity_gb': 0, ... u'allocated_capacity_gb': 0,
  Feb 16 09:54:20 lplospiuctlg0 cinder-scheduler: 2018-02-16 09:54:20.334 12297 DEBUG cinder.scheduler.host_manager [req-a953be9d-7e8e-4d9a-bc35-27f8b258b095 - - - - -] Received volume service update from hostgroup@netapp1: {u'allocated_capacity_gb': 0, ... u'allocated_capacity_gb': 0,
  Feb 16 09:54:20 lplospiuctlg2 cinder-scheduler: 2018-02-16 09:54:20.335 13703 DEBUG cinder.scheduler.host_manager [req-a953be9d-7e8e-4d9a-bc35-27f8b258b095 - - - - -] Received volume service update from hostgroup@netapp1: {u'allocated_capacity_gb': 0, ... u'allocated_capacity_gb': 0,


And then we have more volume creations
  Feb 16 09:54:21 lplospiuctlg2 cinder-volume: 2018-02-16 09:54:21.390 27158 INFO cinder.volume.manager [req-bc5b631b-3e29-46fa-8ce5-5191183acad8 - - - - -] Created volume successfully.
  Feb 16 09:54:23 lplospiuctlg2 cinder-volume: 2018-02-16 09:54:23.068 27158 INFO cinder.volume.manager [req-286db4b3-5a9f-41ec-8cf8-e3f1b92eea76 - - - - -] Created volume successfully.
  Feb 16 09:54:24 lplospiuctlg2 cinder-volume: 2018-02-16 09:54:24.917 27158 INFO cinder.volume.manager [req-e6d97bc1-0818-4abc-9ddb-27bb90528ac0 - - - - -] Created volume successfully.
  [... 19 more volumes are successfully created ...]


Then we get the status update we should have received before, that includes the
18 volumes that were created earlier (18 volumes of 100GB each result in the 2
pools allocating 900GB each):

  Feb 16 09:55:20 lplospiuctlg1 cinder-scheduler: 2018-02-16 09:55:20.333 29418 DEBUG cinder.scheduler.host_manager [req-92398086-ff5b-4e12-b357-53829a908272 - - - - -] Received volume service update from hostgroup@netapp1: {u'allocated_capacity_gb': 900, ... u'allocated_capacity_gb': 900,
  Feb 16 09:55:20 lplospiuctlg0 cinder-scheduler: 2018-02-16 09:55:20.336 12297 DEBUG cinder.scheduler.host_manager [req-92398086-ff5b-4e12-b357-53829a908272 - - - - -] Received volume service update from hostgroup@netapp1: {u'allocated_capacity_gb': 900, ... u'allocated_capacity_gb': 900,
  Feb 16 09:55:20 lplospiuctlg2 cinder-scheduler: 2018-02-16 09:55:20.337 13703 DEBUG cinder.scheduler.host_manager [req-92398086-ff5b-4e12-b357-53829a908272 - - - - -] Received volume service update from hostgroup@netapp1: {u'allocated_capacity_gb': 900, ... u'allocated_capacity_gb': 900,


And a minute later, without any volume creation completing we receive the stats
that include the remaining 22 volumes (22 volumes of 100GB each result in
2200GB more between the pools):

  Feb 16 09:56:20 lplospiuctlg1 cinder-scheduler: 2018-02-16 09:56:20.334 29418 DEBUG cinder.scheduler.host_manager [req-01c06ec3-f347-4264-ae63-ffddb1379514 - - - - -] Received volume service update from hostgroup@netapp1: {u'allocated_capacity_gb': 2100, ... u'allocated_capacity_gb': 1900,
  Feb 16 09:56:20 lplospiuctlg2 cinder-scheduler: 2018-02-16 09:56:20.339 13703 DEBUG cinder.scheduler.host_manager [req-01c06ec3-f347-4264-ae63-ffddb1379514 - - - - -] Received volume service update from hostgroup@netapp1: {u'allocated_capacity_gb': 2100, ... u'allocated_capacity_gb': 1900,


On that stats update Node 0 doesn't receive the stats update, which is also
somehow concerning.

--- Additional comment from Gorka Eguileor on 2018-02-21 12:08:02 EST ---

I have a cause for the second issue that I wasn't able to explain on comment #43.

The reason why the stats received on the schedulers can be out of sync with the backend and the values that the cinder-volume service should have is that the collection of stats and the publication of this information is done asynchronously.

So on one hand we have a periodic task that calls `_report_driver_status` to collect the data from the backend and prepare it for sending.  Once it's ready it will update the value of the attribute `last_capabilities`.

On the other hand we have another periodic task called `_publish_service_capabilities` that will periodically send whatever it's in the `last_capabilities` attribute.

If the publish happens before the report then we'll have the mentioned offset, which is what's happening in this case, as can be seen in the logs:

  Feb 16 09:54:20 lplospiuctlg2 cinder-volume: 2018-02-16 09:54:20.332 27158 DEBUG cinder.manager [req-a953be9d-7e8e-4d9a-bc35-27f8b258b095 - - - - -] Notifying Schedulers of capabilities ... _publish_service_capabilities
  Feb 16 09:54:20 lplospiuctlg2 cinder-volume: 2018-02-16 09:54:20.335 27158 DEBUG oslo_service.periodic_task [req-a953be9d-7e8e-4d9a-bc35-27f8b258b095 - - - - -] Running periodic task VolumeManager._report_driver_status

--- Additional comment from Gorka Eguileor on 2018-02-21 13:04:50 EST ---

Created upstream bug [1] and proposed a patch [2] to fix the old stats issue.


[1]: https://bugs.launchpad.net/cinder/+bug/1750878
[2]: https://review.openstack.org/546717

--- Additional comment from Gorka Eguileor on 2018-02-22 08:42:23 EST ---

Created upstream bug [1] and proposed a patch [2] to minimize the window where the race condition between the scheduler and the volume service can happen.


[1]: https://bugs.launchpad.net/cinder/+bug/1751057
[2]: https://review.openstack.org/546983

Comment 6 Avi Avraham 2018-06-13 14:17:48 UTC

Verified
installed RPM package:
[root@controller-0 cinder]# rpm -q openstack-cinder
openstack-cinder-8.1.1-20.el7ost.noarch

According to Polarion test cases

Comment 8 errata-xmlrpc 2018-07-05 12:31:02 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:2136

Note You need to log in before you can comment on or make changes to this bug.