+++ This bug was initially created as a clone of Bug #1553731 +++ +++ This bug was initially created as a clone of Bug #1470392 +++ After looking at the logs I have found that there are 2 problems, one that I can explain and another that I can't explain at this moment. The problem that I can explain is, in essence, a race condition between the value of the allocated_capacity_gb in the scheduler service and the volume service. The one that I can't explain is that the scheduler seem to be receiving the previous stats report, son on moment T it will be using the stats from T-1, which increases the effects of the race condition. For the purposes of this explanation, and since the volume service is not reporting a value for provisioned_capacity_gb and the reported allocated_capacity_gb value will be used by the scheduler instead, I'll be using allocated_capacity_gb to refer to them both. Let's see why the race condition happens. The volume service calculates the allocated_capacity_gb on start, and then updates it on the completion of each operation. So when a create operation is completed the value is increased by the size of the volume, and when a volume is deleted it is decreased in the same manner. This value is reported to all the schedulers periodically, by default it is every 60 seconds. The scheduler has its own copy of the allocated_capacity_gb as received from the volume service, but on volume creation it also updates its value. Now, as mentioned before the problem is a race condition, specifically a race condition where the volume has a smaller allocated_capacity_gb than the one in the scheduler, because it hasn't completed the volume creation yet, and when it sends an update to the scheduler it overwrites the value of the scheduler, setting it to 0, and only once all the volume creation operations have completed on the volume service the report from the volume will include all the space allocated for the volumes. To better illustrate this race condition I have created a flow diagram where we can see the 3 Cinder services and the NetApp storage (this situation is independent of the storage), and the volume (VOL) and scheduler (SCH) services have their current allocated_capacity_gb (agb) value. In the diagram we see 4 volumes of 100GB each being created, and assuming we should only be able to create up to 200GB we can see how the scheduler would allow us to create 400GB. +-----+ +-----+ +-----+ +--------+ | API | | SCH | | VOL | | NETAPP | +--|--+ +--|--+ +--|--+ +---|----+ create vol1 | |agb=0 |agb=0 | -------------->| create vol1 | | | |-------------->| | | create vol2 | |agb=100 | | -------------->| create vol2 | | | |-------------->| | | | |agb=200 | | | | | | | | create vol1 | | | |-------------->| | | | | | | | create vol2 | | | |-------------->| | | | | | | | report stats | | | |<--------------| | | |agb=0 | create vol1 | create vol3 | | |-------------->| -------------->| create vol3 | |agb=100 | |-------------->| | create vol2 | create vol4 | |agb=100 |-------------->| -------------->| create vol4 | |agb=200 | |-------------->| | | | |agb=200 | | | | | | | | create vol3 | | | |-------------->| | | | | | | | create vol4 | | | |-------------->| create vol1 | | | |-------------->| | | |agb=300 | | | | create vol2 | | | |-------------->| | | |agb=400 | | | report stats | | | |<--------------| | | |agb=400 | | | | | | In the logs we can observe both issues, the race condition and using what should be older stats data. I have cut some data from the log entries to increase readability. The 3 schedulers receive the stats updates from the backend: Feb 16 09:53:20 lplospiuctlg1 cinder-scheduler: 2018-02-16 09:53:20.335 29418 DEBUG cinder.scheduler.host_manager [req-33153c67-af51-4bc5-a2f4-c1262a203a5b - - - - -] Received volume service update from hostgroup@netapp1: {u'allocated_capacity_gb': 0, ... u'allocated_capacity_gb': 0, Feb 16 09:53:20 lplospiuctlg0 cinder-scheduler: 2018-02-16 09:53:20.339 12297 DEBUG cinder.scheduler.host_manager [req-33153c67-af51-4bc5-a2f4-c1262a203a5b - - - - -] Received volume service update from hostgroup@netapp1: {u'allocated_capacity_gb': 0, ... u'allocated_capacity_gb': 0, Feb 16 09:53:20 lplospiuctlg2 cinder-scheduler: 2018-02-16 09:53:20.341 13703 DEBUG cinder.scheduler.host_manager [req-33153c67-af51-4bc5-a2f4-c1262a203a5b - - - - -] Received volume service update from hostgroup@netapp1: {u'allocated_capacity_gb': 0, ... u'allocated_capacity_gb': 0, Then the creation of the volumes starts completing: Feb 16 09:53:51 lplospiuctlg2 cinder-volume: 2018-02-16 09:53:51.623 27158 INFO cinder.volume.manager [req-dcd2f426-aa8c-4ed0-a3d7-f0f101a1ad1e - - - - -] Created volume successfully. Feb 16 09:53:53 lplospiuctlg2 cinder-volume: 2018-02-16 09:53:53.250 27158 INFO cinder.volume.manager [req-a43960fb-a6d6-4fb2-b1d8-07b42301552b - - - - -] Created volume successfully. Feb 16 09:53:55 lplospiuctlg2 cinder-volume: 2018-02-16 09:53:55.246 27158 INFO cinder.volume.manager [req-6dfc3e7c-afb9-4169-9222-1194c68fe837 - - - - -] Created volume successfully. [... 15 more volumes are successfully created ...] Then on the next stats report we sill say that we haven't allocated anything: Feb 16 09:54:20 lplospiuctlg1 cinder-scheduler: 2018-02-16 09:54:20.331 29418 DEBUG cinder.scheduler.host_manager [req-a953be9d-7e8e-4d9a-bc35-27f8b258b095 - - - - -] Received volume service update from hostgroup@netapp1: {u'allocated_capacity_gb': 0, ... u'allocated_capacity_gb': 0, Feb 16 09:54:20 lplospiuctlg0 cinder-scheduler: 2018-02-16 09:54:20.334 12297 DEBUG cinder.scheduler.host_manager [req-a953be9d-7e8e-4d9a-bc35-27f8b258b095 - - - - -] Received volume service update from hostgroup@netapp1: {u'allocated_capacity_gb': 0, ... u'allocated_capacity_gb': 0, Feb 16 09:54:20 lplospiuctlg2 cinder-scheduler: 2018-02-16 09:54:20.335 13703 DEBUG cinder.scheduler.host_manager [req-a953be9d-7e8e-4d9a-bc35-27f8b258b095 - - - - -] Received volume service update from hostgroup@netapp1: {u'allocated_capacity_gb': 0, ... u'allocated_capacity_gb': 0, And then we have more volume creations Feb 16 09:54:21 lplospiuctlg2 cinder-volume: 2018-02-16 09:54:21.390 27158 INFO cinder.volume.manager [req-bc5b631b-3e29-46fa-8ce5-5191183acad8 - - - - -] Created volume successfully. Feb 16 09:54:23 lplospiuctlg2 cinder-volume: 2018-02-16 09:54:23.068 27158 INFO cinder.volume.manager [req-286db4b3-5a9f-41ec-8cf8-e3f1b92eea76 - - - - -] Created volume successfully. Feb 16 09:54:24 lplospiuctlg2 cinder-volume: 2018-02-16 09:54:24.917 27158 INFO cinder.volume.manager [req-e6d97bc1-0818-4abc-9ddb-27bb90528ac0 - - - - -] Created volume successfully. [... 19 more volumes are successfully created ...] Then we get the status update we should have received before, that includes the 18 volumes that were created earlier (18 volumes of 100GB each result in the 2 pools allocating 900GB each): Feb 16 09:55:20 lplospiuctlg1 cinder-scheduler: 2018-02-16 09:55:20.333 29418 DEBUG cinder.scheduler.host_manager [req-92398086-ff5b-4e12-b357-53829a908272 - - - - -] Received volume service update from hostgroup@netapp1: {u'allocated_capacity_gb': 900, ... u'allocated_capacity_gb': 900, Feb 16 09:55:20 lplospiuctlg0 cinder-scheduler: 2018-02-16 09:55:20.336 12297 DEBUG cinder.scheduler.host_manager [req-92398086-ff5b-4e12-b357-53829a908272 - - - - -] Received volume service update from hostgroup@netapp1: {u'allocated_capacity_gb': 900, ... u'allocated_capacity_gb': 900, Feb 16 09:55:20 lplospiuctlg2 cinder-scheduler: 2018-02-16 09:55:20.337 13703 DEBUG cinder.scheduler.host_manager [req-92398086-ff5b-4e12-b357-53829a908272 - - - - -] Received volume service update from hostgroup@netapp1: {u'allocated_capacity_gb': 900, ... u'allocated_capacity_gb': 900, And a minute later, without any volume creation completing we receive the stats that include the remaining 22 volumes (22 volumes of 100GB each result in 2200GB more between the pools): Feb 16 09:56:20 lplospiuctlg1 cinder-scheduler: 2018-02-16 09:56:20.334 29418 DEBUG cinder.scheduler.host_manager [req-01c06ec3-f347-4264-ae63-ffddb1379514 - - - - -] Received volume service update from hostgroup@netapp1: {u'allocated_capacity_gb': 2100, ... u'allocated_capacity_gb': 1900, Feb 16 09:56:20 lplospiuctlg2 cinder-scheduler: 2018-02-16 09:56:20.339 13703 DEBUG cinder.scheduler.host_manager [req-01c06ec3-f347-4264-ae63-ffddb1379514 - - - - -] Received volume service update from hostgroup@netapp1: {u'allocated_capacity_gb': 2100, ... u'allocated_capacity_gb': 1900, On that stats update Node 0 doesn't receive the stats update, which is also somehow concerning. --- Additional comment from Gorka Eguileor on 2018-02-21 12:08:02 EST --- I have a cause for the second issue that I wasn't able to explain on comment #43. The reason why the stats received on the schedulers can be out of sync with the backend and the values that the cinder-volume service should have is that the collection of stats and the publication of this information is done asynchronously. So on one hand we have a periodic task that calls `_report_driver_status` to collect the data from the backend and prepare it for sending. Once it's ready it will update the value of the attribute `last_capabilities`. On the other hand we have another periodic task called `_publish_service_capabilities` that will periodically send whatever it's in the `last_capabilities` attribute. If the publish happens before the report then we'll have the mentioned offset, which is what's happening in this case, as can be seen in the logs: Feb 16 09:54:20 lplospiuctlg2 cinder-volume: 2018-02-16 09:54:20.332 27158 DEBUG cinder.manager [req-a953be9d-7e8e-4d9a-bc35-27f8b258b095 - - - - -] Notifying Schedulers of capabilities ... _publish_service_capabilities Feb 16 09:54:20 lplospiuctlg2 cinder-volume: 2018-02-16 09:54:20.335 27158 DEBUG oslo_service.periodic_task [req-a953be9d-7e8e-4d9a-bc35-27f8b258b095 - - - - -] Running periodic task VolumeManager._report_driver_status --- Additional comment from Gorka Eguileor on 2018-02-21 13:04:50 EST --- Created upstream bug [1] and proposed a patch [2] to fix the old stats issue. [1]: https://bugs.launchpad.net/cinder/+bug/1750878 [2]: https://review.openstack.org/546717 --- Additional comment from Gorka Eguileor on 2018-02-22 08:42:23 EST --- Created upstream bug [1] and proposed a patch [2] to minimize the window where the race condition between the scheduler and the volume service can happen. [1]: https://bugs.launchpad.net/cinder/+bug/1751057 [2]: https://review.openstack.org/546983
Verified installed RPM package: [root@controller-0 cinder]# rpm -q openstack-cinder openstack-cinder-8.1.1-20.el7ost.noarch According to Polarion test cases
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:2136