Bug 1465772

Summary: Using Netapp storage array cinder CapacityFilter is filtering out hosts with freespace available
Product: Red Hat OpenStack Reporter: Nilesh <nchandek>
Component: openstack-cinderAssignee: Gorka Eguileor <geguileo>
Status: CLOSED NOTABUG QA Contact: Avi Avraham <aavraham>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 8.0 (Liberty)CC: asoni, cschwede, eharney, geguileo, jmelvin, mfuruta, scohen, srevivo, tshefi
Target Milestone: ---Keywords: Reopened, Triaged, ZStream
Target Release: 10.0 (Newton)   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-11-06 15:52:44 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Comment 3 Gorka Eguileor 2017-06-28 11:26:05 UTC
This looks like a problem in the volume types, because they are not specifying that they want to create thin volumes, so only two 150GB volumes can be created if there are only 400GB available using thick capacity computations.

Since scheduler don't broadcast capacity changes among them and requests are send in a round-robin fashion you'll be able to create only 6 volumes at a time (3 schedulers x 2 volumes/scheduler).

Once those 6 volumes have been created the backend (which looks like it's creating all volumes as thin) will report current capacity and the schedulers will be once again able to create volumes.

The solution would be to change the volume types to specify that volumes in those backends are thin volumes.

Comment 4 Jeremy 2017-06-28 16:21:24 UTC
Latest update from customer:

Where to set thin on volume type?

[root@lplospiuctlb1 ~]# df -h
Filesystem                                                                         Size  Used Avail Use% Mounted on
...
phx-e2-nas01-svm-cloud3-lif38.phx.aexp.com:/PE1NAS01_Ctrl5_BS_Cloud3_TESTCinder1   400G   92G  309G  23% /var/lib/cinder/mnt/553e9a28932ba400aca9c5a4cd2140b3
phx-e2-nas01-svm-cloud3-lif40.phx.aexp.com:/PE1NAS01_Ctrl6_NBS_Cloud3_TESTCinder3  400G   14G  387G   4% /var/lib/cinder/mnt/a53b406dcdea021e8f8216840349b42b
phx-e2-nas02-svm-cloud3-lif40.phx.aexp.com:/PE1NAS02_Ctrl6_BS_Cloud3_TESTCinder2   400G   72G  329G  18% /var/lib/cinder/mnt/520cbcf2bf2cab121f340bcacec0b927
phx-e2-nas02-svm-cloud3-lif38.phx.aexp.com:/PE1NAS02_Ctrl5_NBS_Cloud3_TESTCinder4  400G  2.8G  398G   1% /var/lib/cinder/mnt/5c303addd01c874c9a903f5f8c458e04

[root@lplospiuctlb1 ~]# df | grep cinder| awk '{print gensub(".*/","","",$1)" "$NF}'|while read name mount ; do ls -luh $mount/volume-*| sed -e 's/G//'| awk -v volume=$name '{sum=sum+$5;count=count+1} END{print volume" "sum" "count}';done
PE1NAS01_Ctrl5_BS_Cloud3_TESTCinder1 17103 84
PE1NAS01_Ctrl6_NBS_Cloud3_TESTCinder3 2200 22
PE1NAS02_Ctrl6_BS_Cloud3_TESTCinder2 8390 26
ls: cannot access /var/lib/cinder/mnt/5c303addd01c874c9a903f5f8c458e04/volume-*: No such file or directory
PE1NAS02_Ctrl5_NBS_Cloud3_TESTCinder4  

[root@lplospiuctlb1 ~]# du -sh --apparent-size --exclude *snapshot* /var/lib/cinder/mnt/553e9a28932ba400aca9c5a4cd2140b3
17T	/var/lib/cinder/mnt/553e9a28932ba400aca9c5a4cd2140b3
[root@lplospiuctlb1 ~]# 


[stack@lplospiudirb0 ~]$ cinder type-list
+--------------------------------------+--------------------+-------------+-----------+
|                  ID                  |        Name        | Description | Is_Public |
+--------------------------------------+--------------------+-------------+-----------+
| 3dafb34a-b968-45a1-9694-abec160bb45e | netapp-nobackup-c1 |      -      |    True   |
| 6171b3e0-6610-428e-b0b5-a76b8e9ad9fc | netapp-nobackup-c2 |      -      |    True   |
| ac9ed13a-15f3-4b70-b241-a2faa55afc40 |  netapp-backup-c2  |      -      |    True   |
| c870ac26-73a7-461b-a233-627e554232fa |  netapp-backup-c1  |      -      |    True   |
+--------------------------------------+--------------------+-------------+-----------+
[stack@lplospiudirb0 ~]$ cinder type-show c870ac26-73a7-461b-a233-627e554232fa
+---------------------------------+------------------------------------------+
|             Property            |                  Value                   |
+---------------------------------+------------------------------------------+
|           description           |                   None                   |
|           extra_specs           | {u'volume_backend_name': u'cluster1-bs'} |
|                id               |   c870ac26-73a7-461b-a233-627e554232fa   |
|            is_public            |                   True                   |
|               name              |             netapp-backup-c1             |
| os-volume-type-access:is_public |                   True                   |
+---------------------------------+------------------------------------------+


Here's the respective cinder get-pools details - not sure why it shows 20 for max_over_subscription_ratio when cinder.conf has 10?  But even 20x400GB=8TB and we're at 17TB and still getting new volumes created - shouldn't we not allow any new creates?
[root@lplospiuctlb1 ~]# grep max_oversubscription_ratio /etc/cinder/cinder.conf
# Note that this option is deprecated in favor of "max_oversubscription_ratio"
max_oversubscription_ratio=10
max_oversubscription_ratio=10
max_oversubscription_ratio=10
max_oversubscription_ratio=10
[root@lplospiuctlb1 ~]# openstack-config --get /etc/cinder/cinder.conf cluster1-bs max_oversubscription_ratio
10


+-----------------------------+--------------------------------------------------------------------------------------------------------+
|           Property          |                                                 Value                                                  |
+-----------------------------+--------------------------------------------------------------------------------------------------------+
|         QoS_support         |                                                  True                                                  |
|    allocated_capacity_gb    |                                                 17103                                                  |
|        driver_version       |                                                 1.0.0                                                  |
|       free_capacity_gb      |                                                 309.03                                                 |
| max_over_subscription_ratio |                                                  20.0                                                  |
|             name            | hostgroup@cluster1-bs#phx-e2-nas01-svm-cloud3-lif38.phx.aexp.com:/PE1NAS01_Ctrl5_BS_Cloud3_TESTCinder1 |
|      netapp_compression     |                                                  true                                                  |
|         netapp_dedup        |                                                  true                                                  |
|       netapp_disk_type      |                                                  SAS                                                   |
|       netapp_mirrored       |                                                 false                                                  |
|     netapp_nocompression    |                                                 false                                                  |
|        netapp_nodedup       |                                                 false                                                  |
|       netapp_raid_type      |                                                raid_dp                                                 |
|   netapp_thick_provisioned  |                                                 false                                                  |
|   netapp_thin_provisioned   |                                                  true                                                  |
|      netapp_unmirrored      |                                                  true                                                  |
|          pool_name          |            phx-e2-nas01-svm-cloud3-lif38.phx.aexp.com:/PE1NAS01_Ctrl5_BS_Cloud3_TESTCinder1            |
|   provisioned_capacity_gb   |                                                 90.97                                                  |
|     reserved_percentage     |                                                   40                                                   |
|       storage_protocol      |                                                  nfs                                                   |
|  thick_provisioning_support |                                                 False                                                  |
|  thin_provisioning_support  |                                                  True                                                  |
|          timestamp          |                                       2017-06-28T15:33:45.611535                                       |
|      total_capacity_gb      |                                                 400.0                                                  |
|         vendor_name         |                                                 NetApp                                                 |
|     volume_backend_name     |                                              cluster1-bs                                               |
+-----------------------------+--------------------------------------------------------------------------------------------------------+

Comment 5 Angela Soni 2017-06-28 16:38:04 UTC
To set a think provision type vol, we need to specify it in extra specs when creating volume type, either via cli or horizon. To do via cli:

$ cinder type-create "demoVolumeType"
$ cinder type-key "demoVolumeType" set provisioning:type=thin

Gorka, Can you check on above question as to why they can create new vols even when max subscription ratio is succeeded?

 Here's the respective cinder get-pools details - not sure why it shows 20 for max_over_subscription_ratio when cinder.conf has 10?  But even 20x400GB=8TB and we're at 17TB and still getting new volumes created - shouldn't we not allow any new creates?why 

Thanks
Angela

Comment 6 Angela Soni 2017-06-28 17:31:44 UTC
(In reply to Gorka Eguileor from comment #3)
> This looks like a problem in the volume types, because they are not
> specifying that they want to create thin volumes, so only two 150GB volumes
> can be created if there are only 400GB available using thick capacity
> computations.
> 
> Since scheduler don't broadcast capacity changes among them and requests are
> send in a round-robin fashion you'll be able to create only 6 volumes at a
> time (3 schedulers x 2 volumes/scheduler).
> 
> Once those 6 volumes have been created the backend (which looks like it's
> creating all volumes as thin) will report current capacity and the
> schedulers will be once again able to create volumes.
> 
> The solution would be to change the volume types to specify that volumes in
> those backends are thin volumes.

Gorka,

Setting the volume type to thin did not solve their problem. Here is what they reported:
-------------------------------

Perhaps a bit worse now or unchanged?

[stack@lplospiudirb0 ~]$ cinder type-list
+--------------------------------------+--------------------+-------------+-----------+
|                  ID                  |        Name        | Description | Is_Public |
+--------------------------------------+--------------------+-------------+-----------+
| 3dafb34a-b968-45a1-9694-abec160bb45e | netapp-nobackup-c1 |      -      |    True   |
| 6171b3e0-6610-428e-b0b5-a76b8e9ad9fc | netapp-nobackup-c2 |      -      |    True   |
| ac9ed13a-15f3-4b70-b241-a2faa55afc40 |  netapp-backup-c2  |      -      |    True   |
| c870ac26-73a7-461b-a233-627e554232fa |  netapp-backup-c1  |      -      |    True   |
+--------------------------------------+--------------------+-------------+-----------+
[stack@lplospiudirb0 ~]$ cinder type-show netapp-backup-c1
+---------------------------------+-------------------------------------------------------------------------+
|             Property            |                                  Value                                  |
+---------------------------------+-------------------------------------------------------------------------+
|           description           |                                   None                                  |
|           extra_specs           | {u'provisioning:type': u'thin', u'volume_backend_name': u'cluster1-bs'} |
|                id               |                   c870ac26-73a7-461b-a233-627e554232fa                  |
|            is_public            |                                   True                                  |
|               name              |                             netapp-backup-c1                            |
| os-volume-type-access:is_public |                                   True                                  |
+---------------------------------+-------------------------------------------------------------------------+
[stack@lplospiudirb0 ~]$ 


 1026  cinder create --image-id 3917ed45-0669-4907-9078-6d7dc97785a0 --volume-type netapp-backup-c1 --name pattest-150-1 150
 1027  cinder create --image-id 3917ed45-0669-4907-9078-6d7dc97785a0 --volume-type netapp-backup-c1 --name pattest-150-2 150
 1028  cinder create --image-id 3917ed45-0669-4907-9078-6d7dc97785a0 --volume-type netapp-backup-c1 --name pattest-150-3 150
 1029  cinder create --image-id 3917ed45-0669-4907-9078-6d7dc97785a0 --volume-type netapp-backup-c1 --name pattest-150-4 150
 1030  cinder create --image-id 3917ed45-0669-4907-9078-6d7dc97785a0 --volume-type netapp-backup-c1 --name pattest-150-5 150
 1031  cinder create --image-id 3917ed45-0669-4907-9078-6d7dc97785a0 --volume-type netapp-backup-c1 --name pattest-150-6 150
 1032  cinder create --image-id 3917ed45-0669-4907-9078-6d7dc97785a0 --volume-type netapp-backup-c1 --name pattest-150-7 150
 1033  cinder create --image-id 3917ed45-0669-4907-9078-6d7dc97785a0 --volume-type netapp-backup-c1 --name pattest-150-8 150
 1034  cinder create --image-id 3917ed45-0669-4907-9078-6d7dc97785a0 --volume-type netapp-backup-c1 --name pattest-150-9 150
 1035  cinder create --image-id 3917ed45-0669-4907-9078-6d7dc97785a0 --volume-type netapp-backup-c1 --name pattest-150-10 150
... 1-2 min
 1038  cinder create --image-id 3917ed45-0669-4907-9078-6d7dc97785a0 --volume-type netapp-backup-c1 --name pattest-150-11 150

[stack@lplospiudirb0 ~]$ cinder list --all | grep pat
| 04482732-7e54-4bfe-814a-9e4400fb7b75 | a842e6ff8f084acba7813eb05cef4392 | available |        -         |          pattest-150-3           | 150  |  netapp-backup-c1  |   true   |    False    |                                      |
| 09bb41af-eb3e-4101-b99a-a37e31be8778 | a842e6ff8f084acba7813eb05cef4392 | available |        -         |          pattest-150-11          | 150  |  netapp-backup-c1  |   true   |    False    |                                      |
| 18f9011b-af30-4246-9118-7ebbd429090e | a842e6ff8f084acba7813eb05cef4392 |   error   |        -         |          pattest-150-5           | 150  |  netapp-backup-c1  |  false   |    False    |                                      |
| 30d93f19-d8a0-49a5-92c9-f4345b839645 | a842e6ff8f084acba7813eb05cef4392 | available |        -         |          pattest-150-2           | 150  |  netapp-backup-c1  |   true   |    False    |                                      |
| 43c8cd29-f0d6-4277-bac7-7511a6888f29 | a842e6ff8f084acba7813eb05cef4392 |   error   |        -         |          pattest-150-8           | 150  |  netapp-backup-c1  |  false   |    False    |                                      |
| 64a35a73-b349-4249-bb90-4a7c853c8c8f | a842e6ff8f084acba7813eb05cef4392 |   error   |        -         |          pattest-150-4           | 150  |  netapp-backup-c1  |  false   |    False    |                                      |
| 78785bd1-ea87-4cb9-a8d0-2e4751f39e41 | a842e6ff8f084acba7813eb05cef4392 |   error   |        -         |          pattest-150-10          | 150  |  netapp-backup-c1  |  false   |    False    |                                      |
| 9178ac26-41a5-4fd0-9509-5ad4627f4006 | a842e6ff8f084acba7813eb05cef4392 | available |        -         |          pattest-150-1           | 150  |  netapp-backup-c1  |   true   |    False    |                                      |
| b77fe46d-aef5-4dab-bb0f-3f8bc63926d6 | a842e6ff8f084acba7813eb05cef4392 |   error   |        -         |          pattest-150-6           | 150  |  netapp-backup-c1  |  false   |    False    |                                      |
| d9e4d07d-a0e5-4331-a6ef-6569360b8934 | a842e6ff8f084acba7813eb05cef4392 |   error   |        -         |          pattest-150-7           | 150  |  netapp-backup-c1  |  false   |    False    |                                      |
| ee5d5345-5eb2-4e24-9b20-cd9d6db4eaed | a842e6ff8f084acba7813eb05cef4392 |   error   |        -         |          pattest-150-9           | 150  |  netapp-backup-c1  |  false   |    False    |                                      |

Comment 7 Gorka Eguileor 2017-06-28 17:38:56 UTC
In OSP10 thin provisioning is not configured like that, for information on over subscription I recommend the documentation [1].

In this case we would just have to do:

$ cinder type-create "demoVolumeType"
$ cinder type-key "demoVolumeType" set capabilities:thin_provisioning_support="<is> True"

As for why it is reporting a max over subscription value of 20 instead of 10 I would say that it's probably incorrectly configured in the [DEFAULT] section instead of in the driver specific secion.


[1] https://docs.openstack.org/admin-guide/blockstorage-over-subscription.html

Comment 8 Angela Soni 2017-06-28 17:42:19 UTC
(In reply to Gorka Eguileor from comment #7)
> In OSP10 thin provisioning is not configured like that, for information on
> over subscription I recommend the documentation [1].
> 
> In this case we would just have to do:
> 
> $ cinder type-create "demoVolumeType"
> $ cinder type-key "demoVolumeType" set
> capabilities:thin_provisioning_support="<is> True"
> 
> As for why it is reporting a max over subscription value of 20 instead of 10
> I would say that it's probably incorrectly configured in the [DEFAULT]
> section instead of in the driver specific secion.
> 
> 
> [1]
> https://docs.openstack.org/admin-guide/blockstorage-over-subscription.html

Customer is running RHOSP 8 and I checked following doc to set thin provisioning type: https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/8/html/configuration_reference_guide/ch_configuring-openstack-block-storage

Regardless, they are still seeing issue after setting it to thin.

Comment 10 Gorka Eguileor 2017-06-28 19:58:07 UTC
The confusion between OSP version was caused by the version reported in the BZ, will change it now, but it's not relevant, since I've reviewed the capability calculations for this case and it turns out that the failures are caused by the conservative nature of the scheduler's capacity filter calculations.

Let's take lplospiuctlb1's scheduler as an example, before we start creating any volumes we have on cluster1-bs the following:

 provisioned_capacity_gb: 86.23
 free_capacity_gb: 313.77
 total_capacity_gb: 400.0
 reserved_percentage: 40
 max_over_subscription_ratio: 20.0

Then the scheduler receives a request to allocate volume 312abaa8-f703-411b-a391-a24985b6018b and it fits in because:
 provisioned_ratio = (provisioned_capacity_gb + volume_size) / total_capacity_gb = 0.590575
 reserved = total_capacity_gb * reserved_percentage = 160
 free = free_capacity_gb - reserved = 153.77
 free_virtual = free * max_over_subscription_ratio = 3075.4

Since provisioned_ratio <= max_over_subscription_ratio and free_virtual >= 150GB we want for the volume it succeeds.

It is at this point that we do the estimation of the volume consumption on the scheduler in a conservative fashion:

 provisioned_capacity_gb += 150
 free_capacity_gb -= 150

So that means that when we try to create the next two volumes 794fd4aa-4b45-4225-bbd2-5ed960292cda and f25736d0-a061-4904-97f9-26fabee086ec we will have:
 free = 153.77 - 150 = 3.77
 free_virtual = 3.77 * 20 = 75.4

And since 75.4 is < 150 the volume creation will fail.

And it is not until we receive an update from the volume service with the current values at 15:41:45.517, which contain the same free space as we did at the beginning, that the next volume creation (0befcf09-b901-4bf3-ac4e-05e39ee0d921) can succeed.

Comment 15 Jeremy 2017-06-30 16:05:32 UTC
From customer:

Makes sense and we're extending these shares to 1TB and in Production they are 10TB so I don't think we'll hit this.  Feel free to close.

Comment 16 Jeremy 2017-06-30 16:05:42 UTC
From customer:

Makes sense and we're extending these shares to 1TB and in Production they are 10TB so I don't think we'll hit this.  Feel free to close.

Comment 19 Gorka Eguileor 2017-07-03 09:24:29 UTC
This behavior is caused by Cinder's conservative approach to calculating free capacity after a volume is created, and is auto regulated with the feedback from the volume service.

If a less conservative calculation is desired one could always create its own host manager, filter, and driver, to do the calculations as desired.

I have created a simple python package [1] that will do precisely this, free space will be decreased proportionally to the configured max over provisioning and can serve to illustrate my point.

Installation is pretty straightforward:

 # pip install alt_cinder-sch

Then we’ll have to configure Cinder’s schedulers to use the package:

 scheduler_host_manager = alt_cinder_sch.host_managers.HostManagerThin
 scheduler_default_filters = AvailabilityZoneFilter,AltCapacityFilter,CapabilitiesFilter
 scheduler_driver = alt_cinder_sch.scheduler_drivers.FilterScheduler

Comment 20 Gorka Eguileor 2017-07-03 09:25:18 UTC
Forgot the link:  https://pypi.python.org/pypi/alt_cinder_sch