Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1597570

Summary: Possible issue with calculation of allocated_capacity_gb | -4
Product: Red Hat OpenStack Reporter: Tzach Shefi <tshefi>
Component: openstack-cinderAssignee: Eric Harney <eharney>
Status: CLOSED CURRENTRELEASE QA Contact: Avi Avraham <aavraham>
Severity: medium Docs Contact: Kim Nylander <knylande>
Priority: medium    
Version: 13.0 (Queens)CC: abishop, eharney, srevivo
Target Milestone: ---Keywords: TestOnly, Triaged, ZStream
Target Release: 13.0 (Queens)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: openstack-cinder-12.0.4-2.el7ost Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-12-03 11:46:27 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Logs none

Description Tzach Shefi 2018-07-03 08:55:24 UTC
Description of problem: FYI unsure what was done on system to reach this state, I just noticed this and it screams like a bug, when running 
cinder get-pools --detail
+-----------------------------+------------------------------+
| Property                    | Value                        |
+-----------------------------+------------------------------+
| QoS_support                 | False                        |
| allocated_capacity_gb       | -4                           |

Allocated_capacity is -4? 
How could this be a minus value, calculation logic gone wrong?  

Version-Release number of selected component (if applicable):
rhel7.5
openstack-cinder-12.0.1-0.20180418194614.c476898.el7ost.noarch


How reproducible:
Unsure, as I found system in this state.
Not sure it could be reproduced, maybe a fluke case.  

Steps to Reproduce:
1. System was configured with xtreamIO 
2. Apologize I can't supply clues as to how this happened. 
3. I just happened to login this system and see this state. 


Actual results:
allocated_capacity_gb return a negative value. 

Expected results:
IMHO allocated_capacity can only be => 0 nothing negative.  

Additional info:
Adding logs in hopes that something will show up to explain how this happened.

Comment 1 Tzach Shefi 2018-07-03 09:44:43 UTC
Created attachment 1456160 [details]
Logs

Adding cinder conf and logs.

Again I apologize don't know when what happened so can't pinpoint to any specific date/time in log.

Comment 3 Tzach Shefi 2018-07-03 10:23:41 UTC
To reduce log (haystack:)
grep -r allocated_capacity_gb /var/log/containers/cinder   

Found the below to be the the first appearance of -4 
Before this allocated_capacity_gb would report values of =>0  which is expected and fine. From this point onward I see -4.  

/var/log/containers/cinder/cinder-scheduler.log.1:2018-06-24 12:29:37.060 1 DEBUG cinder.scheduler.host_manager [req-91a6f8b4-73fe-4fdf-a25b-d7dcdf00435a 73e77d860d6b4259b572e5308a090ad6 b00f80b47dd244ffb06a2cedfa24078c - default default] Received volume service update from controller-0@xtremio: {u'filter_function': None, u'goodness_function': None, u'multiattach': False, u'thick_provisioning_support': False, u'provisioned_capacity_gb': 374254, u'allocated_capacity_gb': -4, u'volume_backend_name': u'xtremio', u'thin_provisioning_support': True, u'free_capacity_gb': 259920.0, u'driver_version': u'1.0.10', u'total_capacity_gb': 7801, u'reserved_percentage': 0, u'QoS_support': False, u'max_over_subscription_ratio': 80.0, u'vendor_name': u'Dell EMC', u'consistencygroup_support': True, u'storage_protocol': u'iSCSI'} update_service_capabilities /usr/lib/python2.7/site-packages/cinder/scheduler/host_manager.py:544

This in turn leads via -> 2018-06-24
to /var/log/containers/cinder/cinder-volume.log.2
grep -r 2018-06-24 /var/log/containers/cinder/cinder-volume.log* | grep 12:29

But I got lost here nothing screams or not sure what to look at/for. 

api-logs were already purged as they only go back till 
2018-06-30 on cinder-api.log.14

Comment 4 Eric Harney 2018-07-05 18:50:37 UTC
Cinder has some code that does:

try:
    self.stats['pools'][pool]['allocated_capacity_gb'] -= size
except KeyError:
    self.stats['pools'][pool] = dict(
        allocated_capacity_gb=-size)

This looks wrong to me.

Comment 5 Eric Harney 2018-07-05 19:05:33 UTC
The code I found above was already fixed by Gorka in stable/queens.

https://review.openstack.org/#/c/550532/

This fix is already in the build you tested.

Comment 6 Alan Bishop 2018-07-18 11:47:34 UTC
Tentatively targeting for 13z.

Comment 7 Lon Hohberger 2018-11-14 11:42:51 UTC
According to our records, this should be resolved by openstack-cinder-12.0.4-2.el7ost.  This build is available now.

Comment 8 Tzach Shefi 2018-11-14 15:38:21 UTC
Eric, 
Any idea how I might go about trigger this issue, can we deduce anything from the code, as to how this might be tripped?

I didn't initially induce this situation, only spotted it out.  
 
On my current system
openstack-cinder-12.0.4-2.el7ost.noarch

(overcloud) [stack@undercloud-0 ~]$ cinder get-pools --detail                                                                                                                                                                                
+-----------------------------+----------------------------------------------------+                                                                                                                                                         
| Property                    | Value                                              |                                                                                                                                                         
+-----------------------------+----------------------------------------------------+
| QoS_support                 | False                                              |
| allocated_capacity_gb       | 4       

No negative number looking good. 
Yet I wish I had initial reproduce steps. 
Else can't do more than this to verify.

Comment 9 Tzach Shefi 2018-12-02 10:52:13 UTC
Based on above comment plus code review ("-" was removed), this is fixed. 
I've booted up a few deployments none of them showed a negative value. 
OK to verify.