Bug 1597570
| Summary: | Possible issue with calculation of allocated_capacity_gb | -4 | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat OpenStack | Reporter: | Tzach Shefi <tshefi> | ||||
| Component: | openstack-cinder | Assignee: | Eric Harney <eharney> | ||||
| Status: | CLOSED CURRENTRELEASE | QA Contact: | Avi Avraham <aavraham> | ||||
| Severity: | medium | Docs Contact: | Kim Nylander <knylande> | ||||
| Priority: | medium | ||||||
| Version: | 13.0 (Queens) | CC: | abishop, eharney, srevivo | ||||
| Target Milestone: | --- | Keywords: | TestOnly, Triaged, ZStream | ||||
| Target Release: | 13.0 (Queens) | ||||||
| Hardware: | Unspecified | ||||||
| OS: | Unspecified | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | openstack-cinder-12.0.4-2.el7ost | Doc Type: | If docs needed, set a value | ||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2018-12-03 11:46:27 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Attachments: |
|
||||||
|
Description
Tzach Shefi
2018-07-03 08:55:24 UTC
Created attachment 1456160 [details]
Logs
Adding cinder conf and logs.
Again I apologize don't know when what happened so can't pinpoint to any specific date/time in log.
To reduce log (haystack:)
grep -r allocated_capacity_gb /var/log/containers/cinder
Found the below to be the the first appearance of -4
Before this allocated_capacity_gb would report values of =>0 which is expected and fine. From this point onward I see -4.
/var/log/containers/cinder/cinder-scheduler.log.1:2018-06-24 12:29:37.060 1 DEBUG cinder.scheduler.host_manager [req-91a6f8b4-73fe-4fdf-a25b-d7dcdf00435a 73e77d860d6b4259b572e5308a090ad6 b00f80b47dd244ffb06a2cedfa24078c - default default] Received volume service update from controller-0@xtremio: {u'filter_function': None, u'goodness_function': None, u'multiattach': False, u'thick_provisioning_support': False, u'provisioned_capacity_gb': 374254, u'allocated_capacity_gb': -4, u'volume_backend_name': u'xtremio', u'thin_provisioning_support': True, u'free_capacity_gb': 259920.0, u'driver_version': u'1.0.10', u'total_capacity_gb': 7801, u'reserved_percentage': 0, u'QoS_support': False, u'max_over_subscription_ratio': 80.0, u'vendor_name': u'Dell EMC', u'consistencygroup_support': True, u'storage_protocol': u'iSCSI'} update_service_capabilities /usr/lib/python2.7/site-packages/cinder/scheduler/host_manager.py:544
This in turn leads via -> 2018-06-24
to /var/log/containers/cinder/cinder-volume.log.2
grep -r 2018-06-24 /var/log/containers/cinder/cinder-volume.log* | grep 12:29
But I got lost here nothing screams or not sure what to look at/for.
api-logs were already purged as they only go back till
2018-06-30 on cinder-api.log.14
Cinder has some code that does:
try:
self.stats['pools'][pool]['allocated_capacity_gb'] -= size
except KeyError:
self.stats['pools'][pool] = dict(
allocated_capacity_gb=-size)
This looks wrong to me.
The code I found above was already fixed by Gorka in stable/queens. https://review.openstack.org/#/c/550532/ This fix is already in the build you tested. Tentatively targeting for 13z. According to our records, this should be resolved by openstack-cinder-12.0.4-2.el7ost. This build is available now. Eric, Any idea how I might go about trigger this issue, can we deduce anything from the code, as to how this might be tripped? I didn't initially induce this situation, only spotted it out. On my current system openstack-cinder-12.0.4-2.el7ost.noarch (overcloud) [stack@undercloud-0 ~]$ cinder get-pools --detail +-----------------------------+----------------------------------------------------+ | Property | Value | +-----------------------------+----------------------------------------------------+ | QoS_support | False | | allocated_capacity_gb | 4 No negative number looking good. Yet I wish I had initial reproduce steps. Else can't do more than this to verify. Based on above comment plus code review ("-" was removed), this is fixed.
I've booted up a few deployments none of them showed a negative value.
OK to verify.
|