Bug 1489507

Summary: Simultaneous service catalog request do not honour quotas
Product: Red Hat CloudForms Management Engine Reporter: Satoe Imaishi <simaishi>
Component: AutomateAssignee: Tina Fitzgerald <tfitzger>
Status: CLOSED ERRATA QA Contact: Anurag <ansinha>
Severity: high Docs Contact:
Priority: medium    
Version: 5.7.0CC: akarol, alsilva, ansinha, cpelland, dajohnso, gmccullo, greartes, jhardy, jsisul, mfeifer, mkanoor, obarenbo, rmanes, rspagnol, sacpatil, simaishi, tfitzger, vparekh, wfitzger
Target Milestone: GAKeywords: ZStream
Target Release: 5.8.4   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: quota:service
Fixed In Version: 5.8.4.4 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1456819 Environment:
Last Closed: 2018-06-25 14:17:08 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: CFME Core Target Upstream Version:
Embargoed:
Bug Depends On: 1456819, 1497912    
Bug Blocks:    

Comment 2 CFME Bot 2017-09-07 18:16:17 UTC
New commit detected on ManageIQ/manageiq/fine:
https://github.com/ManageIQ/manageiq/commit/9e33806d17e8fcb631741d19fecc37d6a8c94094

commit 9e33806d17e8fcb631741d19fecc37d6a8c94094
Author:     Brandon Dunne <brandondunne>
AuthorDate: Mon Aug 21 16:04:12 2017 -0400
Commit:     Satoe Imaishi <simaishi>
CommitDate: Thu Sep 7 10:57:47 2017 -0400

    Merge pull request #15466 from tinaafitz/new_quota_requested
    
    Quota - Calculate quota values for active provisions.
    (cherry picked from commit 6d5cb119535f0ffe1b425893f831d633a5587030)
    
    https://bugzilla.redhat.com/show_bug.cgi?id=1489507

 app/models/mixins/miq_provision_quota_mixin.rb     | 157 ++++++++++++-----
 app/models/service_template_provision_request.rb   |   1 +
 spec/models/miq_provision_request_spec.rb          | 159 ++++++++++++++++-
 ...ervice_template_provision_request_quota_spec.rb | 191 +++++++++++++++++++++
 4 files changed, 464 insertions(+), 44 deletions(-)
 create mode 100644 spec/models/service_template_provision_request_quota_spec.rb

Comment 3 Sachin 2017-10-03 06:23:18 UTC
Until https://bugzilla.redhat.com/show_bug.cgi?id=1497912 is fixed, you can't test this BZ

Comment 4 Vatsal Parekh 2017-10-03 08:31:45 UTC
As mentioned in Comment 3, can't test it.

Comment 6 Tina Fitzgerald 2017-10-05 19:52:08 UTC
Hi Sachin/Vatsal,

There are 2 separate issue here. 

The instructions above specify:
Created a service catalog with vm_name, instance_type & number_of_vms as fields. 

There is an open issue with using Service dialog values to override quota related items such the number_of_vms and instance type.  

This fix can be tested by using specifying the number_of_vms and instance_type in the service item, rather than specifying it in the Service dialog.

These changes support the quota calculations for active provisions, but there is an Automate method change required as well.

I'll be happy to work with whoever will be testing this fix.

Thanks,
Tina

Comment 7 Sachin 2017-10-06 07:33:10 UTC
(In reply to Tina Fitzgerald from comment #6)
> Hi Sachin/Vatsal,
> 
> There are 2 separate issue here. 
> 
> The instructions above specify:
> Created a service catalog with vm_name, instance_type & number_of_vms as
> fields. 
> 
> There is an open issue with using Service dialog values to override quota
> related items such the number_of_vms and instance type.  
> 
> This fix can be tested by using specifying the number_of_vms and
> instance_type in the service item, rather than specifying it in the Service
> dialog.
> 
> These changes support the quota calculations for active provisions, but
> there is an Automate method change required as well.
> 
> I'll be happy to work with whoever will be testing this fix.
> 
> Thanks,
> Tina

I agree with Tina. Simultaneous request issue should be tested when,
- Provisioning VMs using lifecycle(BZ#1465087)
- Provisioning VMs using Service dialog/Catalog(BZ#1497912)

@Vatsal: Let me know if you need help from Tina or me for testing those cases. Of course a lot more need to be covered during testing like,
- Maximum parallel requests from,
  1. same browser but private windows with same user
  2. Same machine but different browsers with same user
  3. Different machine with same user 
  4. Multiple users belonging to same group &/or tenant(across providers) covering 1, 2 and 3.
- Storage quotas(attached disk)
- Quotas for flavors(instance_types)
- Memory
- CPUs
- and retirement days.

This will be hard to test manually. You need to come up with test cases.

Comment 8 Vatsal Parekh 2017-10-06 08:56:09 UTC
(In reply to Tina Fitzgerald from comment #6)
> Hi Sachin/Vatsal,
> 
> There are 2 separate issue here. 
> 
> The instructions above specify:
> Created a service catalog with vm_name, instance_type & number_of_vms as
> fields. 
> 
> There is an open issue with using Service dialog values to override quota
> related items such the number_of_vms and instance type.  
> 
> This fix can be tested by using specifying the number_of_vms and
> instance_type in the service item, rather than specifying it in the Service
> dialog.
> 
> These changes support the quota calculations for active provisions, but
> there is an Automate method change required as well.
> 
> I'll be happy to work with whoever will be testing this fix.
> 
> Thanks,
> Tina

Tina,
If this fix was for using number_of_vms via service items, it seems to be working. But doing it via dialog still seems broken.

Comment 10 William Fitzgerald 2017-11-16 21:49:44 UTC
Vatsal,

  The dialog override for number_of_vms is being tracked in a separate ticket.

Satoe, 

Can we backport these 3 PR to resolve this issue? 

https://github.com/ManageIQ/manageiq-automation_engine/pull/69
https://github.com/ManageIQ/manageiq-content/pull/191 - This needs to be applied first.
https://github.com/ManageIQ/manageiq-content/pull/196

https://github.com/ManageIQ/manageiq/pull/16170 - This PR is part of the solution but  has already been backported to fine.


Let me know if you have any questions.

Thanks

Billy

Comment 11 CFME Bot 2017-11-20 15:23:25 UTC
New commit detected on ManageIQ/manageiq-content/fine:
https://github.com/ManageIQ/manageiq-content/commit/137152446e08d45ccaaf326715e95cae169603f8

commit 137152446e08d45ccaaf326715e95cae169603f8
Author:     Madhu Kanoor <mkanoor>
AuthorDate: Thu Oct 5 09:59:18 2017 -0400
Commit:     Satoe Imaishi <simaishi>
CommitDate: Mon Nov 20 10:23:20 2017 -0500

    Merge pull request #191 from billfitzgerald0120/used_active_provisions
    
    Refactor used quota method and test.
    (cherry picked from commit 5cc73ae823ff9ec8ec3bf2c4253cb8f416a25fc8)
    
    https://bugzilla.redhat.com/show_bug.cgi?id=1489507

 .../QuotaMethods.class/__methods__/used.rb         | 51 ++++++++++++++---
 .../QuotaMethods.class/__methods__/used_spec.rb    | 65 ++++++++++++++++++++++
 2 files changed, 107 insertions(+), 9 deletions(-)
 create mode 100644 spec/content/automate/ManageIQ/System/CommonMethods/QuotaMethods.class/__methods__/used_spec.rb

Comment 12 CFME Bot 2017-11-20 15:34:00 UTC
New commit detected on ManageIQ/manageiq-content/fine:
https://github.com/ManageIQ/manageiq-content/commit/0db92735f28c731ff45652ec8ddc1164cb243165

commit 0db92735f28c731ff45652ec8ddc1164cb243165
Author:     Madhu Kanoor <mkanoor>
AuthorDate: Fri Oct 13 10:05:54 2017 -0400
Commit:     Satoe Imaishi <simaishi>
CommitDate: Mon Nov 20 10:28:09 2017 -0500

    Merge pull request #196 from billfitzgerald0120/used_new_active_provisions
    
    Added active provisions to quota count.
    (cherry picked from commit a38ebc9efffa62117cf717ca89ce168501736c53)
    
    https://bugzilla.redhat.com/show_bug.cgi?id=1489507

 .../QuotaMethods.class/__methods__/used.rb         | 29 +++++++++++++++++++-
 .../unit/method_validation/calculate_used_spec.rb  | 26 ------------------
 .../QuotaMethods.class/__methods__/used_spec.rb    | 32 +++++++++++++++++-----
 3 files changed, 53 insertions(+), 34 deletions(-)
 delete mode 100644 spec/automation/unit/method_validation/calculate_used_spec.rb

Comment 14 William Fitzgerald 2017-12-15 16:27:55 UTC
Vatsal,

I checked your appliance and logs. Your provision requests are 6 seconds apart.
It appears that the first request didn't add the 10 requested vm's to the active provision request fast enough to stop/block the second one.  Log lines below ...



[----] I, [2017-12-15T04:33:19.645508 #2142:12450f8]  INFO -- : Q-task_id([service_template_provision_request_7]) <AEMethod validate_quota> Item: vms Used: (0) Requested: (10) Max: (10) Warn: (10)


[----] I, [2017-12-15T04:33:25.778621 #2135:1245f6c]  INFO -- : Q-task_id([service_template_provision_request_8]) <AEMethod validate_quota> Item: vms Used: (0) Requested: (10) Max: (10) Warn: (10)


I need to look at this further ...

Thanks

Billy

Comment 15 William Fitzgerald 2017-12-15 18:16:24 UTC
Vatsal,

Can you try the same test but wait 1 minute before you start the second provision?

Please let me know. 

Thanks

Billy

Comment 16 Vatsal Parekh 2017-12-18 06:01:38 UTC
(In reply to William Fitzgerald from comment #15)
> Vatsal,
> 
> Can you try the same test but wait 1 minute before you start the second
> provision?
> 
> Please let me know. 
> 
> Thanks
> 
> Billy

I tried, it seems to work when requests are 1 minute+ apart.

Comment 17 Tina Fitzgerald 2017-12-18 20:36:26 UTC
Hi Vatsal,

Our current Quota implementation does a best effort to calculate active provisions. 

The active provision code looks for MiqQueue entries with:

1. class_name: MiqProvisionRequest or ServiceTemplateProvisionRequest
   method_name: create_request_tasks
   state: dequeue
or
2. class_name: MiqAeEngine
   method_name: deliver
   tracking_label includes: "_provision_"

It is possible that when 2 requests are submitted at exactly the same time, one of the requests could have an MiqQueue entry that falls outside the above criteria.  As you saw when you staggered the requests, the active provisions were successfully calculated. 

We have plans for enhancing quota in the next major release, but the current MiqQueue calculations have this very small window where the request would not be included as you have described. 

The main issue reported here was no active provisions were included in quota at all. Since the code has been changed to calculate active provisions, can we validate this ticket?

Let me know if you have any questions.

Thanks,
Tina

Comment 18 Vatsal Parekh 2017-12-19 12:36:13 UTC
(In reply to Tina Fitzgerald from comment #17)
> Hi Vatsal,
> 
> Our current Quota implementation does a best effort to calculate active
> provisions. 
> 
> The active provision code looks for MiqQueue entries with:
> 
> 1. class_name: MiqProvisionRequest or ServiceTemplateProvisionRequest
>    method_name: create_request_tasks
>    state: dequeue
> or
> 2. class_name: MiqAeEngine
>    method_name: deliver
>    tracking_label includes: "_provision_"
> 
> It is possible that when 2 requests are submitted at exactly the same time,
> one of the requests could have an MiqQueue entry that falls outside the
> above criteria.  As you saw when you staggered the requests, the active
> provisions were successfully calculated. 
> 
> We have plans for enhancing quota in the next major release, but the current
> MiqQueue calculations have this very small window where the request would
> not be included as you have described. 
> 
> The main issue reported here was no active provisions were included in quota
> at all. Since the code has been changed to calculate active provisions, can
> we validate this ticket?
> 
> Let me know if you have any questions.
> 
> Thanks,
> Tina

Hey Tina,
According to the BZ reported here, the issue is not reported for requests being 1 minute apart, it says specifically for simultaneous requests from 2 different user at the same time.
And for that scenario, I still find quota failing.

Comment 21 Tina Fitzgerald 2018-01-16 21:53:43 UTC
Update - I changed the active provision method to calculate counts based on active MiqRequests instead of specific MiqQueue entries.
Created a WIP PR and are still testing changes.

Comment 22 CFME Bot 2018-01-19 21:17:36 UTC
New commit detected on ManageIQ/manageiq/master:
https://github.com/ManageIQ/manageiq/commit/34a83cfa8d554c08cb811113a06d4bc06847d9f2

commit 34a83cfa8d554c08cb811113a06d4bc06847d9f2
Author:     Tina Fitzgerald <tfitzger>
AuthorDate: Tue Jan 16 13:30:37 2018 -0500
Commit:     Tina Fitzgerald <tfitzger>
CommitDate: Thu Jan 18 11:53:17 2018 -0500

    Change the way active_provisions are calculated by using MiqRequests
    instead of MiqQueue entries.
    
    Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1489507

 app/models/mixins/miq_provision_quota_mixin.rb     | 26 ++-----
 spec/models/miq_provision_request_spec.rb          | 82 ++++++----------------
 ...ervice_template_provision_request_quota_spec.rb | 55 +++------------
 spec/support/service_template_helper.rb            |  3 +
 4 files changed, 41 insertions(+), 125 deletions(-)

Comment 24 CFME Bot 2018-01-23 21:21:42 UTC
New commit detected on ManageIQ/manageiq/fine:
https://github.com/ManageIQ/manageiq/commit/1371379942d7500723842ebbea64312892bfe30d

commit 1371379942d7500723842ebbea64312892bfe30d
Author:     Tina Fitzgerald <tfitzger>
AuthorDate: Tue Jan 16 13:30:37 2018 -0500
Commit:     Tina Fitzgerald <tfitzger>
CommitDate: Fri Jan 19 17:46:19 2018 -0500

    Resolve conflicts from
    Cherry-pick 34a83cfa8d554c08cb811113a06d4bc06847d9f2
    
    Fixes https://bugzilla.redhat.com/show_bug.cgi?id=1489507

 app/models/mixins/miq_provision_quota_mixin.rb     | 27 ++-----
 spec/models/miq_provision_request_spec.rb          | 82 ++++++----------------
 ...ervice_template_provision_request_quota_spec.rb | 55 +++------------
 spec/support/service_template_helper.rb            |  3 +
 4 files changed, 41 insertions(+), 126 deletions(-)

Comment 26 Tina Fitzgerald 2018-01-31 13:44:41 UTC
Hi Vatsal,

In the case of requests 6 and 7, I see the active count, counting each other, but I don't see the 1 used.


Request 6

It says 0 used:
[----] I, [2018-01-31T00:49:10.540519 #2115:57ffd3c]  INFO -- : Q-task_id([service_template_provision_request_6]) <AEMethod used> Quota Used: {:cpu=>0, :memory=>0, :vms=>0, :storage=>0, :provisioned_storage=>0}
[----] I, [2018-01-31T00:49:10.544002 #2115:57ffd3c]  INFO -- : Q-task_id([service_template_provision_request_6]) <AEMethod used> Quota source type: group

Plus 1 active(It's counting request 7)
[----] I, [2018-01-31T00:49:10.588145 #2115:57ffd3c]  INFO -- : Q-task_id([service_template_provision_request_6]) <AEMethod used> Quota active_provisions_by_group: {:cpu=>1, :memory=>1073741824, :vms=>1, :storage=>201326592, :provisioned_storage=>0}

Equals total used of 1
[----] I, [2018-01-31T00:49:10.591723 #2115:57ffd3c]  INFO -- : Q-task_id([service_template_provision_request_6]) <AEMethod used> Quota Totals: {:cpu=>1, :memory=>1073741824, :vms=>1, :storage=>201326592, :provisioned_storage=>0}


Request 7

It says 0 used:
[----] I, [2018-01-31T00:49:18.796192 #2115:54488ec]  INFO -- : Q-task_id([service_template_provision_request_7]) <AEMethod used> Quota Used: {:cpu=>0, :memory=>0, :vms=>0, :storage=>0, :provisioned_storage=>0}
[----] I, [2018-01-31T00:49:18.800805 #2115:54488ec]  INFO -- : Q-task_id([service_template_provision_request_7]) <AEMethod used> Quota source type: group

Plus 1 active(It's counting request 6)
[----] I, [2018-01-31T00:49:18.850577 #2115:54488ec]  INFO -- : Q-task_id([service_template_provision_request_7]) <AEMethod used> Quota active_provisions_by_group: {:cpu=>1, :memory=>1073741824, :vms=>1, :storage=>201326592, :provisioned_storage=>0}

Equals total used of 1
[----] I, [2018-01-31T00:49:18.854801 #2115:54488ec]  INFO -- : Q-task_id([service_template_provision_request_7]) <AEMethod used> Quota Totals: {:cpu=>1, :memory=>1073741824, :vms=>1, :storage=>201326592, :provisioned_storage=>0}
 

With 0 used, and 2 active, it makes sense that they both pass quota.

Assuming the same group is used in request 8, it properly denies the request with 2 used(requests 6 and 7).

Thanks,
Tina

Comment 28 Vatsal Parekh 2018-02-01 09:46:10 UTC
A similar issue for Tenant quota: https://bugzilla.redhat.com/show_bug.cgi?id=1540888

Comment 29 Tina Fitzgerald 2018-02-01 15:59:20 UTC
Vatsal and I had a BlueJeans session this morning to discuss quota and how active provisions are calculated.

The current quota calculations for active provisions includes all active requests and each active request considers all other active requests.

The following example was taken from the test appliance Vatsal setup for quota.

The automation.log shows 2 requests, request 8 and 9, both active.  The quota calculations for request 8 would include counts from request 9, and the quota calculations for request 9 would include counts from request 8.  This is by design, the goal being to not over allocate quota as previously reported.
 
[----] I, [2018-02-01T10:17:04.034729 #1985:5c62320]  INFO -- : Q-task_id([service_template_provision_request_8]) <AEMethod used> Quota Used: {:cpu=>0, :memory=>0, :vms=>0, :storage=>0, :provisioned_storage=>0}

[----] I, [2018-02-01T10:17:04.040796 #1985:5c62320]  INFO -- : Q-task_id([service_template_provision_request_8]) <AEMethod used> Quota source type: group

[----] I, [2018-02-01T10:17:04.108723 #1985:5c62320]  INFO -- : Q-
task_id([service_template_provision_request_8]) <AEMethod used> Quota active_provisions_by_group: {:cpu=>3, :memory=>3221225472, :vms=>3, :storage=>603979776, :provisioned_storage=>0}

[----] I, [2018-02-01T10:17:04.112748 #1985:5c62320]  INFO -- : Q-task_id([service_template_provision_request_8]) <AEMethod used> Quota Totals: {:cpu=>3, :memory=>3221225472, :vms=>3, :storage=>603979776, :provisioned_storage=>0}

----] I, [2018-02-01T10:17:05.611724 #1995:c24659c]  INFO -- : Q-task_id([service_template_provision_request_9]) <AEMethod used> Quota Used: {:cpu=>0, :memory=>0, :vms=>0, :storage=>0, :provisioned_storage=>0}
[----] I, [2018-02-01T10:17:05.615621 #1995:c24659c]  INFO -- : Q-task_id([service_template_provision_request_9]) <AEMethod used> Quota source type: group
[----] I, [2018-02-01T10:17:05.660655 #1995:c24659c]  INFO -- : Q-task_id([service_template_provision_request_9]) <AEMethod used> Quota active_provisions_by_group: {:cpu=>3, :memory=>3221225472, :vms=>3, :storage=>603979776, :provisioned_storage=>0}
[----] I, [2018-02-01T10:17:05.664677 #1995:c24659c]  INFO -- : Q-task_id([service_template_provision_request_9]) <AEMethod used> Quota Totals: {:cpu=>3, :memory=>3221225472, :vms=>3, :storage=>603979776, :provisioned_storage=>0}

Comment 30 Vatsal Parekh 2018-02-02 06:59:37 UTC
So from the discussion with Tina about this BZ and active provisioning,
some points to notice here,

1. The current fix is not for the case where you pass number_of_vms(or other resources?) via dialog to the service. If group quota is 10 and you pass 8 from dialog to service, and try to provision 2 such service at the same time(or say quite near that they would count as active_provision) from 2 different users in the same group, it lets both of them pass and provisions 16 even when quota was 10.

Current fix works only when the resource values you are requesting are hard-coded in the service you made. 

2. A observation that I filed in https://bugzilla.redhat.com/show_bug.cgi?id=1540888
If there are 'n' resources available and 2 requests come, both requesting <= 'n' resources, both gets error quota exceeded. They both see the other as active, and each other's resources requested as used, so they both fail saying quota exceeded.
Tina confirmed that this is not a BZ, but a hotfix.
This would stop provisioning over the quota, but would also give false errors even when quota was available.

@Tina, this seems a half fix, should I file a new ticket specifically for passing values from dialog?

Also, I think point 2 should be documented properly before going GA.

Comment 33 Tina Fitzgerald 2018-02-02 16:06:11 UTC
Hi Vatsal,

Your observations are correct. 
Yes, please open a new ticket specifically for passing the number of VMs from the service dialog.

Based on the above discussions, can we validate this ticket?

Thanks,
Tina

Comment 34 Vatsal Parekh 2018-02-05 06:44:36 UTC
Hey Tina, 
after having some discussion with QEs about this (half) fix, I still don't feel confident to verify this ticket. The description of this ticket itself mentions about ordering service catalog with passing values from the dialog. 
So the current fix doesn't seem to solve the primary issue here. 

And provisioning services with just using values given while creating, I think that issue was fixed in this BZ https://bugzilla.redhat.com/show_bug.cgi?id=1497912 itself.

Comment 36 Tina Fitzgerald 2018-02-08 22:28:26 UTC
*** Bug 1540888 has been marked as a duplicate of this bug. ***

Comment 38 CFME Bot 2018-05-22 13:38:42 UTC
New commit detected on ManageIQ/manageiq-content/fine:

https://github.com/ManageIQ/manageiq-content/commit/f9b5e6a4904df2ed5bbe463d3a01c210c63484ea
commit f9b5e6a4904df2ed5bbe463d3a01c210c63484ea
Author:     Madhu Kanoor <mkanoor>
AuthorDate: Wed Apr 18 15:31:51 2018 -0400
Commit:     Madhu Kanoor <mkanoor>
CommitDate: Wed Apr 18 15:31:51 2018 -0400

    Merge pull request #203 from billfitzgerald0120/quota_overrides

    Calculate quota using service dialogs overrides.
    (cherry picked from commit daf5f5031dbe9b516350ae37a1d5a9f43ad1afa0)

    Fixes https://bugzilla.redhat.com/show_bug.cgi?id=1489507

 content/automate/ManageIQ/System/CommonMethods/QuotaMethods.class/__methods__/requested.rb | 59 +-
 spec/automation/unit/method_validation/calculate_requested_spec.rb | 99 +-
 2 files changed, 141 insertions(+), 17 deletions(-)

Comment 40 Anurag 2018-06-13 08:53:40 UTC
Hi Team, 

This issue is resolved and now working fine. I have verified this on version: Version 5.8.4.4.20180611155620_d2f7ab6 hence changing the status of bug from on_qa to verified. Thanks for fixing this issue.

Thanks,
-Anurag

Comment 42 errata-xmlrpc 2018-06-25 14:17:08 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:1972