Bug 1513053

Summary: Smartstate Analysis greyed out on workers not in a provider zone (webui zone)
Product: Red Hat CloudForms Management Engine Reporter: Lynn Dixon <ldixon>
Component: SmartState AnalysisAssignee: Rich Oliveri <roliveri>
Status: CLOSED CURRENTRELEASE QA Contact: Dave Johnson <dajohnso>
Severity: medium Docs Contact:
Priority: medium    
Version: 5.8.0CC: brant.evans, cpelland, gtanzill, itewksbu, jcutter, jhardy, jmarc, jwarnica, lavenel, ldixon, mfeifer, myoder, obarenbo, rmanes, roliveri, tparsons
Target Milestone: GAKeywords: PrioBumpPM, TestOnly
Target Release: 5.10.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: 5.10.0.0 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1536557 1543150 (view as bug list) Environment:
Last Closed: 2018-06-21 20:48:37 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1536557, 1543150    
Attachments:
Description Flags
VMDB Roles
none
WebUI Worker Roles
none
VMware Worker roles none

Description Lynn Dixon 2017-11-14 16:30:41 UTC
Description of problem:
In an environment with multiple workers, across zones, The button to launch a smart state is greyed out when the user is logged in to a worker thats not in the providers zone. 
Example: a user tries to request a smartstate from a "WebUI" zone with workers ONLY doing web interface.  A "provider" zone for VMware houses all workers that have the smartstate role enabled.  The smartstate button will be greyed out and not available.

I have to login to an appliance in the providers zone to request a smart state, which then works as expected. 

Version-Release number of selected component (if applicable):
5.8.2.3.20171016155816_aaec796 
Possibly other versions as well

How reproducible:
Deploy cloudforms with at least two zones, one of those zones setup for webservices only, and the other setup for provider and smartstate operations.  Login to the webui zone and try to request a smartstate.

Steps to Reproduce:
1. Deploy cloudforms, setup two zones. a WebUI zone (with workers only doing web services for UI) and a second zone for provider operations (including workers setup for smartstate in this zone)

2. Configure roles for workers in the webui zone to only have roles relevant for UI work. (User interface, Web Services, Notifier, etc). DO NOT SETUP smartstate role in this zone

3. Configure a provider zone, that includes roles for smartstate and all other relevant roles.  Also fully configure the provider to perform smartstates.

4. Login to a worker in the "WebUI" zone and navigate to a VM.  Click on "Configuration" and try to request a SmartState Analysis. Notice that the button is greyed out.

5. NOW..login to a worker that is in the provider zone, navigate to a VM and request a smartstate.  Notice that the button is no longer greyed out, and you are able to perform a smartstate without issue.

Actual results:
Smartstate button greyed out with hover text "There is no server with the smartstate role enabled" when trying to request smartstate when logged into a worker thats in a "webui" zone. Can only request smartstates when logged into a worker thats in the provider zone

Expected results:
Able to perform smartstate when logged in to workers that are designated as "webui" only workers.

Additional info:

Comment 3 Dave Johnson 2017-11-14 16:44:34 UTC
Please assess the impact of this issue and update the severity accordingly.  Please refer to https://bugzilla.redhat.com/page.cgi?id=fields.html#bug_severity for a reminder on each severity's definition.

If it's something like a tracker bug where it doesn't matter, please set the severity to Low.

Comment 4 Lynn Dixon 2017-11-14 16:56:49 UTC
Dave, 
I have set the priority on the BZ.  I tried to match the priority on the support case.  The customer is able to perform a smartstate, but they have to logout of the webui appliances, and login to an appliance in the provider zone.

Comment 9 myoder 2017-11-14 22:32:12 UTC
*** Bug 1439735 has been marked as a duplicate of this bug. ***

Comment 10 Mo Morsi 2017-11-16 19:25:59 UTC
Is this an actual bug or intended functionality? When logged into a zone that does not have smart state privileges, then smart state functionality should not be available and vice-versa. Moreso wouldn't this issue have more to do with the permission control system, and not SSA itself?

Comment 11 Lynn Dixon 2017-11-16 19:46:41 UTC
Mo,
This is a bug in my opinion.  When we deploy in large scales, we create worker appliances that ONLY serve WebUI functions, and we place those workers in their own zone.  We then create workers to service providers, typically in their own provider-specific zone. 

the problem is when we have users logging in to these "WebUI" workers to do their normal functions, they are not able to request a smart state.  Most customer want to use LDAP for their authentication, so we only configure the WebUI workers for auth, and the end users are not allowed to log in to any other workers, EXCEPT ones denoted as "WebUI" workers.  This helps us preserve user experience in the event of heavy load on the provider workers.  Since users are not typically allowed to login on other workers, they are not able to request a Smart State.

At scale, not allowing a user to request a smart state from a "webui" zone seems to be counter-intuitive. The customer I am working with customer has this exact use case (very large deployment, with WebUI zone) and their users are only allowed to log in to the webui zone.

Comment 12 Brant Evans 2017-11-16 19:55:29 UTC
I'll throw my 2 cents in....

I agree with Lynn that this is a bug. The vast majority of customer deployments are multi-zone and users are only able to interact with appliances in the a UI zone.

The ability for users to initiate SSA from the UI they are logged into is required.

CloudForms also needs to be smart enough to queue the SSA for the zone the provider of the VM is in. Which in most cases will be a different zone than where the user is logged in.

Comment 13 Mo Morsi 2017-11-16 20:02:02 UTC
Hi Lynn, Brant, I'm afraid I don't know the details of the underlying permission system to comment further, we'll need to wait for Rich's feedback. I think we're all agreeing that as it stands when logged into the zone with WebUI workers that do _not_ have access to Smart State privileges, the user is _not_ able to invoke a Smart State Analysis (I haven't verified this on recent releases but it sounds accurate)

I can see value in the access control / privilege separation, as well as value in the desire to dispatch this type of request between zones, so perhaps a 'SmartStateDispatcher' (or similar) privilege is needed, if one does not exist.

Again I'm speculating here, as others should be able to provide more context with regard to the control structure around which smart state is invoked, something I am not too familiar with

Comment 15 Ian Tewksbury 2017-11-16 21:57:47 UTC
I will just echo Lynn and Brent and agree with everything they said.

Mo, if you need to see what our standard deployment of CFME looks like in the field I would be happy to show you and demonstrate why tis is a large issue. I don't personally care if you call it a Bug or an RFE, but I do care that it gets addressed sooner rather then later.

Comment 16 Rich Oliveri 2017-11-17 03:05:15 UTC
Mo, it has nothing to do with permissions.
The request is made on an appliance that is servicing the UI.
That request should be queued, and picked up by an appliance that has the smartstate and smartproxy roles. However, that appliance also needs to be in the same zone as the VM's provider, that's the way we ensure the smartproxy will be able to access the provider and VM in question. That's by design.
The option should only be grayed out when no such appliance exists.

Comment 17 Mo Morsi 2017-11-17 13:45:31 UTC
Ian, please don't clear the needinfo flag unless you are providing the requested information. We are in the process of determining what the ultimate cause of the issue which is a prerequisite to a patch modifying the codebase being submitted (if a fix/enhancement can be deduced). I assure you this is being looked at and will be addressed in a timely manner.

Rich, Lynn perhaps a chat session via gitter/bluejeans would sort this out quicker. From what I'm understanding, everything is working as expecting as the
UI zone / workers do not have access to Smart State functionality as the workers with those permissions are in a different zone.

Lynn, would it be possible to assign workers to different zones (perhaps via a per-cloud zone configuration) such that the workers with UI privileges can dispatch queue requests to workers with Smart State Privileges?

If this is not possible, then so workaround would be needed, though I'm not sure what the feasibility of dispatching smart state requests to remote zones / appliances. Rich, do we have any such system to do that now?

Comment 18 Ian Tewksbury 2017-11-17 13:55:00 UTC
@mo,

My apologies. I thought the requested information was understanding the user impact/issue.

Blue skies,
Ian

Comment 19 Rich Oliveri 2017-11-17 15:09:25 UTC
I don't think the issue is the relationship of the UI worker appliance's zone and the smartproxy zone. I think the issue is the relationship between the smartproxy zone and the provider's zone.

I believe this should work if the smartproxy is in the same zone as the target provider - either in the same appliance or an appliance in the same zone.

Have you determined that this is indeed the case?

Comment 20 Brant Evans 2017-11-17 16:06:19 UTC
Rich, Mo,

This issue can be seen on 5.9 on the CF implementation for the enablement sessions that are going on.

This setup has a UI appliance that everyone connects to and then each provider has worker appliances in provider specific zones.

If you navigate to Compute > Infrastructure > Virtual Machines and then drill down to a VM the option to "Perform SmartState Analysis" is greyed out. The same is true for instances under Compute > Clouds > Instances.

The appliances in the zone that are being used for the UI do not have the SmartProxy or SmartState Analysis roles enabled. The appliances in the zones for the providers have both roles enabled.

I also noticed this morning that from the VM or Instance list view that when VMs or Instances are checked that the "Perform SmartState Analysis" is not greyed out.

Comment 22 Lynn Dixon 2017-11-17 16:25:09 UTC
Hi Mo,
Sure, I'd welcome an opportunity to discuss the how's/why's of this is needed.  It might be a good idea to setup a bluejeans session so that You, Brant, Ian, Jeff and I can all join together and discuss.  

Consulting is typically the most available on Friday's.  Maybe we can take some time during one of the Cloudforms Enablement sessions that is already scheduled to discuss?

Comment 23 Lynn Dixon 2017-11-17 17:43:56 UTC
I can confirm what Brant mentions in Comment #20. In this customers environment, If I login to the "webui" worker, and navigate to the VM/Instance information page for a particular VM, Smart State is greyed out.

However, If from that same Webui worker, I navigate to Compute -> Infrastructure -> Virtual Machine to see the list view of all my VM's, I am able to select the same VM (as mentioned earlier) using a check box, and the request a SSA.  

That SSA will subsequently get requested, queued, and performed in the proper zone.  I have verified in the customers smaller Dev environment.  I will verify in the larger, full-scale production environment shortly and report.

Comment 24 Mo Morsi 2017-11-17 20:51:13 UTC
This does sound like an issue, but it's not apparent what the root cause is. I'm more familiar with the Smart State internals than the surrounding permissions/worker context but just to clarify Rich's point, in theses situations are the workers on separate appliances?

The user should not be granted access to invoke the SSA through one section of the UI but denied in another. If it's a permission issue, where the appliances and zones are separate, then the user should not be able to invoke it at all. If the user should be able to run the SSA, then it's a UI issue where the relevant option should be available.

I'm available for a bluejeans session as needed (just suggest a time) though am not sure how much more insight I can offer until we have details pertaining to your appliance / zone / worker / privilege deployment. From there if I can access your environment (via both the UI and ssh) I can help debug, but most likely will need to defer to one of our UI or RBAC devs to help identify / handle the issue on that end.

Let me know about these and we can take it from there.

Comment 25 Jeff Warnica 2017-11-17 21:35:23 UTC
It may not be a "smart state" bug at all, but a lack of using (or there existing) a mechanism for that UI task to be executed via the queue.

The UI "check authentication" for providers also does the wrong thing (tries to execute from the UI).

Comment 26 Rich Oliveri 2017-11-17 22:11:49 UTC
OK, based on Comment 20:

"The appliances in the zone that are being used for the UI do not have the SmartProxy or SmartState Analysis roles enabled. The appliances in the zones for the providers have both roles enabled."

This should work, so it is indeed a bug.

The UI calls a SmartState specific method to determine if SSA can be performed on the given target. The fault is probably in that SmartState specific method.

Mo, you should probably start by reaching out to someone in the UI group to see if they can point you to the method in question. This would save a lot of time, given digging through UI code isn't that easy.

The method checks for a number of things. Among them, It should check for the existence of a smartproxy in the VM's provider's zone - The zone of the UI shouldn't come into play. For some reason, it seems that check isn't working properly.

The reason it's not grayed out in the list view, is because VM-specific checks are not performed from that page - since multiple VMs can be selected, each requiring their own check.

So Mo,

It's not a permissions issue. It seems the SSA option is grayed out because the UI thinks there isn't an SSA proxy available to service the request - even though, apparently there is. So, find out what method the UI is calling, then we can drill into it to determine why it's not working as expected.

Comment 27 Mo Morsi 2017-11-21 17:00:09 UTC
Just an update, I've been working with the UI team to understand the call chain logic around how the SSA is invoked. Theres a good chance that we won't be able to pinpoint the problem through static analysis, in which can I'll need to either access an environment where the issue occurs (ideally with root shell access to the appliance(s) to debug) or otherwise will need the details necessary to setup the same environent locally (cloud/infra providers being scanned, how they were setup and configured in CFME).

Comment 28 Lynn Dixon 2017-11-21 17:13:15 UTC
Mo, 
This is pretty easy to setup to duplicate, and requires no special configurations.  Here's how:

1. Deploy three appliances (one for VMDB, one for WebUI, and one for VMware Provider)

2. Setup a single region on the VMDB 

3. Create three zones (one for vmdb, one for vmware and one for webui) and place your worker appliances accordingly

4. On the WebUI zone/worker, only use the following roles: Notifier, Reporting, Scheduler, user interface, web services, websocket.

5. On the vmware zone/worker setup all the roles, including smartstate (dont need DB role, or RHN or Ansible here)

6. The vmdb zone/worker really doesnt need any specific roles.

Then add a vmware provider.  Can be a simple vcenter, and configure the vmware worker appliance with the vddk,  

Login to the WebUI worker and try to perform a smartstate...you'll see the option greyed out.

If you guys dont already have a lab setup for this, I can duplicate it in my personal lab, or I can ask the customer to let us do a screenshare (would be done through the support case so there is a proper chain of responsibility for their environment).

Comment 29 Mo Morsi 2017-11-22 21:15:35 UTC
OK, I've begun the process of setting up the environment to replicate the issue locally. We have vmware resources that I should be able to use locally so no worries on that end. Some specified questions are below inline.

---

(In reply to Lynn Dixon from comment #28)
> Mo, 
> This is pretty easy to setup to duplicate, and requires no special
> configurations.  Here's how:
> 
> 1. Deploy three appliances (one for VMDB, one for WebUI, and one for VMware
> Provider)
> 

Easy enough


> 2. Setup a single region on the VMDB 

Will using the default region on the VMDB appliance suffice?



> 
> 3. Create three zones (one for vmdb, one for vmware and one for webui) and
> place your worker appliances accordingly


All three zones are to be created on the VMDB appliance correct? Can the default zone be used for the VMDB zone?

How many of each particular worker did you enable / disable for each zone? 


> 
> 4. On the WebUI zone/worker, only use the following roles: Notifier,
> Reporting, Scheduler, user interface, web services, websocket.
> 
> 5. On the vmware zone/worker setup all the roles, including smartstate (dont
> need DB role, or RHN or Ansible here)
> 
> 6. The vmdb zone/worker really doesnt need any specific roles.


Do you mean disabling all zones or leaving them with their default assignments?


> 
> Then add a vmware provider.  Can be a simple vcenter, and configure the
> vmware worker appliance with the vddk,  

The provider is to be added on the vmware appliance correct?


> 
> Login to the WebUI worker and try to perform a smartstate...you'll see the
> option greyed out.

Sounds good. Once I have the full setup locally and can reproduce, I should be able to debug. As a side note, I'll be off the next couple of days (holiday in the states), so if we don't make more progress before then, we can resume after.

Enjoy!

Comment 30 Lynn Dixon 2017-11-23 06:34:00 UTC
Hi Mo! Here are my replies to your questions (trimmed out the excess text).


>> 2. Setup a single region on the VMDB 

>Will using the default region on the VMDB appliance suffice?

Yeah, I think that should be fine.


>> 3. Create three zones (one for vmdb, one for vmware and one for webui) and
>> place your worker appliances accordingly


>All three zones are to be created on the VMDB appliance correct? Can the default zone be used for the VMDB zone?

I'm not sure it matters which appliance you are logged in to create the zones?  So long as you create more than one zone in your region to re-create the problem of launching a SSA from a zone that doesnt have the role enabled.

>How many of each particular worker did you enable / disable for each zone? 

In this customers environment, there is one VMDB worker in the VMDB zone.  There are two workers in the WebUI zone (They are using a load balancer, one worker in the webui zone should be fine to replicate the issue).  In their VMware provider zone they have around 20 workers (their vcenter is very large).

>Do you mean disabling all zones or leaving them with their default assignments?

So far as I know, zones don't have an option to be disabled or not?  Nor are roles assigned to a zone?  Roles are either turned on/off on each of the workers in a particular zone.  

>The provider is to be added on the vmware appliance correct?

Yes. Once you have your three zones (vmdb, webui, and vmware) zones created, and your appliances placed into those zones with the proper roles enabled.  Then you'll want to add your vCenter provider and during this process select your "vmware" zone when adding the provider.  

I'll put some screenshots I grabbed from my lab (no customer data so no worries) in this reply showing the roles for each zone.  These are setup identical to how this customer is setup, and is exactly how the majority of consulting is deploying Cloudforms in large scale environments.

So, in summary,  deploy three CFME appliances: 
1. VMDB appliance,
2. WebUI appliance
3. VMware provider appliance.  Make sure you setup the VDDK on this appliance.

Then configure three zones in your region:
1. vmdb (with only the vmdb appliance in it)
2. webui (with only your webui appliance in it)
3. vmware (with only your vmware provider appliance in it).  

Setup the roles on each of these appliances using the screenshots in the attachments as a guide to replicate both my lab and our customers on this BZ.

Add the vmware provider, and make sure you have it placed in your vmware zone.  Also, dont forget to include the vmware hypervisors root password in the provider hosts section so SSA can function correctly, once you have the provider added and an inventory refresh has been completed.

Comment 31 Lynn Dixon 2017-11-23 06:40:59 UTC
Created attachment 1358013 [details]
VMDB Roles

Comment 32 Lynn Dixon 2017-11-23 06:41:23 UTC
Created attachment 1358014 [details]
WebUI Worker Roles

Comment 33 Lynn Dixon 2017-11-23 06:41:52 UTC
Created attachment 1358015 [details]
VMware Worker roles

Comment 35 Mo Morsi 2017-11-29 22:29:53 UTC
OK I setup the environment locally, verified the situation, and poked around the code a bit. What I discovered is the following module which corresponds to the 'Smart State' analysis button in the toolbar:

https://github.com/ManageIQ/manageiq-ui-classic/blob/master/app/helpers/application_helper/button/smart_state_scan.rb#L5

Here we see that the known servers are iterated over and their roles checked to see if they can perform the smart scan. The caveat? The check also verifies the server is in the same zone as the local server (the UI server which you are attempting to invoke the smart scan through). I verified that once the zone check is remove, the smart state option is now available.

So everything is working as "intended", the only question is why this check was added in the first place and if it can be removed (rich?)

Comment 36 Rich Oliveri 2017-11-30 16:48:59 UTC
In the past, if there were no proxy available to service the SSA request, the user could initiate the SSA but it would just timeout on the queue. This appears to be an attempt to prevent that, but the logic employed is wrong.

Instead of checking to see if the proxy is in "my_zone", it should check to see if it's in the same zone as the target VM of the request.

In my opinion, this type of logic shouldn't be in the application_helper. Instead a method should be made available from the backend for them to call.

I believe the proper check would be for the UI to call "has_active_proxy?" on the target VM (this is defined in vm_or_template.rb). Given this check is performed from the VM detail screen, I think the model object for the VM in question should be available.

One word of caution, SSA can be performed on things that aren't VMs, containers for example. So we may need to ensure that all types of SSA-able things have an "has_active_proxy" method.

Comment 37 Dávid Halász 2017-12-04 14:58:06 UTC
*** Bug 1515955 has been marked as a duplicate of this bug. ***

Comment 38 Mo Morsi 2017-12-12 03:15:08 UTC
I just submitted manageiq-ui-classic PR #3006 which I believe tackles this issue:

https://github.com/ManageIQ/manageiq-ui-classic/pull/3006

Upon application, the previously disabled smart state menu option is now enabled. This works by relying on the vm#have_active_proxy method which correctly verifies smart state can be performed. Once the PR is accepted upstream it be merged downstream for eventual deployment.

Comment 39 Ian Tewksbury 2017-12-12 04:08:45 UTC
Yippy! Thank you!

Comment 40 Oleg Barenboim 2017-12-19 14:24:46 UTC
Pull Request in https://github.com/ManageIQ/manageiq-ui-classic/pull/3006

Comment 41 Lynn Dixon 2018-01-05 15:49:15 UTC
Hi all, any chance the ManageIQ PR will get merged into a 4.5.* release?

Comment 43 Mo Morsi 2018-01-08 22:40:52 UTC
Hey Marianne, the last few days I've been having trouble getting the existing manageiq-ui-classic spec suite working. I've been attempting to run the tests on my local environment and in various appliance environments but am running into various issues, including one related to another outstanding bug filed on github. I've been slowly moving forward with the process, and will be working with some UI individuals tommorow (hopefully expiditing the process) but if this is a time critical issue, I suggest we merge the PR as is, and then worry about specing out the edge case later. Otherwise this fix will be contingent on getting the spec suite working again.

Comment 44 Mo Morsi 2018-01-17 17:11:53 UTC
Issues w/ upstream testing environment have been resolved and the PR is good to go, once that is merged it should be a fairly fast process to backport it.

Comment 45 Ian Tewksbury 2018-01-17 18:26:25 UTC
SO EXCITED!!!!

Comment 50 Rich Oliveri 2018-05-07 19:27:13 UTC
*** Bug 1513052 has been marked as a duplicate of this bug. ***