Bug 1145132 - Domain validation fails when adding size due to previously removed size
Summary: Domain validation fails when adding size due to previously removed size
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: OpenShift Online
Classification: Red Hat
Component: Pod
Version: 2.x
Hardware: x86_64
OS: Linux
unspecified
medium
Target Milestone: ---
: ---
Assignee: Abhishek Gupta
QA Contact: libra bugs
URL:
Whiteboard:
Depends On:
Blocks: 1144057
TreeView+ depends on / blocked
 
Reported: 2014-09-22 12:49 UTC by Luke Meyer
Modified: 2018-12-09 18:37 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of: 1144057
Environment:
Last Closed: 2015-02-18 16:51:07 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Luke Meyer 2014-09-22 12:49:17 UTC
+++ This bug was initially created as a clone of Bug #1144057 +++

Description of problem:
When adding a gear size to a user, the gear size is first added to the CloudUser, then further validity checks are made before adding the gear size to the user's Domain. If the validity checks before adding to the domain fail, the user will have the new gear size in their 'rhc account' output, but will be unable to create the gear size because it is missing from the domain.

This is confusing to end users. The validity checks should be made before any changes to the database or made, or before each database change.

Steps to Reproduce:
1. Start with an OpenShift environment with at least two gear sizes defined with VALID_GEAR_SIZES in /etc/openshift/broker.conf
2. Create an application on the node with one of the gear sizes
3. Edit broker.conf to remove the existing gear size used for step 2 and add a new gear size to VALID_GEAR_SIZES
4. Restart the openshift-broker service
5. Attempt to add the new gear size to the user:
     $ oo-admin-ctl-user -l login --addgearsize new_gear_size

Actual results:
oo-admin-ctl-user output:
-=~~~~~~~~~=-
Adding gear size <new_gear_size> for user <login>...
Problem:
  Validation of Domain failed.
Summary:
  The following errors were found: Allowed gear sizes The following gear sizes are invalid: <removed_gear_size>
Resolution:
  Try persisting the document with valid data or remove the validations.
-=~~~~~~~~~=-

rhc account output shows the new gear size available

Expected results:
oo-admin-ctl-user fails and rhc account output does not show the new gear size, since it was not added to the domain.

--- Additional comment from Luke Meyer on 2014-09-22 08:45:00 EDT ---

If you remove a valid gear size in broker.conf after it's already in use (users, domains, apps created) then it causes validation problems later, specifically when you try to modify an existing domain having that gear size ("Validation of Domain failed."). There could be other ramifications too, but this is the one reported.

Nothing goes through after a change like this to broker.conf to "true up" the existing records. Frankly, it would be unclear what we should do with apps/nodes/districts/regions that now are unrecognized by the broker. Technically, the administrator should disable all of the nodes in the profile, destroy all of the apps (or perhaps move them to a different profile), remove any districts/regions for the profile, and only then remove the profile from broker.conf.

None of this would fix the existing users and domains. A decent workaround would be to use oo-admin-ctl-user with a list of all users to remove the gear size from them prior to removing it from broker.conf. Since there is no way I know of to *get* a list of all OpenShift users (aside from MongoDB query) it would be really nice to have a "--all" flag of some kind on oo-admin-ctl-user for situations like this.

This would be a good addition to the docs, both the current state of affairs and any improvements that come.

It might also be good if domain and user capability validation silently removed invalid gear sizes rather than choking on them while doing something unrelated.

Comment 1 Abhishek Gupta 2014-10-01 20:39:34 UTC
We are adding checks and repair logic to oo-admin-chk and oo-admin-repair scripts.
https://github.com/openshift/origin-server/pull/5849

Comment 2 openshift-github-bot 2014-10-01 21:26:02 UTC
Commit pushed to master at https://github.com/openshift/origin-server

https://github.com/openshift/origin-server/commit/63dfd54e020d41ef12728cb9b5d954660a5a0b9c
Bug 1145132 - Domain validation fails when adding size due to previously removed size

Comment 3 Abhishek Gupta 2014-10-01 21:50:40 UTC
We have not made any changes in the domain validations and are not removing invalid gear sizes from users/domains silently. The approach taken is to just leave the cleanup of the bad data (invalid gear sizes) to admins.

Comment 4 Luke Meyer 2014-10-02 13:20:27 UTC
Just to clarify what's changing here... it sounds like you've added something to allow the admin to fixup existing domains with oo-admin-repair. In which case, the procedure for removing a gear size would be something like:

1. Nuke all of the nodes for that size
2. Remove the size from broker.conf
3. Run oo-admin-repair to remove now-invalid gears and districts from MongoDB
4. Run oo-admin-repair --gear-sizes to remove the invalid size from users and domains.

Is that about right? I think we can work with that, just need to document it somewhere. 

Incidentally, could oo-admin-repair be used to implement step 1 more cleanly? I.e. if a node has a now-invalid gear profile, remove all its gears and un-district it such that it can easily be repurposed for another profile? Otherwise the user has to manually remove gears and reset district.info, or reinstall.

Comment 5 Abhishek Gupta 2014-10-02 19:33:00 UTC
In addition to the PR above, there was another earlier commit that was part of this -->

https://github.com/abhgupta/origin-server/commit/972efc1ce95a06ec3c4f85c0b32bd1b3f4d007b0

Comment 6 Abhishek Gupta 2014-10-02 19:35:45 UTC
Yes, today we have means to identify unresponsive nodes and deal with the gears on them. We could make changes to allow the script to handle these situations as the trigger to reach the same state and take the same actions subsequently.

That will need to be dealt with as a separate RFE/trello card.

Comment 7 Jianwei Hou 2014-10-08 06:21:22 UTC
Verified on devenv_5219

Steps:
1. Create a small gear
2. Remove small gear size from /etc/openshift/broker-dev.conf, add 'newsize' to VALID_GEAR_SIZES. Run `rhc domain-list`, the default small gear size is already removed from domain.
3. Attempt to add the 'newsize' to the user, new gear size added successfully.
4. Run oo-admin-chk, the inconsistent user capability is detected
5. Fix it with oo-admin-repair, and run oo-admin-chk again.

Result:
After step 3:
[root@ip-10-167-171-74 ~]# oo-admin-ctl-user -l jhou --addgearsize newsize


Adding gear size newsize for user jhou... Done.

User jhou:
                            plan: free
                   plan quantity: 1
            plan expiration date: 
                consumed domains: 1
                     max domains: 1
                  consumed gears: 1
                       max gears: 3
    max tracked storage per gear: 0
  max untracked storage per gear: 0
                       max teams: 0
viewing all global teams allowed: false
            plan upgrade enabled: true
                      gear sizes: newsize
            sub accounts allowed: false
private SSL certificates allowed: false
              inherit gear sizes: false
                      HA allowed: false

After step 4:
Finished at: 2014-10-08 10:06:56 UTC
Total time: 41.021s
Some users have invalid gear sizes in their capabilities: small
FAILED
Please refer to the oo-admin-repair tool to resolve some of these inconsistencies.

After step 5:
[root@ip-10-167-171-74 ~]# oo-admin-repair --gear-sizes  -v
Started at: 2014-10-08 10:16:22 UTC
Total gears found in mongo: 1
Some users have invalid gear sizes in their capabilities: small
Removing invalid gear sizes (small) from all users...	Done.

Finished at: 2014-10-08 10:16:22 UTC
Total time: 0.112s
SUCCESS
[root@ip-10-167-171-74 ~]# oo-admin-chk 
Started at: 2014-10-08 10:16:55 UTC

User data populated in 0 seconds

Domain data populated in 0 seconds

District data populated in 0 seconds

Total gears found in mongo: 1
Application data populated in 0 seconds

Usage data populated in 0 seconds

Fetched all gears in 20 seconds
Total gears found on the nodes: 1
Total nodes that responded: 1
Checked application gears on nodes in 0 seconds

Checked application gears on nodes (reverse match) in 0 seconds


Finished at: 2014-10-08 10:17:16 UTC
Total time: 20.898s
SUCCESS


Note You need to log in before you can comment on or make changes to this bug.