Bug 1056906
| Summary: | openshift.ks/openshift.sh: RHSM/RHN is all or nothing (registration, configuring pools and channels, validation, etc.) | ||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Miciah Dashiel Butler Masters <mmasters> | ||||||||||||||||
| Component: | Installer | Assignee: | Luke Meyer <lmeyer> | ||||||||||||||||
| Status: | CLOSED ERRATA | QA Contact: | |||||||||||||||||
| Severity: | unspecified | Docs Contact: | |||||||||||||||||
| Priority: | unspecified | ||||||||||||||||||
| Version: | 2.0.0 | CC: | libra-bugs, mmasters, xiama | ||||||||||||||||
| Target Milestone: | --- | ||||||||||||||||||
| Target Release: | --- | ||||||||||||||||||
| Hardware: | Unspecified | ||||||||||||||||||
| OS: | Unspecified | ||||||||||||||||||
| Whiteboard: | |||||||||||||||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||||||||||||||
| Doc Text: | Story Points: | --- | |||||||||||||||||
| Clone Of: | Environment: | ||||||||||||||||||
| Last Closed: | 2014-05-15 14:41:08 UTC | Type: | Bug | ||||||||||||||||
| Regression: | --- | Mount Type: | --- | ||||||||||||||||
| Documentation: | --- | CRM: | |||||||||||||||||
| Verified Versions: | Category: | --- | |||||||||||||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||||||||||||
| Embargoed: | |||||||||||||||||||
| Attachments: |
|
||||||||||||||||||
|
Description
Miciah Dashiel Butler Masters
2014-01-23 06:58:13 UTC
I tested the pull request with Steps 1, 2, and 5 with subscriptions properly configured. I have not yet tested those steps without configuring the system appropriately beforehand, and I have not tested with oo-install. (In reply to Miciah Dashiel Butler Masters from comment #0) > While working on this pull request, I realised that the validate_preflight > function is specifically checking that these options are provided. I feel > bad about removing sanity checks. However, I get the impression that > validate_preflight is primarily intended with oo-install in mind as the > user. Is this the case? No, it is primarily intended for those who don't quite understand how RHSM or RHN classic work (i.e. probably 80% of eval users), to fail fast and with a clear error message when they haven't provided the necessary items. Previously, if you didn't supply a user/pass or didn't supply a pool id, it could fail in the following confusing ways: 1. Hang forever waiting for you to enter a user/pass at registration 2. Choke several minutes down the line when it tries to install something, so you are wondering why you can't install stuff when successfully registered. It's important to fail helpfully regarding subscription management since it's one of the more variable and frustrating parts of initial OpenShift deployment, and it's right up front so there's a high risk of abandonment if it doesn't work out. That said, it *is* a reasonable use case to want to handle registration yourself and have openshift.sh do the rest. And the existing process/checks haven't been re-evaluated in the light of actually having a tool to validate that everything turned out OK (yum-validator). So I'm all for this enhancement, as long as the failure modes continue to be pretty clear even to those lacking familiarity with RHN. "You're missing channel X, Y, and Z" is probably not clear enough - it would need to be closer to "you don't seem to have attached an OpenShift subscription - maybe you should run subscription-manager list --available and pick out a pool ID from there" > Alternatively, we could make validate_preflight a bit more sophisticated so > that if RHN/RHSM credentials were not provided, it would check also that the > host were not already registered before issuing a warning, and similar for > the pool id. Would this be a better approach? Yes. By the way, keep in mind that you will need the RHN user/pass in order to enable channels when using the "rhn" method, but they might not necessarily want to re-register from scratch (because they already have the host registered with the desired activation key / org / profile name / etc.). On the whole, expecting the registration to run from scratch is the simplest and most fool-proof method, but if you can enable the new use case without sacrificing the sanity of eval users, it would be great. Thanks! I updated the pull request so that it keeps the validate_preflight checks but does a little more checking when credentials and (in the case of RHSM) a pool id are not provided in order to determine whether the configuration that openshift.sh cannot do for lack of information has already been done. I couldn't find any really awesome ways to do these checks, but what I have works all right in my testing.
I'm sure there's a better way to convey the following, but here are the combinations and results (including, I hope, the most important output) from my testing:
install_method Registered Registered Channels Credentials Pool id Result
with RHN with RHSM added provided provided
-------------------------------------------------------------------------------
rhn no no no no no B A
rhn no yes no no yes B A
rhn no yes no yes yes E I
rhn yes no yes yes no E I
rhsm no no no no no C D A
rhsm no no no no yes C A
rhsm no yes no no no D A
rhsm no yes no no yes F G I
rhsm yes no no no yes C A
rhsm yes no no yes yes H G I
A='OpenShift: Aborting Installation.' (script aborts execution)
B='OpenShift: Install method rhn requires an RHN user and password.'
C='OpenShift: Install method rhsm requires an RHN user and password.'
D='OpenShift: Install method rhsm requires a poolid.'
E='OpenShift: Register to RHN Classic with username and password'
F='OpenShift: No credentials given for RHSM; assuming already configured'
G='OpenShift: Registering subscription from pool id [redacted]'
H='OpenShift: Register with RHSM'
I='OpenShift: Completed configuring OpenShift.' (script executes to completion)
Any other combinations I should try?
Does the code look OK to you?
Again, I have not tested with oo-install.
I would not worry about oo-install. It's just providing inputs to this.
Hm, question about these:
install_method Registered Registered Channels Credentials Pool id Result
with RHN with RHSM added provided provided
-------------------------------------------------------------------------------
rhn yes no yes yes no E I
rhn yes no NO yes no J? I
J="OpenShift: Already registered with RHN classic; configuring channels"
Am I correct in understanding that the first re-registers the host? Should that not work similarly to RHSM - don't re-register if already registered?
I mention the 2nd one because they might well register to RHN classic, do nothing with channels (or just enable one channel to get o-e-release), and expect openshift.sh to work without re-registering.
If "rhn" method needs to work differently regarding re-registering, I guess we can do it, it just seems like it would be best to be consistent.
My brain cannot process all this right now so I will accept your incredible level of detail as evidence you've thought the rest of this through :) Only extra thing I can think you might want to check is what happens when the existing RHN registration is broken; for example our own images on OpenStack tend to have an RHN systemid left over that gets locked out due to so many images coming up with it. I could totally imagine this happening to our customers and we should make sure when it does... openshift.sh fails helpfully.
I figure that if the user provides credentials, we should go ahead and re-register. Now that I think about it, maybe we shouldn't because that generates a new RHN key, if I understand correctly. (Do I understand correctly? Is re-registering a bad thing with RHN?)
However, the rhn and rhsm methods are consistent: in both cases, if you provide credentials, openshift.sh will re-register.
These are good test cases, so I went ahead and ran them. The first and third rows are from my earlier table, for comparison, and the second and fourth rows are new:
install_method Registered Registered Channels Credentials Pool id Result
with RHN with RHSM added provided provided
-------------------------------------------------------------------------------
rhn yes no yes yes no E I
rhn yes no NO yes no E I
rhsm no yes no no yes F G I
rhsm no yes yes yes yes H G I
The behaviour is consistent. If possibly re-registering isn't a major issue, I would like to get these changes to QE and have them merged sooner rather than later.
I have not tested the PR with broken RHN registration. However, I realised that simply checking for the existence of /etc/sysconfig/rhn/systemid would be insufficient, and so I made validate_preflight both check for that file and check the output of `rhn-channel -l` for ose-2.0 repos, which I expect would fail (or not report the required repos) if the registration is broken. Because credentials are required to add repos with RHN, it already made sense to require both checks to pass if credentials are missing. I think we're covered.
Re-registering isn't a major issue, as it's the current behavior :) I just thought you were trying to avoid having to do it. The inconsistency is that with RHSM, you can register, attach the poolid you want, and then proceed with openshift.sh and not re-register (possibly separating out the second part to a different admin with no account access). With RHN classic, you're forced to either set up all the channels yourself, or give your user+pass so openshift.sh can do it, in which case it will re-register. I suppose that's OK; I don't see a good way around it. It makes things somewhat nicer for RHSM and doesn't make the RHN method any worse (in fact, with an activation key, you can have all the channels set up automatically and still hand off the actual installation). I think the preflight rhn-channel check should suffice. My goal is to have it so that I can register the system ahead of time and then run openshift.sh and still have it enable channels, set priorities/excludes, and validate. Because we provide credentials to the rhn-channel command when we add channels, I am assuming that we need to do so, in which case there is no getting around the fact that one must provide credentials to openshift.sh if one wants it to add channels. Because we don't provide credentials to subscription-manager when we add channels using the rhsm method, we can get around that, and that is the particular use-case that I want to address: I want to register the host ahead of time so that somebody else can do the rest without needing to put in credentials. So as I understand it, the inconsistency is inherent in RHN and RHSM; the inconsistency does not arise from our use of the tools per se. I'm marking the bug ON_QA in hopes that we can get my PR merged quickly. I'm curious whether we can use rhn-channel --add on a registered system without providing credentials. Do you know the answer off-hand? If not, I'm happy to experiment. I'm also happy to continue discussing further improvements to this aspect of the installation beyond what my PR does. (In reply to Miciah Dashiel Butler Masters from comment #8) works for me. > I'm curious whether we can use rhn-channel --add on a registered system > without providing credentials. Do you know the answer off-hand? Nope, you can't. 1.register all required repos by rhsm #yum repolist rhel-6-server-cf-tools-1-rpms rhel-6-server-ose-2.0-infra-rpms rhel-6-server-ose-2.0-node-rpms rhel-6-server-ose-2.0-rhc-rpms rhel-6-server-rhev-agent-rpms rhel-6-server-rpms rhel-server-rhscl-6-rpms 2.set the env export CONF_INSTALL_METHOD=rhsm export CONF_RHN_USER=$user export CONF_RHN_PASS=$passwd 3.setup openshift # ./openshift.sh Output: <--snip--> + subscription-manager identity + [[ ! -n '' ]] + tac + grep -q '^Repo ID:\s*rhel-6-server-ose-2.0-\(infra\|node\)-rpms$' + subscription-manager repos + echo 'OpenShift: Install method rhsm requires a poolid.' OpenShift: Install method rhsm requires a poolid. + preflight_failure=1 + '[' rhsm = yum -a '!' '' ']' + '[' 1 ']' + abort_install + [[ x == x ]] + echo 'OpenShift: Aborting Installation.' OpenShift: Aborting Installation. + exit 1 (In reply to Ma xiaoqiang from comment #10) > 1.register all required repos by rhsm > 2.set the env > export CONF_INSTALL_METHOD=rhsm > export CONF_RHN_USER=$user > export CONF_RHN_PASS=$passwd > 3.setup openshift > # ./openshift.sh > + echo 'OpenShift: Install method rhsm requires a poolid.' > OpenShift: Install method rhsm requires a poolid. That seems like the correct behavior; because a user and pass were provided, it expects to re-register, but wasn't given a poolid to use. If you're already registered and subscription attached, don't supply user + pass (but still use "rhsm" method so it configures things as expected). Miciah, this is the sort of confusion we can expect :) Perhaps for the purposes of this bug, we should define for QE the scenarios and how they are expected to succeed or fail. I can't believe you didn't already number your table :) Probably best to run "openshift.sh actions=validate_preflight,configure_repos" since all that really needs testing is that the registration succeeds and channel config is validated. I agree that the behaviour that Xiaoqiang reports is correct behaviour, but I can't reproduce it. In fact, that usage of the script (registering, adding channels, and running openshift.sh with install_method=rhsm and RHN credentials provided but without a pool id provided) fails for me because it re-registers and loses the attached pool in doing so. Xiaoqiang, how did you configure the repositories before you ran openshift.sh? Did you use `subscription-manager repos --enable=...` or some other means? Does `subscription-manager repos` not list the ose-2.0 repositories that `yum repolist` listed in the output you provided? I don't understand why the `subscription-manager repos` test failed. For this combination of inputs, we can do one of two things: (a) Abort in preflight_validate if the install method is rhsm and RHN credentials are provided but the pool id is not. (b) Do not abort in preflight_validate if the install method is rhsm as long as the host is registered and already has the pool attached, but do skip registration in configure_rhsm_channels if we have no pool id. Either approach is easy to implement. Please let me know your thoughts. Meanwhile, I'll work on better automating of testing of preflight_validate and configure_repos per Luke's suggestion. (In reply to Miciah Dashiel Butler Masters from comment #12) > (a) Abort in preflight_validate if the install method is rhsm and RHN > credentials are provided but the pool id is not. IMHO this is vastly less confusing. Under RHSM you only need credentials in order to register, so if you provide them... expect to register from scratch. (Whoops, I typed "preflight_validate" in my last comment, but I meant "validate_preflight.") I updated my pull request per comment 13 so that it aborts in validate_preflight if install_method is rhsm, credentials are specified, but pool id is not specified. I also updated it so that it does not attach a pool that is already attached in configure_rhsm_channels (see below). I also tested all combinations with both the previous version of openshift.sh in the pull request and the new version in the updated pull request: rhn/rhsm, registered with RHN or not, registered with RHSM or not, channels added or not, credentials provided or not, and pool id provided or not. (Note that if the host is registered with neither RHN nor RHSM, "channels added" is meaningless, so my results have a few redundant tests.) Rather than test all 64 combinations by hand with each version of openshift.sh, I wrote a script to do it. I will attach three files to this bug report: the script, the results with the pull request prior to this new update, and the results with the updated pull request. Note the following four combinations from the older results: test # method Registered Registered Channels Credentials Pool id Result with RHN with RHSM added provided provided ------------------------------------------------------------------------------- 46 rhsm no yes yes no yes ERROR 47 rhsm no yes yes yes no ERROR 62 rhsm yes yes yes no yes ERROR 63 rhsm yes yes yes yes no ERROR "ERROR" means that the script passed validate_preflight and hit an error later on (which is bad—validate_preflight should have caught the issue that resulted in the error) on whereas "ABORT" means that the script aborted in validate_preflight (which is not necessarily good or bad but in these two cases is good), and "OK" means both validate_preflight and configure_repos completed successfully. The "ERROR" result for Tests 46 and 62 surprised me. In these tests, the pool is attached before openshift.sh runs, and because the pool id is provided, openshift.sh tries to attach the pool again. It turned out that subscription-manager returns an error if openshift.sh tells it to attach a pool that is already attached, so I added the additional check in configure_rhsm_channels that I mentioned earlier in this comment to skip attaching any pool id that is already attached. The "ERROR" result for Tests 47 and 63 occurred for the reason we discussed earlier: In these tests, the credentials are provided and the pool id is not, so openshift.sh was passing the validate_preflight checks (on the erroneous assumption that because the pool was already attached, the pool id was not needed) and then re-registering in configure_rhsm_channels, which removed all previously attached pools. With the updated pull requests, these results changed as follows: test # method Registered Registered Channels Credentials Pool id Result with RHN with RHSM added provided provided ------------------------------------------------------------------------------- 46 rhsm no yes yes no yes OK 47 rhsm no yes yes yes no ABORT 62 rhsm yes yes yes no yes OK 63 rhsm yes yes yes yes no ABORT Tests 46 and 62 are OK: We provide openshift.sh no credentials, but the host is already registered with RHSM; we provide openshift.sh with a pool id, but it checks and determines that the pool is already attached and does not try to add it again. Tests 47 and 63 abort in validate_preflight: We provide openshift.sh with credentials but no pool id, which implies that openshift.sh should re-register, but it would need to attach a pool afterwards and would be unable to do so, so it aborts. Scanning over all 64 rows, I think the results we see with the latest version of the pull request make sense. Please let me know if anything looks off to you! Created attachment 855305 [details]
Test script for RHN/RHSM configuration
Created attachment 855306 [details]
Test results with previous version of pull request
Created attachment 855307 [details]
Test results with new version of pull request
1.get the PR 277 #https://raw2.github.com/Miciah/openshift-extras/1bde8f6a9a5e51a768b37075838d6d8248adf8dd/enterprise/install-scripts/generic/openshift.sh 2.download the Test script from attachment 855305 [details]. configure the passwd, user, and pool. #vim test-repo-configuration.sh #!/bin/bash poolid=${POOL_ID} rhnuser=${USER} rhnpass=${PASSWD} profile_name="OpenShift-`hostname`" <--snip--> 3.run the script on a new server. ./test-repo-configuration.sh 4.compare "test_results.log" with attachment 855307 [details] The two files should be same. #diff test_results.log expect 48,49c48,49 < 45 rhsm no yes yes no no ABORT < 46 rhsm no yes yes no yes ERROR --- > 45 rhsm no yes yes no no OK > 46 rhsm no yes yes no yes OK 64,65c64,65 < 61 rhsm yes yes yes no no ABORT < 62 rhsm yes yes yes no yes ERROR --- > 61 rhsm yes yes yes no no OK > 62 rhsm yes yes yes no yes OK Is PR 277 the latest version ? Yes, PR 277 has the latest version, and it has the version with which I got those test results. I compared MD5 sums to be certain: # curl -s https://raw2.github.com/Miciah/openshift-extras/1bde8f6a9a5e51a768b37075838d6d8248adf8dd/enterprise/install-scripts/generic/openshift.sh | md5sum - openshift.sh 8c7d38f37be007e524386661c30dd1ea - 8c7d38f37be007e524386661c30dd1ea openshift.sh Could you attach test-45-output.log, test-46-output.log, test-61-output.log, and test-62-output.log from your test run to the bug report? I'll attach my results for the same test runs. I can also re-run the tests myself to see whether I can get the results you are seeing. Created attachment 855630 [details]
Results for Test 45
Created attachment 855631 [details]
Results for Test 46
Created attachment 855632 [details]
Results for Test 61
Created attachment 855633 [details]
Results for Test 62
I check this problem again, and compare the result with expect file #diff test_results.log expect 48,49c48,49 < 45 rhsm no yes yes no no ABORT < 46 rhsm no yes yes no yes ERROR --- > 45 rhsm no yes yes no no OK > 46 rhsm no yes yes no yes OK 64,65c64,65 < 61 rhsm yes yes yes no no ABORT < 62 rhsm yes yes yes no yes ERROR --- > 61 rhsm yes yes yes no no OK > 62 rhsm yes yes yes no yes OK I run the scenario manually. 1.export the env export CONF_INSTALL_METHOD=rhsm export CONF_RHN_USER=${user} export CONF_RHN_PASS=${passwd} export CONF_SM_REG_POOL=${pool_id} 2.run the "openshift.sh" #sh openshift.sh actions=validate_preflight,configure_repos 3.unset all the env unset CONF_RHN_USER unset CONF_RHN_PASS unset CONF_SM_REG_POOL 4.run the "openshift.sh" again. #sh openshift.sh actions=validate_preflight,configure_repos Output: <--snip--> rpm-4.8.0-32.el6.x86_64 yum-3.2.29-40.el6.noarch + [[ rhsm = rhn ]] + [[ rhsm = rhsm ]] + subscription-manager identity + grep -q 'identity is:' + [[ ! -n '' ]] + [[ -n '' ]] + subscription-manager repos + tac + grep -q '^Repo ID:\s*rhel-6-server-ose-2.0-\(infra\|node\)-rpms$' + echo 'OpenShift: Install method rhsm requires a poolid.' OpenShift: Install method rhsm requires a poolid. + preflight_failure=1 + '[' rhsm = yum -a '!' '' ']' + '[' 1 ']' + abort_install + [[ x == x ]] + echo 'OpenShift: Aborting Installation.' OpenShift: Aborting Installation. I copied and pasted your steps (1 through 4), and I see the following output: + [[ rhsm = rhn ]] + [[ rhsm = rhsm ]] + grep -q 'identity is:' + subscription-manager identity + [[ ! -n '' ]] + [[ -n '' ]] + grep -q '^Repo ID:\s*rhel-6-server-ose-2.0-\(infra\|node\)-rpms$' + tac + subscription-manager repos + '[' rhsm = yum -a '!' '' ']' + '[' '' ']' + echo 'OpenShift: Completed preflight validation.' I'm curious why the grep command fails. I'm also curious why the commands are output in a slightly different order in your output. Are you using RHEL65? (I am.) Can you run the following command? # subscription-manager repos | grep '^Repo ID:\s*rhel-6-server-ose-2.0-\(infra\|node\)-rpms$' I see the following output: Repo ID: rhel-6-server-ose-2.0-infra-rpms Repo ID: rhel-6-server-ose-2.0-node-rpms Are you using the same shell that I am using? Please check with the following commands. # ls -l /bin/sh ; rpm -qf /bin/sh `readlink -f /bin/sh` lrwxrwxrwx. 1 root root 4 Oct 1 14:52 /bin/sh -> bash bash-4.1.2-15.el6_4.x86_64 bash-4.1.2-15.el6_4.x86_64 My last comment might have been confusing. I meant to say, I'm curious why the grep command fails for you when it succeeds for me. OK, I repeated Steps 1-2 4 times with the rhel-server-x86_64-kvm-6.4_20130130.0-4 image and 4 times with the rhel-guest-image-6-6.5-20131115.0-1.qcow2 image, using a new instance on OS1 Internal each time. 3 out of 4 times with the rhel-guest-image-6-6.5-20131115.0-1.qcow2 image, Steps 1-2 fail. (One time, Yum died and gave me a traceback.) 4 out of 4 times with the rhel-server-x86_64-kvm-6.4_20130130.0-4 image, Steps 1-2 succeed. Furthermore, Steps 3-4 failed on rhel-server-x86_64-kvm-6.4_20130130.0-4 after Steps 1-2 succeeded, which is the behaviour that you reported. I checked, and it turns out that subscription-manager's output format has changed. rhel-server-x86_64-kvm-6.4_20130130.0-4 has subscription-manager-1.1.23-1.el6.x86_64. rhel-guest-image-6-6.5-20131115.0-1.qcow2 has subscription-manager-1.9.11-1.el6.x86_64. The older version has 'Repo Id' where the newer one has 'Repo ID'. I will drop '^Repo ID:\s*' from the search pattern and rerun my test script. I pushed the updated version to the PR. It turns out that the different versions of subscription-manager have different output for `subscription-manager list --consumed` as well. The older version does not output the pool id, so there is no clear way to determine whether a given pool is already attached. Unfortunately, subscription-manager exits with an error if told to attach a pool that is already attached, so the script was still failing in two test cases on rhel-server-x86_64-kvm-6.4_20130130.0-4. Meanwhile, the script failed in numerous tests on rhel-guest-image-6-6.5-20131115.0-1.qcow2 because the channels are missing even after attaching the pool. I guess this is a problem with the image, not with openshift.sh. I have updated the PR again so that it removes all attached pools before it tries to attach any pools, and I'm running the tests again. OK, my last test run with rhel-server-x86_64-kvm-6.4_20130130.0-4 got the expected results (exactly the same output as attachment 855307 [details]—no errors).
rhel-guest-image-6-6.5-20131115.0-1.qcow2 is still inexplicably failing with RHSM.
Luke pointed me to the workaround to get RHSM to work on the
rhel-guest-image-6-6.5-20131115.0-1.qcow2 image. With that workaround, I get the exact same results with
the rhel-guest-image-6-6.5-20131115.0-1.qcow2 image as I get on the rhel-server-x86_64-kvm-6.4_20130130.0-4 image (i.e., the exact same output as attachment 855307 [details]).
Please test the latest version of the script in the pull request.
launch instance from RHEL6.4-qcow2 1.get the PR 277 #https://raw2.github.com/Miciah/openshift-extras/1bde8f6a9a5e51a768b37075838d6d8248adf8dd/enterprise/install-scripts/generic/openshift.sh 2.download the Test script from attachment 855305 [details]. configure the passwd, user, and pool. #vim test-repo-configuration.sh #!/bin/bash poolid=${POOL_ID} rhnuser=${USER} rhnpass=${PASSWD} profile_name="OpenShift-`hostname`" <--snip--> 3.run the script on a new server. ./test-repo-configuration.sh 4.compare "test_results.log" with attachment 855307 [details] The two files should be same. #diff test_results.log expect update the instance. # lsb_release -a LSB Version: :base-4.0-amd64:base-4.0-noarch:core-4.0-amd64:core-4.0-noarch:graphics-4.0-amd64:graphics-4.0-noarch:printing-4.0-amd64:printing-4.0-noarch Distributor ID: RedHatEnterpriseServer Description: Red Hat Enterprise Linux Server release 6.5 (Santiago) Release: 6.5 Codename: Santiago then run the script again, the result is the same as expect. |