Bug 1412784
| Summary: | [Satellite] host registration fails | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat Quickstart Cloud Installer | Reporter: | Tasos Papaioannou <tpapaioa> | ||||
| Component: | Installation - OpenShift | Assignee: | John Matthews <jmatthew> | ||||
| Status: | CLOSED ERRATA | QA Contact: | Tasos Papaioannou <tpapaioa> | ||||
| Severity: | medium | Docs Contact: | Derek <dcadzow> | ||||
| Priority: | medium | ||||||
| Version: | 1.1 | CC: | apagac, arubin, bbuckingham, bthurber, ehelms, jcallaha, jkim, jmontleo, katello-qa-list, llasmith, qci-bugzillas, rplevka, satellite6-bugs, tpapaioa | ||||
| Target Milestone: | --- | Keywords: | Triaged | ||||
| Target Release: | 1.1 | ||||||
| Hardware: | All | ||||||
| OS: | All | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | 1405085 | ||||||
| : | 1421259 (view as bug list) | Environment: | |||||
| Last Closed: | 2017-02-28 01:43:39 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Bug Depends On: | 1405085, 1419953 | ||||||
| Bug Blocks: | 1404744, 1421259 | ||||||
| Attachments: |
|
||||||
|
Comment 3
John Matthews
2017-01-16 16:18:25 UTC
I hit this with OSP today:
Jan 16 13:50:19 host-192-168-150-10 os-collect-config: [2017-01-16 13:50:19,994] (heat-config) [INFO] {"deploy_stdout": "HTTP/1.1 200 OK\r\nStatus: 200 OK\r\nSatellite 6 detected at https://qci.example.com\nPreparing... ########################################\nUpdating / installing...\nkatello-ca-consumer-qci.example.com-1.########################################\n", "deploy_stderr": " % Total % Received % Xferd Average Speed Time Time Time Current\n Dload Upload Total Spent Left Speed\n\r 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0\r100 6526 100 6526 0 0 43376 0 --:--:-- --:--:-- --:--:-- 43506\nERROR: current transaction is aborted, commands ignored until end of transaction block\n\n", "deploy_status_code": 70}
Jan 16 13:50:19 host-192-168-150-10 os-collect-config: [2017-01-16 13:50:19,994] (heat-config) [DEBUG] [2017-01-16 13:45:33,900] (heat-config) [INFO] REG_ACTIVATION_KEY=OpenStack_Undercloud-osp1-OpenStack_Undercloud
It is not OCP specific.
Created attachment 1241665 [details]
excerpt from production.log from failed OCP HA
Hit this with OCP HA. From production.log:
"
| TASK [satellite_registration : Register to Satellite] **************************
| fatal: [depl2-ocp-master3.example.com]: FAILED! => {"changed": false, "cmd": "subscription-manager register --activationkey OpenShift-depl2-OpenShift --org Default_Organization", "failed": true, "msg": "ERROR: current transaction is aborted, commands ignored until end of transaction block", "rc": 70, "stderr": "ERROR: current transaction is aborted, commands ignored until end of transaction block
"
See attachment for excerpt from production.log.
With QCI-1.1-RHEL-7-20170116.t.0, 4 hosts managed to register and 4 did not:
"
2017-01-17 08:33:48,983 p=7116 u=foreman | fatal: [depl1-ocp-master3.example.com]: FAILED! => {"changed": false, "cmd": "subscription-manager register --activationkey OpenShift-depl1-OpenShift --org Default_Organization", "failed": true, "msg": "ERROR: current transaction is aborted, commands ignored until end of transaction block", "rc": 70, "stderr": "ERROR: current transaction is aborted, commands ignored until end of transaction block\n\n", "stdout": "", "stdout_lines": []}
2017-01-17 08:33:49,003 p=7116 u=foreman | fatal: [depl1-ocp-node2.example.com]: FAILED! => {"changed": false, "cmd": "subscription-manager register --activationkey OpenShift-depl1-OpenShift --org Default_Organization", "failed": true, "msg": "ERROR: current transaction is aborted, commands ignored until end of transaction block", "rc": 70, "stderr": "ERROR: current transaction is aborted, commands ignored until end of transaction block\n\n", "stdout": "", "stdout_lines": []}
2017-01-17 08:33:49,010 p=7116 u=foreman | fatal: [depl1-ocp-master2.example.com]: FAILED! => {"changed": false, "cmd": "subscription-manager register --activationkey OpenShift-depl1-OpenShift --org Default_Organization", "failed": true, "msg": "ERROR: current transaction is aborted, commands ignored until end of transaction block", "rc": 70, "stderr": "ERROR: current transaction is aborted, commands ignored until end of transaction block\n\n", "stdout": "", "stdout_lines": []}
2017-01-17 08:33:49,013 p=7116 u=foreman | fatal: [depl1-ocp-master1.example.com]: FAILED! => {"changed": false, "cmd": "subscription-manager register --activationkey OpenShift-depl1-OpenShift --org Default_Organization", "failed": true, "msg": "ERROR: current transaction is aborted, commands ignored until end of transaction block", "rc": 70, "stderr": "ERROR: current transaction is aborted, commands ignored until end of transaction block\n\n", "stdout": "", "stdout_lines": []}
2017-01-17 08:33:57,772 p=7116 u=foreman | changed: [depl1-ocp-node1.example.com]
2017-01-17 08:34:03,346 p=7116 u=foreman | changed: [depl1-ocp-ha1.example.com]
2017-01-17 08:34:03,576 p=7116 u=foreman | changed: [depl1-ocp-ha2.example.com]
2017-01-17 08:34:03,734 p=7116 u=foreman | changed: [depl1-ocp-node3.example.com]
"
I Hit this issue in QCI-1.1-RHEL-7-20170116.t.0
Did two deployments in parallel on two separate machines, and both deployments failed at the host registration. setup the same way as follows
- RHV & CFME & OCP
- 1 RHV Engine & 1 RHV Hypervisor
- 1 OCP Master & 1 OCP Worker nodes
Paste of the Error from the Ansible Run Below:
TASK [satellite_registration : Register to Satellite] **************************
fatal: [t1-ocp-node1.example.com]: FAILED! => {"changed": false, "cmd": "subscription-manager register --activationkey OpenShift-t1-OpenShift --org Default_Organization", "failed": true, "msg": "ERROR: current transaction is aborted, commands ignored until end of transaction block", "rc": 70, "stderr": "ERROR: current transaction is aborted, commands ignored until end of transaction block
Workaround for OCP: resume the failed OCP task Actions::Fusor::Deployment::OpenShift::Deploy. I had to do it three times, finally it got through registration and now I'm at 60% of OCP deployment. Unable to reproduce using QCI-1.1-RHEL-7-20170120.t.0. Task "[satellite_registration : Register to Satellite]" succeeded without errors for me. Verified on QCI-1.1-RHEL-7-20170120.t.0. This seems to be an issue again, on QCI-1.1-RHEL-7-20170203.t.0.
OCP (1 master + 1 worker) deployment failed during registration of the nodes to the Satellite. /var/log/foreman/production.log shows:
****
2017-02-07 13:24:13 [app] [I] Started POST "/rhsm/consumers?owner=Default_Organization&activation_keys=OpenShift-tpapaioa_2-OpenShift" for 192.168.100.128 at 2017-02-07 13:24:13 -0500
2017-02-07 13:24:13 [app] [I] Processing by Katello::Api::Rhsm::CandlepinProxiesController#consumer_activate as JSON
2017-02-07 13:24:13 [app] [I] Parameters: {"facts"=>"[FILTERED]", "installedProducts"=>[{"productName"=>"Red Hat Enterprise Linux Server", "arch"=>"x86_64", "version"=>"7.3", "productId"=>"69"}], "type"=>"system", "name"=>"tpapaioa-2-ocp-master1.example.com", "contentTags"=>["rhel-7", "rhel-7-server"], "owner"=>"Default_Organization", "activation_keys"=>"OpenShift-tpapaioa_2-OpenShift"}
2017-02-07 13:24:14 [app] [I] Started POST "/rhsm/consumers?owner=Default_Organization&activation_keys=OpenShift-tpapaioa_2-OpenShift" for 192.168.100.130 at 2017-02-07 13:24:14 -0500
2017-02-07 13:24:14 [app] [I] Processing by Katello::Api::Rhsm::CandlepinProxiesController#consumer_activate as JSON
2017-02-07 13:24:14 [app] [I] Parameters: {"facts"=>"[FILTERED]", "installedProducts"=>[{"productName"=>"Red Hat Enterprise Linux Server", "arch"=>"x86_64", "version"=>"7.3", "productId"=>"69"}], "type"=>"system", "name"=>"tpapaioa-2-ocp-node1.example.com", "contentTags"=>["rhel-7", "rhel-7-server"], "owner"=>"Default_Organization", "activation_keys"=>"OpenShift-tpapaioa_2-OpenShift"}
[...]
2017-02-07 13:24:20 [foreman-tasks/action] [E] Validation failed: Name has already been taken (ActiveRecord::RecordInvalid)
| /opt/rh/rh-ror41/root/usr/share/gems/gems/activerecord-4.1.5/lib/active_record/validations.rb:57:in `save!'
| /opt/rh/rh-ror41/root/usr/share/gems/gems/activerecord-4.1.5/lib/active_record/attribute_methods/dirty.rb:29:in `save!'
| /opt/rh/rh-ror41/root/usr/share/gems/gems/activerecord-4.1.5/lib/active_record/transactions.rb:273:in `block in save!'
| /opt/rh/rh-ror41/root/usr/share/gems/gems/activerecord-4.1.5/lib/active_record/transactions.rb:329:in `block in with_transaction_returning_status'
| /opt/rh/rh-ror41/root/usr/share/gems/gems/activerecord-4.1.5/lib/active_record/connection_adapters/abstract/database_statements.rb:201:in `block in transaction'
| /opt/rh/rh-ror41/root/usr/share/gems/gems/activerecord-4.1.5/lib/active_record/connection_adapters/abstract/database_statements.rb:209:in `within_new_transaction'
| /opt/rh/rh-ror41/root/usr/share/gems/gems/activerecord-4.1.5/lib/active_record/connection_adapters/abstract/database_statements.rb:201:in `transaction'
| /opt/rh/rh-ror41/root/usr/share/gems/gems/activerecord-4.1.5/lib/active_record/transactions.rb:208:in `transaction'
| /opt/rh/rh-ror41/root/usr/share/gems/gems/activerecord-4.1.5/lib/active_record/transactions.rb:326:in `with_transaction_returning_status'
| /opt/rh/rh-ror41/root/usr/share/gems/gems/activerecord-4.1.5/lib/active_record/transactions.rb:273:in `save!'
| /opt/theforeman/tfm/root/usr/share/gems/gems/protected_attributes-1.1.3/lib/active_record/mass_assignment_security/validations.rb:17:in `create!'
| /opt/rh/rh-ror41/root/usr/share/gems/gems/activerecord-4.1.5/lib/active_record/relation.rb:140:in `block in create!'
| /opt/rh/rh-ror41/root/usr/share/gems/gems/activerecord-4.1.5/lib/active_record/relation.rb:285:in `scoping'
| /opt/rh/rh-ror41/root/usr/share/gems/gems/activerecord-4.1.5/lib/active_record/relation.rb:140:in `create!'
| /opt/theforeman/tfm/root/usr/share/gems/gems/protected_attributes-1.1.3/lib/active_record/mass_assignment_security/relation.rb:42:in `first_or_create!'
| /opt/theforeman/tfm/root/usr/share/gems/gems/katello-3.0.0.95/app/models/katello/rhsm_fact_importer.rb:29:in `add_fact_name'
| /opt/theforeman/tfm/root/usr/share/gems/gems/katello-3.0.0.95/app/models/katello/rhsm_fact_importer.rb:21:in `block in add_new_facts'
| /opt/theforeman/tfm/root/usr/share/gems/gems/katello-3.0.0.95/app/models/katello/rhsm_fact_importer.rb:21:in `each'
| /opt/theforeman/tfm/root/usr/share/gems/gems/katello-3.0.0.95/app/models/katello/rhsm_fact_importer.rb:21:in `add_new_facts'
| /usr/share/foreman/app/services/fact_importer.rb:41:in `import!'
| /usr/share/foreman/app/models/host/base.rb:148:in `import_facts'
| /opt/theforeman/tfm/root/usr/share/gems/gems/katello-3.0.0.95/app/models/katello/host/subscription_facet.rb:83:in `update_facts'
| /opt/theforeman/tfm/root/usr/share/gems/gems/katello-3.0.0.95/app/lib/actions/katello/host/register.rb:54:in `run'
| /opt/theforeman/tfm/root/usr/share/gems/gems/dynflow-0.8.13.3/lib/dynflow/action.rb:506:in `block (3 levels) in execute_run'
| /opt/theforeman/tfm/root/usr/share/gems/gems/dynflow-0.8.13.3/lib/dynflow/middleware/stack.rb:26:in `call'
[...]
| /opt/theforeman/tfm/root/usr/share/gems/gems/logging-1.8.2/lib/logging/diagnostic_context.rb:323:in `block in create_with_logging_context'
2017-02-07 13:24:20 [katello/cp_proxy] [E] ForemanTasks::TaskError: Task 52ac260a-46b7-4ee8-9544-4127660f41ec: ActiveRecord::RecordInvalid: Validation failed: Name has already been taken
| /opt/theforeman/tfm/root/usr/share/gems/gems/foreman-tasks-0.7.14.11/lib/foreman_tasks.rb:40:in `block in sync_task'
| /opt/theforeman/tfm/root/usr/share/gems/gems/foreman-tasks-0.7.14.11/lib/foreman_tasks.rb:39:in `tap'
| /opt/theforeman/tfm/root/usr/share/gems/gems/foreman-tasks-0.7.14.11/lib/foreman_tasks.rb:39:in `sync_task'
| /opt/theforeman/tfm/root/usr/share/gems/gems/foreman-tasks-0.7.14.11/lib/foreman_tasks/triggers.rb:22:in `sync_task'
| /opt/theforeman/tfm/root/usr/share/gems/gems/katello-3.0.0.95/app/controllers/katello/api/rhsm/candlepin_proxies_controller.rb:227:in `consumer_activate'
[...]
| /opt/theforeman/tfm/root/usr/share/gems/gems/logging-1.8.2/lib/logging/diagnostic_context.rb:323:in `block in create_with_logging_context'
2017-02-07 13:24:20 [app] [I] Completed 500 Internal Server Error in 6432ms (Views: 7.3ms | ActiveRecord: 83.6ms)
2017-02-07 13:24:21 [app] [I] Import facts for 'tpapaioa-2-ocp-node1.example.com' completed. Added: 142, Updated: 1, Deleted 0 facts
****
There is nothing logged in /var/lib/pgsql/data/pg_log/postgresql-Tue.log this time, however. So the Satellite code change may have fixed the postgres error but not the overall race condition when updating/importing rhsm facts.
Updating https://bugzilla.redhat.com/show_bug.cgi?id=1419953 as well.
*** Bug 1420899 has been marked as a duplicate of this bug. *** tfm-rubygem-katello-3.0.0.95-2qci.el7sat should be in QCI-1.1-RHEL-7-20170210.t.1 This contains a PR to fix http://projects.theforeman.org/issues/18122 / https://bugzilla.redhat.com/show_bug.cgi?id=1405085 Verified on QCI-1.1-RHEL-7-20170213.t.0. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2017:0335 |