Bug 1412784 - [Satellite] host registration fails
Summary: [Satellite] host registration fails
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Quickstart Cloud Installer
Classification: Red Hat
Component: Installation - OpenShift
Version: 1.1
Hardware: All
OS: All
medium
medium
Target Milestone: ---
: 1.1
Assignee: John Matthews
QA Contact: Tasos Papaioannou
Derek
URL:
Whiteboard:
: 1420899 (view as bug list)
Depends On: 1405085 1419953
Blocks: 1404744 1421259
TreeView+ depends on / blocked
 
Reported: 2017-01-12 19:06 UTC by Tasos Papaioannou
Modified: 2017-02-28 01:43 UTC (History)
14 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1405085
: 1421259 (view as bug list)
Environment:
Last Closed: 2017-02-28 01:43:39 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
excerpt from production.log from failed OCP HA (62.63 KB, text/plain)
2017-01-17 10:44 UTC, Antonin Pagac
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 1413926 0 unspecified CLOSED OCP HA fails at 10% with "Failed to connect to the host via ssh: Read from socket failed: Connection reset by peer" 2021-02-22 00:41:40 UTC
Red Hat Product Errata RHEA-2017:0335 0 normal SHIPPED_LIVE Red Hat Quickstart Installer 1.1 2017-02-28 06:36:13 UTC

Internal Links: 1413926

Comment 3 John Matthews 2017-01-16 16:18:25 UTC
Moving this to 1.2.

If this occurs more frequently during 1.1 testing we will consider addressing in 1.1

Comment 5 Jason Montleon 2017-01-16 19:07:29 UTC
I hit this with OSP today:

Jan 16 13:50:19 host-192-168-150-10 os-collect-config: [2017-01-16 13:50:19,994] (heat-config) [INFO] {"deploy_stdout": "HTTP/1.1 200 OK\r\nStatus: 200 OK\r\nSatellite 6 detected at https://qci.example.com\nPreparing...                          ########################################\nUpdating / installing...\nkatello-ca-consumer-qci.example.com-1.########################################\n", "deploy_stderr": "  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current\n                                 Dload  Upload   Total   Spent    Left  Speed\n\r  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0\r100  6526  100  6526    0     0  43376      0 --:--:-- --:--:-- --:--:-- 43506\nERROR:  current transaction is aborted, commands ignored until end of transaction block\n\n", "deploy_status_code": 70}
Jan 16 13:50:19 host-192-168-150-10 os-collect-config: [2017-01-16 13:50:19,994] (heat-config) [DEBUG] [2017-01-16 13:45:33,900] (heat-config) [INFO] REG_ACTIVATION_KEY=OpenStack_Undercloud-osp1-OpenStack_Undercloud


It is not OCP specific.

Comment 6 Antonin Pagac 2017-01-17 10:44:17 UTC
Created attachment 1241665 [details]
excerpt from production.log from failed OCP HA

Hit this with OCP HA. From production.log:

"
 | TASK [satellite_registration : Register to Satellite] **************************
 | fatal: [depl2-ocp-master3.example.com]: FAILED! => {"changed": false, "cmd": "subscription-manager register --activationkey OpenShift-depl2-OpenShift --org Default_Organization", "failed": true, "msg": "ERROR:  current transaction is aborted, commands ignored until end of transaction block", "rc": 70, "stderr": "ERROR:  current transaction is aborted, commands ignored until end of transaction block
"

See attachment for excerpt from production.log.

Comment 7 Antonin Pagac 2017-01-17 13:52:49 UTC
With QCI-1.1-RHEL-7-20170116.t.0, 4 hosts managed to register and 4 did not:

"
2017-01-17 08:33:48,983 p=7116 u=foreman |  fatal: [depl1-ocp-master3.example.com]: FAILED! => {"changed": false, "cmd": "subscription-manager register --activationkey OpenShift-depl1-OpenShift --org Default_Organization", "failed": true, "msg": "ERROR:  current transaction is aborted, commands ignored until end of transaction block", "rc": 70, "stderr": "ERROR:  current transaction is aborted, commands ignored until end of transaction block\n\n", "stdout": "", "stdout_lines": []}
2017-01-17 08:33:49,003 p=7116 u=foreman |  fatal: [depl1-ocp-node2.example.com]: FAILED! => {"changed": false, "cmd": "subscription-manager register --activationkey OpenShift-depl1-OpenShift --org Default_Organization", "failed": true, "msg": "ERROR:  current transaction is aborted, commands ignored until end of transaction block", "rc": 70, "stderr": "ERROR:  current transaction is aborted, commands ignored until end of transaction block\n\n", "stdout": "", "stdout_lines": []}
2017-01-17 08:33:49,010 p=7116 u=foreman |  fatal: [depl1-ocp-master2.example.com]: FAILED! => {"changed": false, "cmd": "subscription-manager register --activationkey OpenShift-depl1-OpenShift --org Default_Organization", "failed": true, "msg": "ERROR:  current transaction is aborted, commands ignored until end of transaction block", "rc": 70, "stderr": "ERROR:  current transaction is aborted, commands ignored until end of transaction block\n\n", "stdout": "", "stdout_lines": []}
2017-01-17 08:33:49,013 p=7116 u=foreman |  fatal: [depl1-ocp-master1.example.com]: FAILED! => {"changed": false, "cmd": "subscription-manager register --activationkey OpenShift-depl1-OpenShift --org Default_Organization", "failed": true, "msg": "ERROR:  current transaction is aborted, commands ignored until end of transaction block", "rc": 70, "stderr": "ERROR:  current transaction is aborted, commands ignored until end of transaction block\n\n", "stdout": "", "stdout_lines": []}
2017-01-17 08:33:57,772 p=7116 u=foreman |  changed: [depl1-ocp-node1.example.com]
2017-01-17 08:34:03,346 p=7116 u=foreman |  changed: [depl1-ocp-ha1.example.com]
2017-01-17 08:34:03,576 p=7116 u=foreman |  changed: [depl1-ocp-ha2.example.com]
2017-01-17 08:34:03,734 p=7116 u=foreman |  changed: [depl1-ocp-node3.example.com]
"

Comment 9 jkim 2017-01-17 14:41:10 UTC
I Hit this issue in QCI-1.1-RHEL-7-20170116.t.0

Did two deployments in parallel on two separate machines, and both deployments failed at the host registration. setup the same way as follows
- RHV & CFME & OCP
- 1 RHV Engine & 1 RHV Hypervisor
- 1 OCP Master & 1 OCP Worker nodes


Paste of the Error from the Ansible Run Below:

TASK [satellite_registration : Register to Satellite] **************************
fatal: [t1-ocp-node1.example.com]: FAILED! => {"changed": false, "cmd": "subscription-manager register --activationkey OpenShift-t1-OpenShift --org Default_Organization", "failed": true, "msg": "ERROR:  current transaction is aborted, commands ignored until end of transaction block", "rc": 70, "stderr": "ERROR:  current transaction is aborted, commands ignored until end of transaction block

Comment 10 Antonin Pagac 2017-01-17 16:25:24 UTC
Workaround for OCP: resume the failed OCP task Actions::Fusor::Deployment::OpenShift::Deploy.

I had to do it three times, finally it got through registration and now I'm at 60% of OCP deployment.

Comment 11 Antonin Pagac 2017-01-23 13:59:46 UTC
Unable to reproduce using QCI-1.1-RHEL-7-20170120.t.0.

Task "[satellite_registration : Register to Satellite]" succeeded without errors for me.

Comment 12 Tasos Papaioannou 2017-01-24 15:15:25 UTC
Verified on QCI-1.1-RHEL-7-20170120.t.0.

Comment 13 Tasos Papaioannou 2017-02-07 19:22:15 UTC
This seems to be an issue again, on QCI-1.1-RHEL-7-20170203.t.0.

OCP (1 master + 1 worker) deployment failed during registration of the nodes to the Satellite. /var/log/foreman/production.log shows:


****
2017-02-07 13:24:13 [app] [I] Started POST "/rhsm/consumers?owner=Default_Organization&activation_keys=OpenShift-tpapaioa_2-OpenShift" for 192.168.100.128 at 2017-02-07 13:24:13 -0500
2017-02-07 13:24:13 [app] [I] Processing by Katello::Api::Rhsm::CandlepinProxiesController#consumer_activate as JSON
2017-02-07 13:24:13 [app] [I]   Parameters: {"facts"=>"[FILTERED]", "installedProducts"=>[{"productName"=>"Red Hat Enterprise Linux Server", "arch"=>"x86_64", "version"=>"7.3", "productId"=>"69"}], "type"=>"system", "name"=>"tpapaioa-2-ocp-master1.example.com", "contentTags"=>["rhel-7", "rhel-7-server"], "owner"=>"Default_Organization", "activation_keys"=>"OpenShift-tpapaioa_2-OpenShift"}
2017-02-07 13:24:14 [app] [I] Started POST "/rhsm/consumers?owner=Default_Organization&activation_keys=OpenShift-tpapaioa_2-OpenShift" for 192.168.100.130 at 2017-02-07 13:24:14 -0500
2017-02-07 13:24:14 [app] [I] Processing by Katello::Api::Rhsm::CandlepinProxiesController#consumer_activate as JSON
2017-02-07 13:24:14 [app] [I]   Parameters: {"facts"=>"[FILTERED]", "installedProducts"=>[{"productName"=>"Red Hat Enterprise Linux Server", "arch"=>"x86_64", "version"=>"7.3", "productId"=>"69"}], "type"=>"system", "name"=>"tpapaioa-2-ocp-node1.example.com", "contentTags"=>["rhel-7", "rhel-7-server"], "owner"=>"Default_Organization", "activation_keys"=>"OpenShift-tpapaioa_2-OpenShift"}
[...]
2017-02-07 13:24:20 [foreman-tasks/action] [E] Validation failed: Name has already been taken (ActiveRecord::RecordInvalid)
 | /opt/rh/rh-ror41/root/usr/share/gems/gems/activerecord-4.1.5/lib/active_record/validations.rb:57:in `save!'
 | /opt/rh/rh-ror41/root/usr/share/gems/gems/activerecord-4.1.5/lib/active_record/attribute_methods/dirty.rb:29:in `save!'
 | /opt/rh/rh-ror41/root/usr/share/gems/gems/activerecord-4.1.5/lib/active_record/transactions.rb:273:in `block in save!'
 | /opt/rh/rh-ror41/root/usr/share/gems/gems/activerecord-4.1.5/lib/active_record/transactions.rb:329:in `block in with_transaction_returning_status'
 | /opt/rh/rh-ror41/root/usr/share/gems/gems/activerecord-4.1.5/lib/active_record/connection_adapters/abstract/database_statements.rb:201:in `block in transaction'
 | /opt/rh/rh-ror41/root/usr/share/gems/gems/activerecord-4.1.5/lib/active_record/connection_adapters/abstract/database_statements.rb:209:in `within_new_transaction'
 | /opt/rh/rh-ror41/root/usr/share/gems/gems/activerecord-4.1.5/lib/active_record/connection_adapters/abstract/database_statements.rb:201:in `transaction'
 | /opt/rh/rh-ror41/root/usr/share/gems/gems/activerecord-4.1.5/lib/active_record/transactions.rb:208:in `transaction'
 | /opt/rh/rh-ror41/root/usr/share/gems/gems/activerecord-4.1.5/lib/active_record/transactions.rb:326:in `with_transaction_returning_status'
 | /opt/rh/rh-ror41/root/usr/share/gems/gems/activerecord-4.1.5/lib/active_record/transactions.rb:273:in `save!'
 | /opt/theforeman/tfm/root/usr/share/gems/gems/protected_attributes-1.1.3/lib/active_record/mass_assignment_security/validations.rb:17:in `create!'
 | /opt/rh/rh-ror41/root/usr/share/gems/gems/activerecord-4.1.5/lib/active_record/relation.rb:140:in `block in create!'
 | /opt/rh/rh-ror41/root/usr/share/gems/gems/activerecord-4.1.5/lib/active_record/relation.rb:285:in `scoping'
 | /opt/rh/rh-ror41/root/usr/share/gems/gems/activerecord-4.1.5/lib/active_record/relation.rb:140:in `create!'
 | /opt/theforeman/tfm/root/usr/share/gems/gems/protected_attributes-1.1.3/lib/active_record/mass_assignment_security/relation.rb:42:in `first_or_create!'
 | /opt/theforeman/tfm/root/usr/share/gems/gems/katello-3.0.0.95/app/models/katello/rhsm_fact_importer.rb:29:in `add_fact_name'
 | /opt/theforeman/tfm/root/usr/share/gems/gems/katello-3.0.0.95/app/models/katello/rhsm_fact_importer.rb:21:in `block in add_new_facts'
 | /opt/theforeman/tfm/root/usr/share/gems/gems/katello-3.0.0.95/app/models/katello/rhsm_fact_importer.rb:21:in `each'
 | /opt/theforeman/tfm/root/usr/share/gems/gems/katello-3.0.0.95/app/models/katello/rhsm_fact_importer.rb:21:in `add_new_facts'
 | /usr/share/foreman/app/services/fact_importer.rb:41:in `import!'
 | /usr/share/foreman/app/models/host/base.rb:148:in `import_facts'
 | /opt/theforeman/tfm/root/usr/share/gems/gems/katello-3.0.0.95/app/models/katello/host/subscription_facet.rb:83:in `update_facts'
 | /opt/theforeman/tfm/root/usr/share/gems/gems/katello-3.0.0.95/app/lib/actions/katello/host/register.rb:54:in `run'
 | /opt/theforeman/tfm/root/usr/share/gems/gems/dynflow-0.8.13.3/lib/dynflow/action.rb:506:in `block (3 levels) in execute_run'
 | /opt/theforeman/tfm/root/usr/share/gems/gems/dynflow-0.8.13.3/lib/dynflow/middleware/stack.rb:26:in `call'
[...]
 | /opt/theforeman/tfm/root/usr/share/gems/gems/logging-1.8.2/lib/logging/diagnostic_context.rb:323:in `block in create_with_logging_context'
2017-02-07 13:24:20 [katello/cp_proxy] [E] ForemanTasks::TaskError: Task 52ac260a-46b7-4ee8-9544-4127660f41ec: ActiveRecord::RecordInvalid: Validation failed: Name has already been taken
 | /opt/theforeman/tfm/root/usr/share/gems/gems/foreman-tasks-0.7.14.11/lib/foreman_tasks.rb:40:in `block in sync_task'
 | /opt/theforeman/tfm/root/usr/share/gems/gems/foreman-tasks-0.7.14.11/lib/foreman_tasks.rb:39:in `tap'
 | /opt/theforeman/tfm/root/usr/share/gems/gems/foreman-tasks-0.7.14.11/lib/foreman_tasks.rb:39:in `sync_task'
 | /opt/theforeman/tfm/root/usr/share/gems/gems/foreman-tasks-0.7.14.11/lib/foreman_tasks/triggers.rb:22:in `sync_task'
 | /opt/theforeman/tfm/root/usr/share/gems/gems/katello-3.0.0.95/app/controllers/katello/api/rhsm/candlepin_proxies_controller.rb:227:in `consumer_activate'
[...]
 | /opt/theforeman/tfm/root/usr/share/gems/gems/logging-1.8.2/lib/logging/diagnostic_context.rb:323:in `block in create_with_logging_context'
2017-02-07 13:24:20 [app] [I] Completed 500 Internal Server Error in 6432ms (Views: 7.3ms | ActiveRecord: 83.6ms)
2017-02-07 13:24:21 [app] [I] Import facts for 'tpapaioa-2-ocp-node1.example.com' completed. Added: 142, Updated: 1, Deleted 0 facts
****

There is nothing logged in /var/lib/pgsql/data/pg_log/postgresql-Tue.log this time, however. So the Satellite code change may have fixed the postgres error but not the overall race condition when updating/importing rhsm facts.

Updating https://bugzilla.redhat.com/show_bug.cgi?id=1419953 as well.

Comment 14 John Matthews 2017-02-10 17:20:19 UTC
*** Bug 1420899 has been marked as a duplicate of this bug. ***

Comment 15 Jason Montleon 2017-02-13 14:06:23 UTC
tfm-rubygem-katello-3.0.0.95-2qci.el7sat should be in QCI-1.1-RHEL-7-20170210.t.1

This contains a PR to fix http://projects.theforeman.org/issues/18122 / https://bugzilla.redhat.com/show_bug.cgi?id=1405085

Comment 16 Tasos Papaioannou 2017-02-13 19:23:27 UTC
Verified on QCI-1.1-RHEL-7-20170213.t.0.

Comment 18 errata-xmlrpc 2017-02-28 01:43:39 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2017:0335


Note You need to log in before you can comment on or make changes to this bug.