Bug 1466260 - Embedded Ansible role nginx.service failed , control process exited 1
Embedded Ansible role nginx.service failed , control process exited 1
Status: CLOSED WORKSFORME
Product: Red Hat CloudForms Management Engine
Classification: Red Hat
Component: Appliance (Show other bugs)
5.8.0
Unspecified Unspecified
high Severity high
: GA
: 5.9.0
Assigned To: Nick Carboni
luke couzens
ansible_embed:server_role
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2017-06-29 07:12 EDT by Gellert Kis
Modified: 2017-07-31 13:40 EDT (History)
6 users (show)

See Also:
Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
Environment:
5.8.0.17
Last Closed: 2017-07-31 13:40:17 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Gellert Kis 2017-06-29 07:12:05 EDT
Description of problem:

   17 [----] E, [2017-06-29T03:35:13.177487 #1406:9744704] ERROR -- : MIQ(EmbeddedAnsibleWorker::Runner) ID [99000000602066] PID [1406] GUID [2af81bac-5c6b-11e7-a1ce-005056957282] /bin/systemctl exit code: 1 Worker exiting.
   18 [----] E, [2017-06-29T03:35:25.496324 #1406:9083744] ERROR -- : AwesomeSpawn: /bin/systemctl exit code: 1
   19 [----] E, [2017-06-29T03:35:25.496418 #1406:9083744] ERROR -- : AwesomeSpawn: Job for nginx.service failed because the control process exited with error code. See "systemctl status nginx.service" and "journalctl -xe" for details.
   20 
   21 [----] E, [2017-06-29T03:35:25.496521 #1406:9083744] ERROR -- : [AwesomeSpawn::CommandResultError]: /bin/systemctl exit code: 1  Method:[rescue in do_before_work_loop]
   22 [----] E, [2017-06-29T03:35:25.496576 #1406:9083744] ERROR -- : /opt/rh/cfme-gemset/gems/awesome_spawn-1.4.1/lib/awesome_spawn.rb:105:in `run!'
   23 /opt/rh/cfme-gemset/gems/linux_admin-0.20.1/lib/linux_admin/common.rb:24:in `run!'
   24 /opt/rh/cfme-gemset/gems/linux_admin-0.20.1/lib/linux_admin/service/systemd_service.rb:18:in `start'
   25 /var/www/miq/vmdb/lib/embedded_ansible.rb:47:in `block in start'
   26 /var/www/miq/vmdb/lib/embedded_ansible.rb:47:in `each'
   27 /var/www/miq/vmdb/lib/embedded_ansible.rb:47:in `start'
   28 /var/www/miq/vmdb/app/models/embedded_ansible_worker/runner.rb:34:in `setup_ansible'
   29 /var/www/miq/vmdb/app/models/embedded_ansible_worker/runner.rb:13:in `do_before_work_loop'
   30 /var/www/miq/vmdb/app/models/embedded_ansible_worker/runner.rb:7:in `prepare'
   31 /var/www/miq/vmdb/app/models/miq_worker/runner.rb:129:in `start'
   32 /var/www/miq/vmdb/app/models/miq_worker/runner.rb:21:in `start_worker'
   33 /var/www/miq/vmdb/app/models/embedded_ansible_worker.rb:10:in `block in start_runner'


We have a similar error (except the exit code: 1)  appearing in : Bug 1439783 - Embedded Ansible role does not migrate cleanly to another appliance https://bugzilla.redhat.com/show_bug.cgi?id=1439783

We have enough 14GB memory .

We tried 
* Disable the Ansible role (and perhaps a few others to reduce memory usage)
* systemctl restart evmserverd #to be sure memory is free
* rm /etc/tower/SECRET_KEY
* vmdb; bin/rails c; > MiqDatabase.first.ansible_secret_key  = nil; MiqDatabase.first.save
* enable Ansible role
* check /var/log/tower/setup-<timestamp>.log for status

but /var/log/tower files are 0 size. 

Embedded Ansible role
Version-Release number of selected component (if applicable):


How reproducible:
NA

Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:
Comment 2 Gellert Kis 2017-06-29 08:55:29 EDT
From other support case 01874009 : 

19397 [----] E, [2017-06-29T04:09:24.413984 #21442:b9913c] ERROR -- : AwesomeSpawn: /bin/systemctl exit code: 1
 19398 [----] E, [2017-06-29T04:09:24.414187 #21442:b9913c] ERROR -- : AwesomeSpawn: Job for chronyd.service failed because start of the service was attempted too often. See "systemctl status chronyd.service" and "journalctl -xe" for deta       ils.
 19399 To force a start use "systemctl reset-failed chronyd.service" followed by "systemctl start chronyd.service" again.
 19400 
 19401 [----] E, [2017-06-29T04:09:24.414412 #21442:b9913c] ERROR -- : MIQ(MiqQueue#deliver) Message id: [1000014852781], Error: [/bin/systemctl exit code: 1]
 19402 [----] E, [2017-06-29T04:09:24.414543 #21442:b9913c] ERROR -- : [AwesomeSpawn::CommandResultError]: /bin/systemctl exit code: 1  Method:[rescue in deliver]
 19403 [----] E, [2017-06-29T04:09:24.414684 #21442:b9913c] ERROR -- : /opt/rh/cfme-gemset/gems/awesome_spawn-1.4.1/lib/awesome_spawn.rb:105:in `run!'
 19404 /opt/rh/cfme-gemset/gems/linux_admin-0.19.0/lib/linux_admin/common.rb:24:in `run!'
 19405 /opt/rh/cfme-gemset/gems/linux_admin-0.19.0/lib/linux_admin/service/systemd_service.rb:21:in `start'
 19406 /opt/rh/cfme-gemset/gems/linux_admin-0.19.0/lib/linux_admin/service/systemd_service.rb:40:in `restart'
 19407 /opt/rh/cfme-gemset/gems/linux_admin-0.19.0/lib/linux_admin/chrony.rb:28:in `restart_service_if_running'
 19408 /opt/rh/cfme-gemset/gems/linux_admin-0.19.0/lib/linux_admin/chrony.rb:21:in `add_servers'
 19409 /var/www/miq/vmdb/app/models/miq_server/ntp_management.rb:63:in `apply_ntp_server_settings'
 19410 /var/www/miq/vmdb/app/models/miq_server/ntp_management.rb:51:in `ntp_reload'
 19411 /var/www/miq/vmdb/app/models/miq_queue.rb:347:in `block in deliver'
 19412 /opt/rh/rh-ruby23/root/usr/share/ruby/timeout.rb:91:in `block in timeout'
 19413 /opt/rh/rh-ruby23/root/usr/share/ruby/timeout.rb:33:in `block in catch'
 19414 /opt/rh/rh-ruby23/root/usr/share/ruby/timeout.rb:33:in `catch'
 19415 /opt/rh/rh-ruby23/root/usr/share/ruby/timeout.rb:33:in `catch'
 19416 /opt/rh/rh-ruby23/root/usr/share/ruby/timeout.rb:106:in `timeout'
 19417 /var/www/miq/vmdb/app/models/miq_queue.rb:343:in `deliver'
 19418 /var/www/miq/vmdb/app/models/miq_queue_worker_base/runner.rb:106:in `deliver_queue_message'
 19419 /var/www/miq/vmdb/app/models/miq_queue_worker_base/runner.rb:134:in `deliver_message'
 19420 /var/www/miq/vmdb/app/models/miq_queue_worker_base/runner.rb:152:in `block in do_work'
 19421 /var/www/miq/vmdb/app/models/miq_queue_worker_base/runner.rb:146:in `loop'
 19422 /var/www/miq/vmdb/app/models/miq_queue_worker_base/runner.rb:146:in `do_work'
 19423 /var/www/miq/vmdb/app/models/miq_worker/runner.rb:334:in `block in do_work_loop'
 19424 /var/www/miq/vmdb/app/models/miq_worker/runner.rb:331:in `loop'
 19425 /var/www/miq/vmdb/app/models/miq_worker/runner.rb:331:in `do_work_loop'
 19426 /var/www/miq/vmdb/app/models/miq_worker/runner.rb:153:in `run'
 19427 /var/www/miq/vmdb/app/models/miq_worker/runner.rb:128:in `start'
 19428 /var/www/miq/vmdb/app/models/miq_worker/runner.rb:21:in `start_worker'
 19429 /var/www/miq/vmdb/app/models/miq_worker.rb:343:in `block in start'
 19430 /opt/rh/cfme-gemset/gems/nakayoshi_fork-0.0.3/lib/nakayoshi_fork.rb:24:in `fork'
 19431 /opt/rh/cfme-gemset/gems/nakayoshi_fork-0.0.3/lib/nakayoshi_fork.rb:24:in `fork'
 19432 /var/www/miq/vmdb/app/models/miq_worker.rb:341:in `start'
 19433 /var/www/miq/vmdb/app/models/miq_worker.rb:270:in `start_worker'
 19434 /var/www/miq/vmdb/app/models/miq_worker.rb:150:in `block in sync_workers'
 19435 /var/www/miq/vmdb/app/models/miq_worker.rb:150:in `times'
 19436 /var/www/miq/vmdb/app/models/miq_worker.rb:150:in `sync_workers'
 19437 /var/www/miq/vmdb/app/models/miq_server/worker_management/monitor.rb:52:in `block in sync_workers'
 19438 /var/www/miq/vmdb/app/models/miq_server/worker_management/monitor.rb:50:in `each'
 19439 /var/www/miq/vmdb/app/models/miq_server/worker_management/monitor.rb:50:in `sync_workers'
 19440 /var/www/miq/vmdb/app/models/miq_server/worker_management/monitor.rb:22:in `monitor_workers'
 19441 /var/www/miq/vmdb/app/models/miq_server.rb:346:in `block in monitor'
 19442 /var/www/miq/vmdb/gems/pending/util/extensions/miq-benchmark.rb:11:in `realtime_store'
 19443 /var/www/miq/vmdb/gems/pending/util/extensions/miq-benchmark.rb:30:in `realtime_block'
 19444 /var/www/miq/vmdb/app/models/miq_server.rb:346:in `monitor'
 19445 /var/www/miq/vmdb/app/models/miq_server.rb:368:in `block (2 levels) in monitor_loop'
 19446 /var/www/miq/vmdb/gems/pending/util/extensions/miq-benchmark.rb:11:in `realtime_store'
 19447 /var/www/miq/vmdb/gems/pending/util/extensions/miq-benchmark.rb:30:in `realtime_block'
 19448 /var/www/miq/vmdb/app/models/miq_server.rb:368:in `block in monitor_loop'
 19449 /var/www/miq/vmdb/app/models/miq_server.rb:367:in `loop'
 19450 /var/www/miq/vmdb/app/models/miq_server.rb:367:in `monitor_loop'
 19451 /var/www/miq/vmdb/app/models/miq_server.rb:250:in `start'
 19452 /var/www/miq/vmdb/lib/workers/evm_server.rb:65:in `start'
 19453 /var/www/miq/vmdb/lib/workers/evm_server.rb:92:in `start'
 19454 /var/www/miq/vmdb/lib/workers/bin/evm_server.rb:4:in `<main>'
Comment 3 Gellert Kis 2017-06-29 10:34:27 EDT
in comment #2 the other customer don't use ansible role at all.
Comment 8 Nick Carboni 2017-07-06 13:33:18 EDT
comment 2 is unrelated to the embedded ansible role and should either be opened as a separate issue or disregarded.

Based on the output of the nginx service it seems like the setup was never run (after the setup nginx would not have been configured to bind to port 80).

And if the setup was run on that appliance there would be logs in /var/log/tower/

It looks like there is an issue with the workaround you posted.
> vmdb; bin/rails c; > MiqDatabase.first.ansible_secret_key  = nil; MiqDatabase.first.save

should be

db = MiqDatabase.first
db.ansible_secret_key = nil
db.save

Give that a shot.
I'll take a look at the reproducer when I get a chance.
Comment 10 Gellert Kis 2017-07-18 10:36:30 EDT
Excerpt from /var/log/tower setup log , 

last line Fatal error :

TASK [awx_install : Migrate the Tower database schema (may take awhile when upgrading).] ***

fatal: [localhost]: FAILED! => {"changed": true, "cmd": ["tower-manage", "migrate", "--noinput", "--fake-initial"], "delta": "0:00:00.833555", "end": "2017-07-13 08:52:04.209103", "failed": true, "rc": 1, "start": "2017-07-13 08:52:03.375548", "stderr": "Traceback (most recent call last):\n  File \"/bin/tower-manage\", line 9, in <module>\n    load_entry_point('ansible-tower==3.1.2', 'console_scripts', 'tower-manage')()\n  File \"/lib/python2.7/site-packages/awx/__init__.py\", line 105, in manage\n  File \"/var/lib/awx/venv/tower/lib/python2.7/site-packages/django/core/management/__init__.py\", line 354, in execute_from_command_line\n    utility.execute()\n  File \"/var/lib/awx/venv/tower/lib/python2.7/site-packages/django/core/management/__init__.py\", line 328, in execute\n    django.setup()\n  File \"/var/lib/awx/venv/tower/lib/python2.7/site-packages/django/__init__.py\", line 17, in setup\n    configure_logging(settings.LOGGING_CONFIG, settings.LOGGING)\n  File \"/var/lib/awx/venv/tower/lib/python2.7/site-packages/django/utils/log.py\", line 86, in configure_logging\n    logging_config_func(logging_settings)\n  File \"/usr/lib64/python2.7/logging/config.py\", line 803, in dictConfig\n    dictConfigClass(config).configure()\n  File \"/usr/lib64/python2.7/logging/config.py\", line 585, in configure\n    '%r: %s' % (name, e))\nValueError: Unable to configure handler 'callback_receiver': [Errno 13] Permission denied: '/var/log/tower/callback_receiver.log'", "stdout": "", "stdout_lines": [], "warnings": []}
	to retry, use: --limit @/var/lib/awx/setup/install.retry
Comment 11 Nick Carboni 2017-07-20 09:48:15 EDT
The last line in the backtrace from comment 10 seems to say that the process cannot write to /var/log/tower/callback_receiver.log

I'm not sure why that would be, as this process should be running as the root user.

Are there any changes to the environment that would cause this kind of permission error?
Comment 12 Gellert Kis 2017-07-21 05:00:09 EDT
In my appliance this step is skipped . 

"
TASK [awx_install : Migrate the Tower database schema (may take awhile when upgrading).] ***
skipping: [localhost] => {"changed": false, "skip_reason": "Conditional check failed", "skipped": true}
"

In customer env. they have Tower database schema ? Can we delete it ? We need more understanding of what is Tower database schema and if possible to make this step "skip" ? 

basically the permission are the following:
drwxr-xr-x.  2 awx  awx  4096 Jul 15 03:30 .
drwxr-xr-x. 18 root root 4096 Jul 17 03:22 ..
-rw-r--r--.  1 root root    0 Jun 29 07:19 callback_receiver.log
-rw-r--r--.  1 root root    0 Jun 29 07:19 fact_receiver.log
-rw-r--r--.  1 root root    0 Jul 15 03:30 setup-2017-07-13-08:51:33.log
-rw-r--r--.  1 root root 8734 Jul 13 08:52 setup-2017-07-13-08:51:33.log-20170715.gz
-rw-r--r--.  1 root root    0 Jun 29 07:19 task_system.log
-rw-r--r--.  1 root root    0 Jun 29 07:19 tower.log
-rw-r--r--.  1 root root    0 Jun 29 07:19 tower_rbac_migrations.log
-rw-r--r--.  1 root root    0 Jun 29 07:19 tower_system_tracking_migrations.log

According to the setup.log there are several ansible tasks, which configures permissions:

TASK [awx_install : Configure the Tower settings directory.] *******************
ok: [localhost] => {"changed": false, "gid": 988, "group": "awx", "mode": "0755", "owner": "awx", "path": "/etc/tower", "secontext": "system_u:object_r:etc_t:s0", "size": 54, "state": "directory", "uid": 991}

selinux is on:

[mn@rlx-v399 ~]$ sestatus
SELinux status:                 enabled
SELinuxfs mount:                /sys/fs/selinux
SELinux root directory:         /etc/selinux
Loaded policy name:             targeted
Current mode:                   enforcing
Mode from config file:          enforcing
Policy MLS status:              enabled
Policy deny_unknown status:     allowed
Max kernel policy version:      28
Comment 13 Gellert Kis 2017-07-24 05:03:23 EDT

more detail from customer :
"
i actually found something in /var/lib/awx/setup/setup.log which indicates that requiretty in /etc/sudoers causes the playbook to stop.
So i've changed that and did all the workaround steps again, to enable ansible worker again.

Now i got an setup.log for the first time in /var/log/tower.
"

Then we got error (see comment #10)
Comment 14 Gellert Kis 2017-07-24 05:41:07 EDT
Most probably our issue is not correct owner in /var/log/tower . If the permissions on files that are needed for the tower-manage --fake migration step are not owned by the awx user, it throws a permissions denied error.

[Errno 13] Permission denied: '/var/log/tower/callback_receiver.log'", "stdout": "", "stdout_lines": [], "warnings": []}

The resolution for the permission denied errors is to ensure whatever file that's throwing the error and then chown :

chown awx:awx /var/log/tower/*
Comment 15 Nick Carboni 2017-07-31 13:40:17 EDT
(In reply to Gellert Kis from comment #12)
> In my appliance this step is skipped . 
> 
> "
> TASK [awx_install : Migrate the Tower database schema (may take awhile when
> upgrading).] ***
> skipping: [localhost] => {"changed": false, "skip_reason": "Conditional
> check failed", "skipped": true}
> "
> 
> In customer env. they have Tower database schema ?

Yes, embedded ansible uses a headless tower installation internally

> Can we delete it ?

No

> We need more understanding of what is Tower database schema and if possible to
> make this step "skip" ? 

Ansible skips steps that are already completed. It works using a desired state so that if no work is needed it doesn't execute the step. This just means that your appliance had previously executed this step successfully and it doesn't need any more work.

(In reply to Gellert Kis from comment #13)
> 
> more detail from customer :
> "
> i actually found something in /var/lib/awx/setup/setup.log which indicates
> that requiretty in /etc/sudoers causes the playbook to stop.
> So i've changed that and did all the workaround steps again, to enable
> ansible worker again.
> 
> Now i got an setup.log for the first time in /var/log/tower.
> "
> 
> Then we got error (see comment #10)

Is this something that was added by the customer? I don't see this in the /etc/sudoers file shipped on our appliance.

(In reply to Gellert Kis from comment #14)
> Most probably our issue is not correct owner in /var/log/tower . If the
> permissions on files that are needed for the tower-manage --fake migration
> step are not owned by the awx user, it throws a permissions denied error.
> 
> [Errno 13] Permission denied: '/var/log/tower/callback_receiver.log'",
> "stdout": "", "stdout_lines": [], "warnings": []}

I'm not sure how something like this could have happened. Is there anything else that was done to this appliance before attempting to enable the embedded ansible role?

> 
> The resolution for the permission denied errors is to ensure whatever file
> that's throwing the error and then chown :
> 
> chown awx:awx /var/log/tower/*

When trying to reproduce this, after a clean deploy and setup, all the logs in /var/log/tower are owned by the awx user (except for the setup log).

I'm not sure there is anything else I can do for this.

I'm going to close this as WORKSFORME.

Note You need to log in before you can comment on or make changes to this bug.