+++ This bug was initially created as a clone of Bug #1456562 +++ Description of problem: (undercloud) [stack@undercloud-0 ~]$ openstack overcloud deploy --libvirt-type kvm --ntp-server clock.redhat.com --control-scale 1 --templates $THT -e $THT/environments/low-memory-usage.yaml -e $THT/environments/docker.yaml -e $THT/environments/major-upgrade-composable-steps-docker.yaml -e $THT/environments/containers-upgrade-repos.yaml -e $THT/environments/docker-osp12.yaml Started Mistral Workflow tripleo.validations.v1.check_pre_deployment_validations. Execution ID: 17ba8909-34a5-4738-a169-47eafa3c7b53 Waiting for messages on queue 'e6be1d56-3d22-45d2-9e5e-e991c1e786b4' with no timeout. (undercloud) [stack@undercloud-0 ~]$ date Mon May 29 10:58:10 EDT 2017 (undercloud) [stack@undercloud-0 ~]$ heat stack-list WARNING (shell) "heat stack-list" is deprecated, please use "openstack stack list" instead ^[[A+--------------------------------------+------------+-----------------+----------------------+--------------+----------------------------------+ | id | stack_name | stack_status | creation_time | updated_time | project | +--------------------------------------+------------+-----------------+----------------------+--------------+----------------------------------+ | 61252786-3df4-4fa8-b91b-868a5af148ad | overcloud | CREATE_COMPLETE | 2017-05-29T08:38:37Z | None | 853b4c20691043b58ab0b5fa344cf23d | +--------------------------------------+------------+-----------------+----------------------+--------------+----------------------------------+ (undercloud) [stack@undercloud-0 ~]$ ps -aux |grep overcloud stack 3471 0.1 0.3 359000 62840 pts/0 S+ 10:49 0:00 /usr/bin/python2 /usr/bin/openstack overcloud deploy --libvirt-type kvm --ntp-server clock.redhat.com --control-scale 1 --templates /usr/share/openstack-tripleo-heat-templates -e /usr/share/openstack-tripleo-heat-templates/environments/low-memory-usage.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/docker.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/major-upgrade-composable-steps-docker.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/containers-upgrade-repos.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/docker-osp12.yaml stack 3908 0.0 0.0 112664 972 pts/1 R+ 10:58 0:00 grep --color=auto overcloud Version-Release number of selected component (if applicable): OSP11->OSP12 How reproducible: alway Steps to Reproduce: 1) Deploy OSP11 Undercloud & Overcloud 2) perform minor update of undercloud node >sudo rhos-release 11 -r 7.4 >sudo systemctl stop 'openstack-*' 'neutron-*' httpd >sudo yum update -y instack-undercloud openstack-puppet-modules openstack-tripleo-common python-tripleoclient >openstack undercloud upgrade >sudo reboot 3) upgrade osp11 undrcloud to osp12 >sudo rhos-release 12-director >sudo systemctl stop openstack-* >sudo systemctl stop neutron-* >sudo systemctl stop httpd Apply workaround for https://bugzilla.redhat.com/show_bug.cgi?id=1456458 >sudo yum -y update instack-undercloud openstack-puppet-modules openstack-tripleo-common python-tripleoclient >openstack undercloud upgrade #Overcloud upgrade source stackrc export IMAGE_TAG=`bash /usr/bin/puddle-version http://download-node-02.eng.bos.redhat.com/rcm-guest/puddles/OpenStack/12.0-RHEL-7/latest_containers` sudo wget -O /home/stack/overcloud_containers.yaml http://file.rdu.redhat.com/~ohochman/containers/overcloud_containers.yaml sudo chown stack:stack /home/stack/overcloud_containers.yaml sudo sed -i 's:<IMAGE_TAG>:'$IMAGE_TAG':g' /home/stack/overcloud_containers.yaml sudo sed -i 's/--insecure-registry 192.168.24.3:8787/--insecure-registry docker-registry.engineering.redhat.com/' /etc/sysconfig/docker sudo service docker restart source /home/stack/stackrc && openstack overcloud container image upload --verbose --config-file /home/stack/overcloud_containers.yaml sudo wget -O /usr/share/openstack-tripleo-heat-templates/environments/docker-osp12.yaml http://file.rdu.redhat.com/~ohochman/containers/docker-osp12.yaml sudo chown stack:stack /usr/share/openstack-tripleo-heat-templates/environments/docker-osp12.yaml sudo sed -i 's:<IMAGE_TAG>:'$IMAGE_TAG':g' /usr/share/openstack-tripleo-heat-templates/environments/docker-osp12.yaml # use master repos cat > ~/containers-upgrade-repos.yaml <<EOEF parameter_defaults: UpgradeInitCommand: | set -e curl -LO http://rhos-release.virt.bos.redhat.com/repos/rhos-release/rhos-release-latest.noarch.rpm rpm -ivh rhos-release-latest.noarch.rpm || true rhos-release 12-director yum clean all EOEF export THT=/usr/share/openstack-tripleo-heat-templates # migrate to containers openstack overcloud deploy \ --libvirt-type qemu \ --ntp-server clock.redhat.com \ --control-scale 1 \ --templates $THT \ -e $THT/environments/low-memory-usage.yaml \ -e $THT/environments/docker.yaml \ -e $THT/environments/major-upgrade-composable-steps-docker.yaml \ -e ~/containers-upgrade-repos.yaml \ -e ~/docker-osp12.yaml Actual results: overcloud upgrade stuck Expected results: overcloud successfully upgraded to next major version Additional info: --- Additional comment from Red Hat Bugzilla Rules Engine on 2017-05-29 11:09:24 EDT --- This bugzilla has been removed from the release and needs to be reviewed and Triaged for another Target Release. --- Additional comment from Red Hat Bugzilla Rules Engine on 2017-05-29 11:14:07 EDT --- This bugzilla has been removed from the release and needs to be reviewed and Triaged for another Target Release. --- Additional comment from Dougal Matthews on 2017-06-07 08:54:05 EDT --- In the Mistral executor logs I found the following error: 2017-06-07 08:31:42.569 1446 ERROR mistral.executors.default_executor Traceback (most recent call last): 2017-06-07 08:31:42.569 1446 ERROR mistral.executors.default_executor File "/usr/lib/python2.7/site-packages/mistral/executors/default_executor.py", line 112, in run_action 2017-06-07 08:31:42.569 1446 ERROR mistral.executors.default_executor result = action.run() 2017-06-07 08:31:42.569 1446 ERROR mistral.executors.default_executor File "/usr/lib/python2.7/site-packages/mistral/actions/openstack/base.py", line 159, in run 2017-06-07 08:31:42.569 1446 ERROR mistral.executors.default_executor (self.__class__.__name__, self.client_method_name, e_str) 2017-06-07 08:31:42.569 1446 ERROR mistral.executors.default_executor ActionException: ZaqarAction.queue_post failed: <class 'zaqarclient.transport.errors.InternalServerError'>: Error response from Zaqar. Code: 500. Text: <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN"> 2017-06-07 08:31:42.569 1446 ERROR mistral.executors.default_executor <html><head> 2017-06-07 08:31:42.569 1446 ERROR mistral.executors.default_executor <title>500 Internal Server Error</title> 2017-06-07 08:31:42.569 1446 ERROR mistral.executors.default_executor </head><body> 2017-06-07 08:31:42.569 1446 ERROR mistral.executors.default_executor <h1>Internal Server Error</h1> 2017-06-07 08:31:42.569 1446 ERROR mistral.executors.default_executor <p>The server encountered an internal error or 2017-06-07 08:31:42.569 1446 ERROR mistral.executors.default_executor misconfiguration and was unable to complete 2017-06-07 08:31:42.569 1446 ERROR mistral.executors.default_executor your request.</p> 2017-06-07 08:31:42.569 1446 ERROR mistral.executors.default_executor <p>Please contact the server administrator at 2017-06-07 08:31:42.569 1446 ERROR mistral.executors.default_executor [no address given] to inform them of the time this error occurred, 2017-06-07 08:31:42.569 1446 ERROR mistral.executors.default_executor and the actions you performed just before this error.</p> 2017-06-07 08:31:42.569 1446 ERROR mistral.executors.default_executor <p>More information about this error may be available 2017-06-07 08:31:42.569 1446 ERROR mistral.executors.default_executor in the server error log.</p> 2017-06-07 08:31:42.569 1446 ERROR mistral.executors.default_executor </body></html> 2017-06-07 08:31:42.569 1446 ERROR mistral.executors.default_executor . 2017-06-07 08:31:42.569 1446 ERROR mistral.executors.default_executor It seems that Zaqar is returning an error every time Mistral attempts to access it. --- Additional comment from Thomas Hervé on 2017-06-07 10:35:35 EDT --- So from the apache logs: [Wed Jun 07 09:56:16.069984 2017] [:error] [pid 1906] [remote 192.168.24.1:144] mod_wsgi (pid=1906): Target WSGI script '/var/www/cgi-bin/zaqar/zaqar-server' cannot be loaded as Python module. [Wed Jun 07 09:56:16.070035 2017] [:error] [pid 1906] [remote 192.168.24.1:144] mod_wsgi (pid=1906): Exception occurred processing WSGI script '/var/www/cgi-bin/zaqar/zaqar-server'. [Wed Jun 07 09:56:16.070054 2017] [:error] [pid 1906] [remote 192.168.24.1:144] Traceback (most recent call last): [Wed Jun 07 09:56:16.070068 2017] [:error] [pid 1906] [remote 192.168.24.1:144] File "/var/www/cgi-bin/zaqar/zaqar-server", line 41, in <module> [Wed Jun 07 09:56:16.070091 2017] [:error] [pid 1906] [remote 192.168.24.1:144] log.register_options(conf) [Wed Jun 07 09:56:16.070098 2017] [:error] [pid 1906] [remote 192.168.24.1:144] File "/usr/lib/python2.7/site-packages/oslo_log/log.py", line 253, in register_options [Wed Jun 07 09:56:16.070110 2017] [:error] [pid 1906] [remote 192.168.24.1:144] conf.register_cli_opts(_options.common_cli_opts) [Wed Jun 07 09:56:16.070115 2017] [:error] [pid 1906] [remote 192.168.24.1:144] File "/usr/lib/python2.7/site-packages/oslo_config/cfg.py", line 2318, in __inner [Wed Jun 07 09:56:16.070124 2017] [:error] [pid 1906] [remote 192.168.24.1:144] result = f(self, *args, **kwargs) [Wed Jun 07 09:56:16.070129 2017] [:error] [pid 1906] [remote 192.168.24.1:144] File "/usr/lib/python2.7/site-packages/oslo_config/cfg.py", line 2540, in register_cli_opts [Wed Jun 07 09:56:16.070136 2017] [:error] [pid 1906] [remote 192.168.24.1:144] self.register_cli_opt(opt, group, clear_cache=False) [Wed Jun 07 09:56:16.070140 2017] [:error] [pid 1906] [remote 192.168.24.1:144] File "/usr/lib/python2.7/site-packages/oslo_config/cfg.py", line 2322, in __inner [Wed Jun 07 09:56:16.070147 2017] [:error] [pid 1906] [remote 192.168.24.1:144] return f(self, *args, **kwargs) [Wed Jun 07 09:56:16.070151 2017] [:error] [pid 1906] [remote 192.168.24.1:144] File "/usr/lib/python2.7/site-packages/oslo_config/cfg.py", line 2532, in register_cli_opt [Wed Jun 07 09:56:16.070185 2017] [:error] [pid 1906] [remote 192.168.24.1:144] raise ArgsAlreadyParsedError("cannot register CLI option") [Wed Jun 07 09:56:16.070201 2017] [:error] [pid 1906] [remote 192.168.24.1:144] ArgsAlreadyParsedError: arguments already parsed: cannot register CLI option Restarting apache seems to fix it, but I don't know why. --- Additional comment from Thomas Hervé on 2017-06-08 15:50:25 EDT --- After the issue reappeared, it seemed that an additional step was required: move /var/log/zaqar/zaqar.log. I wasn't able to know why yet, but maybe permissions error? --- Additional comment from Artem Hrechanychenko on 2017-06-19 11:50:30 EDT --- w/a restart httpd service and delete /var/log/zaqar/zaqar.log --- Additional comment from Dan Prince on 2017-06-19 11:53:40 EDT --- Any chance this is related to https://review.openstack.org/#/c/472618 --- Additional comment from Marius Cornea on 2017-07-11 17:46:54 EDT --- Looks like an SELinux issue: [root@undercloud-0 stack]# grep zaqar /var/log/audit/audit.log type=AVC msg=audit(1499809225.125:6424): avc: denied { open } for pid=23130 comm="httpd" path="/var/log/zaqar/zaqar.log" dev="vda1" ino=12686192 scontext=system_u:system_r:httpd_t:s0 tcontext=system_u:object_r:var_log_t:s0 tclass=file type=AVC msg=audit(1499809391.677:6514): avc: denied { open } for pid=23563 comm="httpd" path="/var/log/zaqar/zaqar.log" dev="vda1" ino=12686192 scontext=system_u:system_r:httpd_t:s0 tcontext=system_u:object_r:var_log_t:s0 tclass=file [root@undercloud-0 stack]# ls -lZ /var/log/zaqar/zaqar.log -rw-r--r--. zaqar zaqar system_u:object_r:var_log_t:s0 /var/log/zaqar/zaqar.log --- Additional comment from Marius Cornea on 2017-07-11 17:49:36 EDT --- After removing the file and restarting httpd: [root@undercloud-0 stack]# ls -lZ /var/log/zaqar/zaqar.log -rw-r--r--. zaqar zaqar system_u:object_r:httpd_log_t:s0 /var/log/zaqar/zaqar.log --- Additional comment from Mike Orazi on 2017-07-25 10:49:15 EDT --- Lon, Can you help confirm this is an selinux bug? --- Additional comment from Marius Cornea on 2017-07-25 17:14 EDT --- After undercloud upgrade: [root@undercloud-0 stack]# rpm -qa | grep openstack-selinux openstack-selinux-0.8.8-0.20170622195307.74ddc0e.el7ost.noarch [root@undercloud-0 stack]# ls -lZ /var/log/zaqar/zaqar.log -rw-r--r--. zaqar zaqar system_u:object_r:var_log_t:s0 /var/log/zaqar/zaqar.log Attaching /var/log/audit/audit.log after undercloud upgrade. --- Additional comment from Marius Cornea on 2017-08-02 10:06:43 EDT --- --- Additional comment from Lon Hohberger on 2017-08-02 11:18:52 EDT --- We need to set /var/log/zaqar/zaqar.log to a different file context and then call 'restorecon' after deployment. --- Additional comment from Lon Hohberger on 2017-08-02 11:55:03 EDT --- https://github.com/redhat-openstack/openstack-selinux/commit/f7297a77903fcb8963c7796e49d357cb7cce2b28 This still needs a restorecon for it to operate correctly.
I created this after initial discussions with Lon suggested that we would need to run restorecon outside of the openstack-selinux package, however that may no longer be the case. I'll close this out if we don't need anything in instack etc.
AFAIK this was fixed elsewhere and thus this bug isn't required anymore.