Bug 1456562 - Cannot upgrade OSP11 to OSP12 due to freeze after executing of overcloud deploy command [NEEDINFO]
Summary: Cannot upgrade OSP11 to OSP12 due to freeze after executing of overcloud depl...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-selinux
Version: 12.0 (Pike)
Hardware: x86_64
OS: Linux
high
urgent
Target Milestone: ga
: 12.0 (Pike)
Assignee: Lon Hohberger
QA Contact: Udi Shkalim
URL:
Whiteboard:
: 1477611 (view as bug list)
Depends On:
Blocks: 1478004
TreeView+ depends on / blocked
 
Reported: 2017-05-29 15:09 UTC by Artem Hrechanychenko
Modified: 2018-02-05 19:07 UTC (History)
19 users (show)

Fixed In Version: openstack-selinux-0.8.8-0.20170804200925.ad96ed3.el7ost
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1478004 (view as bug list)
Environment:
Last Closed: 2017-12-13 21:29:42 UTC
Target Upstream Version:
dprince: needinfo? (shardy)


Attachments (Terms of Use)
audit.log (7.75 MB, text/plain)
2017-07-25 21:14 UTC, Marius Cornea
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2017:3462 0 normal SHIPPED_LIVE Red Hat OpenStack Platform 12.0 Enhancement Advisory 2018-02-16 01:43:25 UTC

Description Artem Hrechanychenko 2017-05-29 15:09:19 UTC
Description of problem:
(undercloud) [stack@undercloud-0 ~]$ openstack overcloud deploy --libvirt-type kvm --ntp-server clock.redhat.com     --control-scale 1     --templates $THT     -e $THT/environments/low-memory-usage.yaml     -e $THT/environments/docker.yaml     -e $THT/environments/major-upgrade-composable-steps-docker.yaml     -e $THT/environments/containers-upgrade-repos.yaml     -e $THT/environments/docker-osp12.yaml
Started Mistral Workflow tripleo.validations.v1.check_pre_deployment_validations. Execution ID: 17ba8909-34a5-4738-a169-47eafa3c7b53
Waiting for messages on queue 'e6be1d56-3d22-45d2-9e5e-e991c1e786b4' with no timeout.

(undercloud) [stack@undercloud-0 ~]$ date
Mon May 29 10:58:10 EDT 2017
(undercloud) [stack@undercloud-0 ~]$ heat stack-list
WARNING (shell) "heat stack-list" is deprecated, please use "openstack stack list" instead
^[[A+--------------------------------------+------------+-----------------+----------------------+--------------+----------------------------------+
| id                                   | stack_name | stack_status    | creation_time        | updated_time | project                          |
+--------------------------------------+------------+-----------------+----------------------+--------------+----------------------------------+
| 61252786-3df4-4fa8-b91b-868a5af148ad | overcloud  | CREATE_COMPLETE | 2017-05-29T08:38:37Z | None         | 853b4c20691043b58ab0b5fa344cf23d |
+--------------------------------------+------------+-----------------+----------------------+--------------+----------------------------------+
(undercloud) [stack@undercloud-0 ~]$ ps -aux |grep overcloud
stack     3471  0.1  0.3 359000 62840 pts/0    S+   10:49   0:00 /usr/bin/python2 /usr/bin/openstack overcloud deploy --libvirt-type kvm --ntp-server clock.redhat.com --control-scale 1 --templates /usr/share/openstack-tripleo-heat-templates -e /usr/share/openstack-tripleo-heat-templates/environments/low-memory-usage.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/docker.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/major-upgrade-composable-steps-docker.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/containers-upgrade-repos.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/docker-osp12.yaml
stack     3908  0.0  0.0 112664   972 pts/1    R+   10:58   0:00 grep --color=auto overcloud


Version-Release number of selected component (if applicable):
OSP11->OSP12

How reproducible:
alway

Steps to Reproduce:
1) Deploy OSP11 Undercloud & Overcloud
2) perform minor update of undercloud node
>sudo rhos-release 11 -r 7.4
>sudo systemctl stop 'openstack-*' 'neutron-*' httpd
>sudo yum update -y instack-undercloud openstack-puppet-modules openstack-tripleo-common python-tripleoclient
>openstack undercloud upgrade
>sudo reboot

3) upgrade osp11 undrcloud to osp12

>sudo rhos-release 12-director
>sudo systemctl stop openstack-*
>sudo systemctl stop neutron-*
>sudo systemctl stop httpd

Apply workaround for https://bugzilla.redhat.com/show_bug.cgi?id=1456458

>sudo yum -y update instack-undercloud openstack-puppet-modules openstack-tripleo-common python-tripleoclient
>openstack undercloud upgrade

#Overcloud upgrade 
source stackrc

export IMAGE_TAG=`bash /usr/bin/puddle-version http://download-node-02.eng.bos.redhat.com/rcm-guest/puddles/OpenStack/12.0-RHEL-7/latest_containers`

 sudo wget -O /home/stack/overcloud_containers.yaml http://file.rdu.redhat.com/~ohochman/containers/overcloud_containers.yaml
    sudo chown stack:stack /home/stack/overcloud_containers.yaml
    sudo sed -i 's:<IMAGE_TAG>:'$IMAGE_TAG':g' /home/stack/overcloud_containers.yaml
    sudo sed -i 's/--insecure-registry 192.168.24.3:8787/--insecure-registry docker-registry.engineering.redhat.com/' /etc/sysconfig/docker
    sudo service docker restart
    source /home/stack/stackrc && openstack overcloud container image upload --verbose --config-file /home/stack/overcloud_containers.yaml 
    
sudo wget -O /usr/share/openstack-tripleo-heat-templates/environments/docker-osp12.yaml http://file.rdu.redhat.com/~ohochman/containers/docker-osp12.yaml
    sudo chown stack:stack /usr/share/openstack-tripleo-heat-templates/environments/docker-osp12.yaml

    sudo sed -i 's:<IMAGE_TAG>:'$IMAGE_TAG':g' /usr/share/openstack-tripleo-heat-templates/environments/docker-osp12.yaml


# use master repos
cat > ~/containers-upgrade-repos.yaml <<EOEF
parameter_defaults:
  UpgradeInitCommand: |
    set -e
    curl -LO http://rhos-release.virt.bos.redhat.com/repos/rhos-release/rhos-release-latest.noarch.rpm
    rpm -ivh rhos-release-latest.noarch.rpm || true
    rhos-release 12-director
    yum clean all
EOEF

export THT=/usr/share/openstack-tripleo-heat-templates

# migrate to containers
openstack overcloud deploy \
    --libvirt-type qemu \
    --ntp-server clock.redhat.com \
    --control-scale 1 \
    --templates $THT \
    -e $THT/environments/low-memory-usage.yaml \
    -e $THT/environments/docker.yaml \
    -e $THT/environments/major-upgrade-composable-steps-docker.yaml \
    -e ~/containers-upgrade-repos.yaml \
    -e ~/docker-osp12.yaml

Actual results:
overcloud upgrade stuck

Expected results:

overcloud successfully upgraded to next major version
Additional info:

Comment 1 Red Hat Bugzilla Rules Engine 2017-05-29 15:09:24 UTC
This bugzilla has been removed from the release and needs to be reviewed and Triaged for another Target Release.

Comment 2 Red Hat Bugzilla Rules Engine 2017-05-29 15:14:07 UTC
This bugzilla has been removed from the release and needs to be reviewed and Triaged for another Target Release.

Comment 3 Dougal Matthews 2017-06-07 12:54:05 UTC
In the Mistral executor logs I found the following error:

2017-06-07 08:31:42.569 1446 ERROR mistral.executors.default_executor Traceback (most recent call last):
2017-06-07 08:31:42.569 1446 ERROR mistral.executors.default_executor   File "/usr/lib/python2.7/site-packages/mistral/executors/default_executor.py", line 112, in run_action
2017-06-07 08:31:42.569 1446 ERROR mistral.executors.default_executor     result = action.run()
2017-06-07 08:31:42.569 1446 ERROR mistral.executors.default_executor   File "/usr/lib/python2.7/site-packages/mistral/actions/openstack/base.py", line 159, in run
2017-06-07 08:31:42.569 1446 ERROR mistral.executors.default_executor     (self.__class__.__name__, self.client_method_name, e_str)
2017-06-07 08:31:42.569 1446 ERROR mistral.executors.default_executor ActionException: ZaqarAction.queue_post failed: <class 'zaqarclient.transport.errors.InternalServerError'>: Error response from Zaqar. Code: 500. Text: <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
2017-06-07 08:31:42.569 1446 ERROR mistral.executors.default_executor <html><head>
2017-06-07 08:31:42.569 1446 ERROR mistral.executors.default_executor <title>500 Internal Server Error</title>
2017-06-07 08:31:42.569 1446 ERROR mistral.executors.default_executor </head><body>
2017-06-07 08:31:42.569 1446 ERROR mistral.executors.default_executor <h1>Internal Server Error</h1>
2017-06-07 08:31:42.569 1446 ERROR mistral.executors.default_executor <p>The server encountered an internal error or
2017-06-07 08:31:42.569 1446 ERROR mistral.executors.default_executor misconfiguration and was unable to complete
2017-06-07 08:31:42.569 1446 ERROR mistral.executors.default_executor your request.</p>
2017-06-07 08:31:42.569 1446 ERROR mistral.executors.default_executor <p>Please contact the server administrator at 
2017-06-07 08:31:42.569 1446 ERROR mistral.executors.default_executor  [no address given] to inform them of the time this error occurred,
2017-06-07 08:31:42.569 1446 ERROR mistral.executors.default_executor  and the actions you performed just before this error.</p>
2017-06-07 08:31:42.569 1446 ERROR mistral.executors.default_executor <p>More information about this error may be available
2017-06-07 08:31:42.569 1446 ERROR mistral.executors.default_executor in the server error log.</p>
2017-06-07 08:31:42.569 1446 ERROR mistral.executors.default_executor </body></html>
2017-06-07 08:31:42.569 1446 ERROR mistral.executors.default_executor .
2017-06-07 08:31:42.569 1446 ERROR mistral.executors.default_executor 



It seems that Zaqar is returning an error every time Mistral attempts to access it.

Comment 4 Thomas Hervé 2017-06-07 14:35:35 UTC
So from the apache logs:

[Wed Jun 07 09:56:16.069984 2017] [:error] [pid 1906] [remote 192.168.24.1:144] mod_wsgi (pid=1906): Target WSGI script '/var/www/cgi-bin/zaqar/zaqar-server' cannot be loaded as Python module.
[Wed Jun 07 09:56:16.070035 2017] [:error] [pid 1906] [remote 192.168.24.1:144] mod_wsgi (pid=1906): Exception occurred processing WSGI script '/var/www/cgi-bin/zaqar/zaqar-server'.
[Wed Jun 07 09:56:16.070054 2017] [:error] [pid 1906] [remote 192.168.24.1:144] Traceback (most recent call last):
[Wed Jun 07 09:56:16.070068 2017] [:error] [pid 1906] [remote 192.168.24.1:144]   File "/var/www/cgi-bin/zaqar/zaqar-server", line 41, in <module>
[Wed Jun 07 09:56:16.070091 2017] [:error] [pid 1906] [remote 192.168.24.1:144]     log.register_options(conf)
[Wed Jun 07 09:56:16.070098 2017] [:error] [pid 1906] [remote 192.168.24.1:144]   File "/usr/lib/python2.7/site-packages/oslo_log/log.py", line 253, in register_options
[Wed Jun 07 09:56:16.070110 2017] [:error] [pid 1906] [remote 192.168.24.1:144]     conf.register_cli_opts(_options.common_cli_opts)
[Wed Jun 07 09:56:16.070115 2017] [:error] [pid 1906] [remote 192.168.24.1:144]   File "/usr/lib/python2.7/site-packages/oslo_config/cfg.py", line 2318, in __inner
[Wed Jun 07 09:56:16.070124 2017] [:error] [pid 1906] [remote 192.168.24.1:144]     result = f(self, *args, **kwargs)
[Wed Jun 07 09:56:16.070129 2017] [:error] [pid 1906] [remote 192.168.24.1:144]   File "/usr/lib/python2.7/site-packages/oslo_config/cfg.py", line 2540, in register_cli_opts
[Wed Jun 07 09:56:16.070136 2017] [:error] [pid 1906] [remote 192.168.24.1:144]     self.register_cli_opt(opt, group, clear_cache=False)
[Wed Jun 07 09:56:16.070140 2017] [:error] [pid 1906] [remote 192.168.24.1:144]   File "/usr/lib/python2.7/site-packages/oslo_config/cfg.py", line 2322, in __inner
[Wed Jun 07 09:56:16.070147 2017] [:error] [pid 1906] [remote 192.168.24.1:144]     return f(self, *args, **kwargs)
[Wed Jun 07 09:56:16.070151 2017] [:error] [pid 1906] [remote 192.168.24.1:144]   File "/usr/lib/python2.7/site-packages/oslo_config/cfg.py", line 2532, in register_cli_opt
[Wed Jun 07 09:56:16.070185 2017] [:error] [pid 1906] [remote 192.168.24.1:144]     raise ArgsAlreadyParsedError("cannot register CLI option")
[Wed Jun 07 09:56:16.070201 2017] [:error] [pid 1906] [remote 192.168.24.1:144] ArgsAlreadyParsedError: arguments already parsed: cannot register CLI option


Restarting apache seems to fix it, but I don't know why.

Comment 5 Thomas Hervé 2017-06-08 19:50:25 UTC
After the issue reappeared, it seemed that an additional step was required: move /var/log/zaqar/zaqar.log. I wasn't able to know why yet, but maybe permissions error?

Comment 6 Artem Hrechanychenko 2017-06-19 15:50:30 UTC
w/a restart httpd service and delete /var/log/zaqar/zaqar.log

Comment 7 Dan Prince 2017-06-19 15:53:40 UTC
Any chance this is related to https://review.openstack.org/#/c/472618

Comment 8 Marius Cornea 2017-07-11 21:46:54 UTC
Looks like an SELinux issue:

 [root@undercloud-0 stack]# grep zaqar /var/log/audit/audit.log                                                                                                                                                                                
type=AVC msg=audit(1499809225.125:6424): avc:  denied  { open } for  pid=23130 comm="httpd" path="/var/log/zaqar/zaqar.log" dev="vda1" ino=12686192 scontext=system_u:system_r:httpd_t:s0 tcontext=system_u:object_r:var_log_t:s0 tclass=file
type=AVC msg=audit(1499809391.677:6514): avc:  denied  { open } for  pid=23563 comm="httpd" path="/var/log/zaqar/zaqar.log" dev="vda1" ino=12686192 scontext=system_u:system_r:httpd_t:s0 tcontext=system_u:object_r:var_log_t:s0 tclass=file


 [root@undercloud-0 stack]# ls -lZ /var/log/zaqar/zaqar.log 
-rw-r--r--. zaqar zaqar system_u:object_r:var_log_t:s0   /var/log/zaqar/zaqar.log

Comment 9 Marius Cornea 2017-07-11 21:49:36 UTC
After removing the file and restarting httpd:

[root@undercloud-0 stack]# ls -lZ /var/log/zaqar/zaqar.log
-rw-r--r--. zaqar zaqar system_u:object_r:httpd_log_t:s0 /var/log/zaqar/zaqar.log

Comment 10 Mike Orazi 2017-07-25 14:49:15 UTC
Lon,

Can you help confirm this is an selinux bug?

Comment 11 Marius Cornea 2017-07-25 21:14:42 UTC
Created attachment 1304472 [details]
audit.log

After undercloud upgrade:

[root@undercloud-0 stack]# rpm -qa | grep openstack-selinux
openstack-selinux-0.8.8-0.20170622195307.74ddc0e.el7ost.noarch

[root@undercloud-0 stack]#  ls -lZ /var/log/zaqar/zaqar.log
-rw-r--r--. zaqar zaqar system_u:object_r:var_log_t:s0   /var/log/zaqar/zaqar.log

Attaching /var/log/audit/audit.log after undercloud upgrade.

Comment 12 Marius Cornea 2017-08-02 14:06:43 UTC
*** Bug 1477611 has been marked as a duplicate of this bug. ***

Comment 13 Lon Hohberger 2017-08-02 15:18:52 UTC
We need to set /var/log/zaqar/zaqar.log to a different file context and then call 'restorecon' after deployment.

Comment 14 Lon Hohberger 2017-08-02 15:55:03 UTC
https://github.com/redhat-openstack/openstack-selinux/commit/f7297a77903fcb8963c7796e49d357cb7cce2b28

This still needs a restorecon for it to operate correctly.

Comment 21 errata-xmlrpc 2017-12-13 21:29:42 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2017:3462


Note You need to log in before you can comment on or make changes to this bug.