Bug 1698555 - overcloud node import hangs - mistral_executor is down due /var/lib/undercloud.conf missing
Summary: overcloud node import hangs - mistral_executor is down due /var/lib/underclou...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo-heat-templates
Version: 15.0 (Stein)
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: beta
: 15.0 (Stein)
Assignee: Michele Baldessari
QA Contact: Sasha Smolyak
URL:
Whiteboard:
: 1698540 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-04-10 15:25 UTC by Pavel Sedlák
Modified: 2019-09-26 10:49 UTC (History)
9 users (show)

Fixed In Version: openstack-tripleo-heat-templates-10.4.1-0.20190412000410.b934fdd.el8ost.noarch
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-09-21 11:21:11 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
OpenStack gerrit 651602 0 None MERGED Bind mount undercloud.conf with ,z in mistral_executor 2020-10-07 21:05:27 UTC
Red Hat Product Errata RHEA-2019:2811 0 None None None 2019-09-21 11:21:35 UTC

Description Pavel Sedlák 2019-04-10 15:25:26 UTC
During OSP15 deployment import of instackenv.json hangs even for hours.

Looking at failed systemd services, mistral-executor is down:
> (undercloud) [stack@undercloud-0 ~]$ systemctl --state=failed
>   UNIT                                         LOAD   ACTIVE SUB    DESCRIPTION                 
> ● NetworkManager-wait-online.service           loaded failed failed Network Manager Wait Online 
> ● tripleo_mistral_executor.service             loaded failed failed mistral_executor container  
> ● tripleo_mistral_executor_healthcheck.service loaded failed failed mistral_executor healthcheck
> 
> LOAD   = Reflects whether the unit definition was properly loaded.
> ACTIVE = The high-level unit activation state, i.e. generalization of SUB.
> SUB    = The low-level unit activation state, values depend on unit type.
> 
> 3 loaded units listed. Pass --all to see loaded but inactive units, too.
> To show all installed unit files use 'systemctl list-unit-files'.

From journalctl -u tripleo_mistral_executor.service there is visible:
> Apr 10 12:39:50 undercloud-0.redhat.local podman[52332]: INFO:__main__:Copying /var/lib/kolla/config_files/src/etc/mistral/mistral.conf to /etc/mistral/mistral.conf
> Apr 10 12:39:50 undercloud-0.redhat.local podman[52332]: INFO:__main__:Deleting /etc/my.cnf.d/tripleo.cnf
> Apr 10 12:39:50 undercloud-0.redhat.local podman[52332]: INFO:__main__:Copying /var/lib/kolla/config_files/src/etc/my.cnf.d/tripleo.cnf to /etc/my.cnf.d/tripleo.cnf
> Apr 10 12:39:50 undercloud-0.redhat.local podman[52332]: INFO:__main__:Deleting /var/www/cgi-bin/mistral/app
> Apr 10 12:39:50 undercloud-0.redhat.local podman[52332]: INFO:__main__:Copying /var/lib/kolla/config_files/src/var/www/cgi-bin/mistral/app to /var/www/cgi-bin/mistral/app
> Apr 10 12:39:50 undercloud-0.redhat.local podman[52332]: ERROR:__main__:MissingRequiredSource: /var/lib/undercloud.conf file is not found
> Apr 10 12:39:50 undercloud-0.redhat.local systemd[1]: tripleo_mistral_executor.service: Main process exited, code=exited, status=1/FAILURE
> Apr 10 12:39:50 undercloud-0.redhat.local systemd[1]: tripleo_mistral_executor.service: Failed with result 'exit-code'.
> Apr 10 12:39:50 undercloud-0.redhat.local systemd[1]: tripleo_mistral_executor.service: Service RestartSec=100ms expired, scheduling restart.
> Apr 10 12:39:50 undercloud-0.redhat.local systemd[1]: tripleo_mistral_executor.service: Scheduled restart job, restart counter is at 29.
> Apr 10 12:39:50 undercloud-0.redhat.local systemd[1]: Stopped mistral_executor container.
> Apr 10 12:39:50 undercloud-0.redhat.local systemd[1]: tripleo_mistral_executor.service: Start request repeated too quickly.
> Apr 10 12:39:50 undercloud-0.redhat.local systemd[1]: tripleo_mistral_executor.service: Failed with result 'exit-code'.
> Apr 10 12:39:50 undercloud-0.redhat.local systemd[1]: Failed to start mistral_executor container.
> Apr 10 12:43:20 undercloud-0.redhat.local systemd[1]: mistral_executor container is not active.
> Apr 10 12:44:50 undercloud-0.redhat.local systemd[1]: mistral_executor container is not active.

Inside podman image inspect mistral_executor "config_data": "volumes" i see \"/home/stack/undercloud.conf:/var/lib/undercloud.conf:ro\".

Now switching selinux to permissive enables successful start of:
> [root@undercloud-0 ~]# systemctl start tripleo_mistral_executor
> [root@undercloud-0 ~]# systemctl status tripleo_mistral_executor
> ● tripleo_mistral_executor.service - mistral_executor container
>    Loaded: loaded (/etc/systemd/system/tripleo_mistral_executor.service; enabled; vendor preset: disabled)
>    Active: active (running) since Wed 2019-04-10 15:13:48 UTC; 3s ago
>   Process: 227926 ExecStop=/usr/bin/podman stop -t 10 mistral_executor (code=exited, status=0/SUCCESS)
>  Main PID: 237380 (podman)
>     Tasks: 14 (limit: 26213)
>    Memory: 21.5M
>    CGroup: /system.slice/tripleo_mistral_executor.service
>            └─237380 /usr/bin/podman start -a mistral_executor

From audit.log then:
> type=AVC msg=audit(1554909228.545:11915): avc:  denied  { read } for  pid=237729 comm="python" name="undercloud.conf" dev="vda1" ino=75552388 scontext=system_u:system_r:container_t:s0:c349,c825 tcontext=unconfined_u:object_r:user_home_t:s0 tclass=file permissive=1
> type=AVC msg=audit(1554909228.545:11915): avc:  denied  { open } for  pid=237729 comm="python" path="/var/lib/undercloud.conf" dev="vda1" ino=75552388 scontext=system_u:system_r:container_t:s0:c349,c825 tcontext=unconfined_u:object_r:user_home_t:s0 tclass=file permissive=1
> type=AVC msg=audit(1554909228.545:11916): avc:  denied  { ioctl } for  pid=237729 comm="python" path="/var/lib/undercloud.conf" dev="vda1" ino=75552388 ioctlcmd=0x5401 scontext=system_u:system_r:container_t:s0:c349,c825 tcontext=unconfined_u:object_r:user_home_t:s0 tclass=file permissive=1
> type=AVC msg=audit(1554909228.545:11917): avc:  denied  { relabelto } for  pid=237729 comm="python" name="undercloud.conf" dev="vda1" ino=10041677 scontext=system_u:system_r:container_t:s0:c349,c825 tcontext=unconfined_u:object_r:user_home_t:s0 tclass=file permissive=1
> type=AVC msg=audit(1554909228.545:11918): avc:  denied  { setattr } for  pid=237729 comm="python" name="undercloud.conf" dev="vda1" ino=10041677 scontext=system_u:system_r:container_t:s0:c349,c825 tcontext=unconfined_u:object_r:user_home_t:s0 tclass=file permissive=1


Some of packages/images involved:
> container-selinux-2.75-1.git99e2cfd.module+el8+2769+577ad176.noarch
> libselinux-2.8-6.el8.x86_64
> libselinux-ruby-2.8-6.el8.x86_64
> libselinux-utils-2.8-6.el8.x86_64
> openstack-selinux-0.8.18-0.20190329040328.4c5ed0f.el8ost.noarch
> openvswitch-selinux-extra-policy-1.0-10.el8fdb.noarch
> python3-libselinux-2.8-6.el8.x86_64
> rpm-plugin-selinux-4.14.2-9.el8.x86_64
> selinux-policy-3.14.1-61.el8.noarch
> selinux-policy-targeted-3.14.1-61.el8.noarch
> ansible-role-tripleo-modify-image-1.0.1-0.20190402220346.012209a.el8ost.noarch
> ansible-tripleo-ipsec-9.0.1-0.20190220162047.f60ad6c.el8ost.noarch
> openstack-tripleo-common-10.6.2-0.20190408160359.d8fded9.el8ost.noarch
> openstack-tripleo-common-containers-10.6.2-0.20190408160359.d8fded9.el8ost.noarch
> openstack-tripleo-heat-templates-10.4.1-0.20190409050352.58ff7df.el8ost.noarch
> openstack-tripleo-image-elements-10.3.1-0.20190325204940.253fe88.el8ost.noarch
> openstack-tripleo-puppet-elements-10.2.1-0.20190408131411.a72c6b3.el8ost.noarch
> openstack-tripleo-validations-10.3.1-0.20190404130349.6ecfb48.el8ost.noarch
> puppet-mistral-14.4.1-0.20190328231250.f9e938d.el8ost.noarch
> puppet-tripleo-10.3.1-0.20190405000342.566703d.el8ost.noarch
> python3-mistral-lib-1.1.0-0.20190312192103.bac92db.el8ost.noarch
> python3-mistralclient-3.8.1-0.20190318115402.0cd6b28.el8ost.noarch
> python3-tripleo-common-10.6.2-0.20190408160359.d8fded9.el8ost.noarch
> python3-tripleoclient-11.3.1-0.20190409084327.be9b7ef.el8ost.noarch
> python3-tripleoclient-heat-installer-11.3.1-0.20190409084327.be9b7ef.el8ost.noarch
> 
> [root@undercloud-0 ~]# podman images|grep -i mistr
> 192.168.24.1:8787/rhosp15/openstack-mistral-event-engine        20190409.1   250f01a40e46   18 hours ago   989 MB
> 192.168.24.1:8787/rhosp15/openstack-mistral-api                 20190409.1   d063f718d5e5   18 hours ago   1.01 GB
> 192.168.24.1:8787/rhosp15/openstack-mistral-engine              20190409.1   a78d46f6594b   19 hours ago   989 MB
> 192.168.24.1:8787/rhosp15/openstack-mistral-executor            20190409.1   68c1f09c2bfa   19 hours ago   1.23 GB

Comment 2 Julie Pichon 2019-04-10 16:03:54 UTC
I thought perhaps the undercloud.conf file is keeping the original home directory context permissions when mounted, when it should switch to a mistral or container-specific context. However, AlistairT ran ls -Z in an environment where this is working in enforcing mode and it appears like this should work even with the unconfined user_home_t context:

podman  exec -u root mistral_executor bash
 
ls -lZ /var/lib/undercloud.conf
-rwxr-xr-x. 1 1001 1001 unconfined_u:object_r:user_home_t:s0 891 Apr  4 10:54 /var/lib/undercloud.conf

Investigating further. I doubt we want to give containers read access to the home directory in general.

Comment 3 Michele Baldessari 2019-04-10 16:16:17 UTC
A) Enforcing on        
# 68c1f09c2bfa is the mistral image                                        
podman run -it --rm -user=root --net=host -e KOLLA_INSTALL_METATYPE=rhos -e KOLLA_INSTALL_TYPE=binary \                   
  -e KOLLA_BASE_DISTRO=rhel -e KOLLA_CONFIG_STRATEGY=COPY_ALWAYS -e KOLLA_DISTRO_PYTHON_VERSION=3.6 \                     
  -v /home/stack/undercloud.conf:/var/lib/undercloud.conf \                
  -v /var/lib/kolla/config_files/mistral_executor.json:/var/lib/kolla/config_files/config.json \
  -v /var/lib/config-data/puppet-generated/mistral/:/var/lib/kolla/config_files/src 68c1f09c2bfa sh                       
[root@undercloud-0 ~]# sh x.sh                              
()[root@undercloud-0 /]$ kolla_set_configs                                 
INFO:__main__:Loading config file at /var/lib/kolla/config_files/config.json
....snip....           
INFO:__main__:Copying /var/lib/kolla/config_files/src/var/www/cgi-bin/mistral/app to /var/www/cgi-bin/mistral/app         
ERROR:__main__:MissingRequiredSource: /var/lib/undercloud.conf file is not found
                       
                       
The error is a bit misleading because the file is actually there:          
()[root@undercloud-0 /]$ ls -1 /var/lib/ |grep -i undercloud.conf          
undercloud.conf        
                       
The problem is that we cannot access it:                                   
()[root@undercloud-0 /]$ ls -lZ /var/lib/undercloud.conf                   
ls: cannot access '/var/lib/undercloud.conf': Permission denied            
                       
[root@undercloud-0 ~]# ls -ldZ /home/stack/ ; ls -lZ /home/stack/undercloud.conf 
drwx------. 9 stack stack unconfined_u:object_r:user_home_dir_t:s0 4096 Apr 10 11:06 /home/stack/                         
-rwxr-xr-x. 1 stack stack unconfined_u:object_r:user_home_t:s0 891 Apr 10 10:23 /home/stack/undercloud.conf               
                       
The denied I see around this are all about dbus and sudo (which is likely https://bugs.launchpad.net/tripleo/+bug/1819461)
so not sure they are relevant (?):                          
type=AVC msg=audit(1554910473.290:10321): avc:  denied  { connectto } for  pid=130205 comm="sudo" path="/run/dbus/system_bus_socket" scontext=system_u:system_r:container_t:s0:c363,c968 tcontext=system_u:system_r:system_dbusd_t:s0-s0:c0.c1023 tclass=unix_stream_socket permissive=0
type=AVC msg=audit(1554910473.293:10325): avc:  denied  { connectto } for  pid=130205 comm="sudo" path="/run/dbus/system_bus_socket" scontext=system_u:system_r:container_t:s0:c363,c968 tcontext=system_u:system_r:system_dbusd_t:s0-s0:c0.c1023 tclass=unix_stream_socket permissive=0
                       
                       
B) Enforcing off       
podman run -it --user=root --rm --net=host -e KOLLA_INSTALL_METATYPE=rhos -e KOLLA_INSTALL_TYPE=binary \
  -e KOLLA_BASE_DISTRO=rhel -e KOLLA_CONFIG_STRATEGY=COPY_ALWAYS -e KOLLA_DISTRO_PYTHON_VERSION=3.6 \
  -v /home/stack/undercloud.conf:/var/lib/undercloud.conf \
  -v /var/lib/kolla/config_files/mistral_executor.json:/var/lib/kolla/config_files/config.json \
  -v /var/lib/config-data/puppet-generated/mistral/:/var/lib/kolla/config_files/src 68c1f09c2bfa sh
()[root@undercloud-0 /]$ ls -lZ /var/lib/undercloud.conf 
-rwxr-xr-x. 1 1001 1001 unconfined_u:object_r:user_home_t:s0 891 Apr 10 14:23 /var/lib/undercloud.conf
                       
Seems to work correctly               

What I do not fully understand is why we do not see any denials specific to these ls -l commands run as root user (without the sudo)?

Comment 4 Michele Baldessari 2019-04-10 16:27:14 UTC
Julie++'s suggestion seems to have worked:
1) Edited                  
/var/lib/tripleo-config/hashed-container-startup-config-step_4.json and added ',z' to the mistral_executor stanza for the undercloud.conf file only:
  "/home/stack/undercloud.conf:/var/lib/undercloud.conf:ro,z",
                           
2) Ran paunch again for step4                              
paunch --debug apply --default-runtime podman --file /var/lib/tripleo-config/hashed-container-startup-config-step_4.json --config-id tripleo_step4 --managed-by tripleo-Undercloud 2>&1 | tee /tmp/paunch.log
                           
3) mistral_executor is up and running:                     
[root@undercloud-0 ~]# podman ps |grep mistral_ex          
e220c54e027d  192.168.24.1:8787/rhosp15/openstack-mistral-executor:20190409.1           dumb-init --singl...  About a minute ago  Up About a minute ago         mistral_executor
                         
[root@undercloud-0 ~]# podman logs mistral_executor 2>&1|tail -n10
+ . kolla_extend_start     
++ [[ ! -d /var/log/kolla/mistral ]]                       
++ mkdir -p /var/log/kolla/mistral                         
+++ stat -c %a /var/log/kolla/mistral                      
++ [[ 2755 != \7\5\5 ]]    
++ chmod 755 /var/log/kolla/mistral                        
Running command: '/usr/bin/mistral-server --config-file=/etc/mistral/mistral.conf --log-file=/var/log/mistral/executor.log --server=executor'
++ . /usr/local/bin/kolla_mistral_extend_start             
+ echo 'Running command: '\''/usr/bin/mistral-server --config-file=/etc/mistral/mistral.conf --log-file=/var/log/mistral/executor.log --server=executor'\'''
+ exec /usr/bin/mistral-server --config-file=/etc/mistral/mistral.conf --log-file=/var/log/mistral/executor.log --server=executor

Comment 6 Michele Baldessari 2019-04-11 04:53:41 UTC
*** Bug 1698540 has been marked as a duplicate of this bug. ***

Comment 8 Sasha Smolyak 2019-04-22 11:03:14 UTC
Node import passed successfully, no mistral in list of failed

Comment 11 errata-xmlrpc 2019-09-21 11:21:11 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2019:2811


Note You need to log in before you can comment on or make changes to this bug.