Bug 1608529

Summary: Overcloud deployment may fail with OSError: [Errno 13] Permission denied: '/var/lib/ironic/httpboot/1a47efe7-07f0-41df-932f-ea48ec23a7ac'
Product: Red Hat OpenStack Reporter: Marius Cornea <mcornea>
Component: openstack-tripleo-heat-templatesAssignee: RHOS Maint <rhos-maint>
Status: CLOSED ERRATA QA Contact: Filip Hubík <fhubik>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 14.0 (Rocky)CC: agurenko, apevec, aschultz, bdobreli, bfournie, dbecker, dpeacock, emacchi, fhubik, lhh, mburns, morazi, racedoro, rhos-maint, srevivo, therve, tvignaud
Target Milestone: Upstream M3Keywords: Triaged
Target Release: 14.0 (Rocky)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: openstack-tripleo-heat-templates-9.0.0-0.20180726103746.5fefd0b.el7ost Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-01-11 11:50:58 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Marius Cornea 2018-07-25 17:54:14 UTC
Description of problem:
Overcloud deployment may fail with OSError: [Errno 13] Permission denied: '/var/lib/ironic/httpboot/1a47efe7-07f0-41df-932f-ea48ec23a7ac'

Version-Release number of selected component (if applicable):
openstack-tripleo-heat-templates-9.0.0-0.20180717094148.d8b7b19.el7ost.noarch

How reproducible:
100%

Steps to Reproduce:
1. Deploy OSP14 overcloud

Actual results:
/var/log/containers/ironic/ironic-conductor.log shows errors like OSError: [Errno 13] Permission denied: '/var/lib/ironic/httpboot/1a47efe7-07f0-41df-932f-ea48ec23a7ac' and deployment fails.

Expected results:
Deployment passes.

Additional info:

Comment 1 David Peacock 2018-07-25 18:10:51 UTC
Fixed upstream already - needs downstream.

https://bugs.launchpad.net/tripleo/+bug/1782267

Comment 4 Marius Cornea 2018-07-26 02:37:36 UTC
The patch landed in the latest puddle but the issue is still present, I suspect it's related to the /var/lib/ironic/httpboot ownership:

[root@undercloud-0 stack]# ls -lah /var/lib/ironic/
total 4.0K
drwxr-xr-x.  4 42422 42422   38 Jul 25 20:39 .
drwxr-xr-x. 61 root  root  4.0K Jul 25 20:53 ..
drwxr-xr-x.  2 root  root    86 Jul 25 21:04 httpboot
drwxr-xr-x.  4 42422 42422  135 Jul 25 21:57 tftpboot

[root@undercloud-0 stack]# ls -lah /var/lib/ironic/httpboot/
total 412M
drwxr-xr-x. 2 root  root    86 Jul 25 21:04 .
drwxr-xr-x. 4 42422 42422   38 Jul 25 20:39 ..
-rwxr-xr-x. 1 42422 42422 6.1M Jul 25 21:00 agent.kernel
-rw-r--r--. 1 42422 42422 406M Jul 25 21:00 agent.ramdisk
-rw-r--r--. 1 42422 42422  758 Jul 25 20:54 boot.ipxe
-rw-r--r--. 1 42461 42461  470 Jul 25 20:47 inspector.ipxe

Comment 5 Filip Hubík 2018-07-26 11:11:03 UTC
I have similar issue with newer
openstack-tripleo-heat-templates-9.0.0-0.20180717094150.el7ost.noarch (2018-07-25 18:08:27) raising error:

DriverLoadError: Driver, hardware type or interface ilo-pxe could not be loaded. Reason: [Errno 13] Permission denied: '/var/lib/ironic/httpboot/boot.ipxe'.

/var/log/containers/ironic/ironic-conductor.log
with patch https://code.engineering.redhat.com/gerrit/#/c/145094/ present.

Since I can not see error mentioned in this BZ in /var/log/containers/ironic/ironic-conductor.log nor existing BZ with this specific issue, I'll create separate BZ to track it - https://bugzilla.redhat.com/show_bug.cgi?id=1608829 - but I suspect root cause will be similar for both:

$ ls -lahZ /var/lib/ironic/httpboot/
drwxr-xr-x. 42422 42422 unconfined_u:object_r:var_lib_t:s0 .
drwxr-xr-x. 42422 42422 unconfined_u:object_r:var_lib_t:s0 ..
drwxr-xr-x. 42422 42422 system_u:object_r:var_lib_t:s0   2e005e19-b2ef-4ee5-b9f2-99d62856b328
drwxr-xr-x. 42422 42422 system_u:object_r:var_lib_t:s0   78e7549a-8020-49b6-9679-0fd1e630bce2
-rwxr-xr-x. root  root  unconfined_u:object_r:var_lib_t:s0 agent.kernel
-rw-r--r--. root  root  unconfined_u:object_r:var_lib_t:s0 agent.ramdisk
drwxr-xr-x. 42422 42422 system_u:object_r:var_lib_t:s0   b5db2f3c-42b0-4d4d-9261-c332ce30f329
-rw-r--r--. 42422 42422 system_u:object_r:var_lib_t:s0   boot.ipxe
-rw-r--r--. 42422 42422 system_u:object_r:var_lib_t:s0   inspector.ipxe
drwxr-xr-x. 42422 42422 system_u:object_r:var_lib_t:s0   pxelinux.cfg

Comment 6 Bogdan Dobrelya 2018-07-26 12:07:36 UTC
*** Bug 1608829 has been marked as a duplicate of this bug. ***

Comment 7 Bogdan Dobrelya 2018-07-26 12:10:23 UTC
The upstream fix (cherry-picked as https://code.engineering.redhat.com/gerrit/#/c/145094/) requires an adjustment
so /var/lib/ironic/httpboot/ should be owned by the same 42422:42422 as in the example command output:

 [zuul@undercloud ~]$ ls -lah /var/lib/ironic/httpboot/
total 377M
drwxr-xr-x. 2 root  root    86 Jul 26 11:52 .
drwxr-xr-x. 4 42422 42422   59 Jun 26 12:56 ..
-rwxr-xr-x. 1 42422 42422 6.0M Jun 22 15:27 agent.kernel
-rw-r--r--. 1 42422 42422 371M Jun 22 15:27 agent.ramdisk
-rw-r--r--. 1 42422 42422  758 Jun 22 14:28 boot.ipxe
-rw-r--r--. 1 42461 42461  470 Jun 22 14:11 inspector.ipxe

I hope that would resolve the issue

Comment 8 Bogdan Dobrelya 2018-07-26 12:29:54 UTC
Actually, I can see some of the ironic_* containers are missing the kolla_config to chown /var/lib/ironic* paths, and that brings data owning races across ironic containers

Comment 9 Filip Hubík 2018-07-30 10:21:58 UTC
I can not confirm openstack-tripleo-heat-templates-9.0.0-0.20180720154239.959e1d7.el7ost as package version where this issue suppose to be fixed, I am still hitting mentioned issue (marked as clone of https://bugzilla.redhat.com/show_bug.cgi?id=1608829):

uc $ rpm -qa | grep "openstack-tripleo-heat-templates".                                                                                                                                                     
openstack-tripleo-heat-templates-9.0.0-0.20180720154239.959e1d7.el7ost.noarch

uc $ cat /var/log/containers/ironic/ironic-conductor.log | grep "denied"
2018-07-30 05:32:48.223 1 ERROR oslo_service.service [req-279245e2-842d-407c-b6d5-99866889244b - - - - -] Error starting thread.: DriverLoadError: Driver, hardware type or interface ilo-pxe could not be loaded. Reason: [Errno 13] Permission denied: '/var/lib/ironic/httpboot/boot.ipxe'.
2018-07-30 05:32:48.223 1 ERROR oslo_service.service DriverLoadError: Driver, hardware type or interface ilo-pxe could not be loaded. Reason: [Errno 13] Permission denied: '/var/lib/ironic/httpboot/boot.ipxe'

OC nodes are stuck in BUILD state. Ipxe fails with errors:

No more network devices

No bootable device.

Comment 10 Filip Hubík 2018-07-30 11:28:13 UTC
On the second look, it seems that there might be multiple issues combined here. OC nodes are being stuck in "BUILD" state might be also related to https://bugzilla.redhat.com/show_bug.cgi?id=1608508 , but that doesn't change fact that error "ilo-pxe could not be loaded" can be still seen in ironic-conductor's log.

Comment 11 Bogdan Dobrelya 2018-07-30 13:05:10 UTC
The updated package with https://code.engineering.redhat.com/gerrit/#/c/145208/ included at least fixes the wrong root owner for the /var/lib/ironic/httpboot, which I thought was the root cause for inter-containers communications over that host path. I'm afraid I have no more ideas for the proper fix from DF side, yet. Mind taking it over back to ironic folks?..

Comment 13 Filip Hubík 2018-07-30 14:04:40 UTC
With selinux disabled on UC I don't hit "ilo-pxe could not be loaded" issue, but OC deployment step fails again on:

OSError: [Errno 13] Permission denied: '/var/lib/ironic/httpboot/4f872d4a-6691-4e21-8144-48da5ea452a9

Comment 14 Marius Cornea 2018-07-30 21:21:21 UTC
Adding one observation, it looks that restarting the ironic_inspector container resets the permissions for /var/lib/ironic/httpboot to be owned by root:


[root@undercloud-0 stack]# ls -lah /var/lib/ironic
total 4.0K
drwxr-xr-x.  4 42422 42422   38 Jul 30 14:18 .
drwxr-xr-x. 64 root  root  4.0K Jul 30 14:44 ..
drwxr-xr-x.  3 42422 42422  106 Jul 30 17:13 httpboot
drwxr-xr-x.  4 42422 42422  135 Jul 30 17:12 tftpboot
[root@undercloud-0 stack]# docker restart ironic_inspector
ironic_inspector
[root@undercloud-0 stack]# ls -lah /var/lib/ironic
total 4.0K
drwxr-xr-x.  4 42422 42422   38 Jul 30 14:18 .
drwxr-xr-x. 64 root  root  4.0K Jul 30 14:44 ..
drwxr-xr-x.  3 root  root   106 Jul 30 17:15 httpboot
drwxr-xr-x.  4 42422 42422  135 Jul 30 17:12 tftpboot

Comment 17 Bob Fournier 2018-07-31 00:59:16 UTC
Good catch Marius!!

Comment 26 errata-xmlrpc 2019-01-11 11:50:58 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2019:0045

Comment 27 Red Hat Bugzilla 2023-09-14 04:32:08 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days