Bug 1608529 - Overcloud deployment may fail with OSError: [Errno 13] Permission denied: '/var/lib/ironic/httpboot/1a47efe7-07f0-41df-932f-ea48ec23a7ac'
Summary: Overcloud deployment may fail with OSError: [Errno 13] Permission denied: '/v...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo-heat-templates
Version: 14.0 (Rocky)
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: Upstream M3
: 14.0 (Rocky)
Assignee: RHOS Maint
QA Contact: Filip Hubík
URL:
Whiteboard:
: 1608829 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-07-25 17:54 UTC by Marius Cornea
Modified: 2023-09-14 04:32 UTC (History)
17 users (show)

Fixed In Version: openstack-tripleo-heat-templates-9.0.0-0.20180726103746.5fefd0b.el7ost
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-01-11 11:50:58 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Launchpad 1783762 0 None None None 2018-07-26 12:32:39 UTC
Red Hat Product Errata RHEA-2019:0045 0 None None None 2019-01-11 11:51:13 UTC

Description Marius Cornea 2018-07-25 17:54:14 UTC
Description of problem:
Overcloud deployment may fail with OSError: [Errno 13] Permission denied: '/var/lib/ironic/httpboot/1a47efe7-07f0-41df-932f-ea48ec23a7ac'

Version-Release number of selected component (if applicable):
openstack-tripleo-heat-templates-9.0.0-0.20180717094148.d8b7b19.el7ost.noarch

How reproducible:
100%

Steps to Reproduce:
1. Deploy OSP14 overcloud

Actual results:
/var/log/containers/ironic/ironic-conductor.log shows errors like OSError: [Errno 13] Permission denied: '/var/lib/ironic/httpboot/1a47efe7-07f0-41df-932f-ea48ec23a7ac' and deployment fails.

Expected results:
Deployment passes.

Additional info:

Comment 1 David Peacock 2018-07-25 18:10:51 UTC
Fixed upstream already - needs downstream.

https://bugs.launchpad.net/tripleo/+bug/1782267

Comment 4 Marius Cornea 2018-07-26 02:37:36 UTC
The patch landed in the latest puddle but the issue is still present, I suspect it's related to the /var/lib/ironic/httpboot ownership:

[root@undercloud-0 stack]# ls -lah /var/lib/ironic/
total 4.0K
drwxr-xr-x.  4 42422 42422   38 Jul 25 20:39 .
drwxr-xr-x. 61 root  root  4.0K Jul 25 20:53 ..
drwxr-xr-x.  2 root  root    86 Jul 25 21:04 httpboot
drwxr-xr-x.  4 42422 42422  135 Jul 25 21:57 tftpboot

[root@undercloud-0 stack]# ls -lah /var/lib/ironic/httpboot/
total 412M
drwxr-xr-x. 2 root  root    86 Jul 25 21:04 .
drwxr-xr-x. 4 42422 42422   38 Jul 25 20:39 ..
-rwxr-xr-x. 1 42422 42422 6.1M Jul 25 21:00 agent.kernel
-rw-r--r--. 1 42422 42422 406M Jul 25 21:00 agent.ramdisk
-rw-r--r--. 1 42422 42422  758 Jul 25 20:54 boot.ipxe
-rw-r--r--. 1 42461 42461  470 Jul 25 20:47 inspector.ipxe

Comment 5 Filip Hubík 2018-07-26 11:11:03 UTC
I have similar issue with newer
openstack-tripleo-heat-templates-9.0.0-0.20180717094150.el7ost.noarch (2018-07-25 18:08:27) raising error:

DriverLoadError: Driver, hardware type or interface ilo-pxe could not be loaded. Reason: [Errno 13] Permission denied: '/var/lib/ironic/httpboot/boot.ipxe'.

/var/log/containers/ironic/ironic-conductor.log
with patch https://code.engineering.redhat.com/gerrit/#/c/145094/ present.

Since I can not see error mentioned in this BZ in /var/log/containers/ironic/ironic-conductor.log nor existing BZ with this specific issue, I'll create separate BZ to track it - https://bugzilla.redhat.com/show_bug.cgi?id=1608829 - but I suspect root cause will be similar for both:

$ ls -lahZ /var/lib/ironic/httpboot/
drwxr-xr-x. 42422 42422 unconfined_u:object_r:var_lib_t:s0 .
drwxr-xr-x. 42422 42422 unconfined_u:object_r:var_lib_t:s0 ..
drwxr-xr-x. 42422 42422 system_u:object_r:var_lib_t:s0   2e005e19-b2ef-4ee5-b9f2-99d62856b328
drwxr-xr-x. 42422 42422 system_u:object_r:var_lib_t:s0   78e7549a-8020-49b6-9679-0fd1e630bce2
-rwxr-xr-x. root  root  unconfined_u:object_r:var_lib_t:s0 agent.kernel
-rw-r--r--. root  root  unconfined_u:object_r:var_lib_t:s0 agent.ramdisk
drwxr-xr-x. 42422 42422 system_u:object_r:var_lib_t:s0   b5db2f3c-42b0-4d4d-9261-c332ce30f329
-rw-r--r--. 42422 42422 system_u:object_r:var_lib_t:s0   boot.ipxe
-rw-r--r--. 42422 42422 system_u:object_r:var_lib_t:s0   inspector.ipxe
drwxr-xr-x. 42422 42422 system_u:object_r:var_lib_t:s0   pxelinux.cfg

Comment 6 Bogdan Dobrelya 2018-07-26 12:07:36 UTC
*** Bug 1608829 has been marked as a duplicate of this bug. ***

Comment 7 Bogdan Dobrelya 2018-07-26 12:10:23 UTC
The upstream fix (cherry-picked as https://code.engineering.redhat.com/gerrit/#/c/145094/) requires an adjustment
so /var/lib/ironic/httpboot/ should be owned by the same 42422:42422 as in the example command output:

 [zuul@undercloud ~]$ ls -lah /var/lib/ironic/httpboot/
total 377M
drwxr-xr-x. 2 root  root    86 Jul 26 11:52 .
drwxr-xr-x. 4 42422 42422   59 Jun 26 12:56 ..
-rwxr-xr-x. 1 42422 42422 6.0M Jun 22 15:27 agent.kernel
-rw-r--r--. 1 42422 42422 371M Jun 22 15:27 agent.ramdisk
-rw-r--r--. 1 42422 42422  758 Jun 22 14:28 boot.ipxe
-rw-r--r--. 1 42461 42461  470 Jun 22 14:11 inspector.ipxe

I hope that would resolve the issue

Comment 8 Bogdan Dobrelya 2018-07-26 12:29:54 UTC
Actually, I can see some of the ironic_* containers are missing the kolla_config to chown /var/lib/ironic* paths, and that brings data owning races across ironic containers

Comment 9 Filip Hubík 2018-07-30 10:21:58 UTC
I can not confirm openstack-tripleo-heat-templates-9.0.0-0.20180720154239.959e1d7.el7ost as package version where this issue suppose to be fixed, I am still hitting mentioned issue (marked as clone of https://bugzilla.redhat.com/show_bug.cgi?id=1608829):

uc $ rpm -qa | grep "openstack-tripleo-heat-templates".                                                                                                                                                     
openstack-tripleo-heat-templates-9.0.0-0.20180720154239.959e1d7.el7ost.noarch

uc $ cat /var/log/containers/ironic/ironic-conductor.log | grep "denied"
2018-07-30 05:32:48.223 1 ERROR oslo_service.service [req-279245e2-842d-407c-b6d5-99866889244b - - - - -] Error starting thread.: DriverLoadError: Driver, hardware type or interface ilo-pxe could not be loaded. Reason: [Errno 13] Permission denied: '/var/lib/ironic/httpboot/boot.ipxe'.
2018-07-30 05:32:48.223 1 ERROR oslo_service.service DriverLoadError: Driver, hardware type or interface ilo-pxe could not be loaded. Reason: [Errno 13] Permission denied: '/var/lib/ironic/httpboot/boot.ipxe'

OC nodes are stuck in BUILD state. Ipxe fails with errors:

No more network devices

No bootable device.

Comment 10 Filip Hubík 2018-07-30 11:28:13 UTC
On the second look, it seems that there might be multiple issues combined here. OC nodes are being stuck in "BUILD" state might be also related to https://bugzilla.redhat.com/show_bug.cgi?id=1608508 , but that doesn't change fact that error "ilo-pxe could not be loaded" can be still seen in ironic-conductor's log.

Comment 11 Bogdan Dobrelya 2018-07-30 13:05:10 UTC
The updated package with https://code.engineering.redhat.com/gerrit/#/c/145208/ included at least fixes the wrong root owner for the /var/lib/ironic/httpboot, which I thought was the root cause for inter-containers communications over that host path. I'm afraid I have no more ideas for the proper fix from DF side, yet. Mind taking it over back to ironic folks?..

Comment 13 Filip Hubík 2018-07-30 14:04:40 UTC
With selinux disabled on UC I don't hit "ilo-pxe could not be loaded" issue, but OC deployment step fails again on:

OSError: [Errno 13] Permission denied: '/var/lib/ironic/httpboot/4f872d4a-6691-4e21-8144-48da5ea452a9

Comment 14 Marius Cornea 2018-07-30 21:21:21 UTC
Adding one observation, it looks that restarting the ironic_inspector container resets the permissions for /var/lib/ironic/httpboot to be owned by root:


[root@undercloud-0 stack]# ls -lah /var/lib/ironic
total 4.0K
drwxr-xr-x.  4 42422 42422   38 Jul 30 14:18 .
drwxr-xr-x. 64 root  root  4.0K Jul 30 14:44 ..
drwxr-xr-x.  3 42422 42422  106 Jul 30 17:13 httpboot
drwxr-xr-x.  4 42422 42422  135 Jul 30 17:12 tftpboot
[root@undercloud-0 stack]# docker restart ironic_inspector
ironic_inspector
[root@undercloud-0 stack]# ls -lah /var/lib/ironic
total 4.0K
drwxr-xr-x.  4 42422 42422   38 Jul 30 14:18 .
drwxr-xr-x. 64 root  root  4.0K Jul 30 14:44 ..
drwxr-xr-x.  3 root  root   106 Jul 30 17:15 httpboot
drwxr-xr-x.  4 42422 42422  135 Jul 30 17:12 tftpboot

Comment 17 Bob Fournier 2018-07-31 00:59:16 UTC
Good catch Marius!!

Comment 26 errata-xmlrpc 2019-01-11 11:50:58 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2019:0045

Comment 27 Red Hat Bugzilla 2023-09-14 04:32:08 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days


Note You need to log in before you can comment on or make changes to this bug.