Bug 1608529
Summary: | Overcloud deployment may fail with OSError: [Errno 13] Permission denied: '/var/lib/ironic/httpboot/1a47efe7-07f0-41df-932f-ea48ec23a7ac' | ||
---|---|---|---|
Product: | Red Hat OpenStack | Reporter: | Marius Cornea <mcornea> |
Component: | openstack-tripleo-heat-templates | Assignee: | RHOS Maint <rhos-maint> |
Status: | CLOSED ERRATA | QA Contact: | Filip Hubík <fhubik> |
Severity: | urgent | Docs Contact: | |
Priority: | urgent | ||
Version: | 14.0 (Rocky) | CC: | agurenko, apevec, aschultz, bdobreli, bfournie, dbecker, dpeacock, emacchi, fhubik, lhh, mburns, morazi, racedoro, rhos-maint, srevivo, therve, tvignaud |
Target Milestone: | Upstream M3 | Keywords: | Triaged |
Target Release: | 14.0 (Rocky) | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | openstack-tripleo-heat-templates-9.0.0-0.20180726103746.5fefd0b.el7ost | Doc Type: | If docs needed, set a value |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2019-01-11 11:50:58 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Marius Cornea
2018-07-25 17:54:14 UTC
Fixed upstream already - needs downstream. https://bugs.launchpad.net/tripleo/+bug/1782267 The patch landed in the latest puddle but the issue is still present, I suspect it's related to the /var/lib/ironic/httpboot ownership: [root@undercloud-0 stack]# ls -lah /var/lib/ironic/ total 4.0K drwxr-xr-x. 4 42422 42422 38 Jul 25 20:39 . drwxr-xr-x. 61 root root 4.0K Jul 25 20:53 .. drwxr-xr-x. 2 root root 86 Jul 25 21:04 httpboot drwxr-xr-x. 4 42422 42422 135 Jul 25 21:57 tftpboot [root@undercloud-0 stack]# ls -lah /var/lib/ironic/httpboot/ total 412M drwxr-xr-x. 2 root root 86 Jul 25 21:04 . drwxr-xr-x. 4 42422 42422 38 Jul 25 20:39 .. -rwxr-xr-x. 1 42422 42422 6.1M Jul 25 21:00 agent.kernel -rw-r--r--. 1 42422 42422 406M Jul 25 21:00 agent.ramdisk -rw-r--r--. 1 42422 42422 758 Jul 25 20:54 boot.ipxe -rw-r--r--. 1 42461 42461 470 Jul 25 20:47 inspector.ipxe I have similar issue with newer openstack-tripleo-heat-templates-9.0.0-0.20180717094150.el7ost.noarch (2018-07-25 18:08:27) raising error: DriverLoadError: Driver, hardware type or interface ilo-pxe could not be loaded. Reason: [Errno 13] Permission denied: '/var/lib/ironic/httpboot/boot.ipxe'. /var/log/containers/ironic/ironic-conductor.log with patch https://code.engineering.redhat.com/gerrit/#/c/145094/ present. Since I can not see error mentioned in this BZ in /var/log/containers/ironic/ironic-conductor.log nor existing BZ with this specific issue, I'll create separate BZ to track it - https://bugzilla.redhat.com/show_bug.cgi?id=1608829 - but I suspect root cause will be similar for both: $ ls -lahZ /var/lib/ironic/httpboot/ drwxr-xr-x. 42422 42422 unconfined_u:object_r:var_lib_t:s0 . drwxr-xr-x. 42422 42422 unconfined_u:object_r:var_lib_t:s0 .. drwxr-xr-x. 42422 42422 system_u:object_r:var_lib_t:s0 2e005e19-b2ef-4ee5-b9f2-99d62856b328 drwxr-xr-x. 42422 42422 system_u:object_r:var_lib_t:s0 78e7549a-8020-49b6-9679-0fd1e630bce2 -rwxr-xr-x. root root unconfined_u:object_r:var_lib_t:s0 agent.kernel -rw-r--r--. root root unconfined_u:object_r:var_lib_t:s0 agent.ramdisk drwxr-xr-x. 42422 42422 system_u:object_r:var_lib_t:s0 b5db2f3c-42b0-4d4d-9261-c332ce30f329 -rw-r--r--. 42422 42422 system_u:object_r:var_lib_t:s0 boot.ipxe -rw-r--r--. 42422 42422 system_u:object_r:var_lib_t:s0 inspector.ipxe drwxr-xr-x. 42422 42422 system_u:object_r:var_lib_t:s0 pxelinux.cfg *** Bug 1608829 has been marked as a duplicate of this bug. *** The upstream fix (cherry-picked as https://code.engineering.redhat.com/gerrit/#/c/145094/) requires an adjustment so /var/lib/ironic/httpboot/ should be owned by the same 42422:42422 as in the example command output: [zuul@undercloud ~]$ ls -lah /var/lib/ironic/httpboot/ total 377M drwxr-xr-x. 2 root root 86 Jul 26 11:52 . drwxr-xr-x. 4 42422 42422 59 Jun 26 12:56 .. -rwxr-xr-x. 1 42422 42422 6.0M Jun 22 15:27 agent.kernel -rw-r--r--. 1 42422 42422 371M Jun 22 15:27 agent.ramdisk -rw-r--r--. 1 42422 42422 758 Jun 22 14:28 boot.ipxe -rw-r--r--. 1 42461 42461 470 Jun 22 14:11 inspector.ipxe I hope that would resolve the issue Actually, I can see some of the ironic_* containers are missing the kolla_config to chown /var/lib/ironic* paths, and that brings data owning races across ironic containers I can not confirm openstack-tripleo-heat-templates-9.0.0-0.20180720154239.959e1d7.el7ost as package version where this issue suppose to be fixed, I am still hitting mentioned issue (marked as clone of https://bugzilla.redhat.com/show_bug.cgi?id=1608829): uc $ rpm -qa | grep "openstack-tripleo-heat-templates". openstack-tripleo-heat-templates-9.0.0-0.20180720154239.959e1d7.el7ost.noarch uc $ cat /var/log/containers/ironic/ironic-conductor.log | grep "denied" 2018-07-30 05:32:48.223 1 ERROR oslo_service.service [req-279245e2-842d-407c-b6d5-99866889244b - - - - -] Error starting thread.: DriverLoadError: Driver, hardware type or interface ilo-pxe could not be loaded. Reason: [Errno 13] Permission denied: '/var/lib/ironic/httpboot/boot.ipxe'. 2018-07-30 05:32:48.223 1 ERROR oslo_service.service DriverLoadError: Driver, hardware type or interface ilo-pxe could not be loaded. Reason: [Errno 13] Permission denied: '/var/lib/ironic/httpboot/boot.ipxe' OC nodes are stuck in BUILD state. Ipxe fails with errors: No more network devices No bootable device. On the second look, it seems that there might be multiple issues combined here. OC nodes are being stuck in "BUILD" state might be also related to https://bugzilla.redhat.com/show_bug.cgi?id=1608508 , but that doesn't change fact that error "ilo-pxe could not be loaded" can be still seen in ironic-conductor's log. The updated package with https://code.engineering.redhat.com/gerrit/#/c/145208/ included at least fixes the wrong root owner for the /var/lib/ironic/httpboot, which I thought was the root cause for inter-containers communications over that host path. I'm afraid I have no more ideas for the proper fix from DF side, yet. Mind taking it over back to ironic folks?.. With selinux disabled on UC I don't hit "ilo-pxe could not be loaded" issue, but OC deployment step fails again on: OSError: [Errno 13] Permission denied: '/var/lib/ironic/httpboot/4f872d4a-6691-4e21-8144-48da5ea452a9 Adding one observation, it looks that restarting the ironic_inspector container resets the permissions for /var/lib/ironic/httpboot to be owned by root: [root@undercloud-0 stack]# ls -lah /var/lib/ironic total 4.0K drwxr-xr-x. 4 42422 42422 38 Jul 30 14:18 . drwxr-xr-x. 64 root root 4.0K Jul 30 14:44 .. drwxr-xr-x. 3 42422 42422 106 Jul 30 17:13 httpboot drwxr-xr-x. 4 42422 42422 135 Jul 30 17:12 tftpboot [root@undercloud-0 stack]# docker restart ironic_inspector ironic_inspector [root@undercloud-0 stack]# ls -lah /var/lib/ironic total 4.0K drwxr-xr-x. 4 42422 42422 38 Jul 30 14:18 . drwxr-xr-x. 64 root root 4.0K Jul 30 14:44 .. drwxr-xr-x. 3 root root 106 Jul 30 17:15 httpboot drwxr-xr-x. 4 42422 42422 135 Jul 30 17:12 tftpboot Good catch Marius!! Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2019:0045 The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days |