Description of problem: In Tripleo CI side, we use diskimage builder to create overcloud images on RHEL8.1 with selinux enforcing mode. Here is the build scipt used to create http://logs.rdoproject.org/openstack-periodic-master/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-rhel-8-buildimage-overcloud-full-master/b7e6593/build_images.sh to create the same. Here is the full log: http://logs.rdoproject.org/openstack-periodic-master/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-rhel-8-buildimage-overcloud-full-master/b7e6593/build.log During build process: network-scripts-10.00.4-1.el8.x86_64 is used. At this step, it is returning this, 2019-12-12 00:31:12.054 | + set -o pipefail 2019-12-12 00:31:12.054 | + chkconfig network on 2019-12-12 00:31:12.055 | failed to glob pattern /etc/rc0.d/[SK][0-9][0-9]network: No such file or directory We have no idea what is wrong there. This issue is tracked externally here: https://bugs.launchpad.net/tripleo/+bug/1853028/
Can you run it under strace and post here the output?
Or is it possible to upload somewhere the content of the image right before this step?
(In reply to Lukáš Nykrýn from comment #3) > Or is it possible to upload somewhere the content of the image right before > this step? Not really, it's being built with chroot in DIB and just fails and cleans all up. The error is not reproducible always (but in most of cases), sometimes it passes. I ran a simple script before chkconfig in case it passed and failed: https://review.opendev.org/#/c/699178/ echo "Start debug" ls -alsh /etc/rc0.d/ || true ls -alsh /etc/rc0.d || true mkdir -p /etc/rc0.d || true Here is the output of failed case and passed case: PASS: dib-run-parts Running /tmp/in_target.d/post-install.d/51-enable-network-service + set -o pipefail + ls -alsh /etc/rc0.d/ total 0 0 drwxr-xr-x. 2 root root 24 Dec 16 09:52 . 0 drwxr-xr-x. 10 root root 127 Aug 30 04:52 .. 0 lrwxrwxrwx. 1 root root 17 Dec 16 09:52 K90network -> ../init.d/network + ls -alsh /etc/rc0.d 0 lrwxrwxrwx. 1 root root 10 Aug 23 06:17 /etc/rc0.d -> rc.d/rc0.d + mkdir -p /etc/rc0.d + chkconfig network on dib-run-parts 51-enable-network-service completed FAIL: dib-run-parts Running /tmp/in_target.d/post-install.d/51-enable-network-service + set -o pipefail + echo 'Start debug' Start debug + ls -alsh /etc/rc0.d/ ls: cannot access '/etc/rc0.d/': No such file or directory + true + ls -alsh /etc/rc0.d 0 lrwxrwxrwx. 1 root root 10 Aug 23 06:17 /etc/rc0.d -> rc.d/rc0.d + mkdir -p /etc/rc0.d mkdir: cannot create directory '/etc/rc0.d': File exists + true + chkconfig network on failed to glob pattern /etc/rc0.d/[SK][0-9][0-9]network: No such file or directory I attach the strace of failed case. Just in case you want to see how strace executes in centos7 and compare, we have it here: https://88763c12f13d1aeca43c-63681721353a54dab1064b012b97b3cb.ssl.cf1.rackcdn.com/699221/3/check/tripleo-buildimage-overcloud-full-centos-7/09d02a8/build.log look for 'strace chkconfig network on' The strace for rhel8 problem is here: http://logs.rdoproject.org/21/699221/3/openstack-check/tripleo-rhel-8-buildimage-overcloud-full/da44408/build.log and also attached to this bug.
Created attachment 1645629 [details] strace of failed case of chkconfig
This is assigned to me, but it seems like it has been fixed in [1] via [2]? Is this part of something else that has started happening I should look at? [1] https://bugs.launchpad.net/tripleo/+bug/1823353 [2] https://review.openstack.org/650305
https://review.openstack.org/650305 did not fix the issue in terms of the image building process. For whatever reason files are disappearing while building the image. It started in newer versions of RHEL/CentOS and is not something we saw when running under 7.
This seems to be related to the same issue as https://bugzilla.redhat.com/show_bug.cgi?id=1875266 which I think is saying tmpfile cleanup is randomly removing things? If so, I'd suggest the same thing as there, set DIB_TMP to some scratch space not in global /tmp. Would we agree this is the same issue, or is there more going on here?
<< If so, I'd suggest the same thing as there, set DIB_TMP to some scratch space not in global /tmp. << Would we agree this is the same issue, or is there more going on here? I think it's the tmpfiles cleanup issue, we actually discussed this couple of months back in relation to tmpfiles cleanup http://eavesdrop.openstack.org/irclogs/%23tripleo/%23tripleo.2020-06-29.log.html#t2020-06-29T15:17:03, if it can be reproduced with tmpfiles cleanup workaround or DIB_TMP applied then only can be considered something else is going on.
Closing, this was a temporary issue in early RHEL-8