Bug 1784001

Summary: chkconfig network on returing failed to glob pattern /etc/rc0.d/[SK][0-9][0-9]network
Product: Red Hat OpenStack Reporter: Chandan Kumar <chkumar>
Component: diskimage-builderAssignee: Ian Wienand <iwienand>
Status: CLOSED CURRENTRELEASE QA Contact: nlevinki <nlevinki>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: 16.2 (Train)Keywords: Triaged
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-10-06 19:47:48 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
strace of failed case of chkconfig none

Description Chandan Kumar 2019-12-16 13:06:18 UTC
Description of problem:
In Tripleo CI side, we use diskimage builder to create overcloud images on RHEL8.1 with selinux enforcing mode.
Here is the build scipt used to create http://logs.rdoproject.org/openstack-periodic-master/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-rhel-8-buildimage-overcloud-full-master/b7e6593/build_images.sh to create the same.

Here is the full log: http://logs.rdoproject.org/openstack-periodic-master/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-rhel-8-buildimage-overcloud-full-master/b7e6593/build.log

During build process:
network-scripts-10.00.4-1.el8.x86_64 is used.

At this step, it is returning this, 
2019-12-12 00:31:12.054 | + set -o pipefail
2019-12-12 00:31:12.054 | + chkconfig network on
2019-12-12 00:31:12.055 | failed to glob pattern /etc/rc0.d/[SK][0-9][0-9]network: No such file or directory


We have no idea what is wrong there.

This issue is tracked externally here: https://bugs.launchpad.net/tripleo/+bug/1853028/

Comment 2 Lukáš Nykrýn 2019-12-16 14:40:42 UTC
Can you run it under strace and post here the output?

Comment 3 Lukáš Nykrýn 2019-12-16 15:09:46 UTC
Or is it possible to upload somewhere the content of the image right before this step?

Comment 4 Sagi Shnaidman 2019-12-16 16:49:53 UTC
(In reply to Lukáš Nykrýn from comment #3)
> Or is it possible to upload somewhere the content of the image right before
> this step?

Not really, it's being built with chroot in DIB and just fails and cleans all up. The error is not reproducible always (but in most of cases), sometimes it passes.
I ran a simple script before chkconfig in case it passed and failed: https://review.opendev.org/#/c/699178/

echo "Start debug"
ls -alsh /etc/rc0.d/ || true
ls -alsh /etc/rc0.d || true
mkdir -p /etc/rc0.d || true

Here is the output of failed case and passed case:

PASS:

 dib-run-parts Running /tmp/in_target.d/post-install.d/51-enable-network-service
 + set -o pipefail
 + ls -alsh /etc/rc0.d/
 total 0
 0 drwxr-xr-x.  2 root root  24 Dec 16 09:52 .
 0 drwxr-xr-x. 10 root root 127 Aug 30 04:52 ..
 0 lrwxrwxrwx.  1 root root  17 Dec 16 09:52 K90network -> ../init.d/network
 + ls -alsh /etc/rc0.d
 0 lrwxrwxrwx. 1 root root 10 Aug 23 06:17 /etc/rc0.d -> rc.d/rc0.d
 + mkdir -p /etc/rc0.d
 + chkconfig network on
 dib-run-parts 51-enable-network-service completed

FAIL:

 dib-run-parts Running /tmp/in_target.d/post-install.d/51-enable-network-service
 + set -o pipefail
 + echo 'Start debug'
 Start debug
 + ls -alsh /etc/rc0.d/
 ls: cannot access '/etc/rc0.d/': No such file or directory
 + true
 + ls -alsh /etc/rc0.d
 0 lrwxrwxrwx. 1 root root 10 Aug 23 06:17 /etc/rc0.d -> rc.d/rc0.d
 + mkdir -p /etc/rc0.d
 mkdir: cannot create directory '/etc/rc0.d': File exists
 + true
 + chkconfig network on
 failed to glob pattern /etc/rc0.d/[SK][0-9][0-9]network: No such file or directory

I attach the strace of failed case.
Just in case you want to see how strace executes in centos7 and compare, we have it here: https://88763c12f13d1aeca43c-63681721353a54dab1064b012b97b3cb.ssl.cf1.rackcdn.com/699221/3/check/tripleo-buildimage-overcloud-full-centos-7/09d02a8/build.log look for 'strace chkconfig network on'

The strace for rhel8 problem is here: http://logs.rdoproject.org/21/699221/3/openstack-check/tripleo-rhel-8-buildimage-overcloud-full/da44408/build.log
and also attached to this bug.

Comment 5 Sagi Shnaidman 2019-12-16 16:50:46 UTC
Created attachment 1645629 [details]
strace of failed case of chkconfig

Comment 6 Ian Wienand 2020-08-11 03:37:17 UTC
This is assigned to me, but it seems like it has been fixed in [1] via [2]?  Is this part of something else that has started happening I should look at?

[1] https://bugs.launchpad.net/tripleo/+bug/1823353
[2] https://review.openstack.org/650305

Comment 8 Alex Schultz 2020-08-11 13:19:18 UTC
https://review.openstack.org/650305 did not fix the issue in terms of the image building process. For whatever reason files are disappearing while building the image.  It started in newer versions of RHEL/CentOS and is not something we saw when running under 7.

Comment 11 Ian Wienand 2020-09-10 02:01:09 UTC
This seems to be related to the same issue as https://bugzilla.redhat.com/show_bug.cgi?id=1875266 which I think is saying tmpfile cleanup is randomly removing things?

If so, I'd suggest the same thing as there, set DIB_TMP to some scratch space not in global /tmp.

Would we agree this is the same issue, or is there more going on here?

Comment 12 Yatin Karel 2020-09-15 09:54:25 UTC
<< If so, I'd suggest the same thing as there, set DIB_TMP to some scratch space not in global /tmp.
<< Would we agree this is the same issue, or is there more going on here?

I think it's the tmpfiles cleanup issue, we actually discussed this couple of months back in relation to tmpfiles cleanup http://eavesdrop.openstack.org/irclogs/%23tripleo/%23tripleo.2020-06-29.log.html#t2020-06-29T15:17:03, if it can be reproduced with tmpfiles cleanup workaround or DIB_TMP applied then only can be considered something else is going on.

Comment 14 Steve Baker 2020-10-06 19:47:48 UTC
Closing, this was a temporary issue in early RHEL-8