Bug 1670032

Summary:

Running lorax-composer or anaconda modifies files in /etc on the host

Product:

Red Hat Enterprise Linux 7

Reporter:

Alexander Todorov <atodorov>

Component:

lorax-composer

Assignee:

Brian Lane <bcl>

Status:

CLOSED NOTABUG

QA Contact:

Alexander Todorov <atodorov>

Severity:

urgent

Docs Contact:

Priority:

urgent

Version:

7.6

CC:

atodorov, bcl, sbueno, tborcin, wwoods

Target Milestone:

Keywords:

Extras, TestBlocker

Target Release:

---

Hardware:

Unspecified

OS:

Unspecified

Whiteboard:

Fixed In Version:

Doc Type:

If docs needed, set a value

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

1766785

Bug Blocks:

Attachments:

Description	Flags
/var/log/lorax-composer/composer.log	none
/var/log/lorax-composer/program.log	none
/var/log/lorax-composer/server.log	none
/var/log/lorax-composer/yum.log	none
/var/lib/lorax/composer/results/72e435fb-e7b9-47d6-abce-8866e886d08f/logs/anaconda/anaconda.log	none
/var/lib/lorax/composer/results/72e435fb-e7b9-47d6-abce-8866e886d08f/logs/anaconda/packaging.log	none
/var/lib/lorax/composer/results/72e435fb-e7b9-47d6-abce-8866e886d08f/logs/anaconda/program.log	none
/var/lib/lorax/composer/results/72e435fb-e7b9-47d6-abce-8866e886d08f/logs/anaconda/storage.log	none

Description Alexander Todorov 2019-01-28 13:14:15 UTC

Description of problem:

I am seeing a problem where building an AMI image with lorax-composer results in files under /etc/ on the host to be modified.

In particular the system is configured to sinchronize via NTP with amazon.pool.ntp.org and I've verified the time is correct (same as on my laptop).

After the image build completes (or somewhere around that time) /etc/localtime is modified and now points to /usr/share/zoneinfo/America/New_York giving several hours difference in the clock.

There result is that copying files to Amazon S3 will fail with RequestTimeTooSkewed because the time mismatch is too big!

Version-Release number of selected component (if applicable):

anaconda-core-21.48.22.147-1.el7.x86_64
lorax-composer from rhel7-extras branch (including some WIP patches)

How reproducible:
Almost always

Steps to Reproduce:

# export AWS_ACCESS_KEY_ID=xxxxxxxx
# export AWS_SECRET_ACCESS_KEY=yyyy

1. Assuming all necessary dependencies are installed (incl. beakerlib) then from the lorax git checkout do:

# ./tests/test_cli.sh tests/cli/test_build_and_deploy_aws.sh

Actual results:

Test fails b/c upload to S3 fails

Expected results:

Test passes, host files are not modified.

Additional info:

Last month I also saw the same issue on a RHEL 7.5 slave and upgraded to a 7.6 image in OpenStack which seemed to resolve it.

Maybe this is not happening on every occasion b/c back then I was able to reproduce on 7.5 and not on 7.6.

Maybe it has to do with what other packages are installed or their versions.

Comment 3 Alexander Todorov 2019-01-28 13:26:50 UTC

Created attachment 1524211 [details]
/var/log/lorax-composer/composer.log

Comment 4 Alexander Todorov 2019-01-28 13:27:03 UTC

Created attachment 1524212 [details]
/var/log/lorax-composer/program.log

Comment 5 Alexander Todorov 2019-01-28 13:27:13 UTC

Created attachment 1524213 [details]
/var/log/lorax-composer/server.log

Comment 6 Alexander Todorov 2019-01-28 13:27:31 UTC

Created attachment 1524214 [details]
/var/log/lorax-composer/yum.log

Comment 7 Alexander Todorov 2019-01-28 13:28:12 UTC

Created attachment 1524215 [details]
/var/lib/lorax/composer/results/72e435fb-e7b9-47d6-abce-8866e886d08f/logs/anaconda/anaconda.log

Comment 8 Alexander Todorov 2019-01-28 13:28:29 UTC

Created attachment 1524217 [details]
/var/lib/lorax/composer/results/72e435fb-e7b9-47d6-abce-8866e886d08f/logs/anaconda/packaging.log

Comment 9 Alexander Todorov 2019-01-28 13:28:39 UTC

Created attachment 1524218 [details]
/var/lib/lorax/composer/results/72e435fb-e7b9-47d6-abce-8866e886d08f/logs/anaconda/program.log

Comment 10 Alexander Todorov 2019-01-28 13:28:49 UTC

Created attachment 1524219 [details]
/var/lib/lorax/composer/results/72e435fb-e7b9-47d6-abce-8866e886d08f/logs/anaconda/storage.log

Comment 12 Alexander Todorov 2019-01-29 11:48:36 UTC

More info from devel:

(13,26,56) mkolman: atodorov: this is what seems to touch /etc/localtime from Anaconda: https://github.com/rhinstaller/anaconda/blob/rhel7-branch/pyanaconda/timezone.py#L90
(13,27,51) mkolman: atodorov: but it seems to be called correctly here: https://github.com/rhinstaller/anaconda/blob/rhel7-branch/pyanaconda/kickstart.py#L1888
(13,28,14) mkolman: if iutil.getSysroot() returned incorrect values, much more would be broken than just this
(13,30,34) mkolman: atodorov: also, looks like we just symlink /etc/localtime to some tz file
(13,30,57) mkolman: atodorov: so if the file is overwritten in place, it's something else doing that

In lorax sources I see:

share/runtime-cleanup.tmpl:removefrom glibc /etc/gai.conf /etc/ld.so.conf /etc/localtime /etc/rpc

but I don't know ATM why and how this is used.

Brian,
does this lead you in the right direction? This one is a serious issue?

Comment 13 Brian Lane 2019-01-29 19:56:29 UTC

runtime-cleanup isn't used by composer so that's not related.

I'm trying to reproduce this in a 7.6 VM and am not seeing it happen. I changed my /etc/localtime to point to a zone in Asia and setup an audit to log changes to it, and it hasn't changed. Tried building ami, qcow2, and tar.

Anaconda is what sets up the timezone from the kickstart, so I'm pretty confident that whatever the issue is, it's an anaconda bug.

Comment 14 Alexander Todorov 2019-01-29 20:03:00 UTC

Brian,
can you distil this to an anaconda command which I can use for reproduction ?

Also there could be differences between the 7.6 host you are using and the one I am using (has updates applied to it and pulp repos enabled).

Comment 15 Brian Lane 2019-01-29 21:39:50 UTC

(In reply to Alexander Todorov from comment #14)
> Brian,
> can you distil this to an anaconda command which I can use for reproduction ?
> 
> Also there could be differences between the 7.6 host you are using and the
> one I am using (has updates applied to it and pulp repos enabled).

Well, since I can't reproduce it, I'm not 100% sure, but you should be able to take a final-kickstart.ks and run it like this:

anaconda --kickstart final-kickstart.ks --cmdline --repo http://url-of-repo --dirinstall

which will leave the files in /mnt/sysimage/

Comment 21 Alexander Todorov 2019-02-12 18:21:38 UTC

> if the time isn't close enough I know that some crypto protocols
> will fail, which is what seems to be happening here.
> 

for the record: Amazon S3 will check timestamps when you try to upload files and if they differ with more than 15 minutes it will fail with RequestTimeTooSkewed. Unfortunately this is very common error and Google tells you to configure NTP to point to Amazon's servers to sync the time, which we already do.

Comment 22 Alexander Todorov 2019-05-09 16:26:07 UTC

More updates:
I provisioned a slave manually, applied the ansible configuration and left it idle. 3+hrs and the clock is still in sync. /etc/localtime symlink hasn't been changed.

Tomorrow I will try running a bash script from Jenkins that doesn't do anything but print on the terminal.

Comment 23 Alexander Todorov 2019-05-10 07:54:29 UTC

(In reply to Alexander Todorov from comment #22)
> More updates:
> I provisioned a slave manually, applied the ansible configuration and left
> it idle. 3+hrs and the clock is still in sync. /etc/localtime symlink hasn't
> been changed.
> 

After about 20hrs still no drift in the clock.

Comment 29 Brian Lane 2019-07-18 20:18:55 UTC

I haven't seen any evidence of the original bug here in any of my testing -- as far as I can tell it is not modifying the host system. Can we close this? Or push it to 7.8?

Comment 30 Alexander Todorov 2019-07-22 08:15:10 UTC

(In reply to Brian Lane from comment #29)
> I haven't seen any evidence of the original bug here in any of my testing --
> as far as I can tell it is not modifying the host system. Can we close this?
> Or push it to 7.8?

We still see this in CI, even after the switch to Cockpit CI. You can push to 7.8 but the problem remains and I've not been able to make any progress towards figuring out what is happening.

Comment 49 Brian Lane 2020-03-16 15:43:41 UTC

Track this with bug #1766785

Comment 50 Brian Lane 2020-03-16 15:43:42 UTC

Track this with bug #1766785