Bug 1670032

Summary: Running lorax-composer or anaconda modifies files in /etc on the host
Product: Red Hat Enterprise Linux 7 Reporter: Alexander Todorov <atodorov>
Component: lorax-composerAssignee: Brian Lane <bcl>
Status: CLOSED NOTABUG QA Contact: Alexander Todorov <atodorov>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 7.6CC: atodorov, bcl, sbueno, tborcin, wwoods
Target Milestone: rcKeywords: Extras, TestBlocker
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1766785    
Bug Blocks:    
Attachments:
Description Flags
/var/log/lorax-composer/composer.log
none
/var/log/lorax-composer/program.log
none
/var/log/lorax-composer/server.log
none
/var/log/lorax-composer/yum.log
none
/var/lib/lorax/composer/results/72e435fb-e7b9-47d6-abce-8866e886d08f/logs/anaconda/anaconda.log
none
/var/lib/lorax/composer/results/72e435fb-e7b9-47d6-abce-8866e886d08f/logs/anaconda/packaging.log
none
/var/lib/lorax/composer/results/72e435fb-e7b9-47d6-abce-8866e886d08f/logs/anaconda/program.log
none
/var/lib/lorax/composer/results/72e435fb-e7b9-47d6-abce-8866e886d08f/logs/anaconda/storage.log none

Description Alexander Todorov 2019-01-28 13:14:15 UTC
Description of problem:


I am seeing a problem where building an AMI image with lorax-composer results in files under /etc/ on the host to be modified.

In particular the system is configured to sinchronize via NTP with amazon.pool.ntp.org and I've verified the time is correct (same as on my laptop).

After the image build completes (or somewhere around that time) /etc/localtime is modified and now points to /usr/share/zoneinfo/America/New_York giving several hours difference in the clock.

There result is that copying files to Amazon S3 will fail with RequestTimeTooSkewed because the time mismatch is too big!


Version-Release number of selected component (if applicable):

anaconda-core-21.48.22.147-1.el7.x86_64
lorax-composer from rhel7-extras branch (including some WIP patches)

How reproducible:
Almost always

Steps to Reproduce:

0. 

# export AWS_ACCESS_KEY_ID=xxxxxxxx
# export AWS_SECRET_ACCESS_KEY=yyyy

1. Assuming all necessary dependencies are installed (incl. beakerlib) then from the lorax git checkout do:

# ./tests/test_cli.sh tests/cli/test_build_and_deploy_aws.sh 

Actual results:

Test fails b/c upload to S3 fails


Expected results:

Test passes, host files are not modified.

Additional info:

Last month I also saw the same issue on a RHEL 7.5 slave and upgraded to a 7.6 image in OpenStack which seemed to resolve it.

Maybe this is not happening on every occasion b/c back then I was able to reproduce on 7.5 and not on 7.6.

Maybe it has to do with what other packages are installed or their versions.

Comment 3 Alexander Todorov 2019-01-28 13:26:50 UTC
Created attachment 1524211 [details]
/var/log/lorax-composer/composer.log

Comment 4 Alexander Todorov 2019-01-28 13:27:03 UTC
Created attachment 1524212 [details]
/var/log/lorax-composer/program.log

Comment 5 Alexander Todorov 2019-01-28 13:27:13 UTC
Created attachment 1524213 [details]
/var/log/lorax-composer/server.log

Comment 6 Alexander Todorov 2019-01-28 13:27:31 UTC
Created attachment 1524214 [details]
/var/log/lorax-composer/yum.log

Comment 7 Alexander Todorov 2019-01-28 13:28:12 UTC
Created attachment 1524215 [details]
/var/lib/lorax/composer/results/72e435fb-e7b9-47d6-abce-8866e886d08f/logs/anaconda/anaconda.log

Comment 8 Alexander Todorov 2019-01-28 13:28:29 UTC
Created attachment 1524217 [details]
/var/lib/lorax/composer/results/72e435fb-e7b9-47d6-abce-8866e886d08f/logs/anaconda/packaging.log

Comment 9 Alexander Todorov 2019-01-28 13:28:39 UTC
Created attachment 1524218 [details]
/var/lib/lorax/composer/results/72e435fb-e7b9-47d6-abce-8866e886d08f/logs/anaconda/program.log

Comment 10 Alexander Todorov 2019-01-28 13:28:49 UTC
Created attachment 1524219 [details]
/var/lib/lorax/composer/results/72e435fb-e7b9-47d6-abce-8866e886d08f/logs/anaconda/storage.log

Comment 12 Alexander Todorov 2019-01-29 11:48:36 UTC
More info from devel:

(13,26,56) mkolman: atodorov: this is what seems to touch /etc/localtime from Anaconda: https://github.com/rhinstaller/anaconda/blob/rhel7-branch/pyanaconda/timezone.py#L90
(13,27,51) mkolman: atodorov: but it seems to be called correctly here: https://github.com/rhinstaller/anaconda/blob/rhel7-branch/pyanaconda/kickstart.py#L1888
(13,28,14) mkolman: if iutil.getSysroot() returned incorrect values, much more would be broken than just this
(13,30,34) mkolman: atodorov: also, looks like we just symlink /etc/localtime to some tz file
(13,30,57) mkolman: atodorov: so if the file is overwritten in place, it's something else doing that

In lorax sources I see:

share/runtime-cleanup.tmpl:removefrom glibc /etc/gai.conf /etc/ld.so.conf /etc/localtime /etc/rpc

but I don't know ATM why and how this is used.

Brian,
does this lead you in the right direction? This one is a serious issue?

Comment 13 Brian Lane 2019-01-29 19:56:29 UTC
runtime-cleanup isn't used by composer so that's not related.

I'm trying to reproduce this in a 7.6 VM and am not seeing it happen. I changed my /etc/localtime to point to a zone in Asia and setup an audit to log changes to it, and it hasn't changed. Tried building ami, qcow2, and tar.

Anaconda is what sets up the timezone from the kickstart, so I'm pretty confident that whatever the issue is, it's an anaconda bug.

Comment 14 Alexander Todorov 2019-01-29 20:03:00 UTC
Brian,
can you distil this to an anaconda command which I can use for reproduction ?

Also there could be differences between the 7.6 host you are using and the one I am using (has updates applied to it and pulp repos enabled).

Comment 15 Brian Lane 2019-01-29 21:39:50 UTC
(In reply to Alexander Todorov from comment #14)
> Brian,
> can you distil this to an anaconda command which I can use for reproduction ?
> 
> Also there could be differences between the 7.6 host you are using and the
> one I am using (has updates applied to it and pulp repos enabled).

Well, since I can't reproduce it, I'm not 100% sure, but you should be able to take a final-kickstart.ks and run it like this:

anaconda --kickstart final-kickstart.ks --cmdline --repo http://url-of-repo --dirinstall

which will leave the files in /mnt/sysimage/

Comment 21 Alexander Todorov 2019-02-12 18:21:38 UTC
> if the time isn't close enough I know that some crypto protocols
> will fail, which is what seems to be happening here.
> 

for the record: Amazon S3 will check timestamps when you try to upload files and if they differ with more than 15 minutes it will fail with RequestTimeTooSkewed. Unfortunately this is very common error and Google tells you to configure NTP to point to Amazon's servers to sync the time, which we already do.

Comment 22 Alexander Todorov 2019-05-09 16:26:07 UTC
More updates:
I provisioned a slave manually, applied the ansible configuration and left it idle. 3+hrs and the clock is still in sync. /etc/localtime symlink hasn't been changed.

Tomorrow I will try running a bash script from Jenkins that doesn't do anything but print on the terminal.

Comment 23 Alexander Todorov 2019-05-10 07:54:29 UTC
(In reply to Alexander Todorov from comment #22)
> More updates:
> I provisioned a slave manually, applied the ansible configuration and left
> it idle. 3+hrs and the clock is still in sync. /etc/localtime symlink hasn't
> been changed.
> 

After about 20hrs still no drift in the clock.

Comment 29 Brian Lane 2019-07-18 20:18:55 UTC
I haven't seen any evidence of the original bug here in any of my testing -- as far as I can tell it is not modifying the host system. Can we close this? Or push it to 7.8?

Comment 30 Alexander Todorov 2019-07-22 08:15:10 UTC
(In reply to Brian Lane from comment #29)
> I haven't seen any evidence of the original bug here in any of my testing --
> as far as I can tell it is not modifying the host system. Can we close this?
> Or push it to 7.8?

We still see this in CI, even after the switch to Cockpit CI. You can push to 7.8 but the problem remains and I've not been able to make any progress towards figuring out what is happening.

Comment 49 Brian Lane 2020-03-16 15:43:41 UTC
Track this with bug #1766785

Comment 50 Brian Lane 2020-03-16 15:43:42 UTC
Track this with bug #1766785