Bug 1105594 - Wrong vmlinuz/initrd files downloaded by Foreman
Summary: Wrong vmlinuz/initrd files downloaded by Foreman
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: foreman-proxy
Version: 5.0 (RHEL 7)
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: z1
: Installer
Assignee: Dmitri Dolguikh
QA Contact: Omri Hochman
URL:
Whiteboard:
: 1102876 1105595 (view as bug list)
Depends On: 1110378 1121172
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-06-06 13:20 UTC by James Slagle
Modified: 2018-09-07 19:14 UTC (History)
14 users (show)

Fixed In Version: foreman-proxy-1.6.0.28-1.el6sat
Doc Type: Known Issue
Doc Text:
Currently, in rare cases, Foreman might corrupt the downloaded vmlinuz and initrd files. As a consequence, hosts deployed using the corrupted images fail to boot and provision. Workaround: Manually copy or download the files to the /var/lib/tftpboot/boot directory replacing the corrupted files. The files can be found on the ISO under the images/pxeboot directory. Once you have copied over the correct files, the provisioning will work with the updated image files.
Clone Of:
: 1110378 (view as bug list)
Environment:
Last Closed: 2014-10-01 13:24:45 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
proxy log (11.73 KB, text/plain)
2014-06-13 12:05 UTC, James Slagle
no flags Details
proxy log showing additional download tasks started before previous ones finish (303.33 KB, text/plain)
2014-06-17 14:03 UTC, James Slagle
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2014:1350 0 normal SHIPPED_LIVE Red Hat Enterprise Linux OpenStack Platform Bug Fix Advisory 2014-10-01 17:22:34 UTC

Description James Slagle 2014-06-06 13:20:55 UTC
Description of problem:
Started a deployment of OpenStack, one of the assigned vm hosts reboots and downloads the vmlinuz and initrd for RHEL 7, which seems correct for an installation, but then it immediately reboots.

It keeps repeating this same pattern, stuck in an endless boot loop.

The deployment status page never updates and is stuck at 15% deployment for the first node.

I tried a Force Stop from the deployment page, and that had no effect on actually stopping the attempted deployment either.

Let me know if there are any logs to attach to help troubleshoot/debug.

Version-Release number of selected component (if applicable):
[root@staypuft ~]# rpm -q foreman-installer-staypuft
foreman-installer-staypuft-0.0.14-1.el6ost.noarch

Comment 2 Mike Burns 2014-06-06 13:36:46 UTC
*** Bug 1105595 has been marked as a duplicate of this bug. ***

Comment 3 James Slagle 2014-06-06 20:06:21 UTC
on advice from mburns I replaced /var/lib/tftpboot/boot/RedHat-7.0-x86_64-initrd.img and /var/lib/tftpboot/boot/RedHat-7.0-x86_64-vmlinuz with the initrd and vmlinuz from http://download.eng.rdu2.redhat.com/rel-eng/RHEL-7.0-RC-2.0/compose/Server/x86_64/os/isolinux/, and then I got a booting instance.

No idea why the original initrd/vmlinuz didn't work, or where Foreman even got them from.

Comment 4 Perry Myers 2014-06-08 21:41:03 UTC
(In reply to James Slagle from comment #3)
> on advice from mburns I replaced
> /var/lib/tftpboot/boot/RedHat-7.0-x86_64-initrd.img and
> /var/lib/tftpboot/boot/RedHat-7.0-x86_64-vmlinuz with the initrd and vmlinuz
> from
> http://download.eng.rdu2.redhat.com/rel-eng/RHEL-7.0-RC-2.0/compose/Server/
> x86_64/os/isolinux/, and then I got a booting instance.
> 
> No idea why the original initrd/vmlinuz didn't work, or where Foreman even
> got them from.

James, is this reproducible or was just a one time thing?

If the issue is that Foreman had an older version of the initrd/vmlinuz files and then you corrected the issue by using the correct versions of these files, then this is likely notabug?

Yaniv, I'm curious about your rationale for marking this a blocker given James' comments in Comment #3.  Based on his comments, this bug isn't even really confirmed, much less a genuine blocker.

Comment 5 James Slagle 2014-06-09 20:52:01 UTC
I reinstalled my foreman staypuft host, and this time I got something slightly different, but it is related to wrong initrd and vmlinuz files. This time they where 0 byte files:

-rw-r--r--. 1 foreman-proxy foreman-proxy         0 Jun  9 16:02 RedHat-7.0-x86_64-initrd.img
-rw-r--r--. 1 foreman-proxy foreman-proxy         0 Jun  9 16:02 RedHat-7.0-x86_64-vmlinuz

This resulted in a "Could not find kernel image" error on the booting vm's and a boot: prompt.

Again I downloaded the files as in comment #3, and things worked after that.

I'm updating the bug title to just be "Corrupted/missing RHEL 7 vmlinuz/initrd files".

I'm following the instructions from:
http://etherpad.corp.redhat.com/Create-staypuft-test-environment

Comment 6 James Slagle 2014-06-09 20:56:51 UTC
A little more info that is likely relevant.

The first time I installed, when prompted for the subscription-manager credentials from staypuft-installer, I used the following values:

Enter your subscription manager credentials?:
1. Subscription manager username:       jslagle
2. Subscription manager password:       ********
3. Comma separated repositories:        rhel-6-server-openstack-4.0-rpms
4. RHEL repo path (http(s) or nfs URL): http://download.eng.rdu2.redhat.com/rel-eng/RHEL-7.0-RC-2.0/compose/Server/x86_64/os/
5. Subscription manager pool (optional): 
6. Proceed with configuration
7. Skip this step (provisioning won't subscribe your machines)

That resulted in the wrong vmlinuz/initrd files that gave me the boot loop.

The second time I reinstalled, I followed the instructions from http://etherpad.corp.redhat.com/Create-staypuft-test-environment and just skipped the sub-man step completely by pressing 7.

And that's likely why I got 0 byte vmlinuz and initrd files.

So, I really can't tell if there's a bug here or user error. 

What do we expect people to do at this step? Where should they get the vmlinuz/initrd files from?

Comment 7 James Slagle 2014-06-12 12:42:17 UTC
Dug into this a bit more: Foreman is downloading the wrong vmlinuz/initrd.

Here's the path I set for my RHEL 7 repo:
http://download.eng.rdu2.redhat.com/rel-eng/latest-RHEL-7/compose/Server/x86_64/os/

Foreman downloaded the  vmlinuz/initrd file from under:
http://download.eng.rdu2.redhat.com/rel-eng/latest-RHEL-7/compose/Server/x86_64/os/images/pxeboot/

This causes the endless reboot loop.

If I replace the vmlinuz/initrd under /var/lib/tftpboot/boot with the ones from the following instead:
http://download.eng.rdu2.redhat.com/rel-eng/latest-RHEL-7/compose/Server/x86_64/os/isolinux/

Then the PXE deployment process works as expected.

The ones from under isolinux should be downloaded automatically, not the ones under images/pxeboot.

Comment 8 James Slagle 2014-06-12 12:45:57 UTC
Just to clarify from the above questions...

(In reply to Perry Myers from comment #4)
> (In reply to James Slagle from comment #3)
> > on advice from mburns I replaced
> > /var/lib/tftpboot/boot/RedHat-7.0-x86_64-initrd.img and
> > /var/lib/tftpboot/boot/RedHat-7.0-x86_64-vmlinuz with the initrd and vmlinuz
> > from
> > http://download.eng.rdu2.redhat.com/rel-eng/RHEL-7.0-RC-2.0/compose/Server/
> > x86_64/os/isolinux/, and then I got a booting instance.
> > 
> > No idea why the original initrd/vmlinuz didn't work, or where Foreman even
> > got them from.
> 
> James, is this reproducible or was just a one time thing?

Yes, 100% reproducable, see comment #7.

> 
> If the issue is that Foreman had an older version of the initrd/vmlinuz
> files and then you corrected the issue by using the correct versions of
> these files, then this is likely notabug?

This is a bug, Foreman is downloading the wrong vmlinuz/initrd.

> 
> Yaniv, I'm curious about your rationale for marking this a blocker given
> James' comments in Comment #3.  Based on his comments, this bug isn't even
> really confirmed, much less a genuine blocker.

Likely not a blocker, given there is a workaround.

Comment 9 Mike Burns 2014-06-12 13:40:55 UTC
Moving to foreman.  This is core foreman functionality, not staypuft.

Comment 10 Dominic Cleal 2014-06-12 13:59:38 UTC
(In reply to James Slagle from comment #7)
> Foreman downloaded the  vmlinuz/initrd file from under:
> http://download.eng.rdu2.redhat.com/rel-eng/latest-RHEL-7/compose/Server/
> x86_64/os/images/pxeboot/
> 
> This causes the endless reboot loop.

Are the downloaded files zero bytes, or fully complete?  Do the md5sums match those from the mirror?

Please supply corresponding debug logs from /var/log/foreman-proxy/proxy.log by setting :log_level: DEBUG in /etc/foreman-proxy/settings.yml.

Comment 11 James Slagle 2014-06-13 12:05:04 UTC
Possible NOTABUG here. After reproducing 3 times yesterday, I can no longer reproduce it today on a new install. I've attached the proxy.log DEBUG output. Interesting enough, it shows an error downloading the vmlinuz and initrd. But, the downloaded files actually *do* seem to work fine.

The md5sums of the downloaded files do  match the ones from the server:
[root@staypuft boot]# md5sum RedHat-7.0-x86_64-vmlinuz RedHat-7.0-x86_64-initrd.img
8edbd2e995aa094b8fb850eb1b0a9399  RedHat-7.0-x86_64-vmlinuz
5960d2340c6fded06f52d06d29878025  RedHat-7.0-x86_64-initrd.img

[jslagle@sh-el6 pxeboot]$ md5sum vmlinuz initrd.img 
8edbd2e995aa094b8fb850eb1b0a9399  vmlinuz
5960d2340c6fded06f52d06d29878025  initrd.img

Comment 12 James Slagle 2014-06-13 12:05:43 UTC
Created attachment 908530 [details]
proxy log

Comment 13 Dominic Cleal 2014-06-13 12:17:28 UTC
Yeah, the log level shouldn't be error, it's incorrect.  It does look like it's working properly now, perhaps it was an issue with the mirror or hadn't completed the download (it's async).  Thanks for the update, will close until further notice.

Comment 14 Lars Kellogg-Stedman 2014-06-13 14:41:55 UTC
I saw this also; manually downloading the images corrected the problem.

Comment 15 Perry Myers 2014-06-13 14:43:40 UTC
Reopening this.  We have tons of users seeing this issue intermittently.

Comment 16 James Slagle 2014-06-13 14:45:00 UTC
We've had a handful of staypuft users report this same issue. I think there's something deeper going on here, possibly something intermittent in the async download task causing corrupted downloads.

The symptoms might be different, endless boot loop, kernel panic, etc...

But the solution always seems to be the same, manually download the vmlinuz/initrd, and then everything works.

Comment 17 Eoghan Glynn 2014-06-13 14:46:42 UTC
I saw this also, where the md5sum for the vmlinuz matched the master version but the md5sum for the initrd did not.

And in a prior failed attempt to install staypuft, I noticed both of these files in the tftp directory had zero length. 

Manually downloading the images, chown/chgrp to foreman-proxy, and restorecon was required to resolve.

Comment 18 Dominic Cleal 2014-06-13 14:48:15 UTC
Please provide the information about MD5 sums, file sizes and logs requested in comment #10 when it occurs.  The only data provided so far was when it worked (comment #11).

Comment 20 James Slagle 2014-06-17 14:02:27 UTC
i been able to reliably reproduce this now and I believe what triggers this is assigning a 2nd host to a host group before the vmlinuz/initrd download has finished from the first host to host group assignment.

according to the foreman-proxy.log, the 2nd assignment causes an additional background task to get started to download the files. You then have multiple downloads clobbering each other.

likewise for any subsequent assignment if there is already a download running.

I've attached my foreman-proxy.log. It doesn't show anything different really from a successful download, but you can at least see it starting an additional download task before the first has even finished.

I end up with files that are much larger than they should be:
[root@staypuft boot]# pwd
/var/lib/tftpboot/boot
[root@staypuft boot]# ll -h
total 277M
-rw-r--r--. 1 foreman-proxy root          165M Jun  5 14:45 foreman-discovery-image-latest.el6.iso-img
-rw-r--r--. 1 foreman-proxy root          3.9M Jun  5 14:45 foreman-discovery-image-latest.el6.iso-vmlinuz
-rw-r--r--. 1 foreman-proxy foreman-proxy 101M May  7 03:39 RedHat-7.0-x86_64-initrd.img
-rw-r--r--. 1 foreman-proxy foreman-proxy 7.4M May  5 11:21 RedHat-7.0-x86_64-vmlinuz


The repo path I have configured is http://download.eng.rdu2.redhat.com/rel-eng/RHEL-7.0-RC-3.1/compose/Server/x86_64/os/

Comment 21 James Slagle 2014-06-17 14:03:44 UTC
Created attachment 909601 [details]
proxy log showing additional download tasks started before previous ones finish

Comment 22 Dominic Cleal 2014-06-17 14:06:35 UTC
Thanks for the data James.

Comment 29 Ami Jeain 2014-07-28 08:17:39 UTC
this is still hapenning with:

# rpm -qa |grep foreman
foreman-1.6.0.21-2.el6sat.noarch
ruby193-rubygem-foreman_discovery-1.3.0-2.el6sat.noarch
foreman-postgresql-1.6.0.21-2.el6sat.noarch
foreman-proxy-1.6.0.8-1.el6sat.noarch
foreman-mysql2-1.6.0.21-2.el6sat.noarch
foreman-installer-1.5.0-0.6.RC2.el6ost.noarch
ruby193-rubygem-foreman-tasks-0.6.4-2.el6sat.noarch
rubygem-foreman_api-0.1.11-4.el6sat.noarch
openstack-foreman-installer-2.0.16-1.el6ost.noarch
foreman-selinux-1.6.0.3-2.el6sat.noarch
foreman-discovery-image-6.5-20140620.2.el6sat.noarch
ruby193-rubygem-foreman_openstack_simplify-0.0.6-8.el6ost.noarch

basically, from poodle:
http://ayanami.boston.devel.redhat.com/poodles/rhos-devel-ci/foreman.el6/2014-07-25.5

Comment 31 Mike Burns 2014-08-05 13:35:49 UTC
*** Bug 1102876 has been marked as a duplicate of this bug. ***

Comment 33 Alexander Chuzhoy 2014-08-14 18:17:51 UTC
Reproduced with rhel-osp-installer-0.1.10-2.el6ost.noarch

Comment 38 Omri Hochman 2014-09-16 12:53:17 UTC
Unable to reproduce using staypuft puddle /Foreman/2014-09-12.1


Environment:
------------- 
rhel-osp-installer-0.3.4-3.el6ost.noarch
ruby193-rubygem-staypuft-0.3.4-2.el6ost.noarch
foreman-1.6.0.44-2.el6ost.noarch
foreman-installer-1.6.0-0.2.RC1.el6ost.noarch
openstack-puppet-modules-2014.1-21.8.el6ost.noarch
puppet-3.6.2-1.1.el6.noarch


(*the ticket will re-open in case the issue pops again)

Comment 40 errata-xmlrpc 2014-10-01 13:24:45 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2014-1350.html


Note You need to log in before you can comment on or make changes to this bug.