Description of problem: Started a deployment of OpenStack, one of the assigned vm hosts reboots and downloads the vmlinuz and initrd for RHEL 7, which seems correct for an installation, but then it immediately reboots. It keeps repeating this same pattern, stuck in an endless boot loop. The deployment status page never updates and is stuck at 15% deployment for the first node. I tried a Force Stop from the deployment page, and that had no effect on actually stopping the attempted deployment either. Let me know if there are any logs to attach to help troubleshoot/debug. Version-Release number of selected component (if applicable): [root@staypuft ~]# rpm -q foreman-installer-staypuft foreman-installer-staypuft-0.0.14-1.el6ost.noarch
*** Bug 1105595 has been marked as a duplicate of this bug. ***
on advice from mburns I replaced /var/lib/tftpboot/boot/RedHat-7.0-x86_64-initrd.img and /var/lib/tftpboot/boot/RedHat-7.0-x86_64-vmlinuz with the initrd and vmlinuz from http://download.eng.rdu2.redhat.com/rel-eng/RHEL-7.0-RC-2.0/compose/Server/x86_64/os/isolinux/, and then I got a booting instance. No idea why the original initrd/vmlinuz didn't work, or where Foreman even got them from.
(In reply to James Slagle from comment #3) > on advice from mburns I replaced > /var/lib/tftpboot/boot/RedHat-7.0-x86_64-initrd.img and > /var/lib/tftpboot/boot/RedHat-7.0-x86_64-vmlinuz with the initrd and vmlinuz > from > http://download.eng.rdu2.redhat.com/rel-eng/RHEL-7.0-RC-2.0/compose/Server/ > x86_64/os/isolinux/, and then I got a booting instance. > > No idea why the original initrd/vmlinuz didn't work, or where Foreman even > got them from. James, is this reproducible or was just a one time thing? If the issue is that Foreman had an older version of the initrd/vmlinuz files and then you corrected the issue by using the correct versions of these files, then this is likely notabug? Yaniv, I'm curious about your rationale for marking this a blocker given James' comments in Comment #3. Based on his comments, this bug isn't even really confirmed, much less a genuine blocker.
I reinstalled my foreman staypuft host, and this time I got something slightly different, but it is related to wrong initrd and vmlinuz files. This time they where 0 byte files: -rw-r--r--. 1 foreman-proxy foreman-proxy 0 Jun 9 16:02 RedHat-7.0-x86_64-initrd.img -rw-r--r--. 1 foreman-proxy foreman-proxy 0 Jun 9 16:02 RedHat-7.0-x86_64-vmlinuz This resulted in a "Could not find kernel image" error on the booting vm's and a boot: prompt. Again I downloaded the files as in comment #3, and things worked after that. I'm updating the bug title to just be "Corrupted/missing RHEL 7 vmlinuz/initrd files". I'm following the instructions from: http://etherpad.corp.redhat.com/Create-staypuft-test-environment
A little more info that is likely relevant. The first time I installed, when prompted for the subscription-manager credentials from staypuft-installer, I used the following values: Enter your subscription manager credentials?: 1. Subscription manager username: jslagle 2. Subscription manager password: ******** 3. Comma separated repositories: rhel-6-server-openstack-4.0-rpms 4. RHEL repo path (http(s) or nfs URL): http://download.eng.rdu2.redhat.com/rel-eng/RHEL-7.0-RC-2.0/compose/Server/x86_64/os/ 5. Subscription manager pool (optional): 6. Proceed with configuration 7. Skip this step (provisioning won't subscribe your machines) That resulted in the wrong vmlinuz/initrd files that gave me the boot loop. The second time I reinstalled, I followed the instructions from http://etherpad.corp.redhat.com/Create-staypuft-test-environment and just skipped the sub-man step completely by pressing 7. And that's likely why I got 0 byte vmlinuz and initrd files. So, I really can't tell if there's a bug here or user error. What do we expect people to do at this step? Where should they get the vmlinuz/initrd files from?
Dug into this a bit more: Foreman is downloading the wrong vmlinuz/initrd. Here's the path I set for my RHEL 7 repo: http://download.eng.rdu2.redhat.com/rel-eng/latest-RHEL-7/compose/Server/x86_64/os/ Foreman downloaded the vmlinuz/initrd file from under: http://download.eng.rdu2.redhat.com/rel-eng/latest-RHEL-7/compose/Server/x86_64/os/images/pxeboot/ This causes the endless reboot loop. If I replace the vmlinuz/initrd under /var/lib/tftpboot/boot with the ones from the following instead: http://download.eng.rdu2.redhat.com/rel-eng/latest-RHEL-7/compose/Server/x86_64/os/isolinux/ Then the PXE deployment process works as expected. The ones from under isolinux should be downloaded automatically, not the ones under images/pxeboot.
Just to clarify from the above questions... (In reply to Perry Myers from comment #4) > (In reply to James Slagle from comment #3) > > on advice from mburns I replaced > > /var/lib/tftpboot/boot/RedHat-7.0-x86_64-initrd.img and > > /var/lib/tftpboot/boot/RedHat-7.0-x86_64-vmlinuz with the initrd and vmlinuz > > from > > http://download.eng.rdu2.redhat.com/rel-eng/RHEL-7.0-RC-2.0/compose/Server/ > > x86_64/os/isolinux/, and then I got a booting instance. > > > > No idea why the original initrd/vmlinuz didn't work, or where Foreman even > > got them from. > > James, is this reproducible or was just a one time thing? Yes, 100% reproducable, see comment #7. > > If the issue is that Foreman had an older version of the initrd/vmlinuz > files and then you corrected the issue by using the correct versions of > these files, then this is likely notabug? This is a bug, Foreman is downloading the wrong vmlinuz/initrd. > > Yaniv, I'm curious about your rationale for marking this a blocker given > James' comments in Comment #3. Based on his comments, this bug isn't even > really confirmed, much less a genuine blocker. Likely not a blocker, given there is a workaround.
Moving to foreman. This is core foreman functionality, not staypuft.
(In reply to James Slagle from comment #7) > Foreman downloaded the vmlinuz/initrd file from under: > http://download.eng.rdu2.redhat.com/rel-eng/latest-RHEL-7/compose/Server/ > x86_64/os/images/pxeboot/ > > This causes the endless reboot loop. Are the downloaded files zero bytes, or fully complete? Do the md5sums match those from the mirror? Please supply corresponding debug logs from /var/log/foreman-proxy/proxy.log by setting :log_level: DEBUG in /etc/foreman-proxy/settings.yml.
Possible NOTABUG here. After reproducing 3 times yesterday, I can no longer reproduce it today on a new install. I've attached the proxy.log DEBUG output. Interesting enough, it shows an error downloading the vmlinuz and initrd. But, the downloaded files actually *do* seem to work fine. The md5sums of the downloaded files do match the ones from the server: [root@staypuft boot]# md5sum RedHat-7.0-x86_64-vmlinuz RedHat-7.0-x86_64-initrd.img 8edbd2e995aa094b8fb850eb1b0a9399 RedHat-7.0-x86_64-vmlinuz 5960d2340c6fded06f52d06d29878025 RedHat-7.0-x86_64-initrd.img [jslagle@sh-el6 pxeboot]$ md5sum vmlinuz initrd.img 8edbd2e995aa094b8fb850eb1b0a9399 vmlinuz 5960d2340c6fded06f52d06d29878025 initrd.img
Created attachment 908530 [details] proxy log
Yeah, the log level shouldn't be error, it's incorrect. It does look like it's working properly now, perhaps it was an issue with the mirror or hadn't completed the download (it's async). Thanks for the update, will close until further notice.
I saw this also; manually downloading the images corrected the problem.
Reopening this. We have tons of users seeing this issue intermittently.
We've had a handful of staypuft users report this same issue. I think there's something deeper going on here, possibly something intermittent in the async download task causing corrupted downloads. The symptoms might be different, endless boot loop, kernel panic, etc... But the solution always seems to be the same, manually download the vmlinuz/initrd, and then everything works.
I saw this also, where the md5sum for the vmlinuz matched the master version but the md5sum for the initrd did not. And in a prior failed attempt to install staypuft, I noticed both of these files in the tftp directory had zero length. Manually downloading the images, chown/chgrp to foreman-proxy, and restorecon was required to resolve.
Please provide the information about MD5 sums, file sizes and logs requested in comment #10 when it occurs. The only data provided so far was when it worked (comment #11).
i been able to reliably reproduce this now and I believe what triggers this is assigning a 2nd host to a host group before the vmlinuz/initrd download has finished from the first host to host group assignment. according to the foreman-proxy.log, the 2nd assignment causes an additional background task to get started to download the files. You then have multiple downloads clobbering each other. likewise for any subsequent assignment if there is already a download running. I've attached my foreman-proxy.log. It doesn't show anything different really from a successful download, but you can at least see it starting an additional download task before the first has even finished. I end up with files that are much larger than they should be: [root@staypuft boot]# pwd /var/lib/tftpboot/boot [root@staypuft boot]# ll -h total 277M -rw-r--r--. 1 foreman-proxy root 165M Jun 5 14:45 foreman-discovery-image-latest.el6.iso-img -rw-r--r--. 1 foreman-proxy root 3.9M Jun 5 14:45 foreman-discovery-image-latest.el6.iso-vmlinuz -rw-r--r--. 1 foreman-proxy foreman-proxy 101M May 7 03:39 RedHat-7.0-x86_64-initrd.img -rw-r--r--. 1 foreman-proxy foreman-proxy 7.4M May 5 11:21 RedHat-7.0-x86_64-vmlinuz The repo path I have configured is http://download.eng.rdu2.redhat.com/rel-eng/RHEL-7.0-RC-3.1/compose/Server/x86_64/os/
Created attachment 909601 [details] proxy log showing additional download tasks started before previous ones finish
Thanks for the data James.
this is still hapenning with: # rpm -qa |grep foreman foreman-1.6.0.21-2.el6sat.noarch ruby193-rubygem-foreman_discovery-1.3.0-2.el6sat.noarch foreman-postgresql-1.6.0.21-2.el6sat.noarch foreman-proxy-1.6.0.8-1.el6sat.noarch foreman-mysql2-1.6.0.21-2.el6sat.noarch foreman-installer-1.5.0-0.6.RC2.el6ost.noarch ruby193-rubygem-foreman-tasks-0.6.4-2.el6sat.noarch rubygem-foreman_api-0.1.11-4.el6sat.noarch openstack-foreman-installer-2.0.16-1.el6ost.noarch foreman-selinux-1.6.0.3-2.el6sat.noarch foreman-discovery-image-6.5-20140620.2.el6sat.noarch ruby193-rubygem-foreman_openstack_simplify-0.0.6-8.el6ost.noarch basically, from poodle: http://ayanami.boston.devel.redhat.com/poodles/rhos-devel-ci/foreman.el6/2014-07-25.5
*** Bug 1102876 has been marked as a duplicate of this bug. ***
Reproduced with rhel-osp-installer-0.1.10-2.el6ost.noarch
Unable to reproduce using staypuft puddle /Foreman/2014-09-12.1 Environment: ------------- rhel-osp-installer-0.3.4-3.el6ost.noarch ruby193-rubygem-staypuft-0.3.4-2.el6ost.noarch foreman-1.6.0.44-2.el6ost.noarch foreman-installer-1.6.0-0.2.RC1.el6ost.noarch openstack-puppet-modules-2014.1-21.8.el6ost.noarch puppet-3.6.2-1.1.el6.noarch (*the ticket will re-open in case the issue pops again)
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2014-1350.html