Bug 1110378 - vmlinuz/initrd files corrupted during when multiple hosts created simultaneously
Summary: vmlinuz/initrd files corrupted during when multiple hosts created simultaneously
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Satellite
Classification: Red Hat
Component: Provisioning
Version: 6.0.3
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: Unspecified
Assignee: Dmitri Dolguikh
QA Contact: Katello QA List
URL: http://projects.theforeman.org/issues...
Whiteboard:
Depends On:
Blocks: 1105594
TreeView+ depends on / blocked
 
Reported: 2014-06-17 14:09 UTC by Dominic Cleal
Modified: 2016-04-26 16:27 UTC (History)
12 users (show)

Fixed In Version:
Doc Type: Known Issue
Doc Text:
Bare-metal provisioning of multiple hosts can fail downloading corrupt kernel or initrd image. This only occurs when the hosts are booting same OS. Kernel and image are being downloaded asynchronously by capsule and due to race condition it can happen that image which is currently being downloaded by Anaconda is overwritten with another host. To prevent this behavior, wait until Anaconda installers are loaded before another host with the same operating system and version is created.
Clone Of: 1105594
Environment:
Last Closed: 2014-09-11 12:30:14 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Foreman Issue Tracker 6289 0 Normal Closed vmlinuz/initrd files corrupted during when multiple hosts created simultaneously 2020-07-08 01:56:47 UTC

Comment 1 Dominic Cleal 2014-06-17 14:10:04 UTC
"
--- Additional comment from James Slagle on 2014-06-17 15:02:27 BST ---

i been able to reliably reproduce this now and I believe what triggers this is assigning a 2nd host to a host group before the vmlinuz/initrd download has finished from the first host to host group assignment.

according to the foreman-proxy.log, the 2nd assignment causes an additional background task to get started to download the files. You then have multiple downloads clobbering each other.

likewise for any subsequent assignment if there is already a download running.

I've attached my foreman-proxy.log. It doesn't show anything different really from a successful download, but you can at least see it starting an additional download task before the first has even finished.

I end up with files that are much larger than they should be:
[root@staypuft boot]# pwd
/var/lib/tftpboot/boot
[root@staypuft boot]# ll -h
total 277M
-rw-r--r--. 1 foreman-proxy root          165M Jun  5 14:45 foreman-discovery-image-latest.el6.iso-img
-rw-r--r--. 1 foreman-proxy root          3.9M Jun  5 14:45 foreman-discovery-image-latest.el6.iso-vmlinuz
-rw-r--r--. 1 foreman-proxy foreman-proxy 101M May  7 03:39 RedHat-7.0-x86_64-initrd.img
-rw-r--r--. 1 foreman-proxy foreman-proxy 7.4M May  5 11:21 RedHat-7.0-x86_64-vmlinuz
"

Comment 4 Bryan Kearney 2014-06-19 16:04:58 UTC
Upstream bug assigned to ddolguik@redhat.com

Comment 5 Bryan Kearney 2014-06-25 16:52:36 UTC
Per discussions with QE, moving this to 6.0.4... keeping as a blocker.

Comment 6 Bryan Kearney 2014-07-07 14:05:04 UTC
Moving to POST since upstream bug http://projects.theforeman.org/issues/6289 has been closed

Comment 8 Dominic Cleal 2014-09-01 10:41:23 UTC
Verified using two browsers and a particularly slow installation medium.

Created two hosts simultaneously, saw two requests for TFTP files (x2 files) reach the proxy:

redacted - - [01/Sep/2014 06:35:14] "POST /tftp/fetch_boot_file HTTP/1.1" 200 - 0.0035
redacted - - [01/Sep/2014 06:35:14] "POST /tftp/fetch_boot_file HTTP/1.1" 200 - 0.0032
redacted - - [01/Sep/2014 06:35:17] "POST /tftp/fetch_boot_file HTTP/1.1" 200 - 0.0009
redacted - - [01/Sep/2014 06:35:17] "POST /tftp/fetch_boot_file HTTP/1.1" 200 - 0.0008

And verified the TFTP files took a while to download and weren't corrupt:

9f281e85900e73e6fe9b2422ff6ef1d2  /var/lib/tftpboot/boot/CentOS-6.5-x86_64-initrd.img
206748238490c0e50a88bc053d3d5f87  /var/lib/tftpboot/boot/CentOS-6.5-x86_64-vmlinuz

Comment 10 Kedar Bidarkar 2014-09-02 10:16:58 UTC
Tested with sat6-GA-snap7.

QE VERIFIED

I was able to successfully PXEboot a Centos using the above initrd.img and vmlinuz TFTP files. which tells vmlinuz/initrd files are no longer corrupted when mutiple hosts are created simultaneously.

NOTE:- As the above test for multiple hosts is already performed, just making sure the TFTP files are not corrupted by PXebooting a host.



Installed Packages

    candlepin-0.9.23-1.el6_5.noarch
    candlepin-common-1.0.1-1.el6_5.noarch
    candlepin-scl-1-5.el6_4.noarch
    candlepin-scl-quartz-2.1.5-5.el6_4.noarch
    candlepin-scl-rhino-1.7R3-1.el6_4.noarch
    candlepin-scl-runtime-1-5.el6_4.noarch
    candlepin-selinux-0.9.23-1.el6_5.noarch
    candlepin-tomcat6-0.9.23-1.el6_5.noarch
    createrepo-0.9.9-21.2.pulp.el6sat.noarch
    elasticsearch-0.90.10-6.el6sat.noarch
    katello-1.5.0-30.el6sat.noarch
    katello-certs-tools-1.5.6-1.el6sat.noarch
    katello-default-ca-1.0-1.noarch
    katello-installer-0.0.64-1.el6sat.noarch
    katello-server-ca-1.0-1.noarch
    mod_wsgi-3.4-1.pulp.el6sat.x86_64
    pulp-katello-0.3-4.el6sat.noarch
    pulp-nodes-common-2.4.1-0.5.rc1.el6sat.noarch
    pulp-nodes-parent-2.4.1-0.5.rc1.el6sat.noarch
    pulp-puppet-plugins-2.4.1-0.5.rc1.el6sat.noarch
    pulp-puppet-tools-2.4.1-0.5.rc1.el6sat.noarch
    pulp-rpm-plugins-2.4.1-0.6.beta.el6sat.noarch
    pulp-selinux-2.4.1-0.5.rc1.el6sat.noarch
    pulp-server-2.4.1-0.5.rc1.el6sat.noarch
    python-gofer-qpid-1.3.0-1.el6sat.noarch
    python-isodate-0.5.0-1.pulp.el6sat.noarch
    python-kombu-3.0.15-12.pulp.el6sat.noarch
    python-pulp-bindings-2.4.1-0.5.rc1.el6sat.noarch
    python-pulp-common-2.4.1-0.5.rc1.el6sat.noarch
    python-pulp-puppet-common-2.4.1-0.5.rc1.el6sat.noarch
    python-pulp-rpm-common-2.4.1-0.6.beta.el6sat.noarch
    python-qpid-0.22-14.el6sat.noarch
    python-qpid-qmf-0.22-37.el6.x86_64
    qpid-cpp-client-0.22-42.el6.x86_64
    qpid-cpp-server-0.22-42.el6.x86_64
    qpid-cpp-server-linearstore-0.22-42.el6.x86_64
    qpid-java-client-0.22-6.el6.noarch
    qpid-java-common-0.22-6.el6.noarch
    qpid-proton-c-0.7-1.el6.x86_64
    qpid-qmf-0.22-37.el6.x86_64
    qpid-tools-0.22-12.el6.noarch
    ruby193-rubygem-katello-1.5.0-86.el6sat.noarch
    rubygem-hammer_cli_katello-0.0.4-14.el6sat.noarch
    rubygem-smart_proxy_pulp-1.0.1-1.1.el6sat.noarch

Comment 11 Bryan Kearney 2014-09-11 12:30:14 UTC
This was delivered with Satellite 6.0 which was released on 10 September 2014.


Note You need to log in before you can comment on or make changes to this bug.