Bug 1110378

Summary: vmlinuz/initrd files corrupted during when multiple hosts created simultaneously
Product: Red Hat Satellite Reporter: Dominic Cleal <dcleal>
Component: ProvisioningAssignee: Dmitri Dolguikh <ddolguik>
Status: CLOSED CURRENTRELEASE QA Contact: Katello QA List <katello-qa-list>
Severity: high Docs Contact:
Priority: high    
Version: 6.0.3CC: bkearney, eglynn, jmontleo, jslagle, kbidarka, lars, lhh, lzap, mburns, ohochman, rhos-maint, yeylon
Target Milestone: UnspecifiedKeywords: Reopened, Triaged
Target Release: Unused   
Hardware: Unspecified   
OS: Unspecified   
URL: http://projects.theforeman.org/issues/6289
Whiteboard:
Fixed In Version: Doc Type: Known Issue
Doc Text:
Bare-metal provisioning of multiple hosts can fail downloading corrupt kernel or initrd image. This only occurs when the hosts are booting same OS. Kernel and image are being downloaded asynchronously by capsule and due to race condition it can happen that image which is currently being downloaded by Anaconda is overwritten with another host. To prevent this behavior, wait until Anaconda installers are loaded before another host with the same operating system and version is created.
Story Points: ---
Clone Of: 1105594 Environment:
Last Closed: 2014-09-11 12:30:14 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1105594    

Comment 1 Dominic Cleal 2014-06-17 14:10:04 UTC
"
--- Additional comment from James Slagle on 2014-06-17 15:02:27 BST ---

i been able to reliably reproduce this now and I believe what triggers this is assigning a 2nd host to a host group before the vmlinuz/initrd download has finished from the first host to host group assignment.

according to the foreman-proxy.log, the 2nd assignment causes an additional background task to get started to download the files. You then have multiple downloads clobbering each other.

likewise for any subsequent assignment if there is already a download running.

I've attached my foreman-proxy.log. It doesn't show anything different really from a successful download, but you can at least see it starting an additional download task before the first has even finished.

I end up with files that are much larger than they should be:
[root@staypuft boot]# pwd
/var/lib/tftpboot/boot
[root@staypuft boot]# ll -h
total 277M
-rw-r--r--. 1 foreman-proxy root          165M Jun  5 14:45 foreman-discovery-image-latest.el6.iso-img
-rw-r--r--. 1 foreman-proxy root          3.9M Jun  5 14:45 foreman-discovery-image-latest.el6.iso-vmlinuz
-rw-r--r--. 1 foreman-proxy foreman-proxy 101M May  7 03:39 RedHat-7.0-x86_64-initrd.img
-rw-r--r--. 1 foreman-proxy foreman-proxy 7.4M May  5 11:21 RedHat-7.0-x86_64-vmlinuz
"

Comment 4 Bryan Kearney 2014-06-19 16:04:58 UTC
Upstream bug assigned to ddolguik

Comment 5 Bryan Kearney 2014-06-25 16:52:36 UTC
Per discussions with QE, moving this to 6.0.4... keeping as a blocker.

Comment 6 Bryan Kearney 2014-07-07 14:05:04 UTC
Moving to POST since upstream bug http://projects.theforeman.org/issues/6289 has been closed

Comment 8 Dominic Cleal 2014-09-01 10:41:23 UTC
Verified using two browsers and a particularly slow installation medium.

Created two hosts simultaneously, saw two requests for TFTP files (x2 files) reach the proxy:

redacted - - [01/Sep/2014 06:35:14] "POST /tftp/fetch_boot_file HTTP/1.1" 200 - 0.0035
redacted - - [01/Sep/2014 06:35:14] "POST /tftp/fetch_boot_file HTTP/1.1" 200 - 0.0032
redacted - - [01/Sep/2014 06:35:17] "POST /tftp/fetch_boot_file HTTP/1.1" 200 - 0.0009
redacted - - [01/Sep/2014 06:35:17] "POST /tftp/fetch_boot_file HTTP/1.1" 200 - 0.0008

And verified the TFTP files took a while to download and weren't corrupt:

9f281e85900e73e6fe9b2422ff6ef1d2  /var/lib/tftpboot/boot/CentOS-6.5-x86_64-initrd.img
206748238490c0e50a88bc053d3d5f87  /var/lib/tftpboot/boot/CentOS-6.5-x86_64-vmlinuz

Comment 10 Kedar Bidarkar 2014-09-02 10:16:58 UTC
Tested with sat6-GA-snap7.

QE VERIFIED

I was able to successfully PXEboot a Centos using the above initrd.img and vmlinuz TFTP files. which tells vmlinuz/initrd files are no longer corrupted when mutiple hosts are created simultaneously.

NOTE:- As the above test for multiple hosts is already performed, just making sure the TFTP files are not corrupted by PXebooting a host.



Installed Packages

    candlepin-0.9.23-1.el6_5.noarch
    candlepin-common-1.0.1-1.el6_5.noarch
    candlepin-scl-1-5.el6_4.noarch
    candlepin-scl-quartz-2.1.5-5.el6_4.noarch
    candlepin-scl-rhino-1.7R3-1.el6_4.noarch
    candlepin-scl-runtime-1-5.el6_4.noarch
    candlepin-selinux-0.9.23-1.el6_5.noarch
    candlepin-tomcat6-0.9.23-1.el6_5.noarch
    createrepo-0.9.9-21.2.pulp.el6sat.noarch
    elasticsearch-0.90.10-6.el6sat.noarch
    katello-1.5.0-30.el6sat.noarch
    katello-certs-tools-1.5.6-1.el6sat.noarch
    katello-default-ca-1.0-1.noarch
    katello-installer-0.0.64-1.el6sat.noarch
    katello-server-ca-1.0-1.noarch
    mod_wsgi-3.4-1.pulp.el6sat.x86_64
    pulp-katello-0.3-4.el6sat.noarch
    pulp-nodes-common-2.4.1-0.5.rc1.el6sat.noarch
    pulp-nodes-parent-2.4.1-0.5.rc1.el6sat.noarch
    pulp-puppet-plugins-2.4.1-0.5.rc1.el6sat.noarch
    pulp-puppet-tools-2.4.1-0.5.rc1.el6sat.noarch
    pulp-rpm-plugins-2.4.1-0.6.beta.el6sat.noarch
    pulp-selinux-2.4.1-0.5.rc1.el6sat.noarch
    pulp-server-2.4.1-0.5.rc1.el6sat.noarch
    python-gofer-qpid-1.3.0-1.el6sat.noarch
    python-isodate-0.5.0-1.pulp.el6sat.noarch
    python-kombu-3.0.15-12.pulp.el6sat.noarch
    python-pulp-bindings-2.4.1-0.5.rc1.el6sat.noarch
    python-pulp-common-2.4.1-0.5.rc1.el6sat.noarch
    python-pulp-puppet-common-2.4.1-0.5.rc1.el6sat.noarch
    python-pulp-rpm-common-2.4.1-0.6.beta.el6sat.noarch
    python-qpid-0.22-14.el6sat.noarch
    python-qpid-qmf-0.22-37.el6.x86_64
    qpid-cpp-client-0.22-42.el6.x86_64
    qpid-cpp-server-0.22-42.el6.x86_64
    qpid-cpp-server-linearstore-0.22-42.el6.x86_64
    qpid-java-client-0.22-6.el6.noarch
    qpid-java-common-0.22-6.el6.noarch
    qpid-proton-c-0.7-1.el6.x86_64
    qpid-qmf-0.22-37.el6.x86_64
    qpid-tools-0.22-12.el6.noarch
    ruby193-rubygem-katello-1.5.0-86.el6sat.noarch
    rubygem-hammer_cli_katello-0.0.4-14.el6sat.noarch
    rubygem-smart_proxy_pulp-1.0.1-1.1.el6sat.noarch

Comment 11 Bryan Kearney 2014-09-11 12:30:14 UTC
This was delivered with Satellite 6.0 which was released on 10 September 2014.