Bug 1318019

Summary: PXE provision booting up with missing pxe boot entry
Product: Red Hat CloudForms Management Engine Reporter: Josh Carter <jocarter>
Component: ProvisioningAssignee: Brandon Dunne <bdunne>
Status: CLOSED ERRATA QA Contact: Shveta <sshveta>
Severity: high Docs Contact:
Priority: unspecified    
Version: 5.5.0CC: bdunne, cpelland, fdewaley, gmccullo, jhardy, jprause, mfeifer, mkanoor, obarenbo, sshveta, tfitzger
Target Milestone: GAKeywords: ZStream
Target Release: 5.6.0   
Hardware: All   
OS: All   
Whiteboard:
Fixed In Version: 5.6.0.0 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 1318436 (view as bug list) Environment:
Last Closed: 2016-06-29 15:42:55 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1318436    
Attachments:
Description Flags
automation.log none

Description Josh Carter 2016-03-15 18:55:56 UTC
Description of problem:

Case Summary

"the pxe boot entry is deleted after 30 seconds, before the vm can even complete its boot (vmware under high load when that happens, not always the same though)

pxe configured from infrastructure > vm

high probablility that there is a problem with the provision checks that run from the fact that the pxe has been ran (30 seconds, and the kickstart can't even run, there is no OS installed on the system, so the system does not have the time to run through the installation of the os before it reaches the wget callback to cloudforms.

cloudforms keeps waiting that the service is awaiting provision an to "retry", which if it actually waits for the kickstart callback, would explain the problem... and then the email error is just a side problem we should not focus on in this case and has no direct tie to what we are looking into.

the issue happens more often when HA is enabled in vmware, but not only then (it also happens when it isn't)

placement is not automatic, and set to go to a host known to not be highly loaded in the current setup. When we try to reproduce the issue it is moved to a highly used environment." 

Version-Release number of selected component (if applicable): 5.5.2


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 5 CFME Bot 2016-03-16 21:10:58 UTC
New commit detected on ManageIQ/manageiq/master:
https://github.com/ManageIQ/manageiq/commit/568b380ff83ed31c80a5f137dba13bbe40fd6bd2

commit 568b380ff83ed31c80a5f137dba13bbe40fd6bd2
Author:     Brandon Dunne <bdunne>
AuthorDate: Tue Mar 15 16:33:23 2016 -0400
Commit:     Brandon Dunne <bdunne>
CommitDate: Wed Mar 16 11:20:46 2016 -0400

    Extract #powered_on_in_provider? in preparation for refactoring
    
    https://bugzilla.redhat.com/show_bug.cgi?id=1318019

 .../providers/redhat/infra_manager/provision/state_machine.rb       | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

Comment 6 CFME Bot 2016-03-16 21:11:02 UTC
New commit detected on ManageIQ/manageiq/master:
https://github.com/ManageIQ/manageiq/commit/5e5e3306bb1ed5ae7e5237b9a180a5d1506c245b

commit 5e5e3306bb1ed5ae7e5237b9a180a5d1506c245b
Author:     Brandon Dunne <bdunne>
AuthorDate: Tue Mar 15 17:21:36 2016 -0400
Commit:     Brandon Dunne <bdunne>
CommitDate: Wed Mar 16 11:20:58 2016 -0400

    Wait for the VM to be powered on in the provider
    
    https://bugzilla.redhat.com/show_bug.cgi?id=1318019

 .../redhat/infra_manager/provision/state_machine.rb         | 13 -------------
 .../vmware/infra_manager/provision/state_machine.rb         |  4 ++++
 .../vmware/infra_manager/provision_via_pxe/state_machine.rb |  5 +----
 app/models/miq_provision/state_machine.rb                   | 13 +++++++++++++
 4 files changed, 18 insertions(+), 17 deletions(-)

Comment 7 CFME Bot 2016-03-16 21:11:07 UTC
New commit detected on ManageIQ/manageiq/master:
https://github.com/ManageIQ/manageiq/commit/cf2266d14fff4cead426edb3fc4a9fb4d15d4cac

commit cf2266d14fff4cead426edb3fc4a9fb4d15d4cac
Author:     Brandon Dunne <bdunne>
AuthorDate: Wed Mar 16 11:20:01 2016 -0400
Commit:     Brandon Dunne <bdunne>
CommitDate: Wed Mar 16 11:21:04 2016 -0400

    Move polling destination power status specs to shared examples
    
    https://bugzilla.redhat.com/show_bug.cgi?id=1318019

 spec/factories/miq_provision_vmware.rb             |  5 ++++
 .../infra_manager/provision/state_machine_spec.rb  | 35 +---------------------
 .../infra_manager/provision/state_machine_spec.rb  | 16 ++++++++++
 .../provision_via_pxe/state_machine_spec.rb        | 16 ++++++++++
 ...ared_examples_for_provisioning_state_machine.rb | 35 ++++++++++++++++++++++
 5 files changed, 73 insertions(+), 34 deletions(-)
 create mode 100644 spec/models/manageiq/providers/vmware/infra_manager/provision/state_machine_spec.rb
 create mode 100644 spec/models/manageiq/providers/vmware/infra_manager/provision_via_pxe/state_machine_spec.rb

Comment 8 CFME Bot 2016-03-16 21:11:11 UTC
New commit detected on ManageIQ/manageiq/master:
https://github.com/ManageIQ/manageiq/commit/1a6747c78e45a12b17f2fe88ab990e05c45e5859

commit 1a6747c78e45a12b17f2fe88ab990e05c45e5859
Author:     Brandon Dunne <bdunne>
AuthorDate: Tue Mar 15 17:22:30 2016 -0400
Commit:     Brandon Dunne <bdunne>
CommitDate: Wed Mar 16 11:21:01 2016 -0400

    Wait for the VM to be powered off in the provider
    
    https://bugzilla.redhat.com/show_bug.cgi?id=1318019

 .../providers/redhat/infra_manager/provision/state_machine.rb  | 10 ----------
 .../providers/vmware/infra_manager/provision/state_machine.rb  |  4 ++++
 app/models/miq_provision/state_machine.rb                      | 10 ++++++++++
 3 files changed, 14 insertions(+), 10 deletions(-)

Comment 11 CFME Bot 2016-03-17 16:01:25 UTC
New commit detected on cfme/5.5.z:
https://code.engineering.redhat.com/gerrit/gitweb?p=cfme.git;a=commitdiff;h=dd89b827fae79e665d0ef4ce5803c9eecedaaa25

commit dd89b827fae79e665d0ef4ce5803c9eecedaaa25
Author:     Brandon Dunne <bdunne>
AuthorDate: Tue Mar 15 17:21:36 2016 -0400
Commit:     Brandon Dunne <bdunne>
CommitDate: Wed Mar 16 17:32:11 2016 -0400

    Wait for the VM to be powered on in the provider
    
    https://bugzilla.redhat.com/show_bug.cgi?id=1318019

 .../redhat/infra_manager/provision/state_machine.rb         | 13 -------------
 .../vmware/infra_manager/provision/state_machine.rb         |  4 ++++
 .../vmware/infra_manager/provision_via_pxe/state_machine.rb |  5 +----
 app/models/miq_provision/state_machine.rb                   | 13 +++++++++++++
 4 files changed, 18 insertions(+), 17 deletions(-)

Comment 12 CFME Bot 2016-03-17 16:01:29 UTC
New commit detected on cfme/5.5.z:
https://code.engineering.redhat.com/gerrit/gitweb?p=cfme.git;a=commitdiff;h=97926ee7a08e40597980be2069da482803469912

commit 97926ee7a08e40597980be2069da482803469912
Author:     Brandon Dunne <bdunne>
AuthorDate: Tue Mar 15 17:22:30 2016 -0400
Commit:     Brandon Dunne <bdunne>
CommitDate: Wed Mar 16 17:32:12 2016 -0400

    Wait for the VM to be powered off in the provider
    
    https://bugzilla.redhat.com/show_bug.cgi?id=1318019

 .../providers/redhat/infra_manager/provision/state_machine.rb  | 10 ----------
 .../providers/vmware/infra_manager/provision/state_machine.rb  |  4 ++++
 app/models/miq_provision/state_machine.rb                      | 10 ++++++++++
 3 files changed, 14 insertions(+), 10 deletions(-)

Comment 13 CFME Bot 2016-03-17 16:01:34 UTC
New commit detected on cfme/5.5.z:
https://code.engineering.redhat.com/gerrit/gitweb?p=cfme.git;a=commitdiff;h=c0fee3b009a9931ccd1451f93ee1d7d58bc59ac6

commit c0fee3b009a9931ccd1451f93ee1d7d58bc59ac6
Author:     Brandon Dunne <bdunne>
AuthorDate: Thu Mar 17 09:48:26 2016 -0400
Commit:     Brandon Dunne <bdunne>
CommitDate: Thu Mar 17 09:48:26 2016 -0400

    Move polling destination power status specs to shared examples
    
    https://bugzilla.redhat.com/show_bug.cgi?id=1318019

 spec/factories/miq_provision_vmware.rb             |  5 +++
 .../infra_manager/provision/state_machine_spec.rb  | 47 ++--------------------
 .../infra_manager/provision/state_machine_spec.rb  | 18 +++++++++
 .../provision_via_pxe/state_machine_spec.rb        | 18 +++++++++
 ...ared_examples_for_provisioning_state_machine.rb | 46 +++++++++++++++++++++
 5 files changed, 90 insertions(+), 44 deletions(-)
 create mode 100644 spec/models/manageiq/providers/vmware/infra_manager/provision/state_machine_spec.rb
 create mode 100644 spec/models/manageiq/providers/vmware/infra_manager/provision_via_pxe/state_machine_spec.rb
 create mode 100644 spec/support/examples_group/shared_examples_for_provisioning_state_machine.rb

Comment 14 CFME Bot 2016-03-17 16:01:38 UTC
New commit detected on cfme/5.5.z:
https://code.engineering.redhat.com/gerrit/gitweb?p=cfme.git;a=commitdiff;h=55924864b5bdc392112b681f21b6867fd6e16907

commit 55924864b5bdc392112b681f21b6867fd6e16907
Author:     Brandon Dunne <bdunne>
AuthorDate: Tue Mar 15 16:33:23 2016 -0400
Commit:     Brandon Dunne <bdunne>
CommitDate: Wed Mar 16 17:32:11 2016 -0400

    Extract #powered_on_in_provider? in preparation for refactoring
    
    https://bugzilla.redhat.com/show_bug.cgi?id=1318019

 .../providers/redhat/infra_manager/provision/state_machine.rb       | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

Comment 17 Shveta 2016-04-17 21:28:22 UTC
Created attachment 1148082 [details]
automation.log

I tried to verify this bug by PXE provisioning but it failed with error :"The operation is not allowed in the current state.')]" 
Attached are automation logs

Comment 18 Brandon Dunne 2016-04-18 14:11:28 UTC
Shveta,

The automation log just tells us that something went wrong.  We would need the evm.log and possibly the provider log to understand what went wrong.

Comment 19 Shveta 2016-04-18 20:33:06 UTC
Tried on another system .
PXE provision succeeded.
Verified in 5.6.0.1-beta2.20160413141124_e25ac0e

Comment 21 errata-xmlrpc 2016-06-29 15:42:55 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2016:1348