Bug 1385038 - Error provisioning VM, incompatible marshal file format
Summary: Error provisioning VM, incompatible marshal file format
Alias: None
Product: Red Hat CloudForms Management Engine
Classification: Red Hat
Component: Provisioning
Version: 5.6.0
Hardware: All
OS: All
Target Milestone: GA
: 5.10.0
Assignee: Adam Grare
QA Contact: Dave Johnson
Whiteboard: vsphere:vm:provision
: 1392047 (view as bug list)
Depends On:
Blocks: 1472481 1481378 1481677 1542735
TreeView+ depends on / blocked
Reported: 2016-10-14 15:28 UTC by myoder
Modified: 2021-06-10 11:36 UTC (History)
16 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1472481 1481378 1481677 1542735 (view as bug list)
Last Closed: 2018-08-01 02:48:44 UTC
Category: ---
Cloudforms Team: CFME Core
Target Upstream Version:
myoder: needinfo+

Attachments (Terms of Use)

Description myoder 2016-10-14 15:28:44 UTC
Description of problem: When provisioning, VM clones successfully from template but it never powers on.  Cloudforms shows the error: Error: incompatible marshal file format (can't be read) format version 4.8 required; 0.0 given

Version-Release number of selected component (if applicable):

How reproducible: Only happens every few provision requests

Steps to Reproduce:

Actual results:

Expected results:

Additional info:

Comment 2 Greg McCullough 2016-10-17 12:46:30 UTC
Please provide additional details, like what provider and provider version is being used and if possible any related logging.

Can you elaborate on the statement "Only happens every few provision requests"?  Is does the same template show the issue randomly or is it random on all templates?

Comment 3 myoder 2016-10-17 21:54:40 UTC
The provider is vCenter 6 update 2.

I have inquired about the statement "Only happens every few provision requests" and will report once I hear back.

Comment 6 Adam Grare 2016-10-20 15:24:39 UTC
There are a log of DRB connection issues that manifest in different ways, I see provision errors, refreshe errors, and metrics collection errors.

A small summary of errors:
ERROR -- : [DRb::DRbConnError]: too large packet 67651907  Method:[rescue in perf_collect_metrics]
ERROR -- : [TypeError]: incompatible marshal file format (can't be read)

[Danbury] [Broker]  is VC, API version: true
Error: [undefined method `queryProviderSummary' for "6.0":VimString]

I don't see any errors from the broker worker in these logs, but could you enable vim debug and upload the vim.log so we can see if the broker is hiccuping while these DRB errors are happening or if it is just DRB?

Another case (https://bugzilla.redhat.com/show_bug.cgi?id=1384113) mentions broker errors likely caused by memory issues, is that something that could be causing this as well?

Comment 7 Leo Khomenko 2016-11-01 15:02:58 UTC
What is the provider reply during this issue? What is the status of the provisioning(clone) task?

Comment 8 Adam Grare 2016-11-17 18:44:21 UTC
Can you have the customer apply the debug hotfix from https://bugzilla.redhat.com/show_bug.cgi?id=1392047 ?

Comment 10 Adam Grare 2016-11-21 14:05:37 UTC
*** Bug 1392047 has been marked as a duplicate of this bug. ***

Comment 11 Adam Grare 2016-12-05 13:25:48 UTC
Hey Michael, can you find out if the customer has hit this again yet?

Comment 15 Sachin 2017-04-27 11:27:47 UTC
Let me know if you want CU to reproduce the error. I can definitely ask CU to do so.

Comment 29 CFME Bot 2017-07-18 19:24:00 UTC
New commit detected on ManageIQ/vmware_web_service/master:

commit b4548c269d09a9b621b411197e9791770cf669e8
Author:     Adam Grare <agrare@redhat.com>
AuthorDate: Tue Jul 18 15:10:58 2017 -0400
Commit:     Adam Grare <agrare@redhat.com>
CommitDate: Tue Jul 18 15:13:34 2017 -0400

    Add marshal version and drb message size checks
    Add diagnostic code to dump out the buffer and object sent from the
    broker if either the DRbMessage size or Marshal version are invalid.

 lib/VMwareWebService/DMiqVimSync.rb | 29 +++++++++++++++++++++++++++--
 1 file changed, 27 insertions(+), 2 deletions(-)

Comment 30 Adam Grare 2017-07-18 19:27:30 UTC
Added addition logging on the server in the case where either the DRbMessage or the Marshal version is invalid.  This will let us catch the error on the server side so that we can dump the buffer and the object which will give us more information as to what happened.

This is still diagnostic code not expecting to fix the issue.

Comment 44 CFME Bot 2017-07-28 16:18:21 UTC
New commit detected on ManageIQ/manageiq-content/master:

commit 4bda33b6359f9564894d8988772d6f02d5f242ff
Author:     mkanoor <mkanoor@redhat.com>
AuthorDate: Fri Jul 28 10:07:38 2017 -0400
Commit:     mkanoor <mkanoor@redhat.com>
CommitDate: Fri Jul 28 10:07:38 2017 -0400

    Log the full error message from the provider
    There was some confusion in this ticket if the error was being
    generated by Automate or if it came from the provider. If we log the
    full error message from the provider it would help debugging.

 .../StateMachines/Methods.class/__methods__/check_provisioned.rb         | 1 +
 .../StateMachines/Methods.class/__methods__/check_provisioned.rb         | 1 +
 2 files changed, 2 insertions(+)

Comment 59 CFME Bot 2017-10-24 13:28:42 UTC
New commit detected on ManageIQ/vmware_web_service/master:

commit 760374d5a76caa2d53ba99ff314004b446eb36fa
Author:     Adam Grare <agrare@redhat.com>
AuthorDate: Thu Oct 12 09:51:37 2017 -0400
Commit:     Adam Grare <agrare@redhat.com>
CommitDate: Mon Oct 23 15:54:16 2017 -0400

    Add client side DrbMessage#load checks
    Add checks for TypeError and packet too large and print out the raw
    buffer from the socket.

 lib/VMwareWebService/MiqVimBroker.rb   | 2 ++
 lib/VMwareWebService/MiqVimDrbDebug.rb | 2 ++
 2 files changed, 4 insertions(+)

Comment 69 CFME Bot 2018-02-06 18:21:58 UTC
New commit detected on ManageIQ/manageiq/master:

commit 271ae4023aa67b5d6a4e57d9decebf14479151a2
Author:     Adam Grare <agrare@redhat.com>
AuthorDate: Mon Feb 5 17:03:01 2018 -0500
Commit:     Adam Grare <agrare@redhat.com>
CommitDate: Tue Feb 6 11:04:17 2018 -0500

    Close open connections from parent after fork
    DRb::DRbConn keeps a global pool of open connections which is shared by
    child processes when they are forked from a parent.  If this parent
    executes a DRb call prior to forking a child process the child picks up
    this open connection and uses it which can cause replies from the server
    to go to the wrong DRb client.
    There is a long standing ruby bug https://bugs.ruby-lang.org/issues/2718
    which describes the issue and has reproducer code attached.
    Fixes https://bugzilla.redhat.com/show_bug.cgi?id=1385038

 app/models/miq_worker.rb | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

Note You need to log in before you can comment on or make changes to this bug.