Created attachment 1217448 [details] VMWARE EMs Refresh worker exhibiging issue Description of problem: EMS refresh failes with error: ===== [----] E, [2016-10-27T23:22:52.767614 #18397:e5798c] ERROR -- : MIQ(ManageIQ::Providers::Vmware::InfraManager::Refresher#refresh) EMS: [Sacramento WLS VCenter 1], id: [50000000000004] Refresh failed [----] E, [2016-10-27T23:22:52.768265 #18397:e5798c] ERROR -- : [TypeError]: incompatible marshal file format (can't be read) format version 4.8 required; 58.12 given Method:[rescue in block in refresh] ====== Version-Release number of selected component (if applicable):5.6.1.2 How reproducible: unknown Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
Created attachment 1217450 [details] most recent incident of error
This looks like it could be a duplicate of https://bugzilla.redhat.com/show_bug.cgi?id=1385038 It is interesting that both of the errors you attached were hit after a successful get_vc_data, when connecting back to the broker to get get_vc_data_host_scsi or get_vc_data_ems_customization_specs. Not sure what this means yet, I'll look through the full logs and see if I can find anything.
This might be related to the workers being killed and having their broker sessions cleaned up. The MiqScheduleWorker is killed: [2016-11-03T07:20:41.438335 #18428:e5798c] WARN -- : MIQ(MiqScheduleWorker#kill) Worker ID [50000001663114] PID [24231] GUID [3cb6663e-a1cf-11e6-b77e-005056827e39] has been killed There is a DRb failure when cleaning up broker connections: [2016-11-03T07:21:47.103457 #18428:e5798c] INFO -- : MIQ(MiqVimBrokerWorker.cleanup_for_pid) Releasing any broker connections for pid: [24231], ERROR: too large packet 67651907 Then at essentially the same time we hit the corrupt message error: [2016-11-03T07:21:47.118597 #22879:e5798c] ERROR -- : MIQ(ManageIQ::Providers::Vmware::InfraManager::Refresher#refresh) EMS: [Sacramento WLS VCenter 1], id: [50000000000004] Refresh failed [2016-11-03T07:21:47.120601 #22879:e5798c] ERROR -- : [TypeError]: incompatible marshal file format (can't be read) format version 4.8 required; 58.13 given Method:[rescue in block in refresh]
Created attachment 1220456 [details] Broker Debug Patch With this patch the broker server will log drb message sizes and checksums, and the broker client will print the original message when it hits a marshal error.
Adam, What instructions can you provide as for applying this patch? Tom Hennessy
Tom, scp the file to /var/www/miq/vmdb and un-tar it, then restart evmserverd. Depending on how much information you want level_vim debug will print size and checksum of every outgoing message (will be quite verbose logging), otherwise leaving the vim log level as warn will show the extra information when it hits a marshaling issue.
from customer who has recieved to hotfixes and is testing them: ===== Most recent comment: On 2016-11-17 04:01:54, Trieu, Daniel commented: "Hello RedHat Team, Updating the ticket: The hotfix to address reported heartbeat failure issues associated with pglogical has been pushed to 3 (out of 8) regions. I confirmed within 30 minutes that the heartbeat issue was resolved. Out of caution, the plan is to push the hotfix to another 2 regions today and another 3 regions the day after. To be clear, there is another hotfix on this ticket for marshal errors, which is in test/dev/uat right now and has not been pushed to any production region. Daniel" =====
This is the same issue as https://bugzilla.redhat.com/show_bug.cgi?id=1385038 *** This bug has been marked as a duplicate of bug 1385038 ***