Bug 1411655 - "Unrecognized message received" message in the UI
Summary: "Unrecognized message received" message in the UI
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: ovirt-engine
Classification: oVirt
Component: Backend.Core
Version: 4.1.0
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ovirt-4.1.1
: 4.1.1
Assignee: Piotr Kliczewski
QA Contact: Israel Pinto
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-01-10 08:59 UTC by Nelly Credi
Modified: 2017-04-21 09:47 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-04-21 09:47:00 UTC
oVirt Team: Infra
Embargoed:
rule-engine: ovirt-4.1+


Attachments (Terms of Use)
log collector (18.00 MB, application/x-xz)
2017-01-10 15:30 UTC, Nelly Credi
no flags Details
log collector (15.49 MB, application/octet-stream)
2017-01-10 15:33 UTC, Nelly Credi
no flags Details

Description Nelly Credi 2017-01-10 08:59:42 UTC
Description of problem:
after add host which failed to configure management network,
we see every 20sec this message:
"Unrecognized message received" message in the UI 

Version-Release number of selected component (if applicable):
ovirt-engine-4.1.0-0.4.master.20170109222652.git53fd6cb.el7.centos.noarch

How reproducible:


Steps to Reproduce:
1. add host
2. add hosts failed on configure mgmt network (no ovirtmgmt)
3. there are a few relevant errors in the 'events' tab
4. "Unrecognized message received" message is printed every ~20sec 

Actual results:
"Unrecognized message received" message

Expected results:
I want to see the relevant issue

Additional info:

	
Jan 10, 2017 10:27:39 AM
	
VDSM host_mixed_3 command failed: Unrecognized message received
	
	
oVirt
	
	
	
Jan 10, 2017 10:27:16 AM
	
VDSM host_mixed_3 command failed: Unrecognized message received
	
	
oVirt
	
	
	
Jan 10, 2017 10:26:53 AM
	
VDSM host_mixed_3 command failed: Unrecognized message received
	
	
oVirt
	
	
	
Jan 10, 2017 10:26:30 AM
	
VDSM host_mixed_3 command failed: Unrecognized message received
	
	
oVirt
	
	
	
Jan 10, 2017 10:26:07 AM
	
VDSM host_mixed_3 command failed: Unrecognized message received
	
	
oVirt
	
	
	
Jan 10, 2017 10:25:44 AM
	
Host host_mixed_3 is non responsive.
	
	
oVirt
	
	
	
Jan 10, 2017 10:25:43 AM
	
VDSM host_mixed_3 command failed: Unrecognized message received
	
	
oVirt
	
	
	
Jan 10, 2017 10:25:21 AM
	
Host host_mixed_3 was autorecovered.
	
2c2fe369
	
oVirt
	
	
	
Jan 10, 2017 10:25:20 AM
	
VDSM host_mixed_3 command failed: Unrecognized message received
	
	
oVirt
	
	
	
Jan 10, 2017 10:24:49 AM
	
Host host_mixed_3 installation failed. Failed to configure management network on the host.
	
1fdd96cb
	
oVirt
	
	
	
Jan 10, 2017 10:24:48 AM
	
Host host_mixed_3 is not responding. Host cannot be fenced automatically because power management for the host is disabled.
	
	
oVirt
	
	
	
Jan 10, 2017 10:24:48 AM
	
Failed to configure management network on host host_mixed_3 due to setup networks failure.
	
	
oVirt
	
	
	
Jan 10, 2017 10:24:48 AM
	
VDSM host_mixed_3 command failed: Heartbeat exceeded

Comment 1 Yaniv Kaul 2017-01-10 13:01:03 UTC
Please attach relevant logs.

Comment 2 Piotr Kliczewski 2017-01-10 13:06:56 UTC
Nelly was this found in environment with recovery issue?

Comment 3 Piotr Kliczewski 2017-01-10 13:10:17 UTC
If this is the case here is the cause of the failure

2017-01-10 10:24:04,832+02 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.PollVDSCommand] (org.ovirt.thread.pool-6-thread-1) [39da46a2] Exception: org.ovirt.engine.core.vdsbroker.vdsbroker.VDSRecoveringException: Recovering from crash or Initializing

The question is why newly added host triggered recovery flow. It seems that we may want to handle it in NetworkConfigurer class.

Comment 4 Nelly Credi 2017-01-10 13:23:00 UTC
yes same env

Comment 5 Yaniv Kaul 2017-01-10 13:28:28 UTC
(In reply to Nelly Credi from comment #4)
> yes same env

So either a dup, or we still need the logs, please.

Comment 6 Piotr Kliczewski 2017-01-10 13:37:50 UTC
I think we need to update the BZ title to reflect what it is about. I see 2 issues:
- why the host entered recovery mode when started on clean (was it really clean?) vm
- NetworkConfigurer seems not to be able to handle recovery mode properly. I wonder why initial getCaps was OK (at least I haven't seen recovery mode in the logs).

Comment 7 Nelly Credi 2017-01-10 15:10:24 UTC
i think this bug is a different bug, unrelated. it was caused by the issus you are talking about, but its a general bad logging problem imo

Comment 8 Nelly Credi 2017-01-10 15:30:34 UTC
Created attachment 1239143 [details]
log collector

Comment 9 Nelly Credi 2017-01-10 15:33:00 UTC
Created attachment 1239144 [details]
log collector

Comment 10 Piotr Kliczewski 2017-01-10 15:53:43 UTC
The audit entries in my opinion are just side effect of the recovery issue.

BTW I am not able to extract the logs from both attachments.

Comment 11 Nelly Credi 2017-01-11 07:23:55 UTC
its a split
because the file was too big to upload as 1 file

Comment 12 Piotr Kliczewski 2017-01-11 08:17:46 UTC
I saw the logs in the env so my 2 questions from comment #6 still apply. Dan do you know?

Comment 13 Dan Kenigsberg 2017-01-11 08:25:09 UTC
(In reply to Piotr Kliczewski from comment #6)
> I think we need to update the BZ title to reflect what it is about. I see 2
> issues:
> - why the host entered recovery mode when started on clean (was it really
> clean?) vm

Whenever Vdsm starts, it starts in "recovery" mode. It would stay there for a longer while if there are running VMs that it needs to recover, but due to random fluctuation it could be longer anyway.

> - NetworkConfigurer seems not to be able to handle recovery mode properly. I
> wonder why initial getCaps was OK (at least I haven't seen recovery mode in
> the logs).

Could it be that Vdsm crashed after a first successful getCaps? (NetworkConfigurer should not attempt to handle this, of course)

Comment 14 Piotr Kliczewski 2017-01-11 08:41:16 UTC
(In reply to Dan Kenigsberg from comment #13)
> (In reply to Piotr Kliczewski from comment #6)
> > I think we need to update the BZ title to reflect what it is about. I see 2
> > issues:
> > - why the host entered recovery mode when started on clean (was it really
> > clean?) vm
> 
> Whenever Vdsm starts, it starts in "recovery" mode. It would stay there for
> a longer while if there are running VMs that it needs to recover, but due to
> random fluctuation it could be longer anyway.

I was told it was freshly created vm and I wonder why we see this in the log.

> 
> > - NetworkConfigurer seems not to be able to handle recovery mode properly. I
> > wonder why initial getCaps was OK (at least I haven't seen recovery mode in
> > the logs).
> 
> Could it be that Vdsm crashed after a first successful getCaps?
> (NetworkConfigurer should not attempt to handle this, of course)

Logs do not confirm it.

Comment 15 Yaniv Kaul 2017-01-11 09:38:28 UTC
(In reply to Nelly Credi from comment #11)
> its a split
> because the file was too big to upload as 1 file

Sane logs that are relevant are welcome. You expect us to join them? Save them in Google drive is an option as well.

Comment 16 Nelly Credi 2017-01-11 13:05:23 UTC
well i dont get why the limit is 20m per file
i need to find creative ways to upload logs each time
its annoying :-/

https://drive.google.com/open?id=0BzKxtECDfsbIMFctTGtDMUJCa0E

Comment 17 Martin Perina 2017-01-18 08:57:53 UTC
Missed RC build, moving to 4.1.1

Comment 18 Piotr Kliczewski 2017-02-13 14:59:56 UTC
This BZ was fixed together with jsonrpc 1.3.8 for 4.1.

Comment 19 Israel Pinto 2017-03-01 07:27:57 UTC
Verify with:
Engine: Red Hat Virtualization Manager Version: 4.1.1.2-0.1.el7
Host:
OS Version:RHEL - 7.3 - 7.el7
Kernel Version:3.10.0 - 550.el7.x86_64
KVM Version:2.6.0 - 28.el7_3.3.1
LIBVIRT Version:libvirt-2.0.0-10.el7_3.5
VDSM Version:vdsm-4.19.6-1.el7ev
SPICE Version:0.12.4 - 20.el7_3

Steps:
1. Add new host to cluster
2. Watch logs and event tab

Results:
Host added and not error messages in the logs


Note You need to log in before you can comment on or make changes to this bug.