1411655 – "Unrecognized message received" message in the UI

Bug 1411655 - "Unrecognized message received" message in the UI

Summary: "Unrecognized message received" message in the UI

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	ovirt-engine
Classification:	oVirt
Component:	Backend.Core
Sub Component:
Version:	4.1.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	medium
Target Milestone:	ovirt-4.1.1
Target Release:	4.1.1
Assignee:	Piotr Kliczewski
QA Contact:	Israel Pinto
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2017-01-10 08:59 UTC by Nelly Credi
Modified:	2017-04-21 09:47 UTC (History)
CC List:	6 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2017-04-21 09:47:00 UTC
oVirt Team:	Infra
Embargoed:
Dependent Products:
Flags:	rule-engine: ovirt-4.1+

Attachments	(Terms of Use)
log collector (18.00 MB, application/x-xz) 2017-01-10 15:30 UTC, Nelly Credi	no flags	Details
log collector (15.49 MB, application/octet-stream) 2017-01-10 15:33 UTC, Nelly Credi	no flags	Details
View All

Description Nelly Credi 2017-01-10 08:59:42 UTC

Description of problem:
after add host which failed to configure management network,
we see every 20sec this message:
"Unrecognized message received" message in the UI

Version-Release number of selected component (if applicable):
ovirt-engine-4.1.0-0.4.master.20170109222652.git53fd6cb.el7.centos.noarch

How reproducible:

Steps to Reproduce:
1. add host
2. add hosts failed on configure mgmt network (no ovirtmgmt)
3. there are a few relevant errors in the 'events' tab
4. "Unrecognized message received" message is printed every ~20sec

Actual results:
"Unrecognized message received" message

Expected results:
I want to see the relevant issue

Additional info:

Jan 10, 2017 10:27:39 AM

VDSM host_mixed_3 command failed: Unrecognized message received

oVirt

Jan 10, 2017 10:27:16 AM

VDSM host_mixed_3 command failed: Unrecognized message received

oVirt

Jan 10, 2017 10:26:53 AM

VDSM host_mixed_3 command failed: Unrecognized message received

oVirt

Jan 10, 2017 10:26:30 AM

VDSM host_mixed_3 command failed: Unrecognized message received

oVirt

Jan 10, 2017 10:26:07 AM

VDSM host_mixed_3 command failed: Unrecognized message received

oVirt

Jan 10, 2017 10:25:44 AM

Host host_mixed_3 is non responsive.

oVirt

Jan 10, 2017 10:25:43 AM

VDSM host_mixed_3 command failed: Unrecognized message received

oVirt

Jan 10, 2017 10:25:21 AM

Host host_mixed_3 was autorecovered.

2c2fe369

oVirt

Jan 10, 2017 10:25:20 AM

VDSM host_mixed_3 command failed: Unrecognized message received

oVirt

Jan 10, 2017 10:24:49 AM

Host host_mixed_3 installation failed. Failed to configure management network on the host.

1fdd96cb

oVirt

Jan 10, 2017 10:24:48 AM

Host host_mixed_3 is not responding. Host cannot be fenced automatically because power management for the host is disabled.

oVirt

Jan 10, 2017 10:24:48 AM

Failed to configure management network on host host_mixed_3 due to setup networks failure.

oVirt

Jan 10, 2017 10:24:48 AM

VDSM host_mixed_3 command failed: Heartbeat exceeded

Comment 1 Yaniv Kaul 2017-01-10 13:01:03 UTC

Please attach relevant logs.

Comment 2 Piotr Kliczewski 2017-01-10 13:06:56 UTC

Nelly was this found in environment with recovery issue?

Comment 3 Piotr Kliczewski 2017-01-10 13:10:17 UTC

If this is the case here is the cause of the failure

2017-01-10 10:24:04,832+02 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.PollVDSCommand] (org.ovirt.thread.pool-6-thread-1) [39da46a2] Exception: org.ovirt.engine.core.vdsbroker.vdsbroker.VDSRecoveringException: Recovering from crash or Initializing

The question is why newly added host triggered recovery flow. It seems that we may want to handle it in NetworkConfigurer class.

Comment 4 Nelly Credi 2017-01-10 13:23:00 UTC

yes same env

Comment 5 Yaniv Kaul 2017-01-10 13:28:28 UTC

(In reply to Nelly Credi from comment #4)
> yes same env

So either a dup, or we still need the logs, please.

Comment 6 Piotr Kliczewski 2017-01-10 13:37:50 UTC

I think we need to update the BZ title to reflect what it is about. I see 2 issues:
- why the host entered recovery mode when started on clean (was it really clean?) vm
- NetworkConfigurer seems not to be able to handle recovery mode properly. I wonder why initial getCaps was OK (at least I haven't seen recovery mode in the logs).

Comment 7 Nelly Credi 2017-01-10 15:10:24 UTC

i think this bug is a different bug, unrelated. it was caused by the issus you are talking about, but its a general bad logging problem imo

Comment 8 Nelly Credi 2017-01-10 15:30:34 UTC

Created attachment 1239143 [details]
log collector

Comment 9 Nelly Credi 2017-01-10 15:33:00 UTC

Created attachment 1239144 [details]
log collector

Comment 10 Piotr Kliczewski 2017-01-10 15:53:43 UTC

The audit entries in my opinion are just side effect of the recovery issue.

BTW I am not able to extract the logs from both attachments.

Comment 11 Nelly Credi 2017-01-11 07:23:55 UTC

its a split
because the file was too big to upload as 1 file

Comment 12 Piotr Kliczewski 2017-01-11 08:17:46 UTC

I saw the logs in the env so my 2 questions from comment #6 still apply. Dan do you know?

Comment 13 Dan Kenigsberg 2017-01-11 08:25:09 UTC

(In reply to Piotr Kliczewski from comment #6)
> I think we need to update the BZ title to reflect what it is about. I see 2
> issues:
> - why the host entered recovery mode when started on clean (was it really
> clean?) vm

Whenever Vdsm starts, it starts in "recovery" mode. It would stay there for a longer while if there are running VMs that it needs to recover, but due to random fluctuation it could be longer anyway.

> - NetworkConfigurer seems not to be able to handle recovery mode properly. I
> wonder why initial getCaps was OK (at least I haven't seen recovery mode in
> the logs).

Could it be that Vdsm crashed after a first successful getCaps? (NetworkConfigurer should not attempt to handle this, of course)

Comment 14 Piotr Kliczewski 2017-01-11 08:41:16 UTC

(In reply to Dan Kenigsberg from comment #13)
> (In reply to Piotr Kliczewski from comment #6)
> > I think we need to update the BZ title to reflect what it is about. I see 2
> > issues:
> > - why the host entered recovery mode when started on clean (was it really
> > clean?) vm
> 
> Whenever Vdsm starts, it starts in "recovery" mode. It would stay there for
> a longer while if there are running VMs that it needs to recover, but due to
> random fluctuation it could be longer anyway.

I was told it was freshly created vm and I wonder why we see this in the log.

> 
> > - NetworkConfigurer seems not to be able to handle recovery mode properly. I
> > wonder why initial getCaps was OK (at least I haven't seen recovery mode in
> > the logs).
> 
> Could it be that Vdsm crashed after a first successful getCaps?
> (NetworkConfigurer should not attempt to handle this, of course)

Logs do not confirm it.

Comment 15 Yaniv Kaul 2017-01-11 09:38:28 UTC

(In reply to Nelly Credi from comment #11)
> its a split
> because the file was too big to upload as 1 file

Sane logs that are relevant are welcome. You expect us to join them? Save them in Google drive is an option as well.

Comment 16 Nelly Credi 2017-01-11 13:05:23 UTC

well i dont get why the limit is 20m per file
i need to find creative ways to upload logs each time
its annoying :-/

https://drive.google.com/open?id=0BzKxtECDfsbIMFctTGtDMUJCa0E

Comment 17 Martin Perina 2017-01-18 08:57:53 UTC

Missed RC build, moving to 4.1.1

Comment 18 Piotr Kliczewski 2017-02-13 14:59:56 UTC

This BZ was fixed together with jsonrpc 1.3.8 for 4.1.

Comment 19 Israel Pinto 2017-03-01 07:27:57 UTC

Verify with:
Engine: Red Hat Virtualization Manager Version: 4.1.1.2-0.1.el7
Host:
OS Version:RHEL - 7.3 - 7.el7
Kernel Version:3.10.0 - 550.el7.x86_64
KVM Version:2.6.0 - 28.el7_3.3.1
LIBVIRT Version:libvirt-2.0.0-10.el7_3.5
VDSM Version:vdsm-4.19.6-1.el7ev
SPICE Version:0.12.4 - 20.el7_3

Steps:
1. Add new host to cluster
2. Watch logs and event tab

Results:
Host added and not error messages in the logs

Note You need to log in before you can comment on or make changes to this bug.