Bug 1411655

Summary:

"Unrecognized message received" message in the UI

Product:

[oVirt] ovirt-engine

Reporter:

Nelly Credi <ncredi>

Component:

Backend.Core

Assignee:

Piotr Kliczewski <pkliczew>

Status:

CLOSED CURRENTRELEASE

QA Contact:

Israel Pinto <ipinto>

Severity:

medium

Docs Contact:

Priority:

unspecified

Version:

4.1.0

CC:

bugs, danken, mburman, mperina, ncredi, oourfali

Target Milestone:

ovirt-4.1.1

Flags:

rule-engine: ovirt-4.1+

Target Release:

4.1.1

Hardware:

Unspecified

OS:

Unspecified

Whiteboard:

Fixed In Version:

Doc Type:

If docs needed, set a value

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2017-04-21 09:47:00 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

Infra

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
log collector	none
log collector	none

Description Nelly Credi 2017-01-10 08:59:42 UTC

Description of problem:
after add host which failed to configure management network,
we see every 20sec this message:
"Unrecognized message received" message in the UI

Version-Release number of selected component (if applicable):
ovirt-engine-4.1.0-0.4.master.20170109222652.git53fd6cb.el7.centos.noarch

How reproducible:

Steps to Reproduce:
1. add host
2. add hosts failed on configure mgmt network (no ovirtmgmt)
3. there are a few relevant errors in the 'events' tab
4. "Unrecognized message received" message is printed every ~20sec

Actual results:
"Unrecognized message received" message

Expected results:
I want to see the relevant issue

Additional info:

Jan 10, 2017 10:27:39 AM

VDSM host_mixed_3 command failed: Unrecognized message received

oVirt

Jan 10, 2017 10:27:16 AM

VDSM host_mixed_3 command failed: Unrecognized message received

oVirt

Jan 10, 2017 10:26:53 AM

VDSM host_mixed_3 command failed: Unrecognized message received

oVirt

Jan 10, 2017 10:26:30 AM

VDSM host_mixed_3 command failed: Unrecognized message received

oVirt

Jan 10, 2017 10:26:07 AM

VDSM host_mixed_3 command failed: Unrecognized message received

oVirt

Jan 10, 2017 10:25:44 AM

Host host_mixed_3 is non responsive.

oVirt

Jan 10, 2017 10:25:43 AM

VDSM host_mixed_3 command failed: Unrecognized message received

oVirt

Jan 10, 2017 10:25:21 AM

Host host_mixed_3 was autorecovered.

2c2fe369

oVirt

Jan 10, 2017 10:25:20 AM

VDSM host_mixed_3 command failed: Unrecognized message received

oVirt

Jan 10, 2017 10:24:49 AM

Host host_mixed_3 installation failed. Failed to configure management network on the host.

1fdd96cb

oVirt

Jan 10, 2017 10:24:48 AM

Host host_mixed_3 is not responding. Host cannot be fenced automatically because power management for the host is disabled.

oVirt

Jan 10, 2017 10:24:48 AM

Failed to configure management network on host host_mixed_3 due to setup networks failure.

oVirt

Jan 10, 2017 10:24:48 AM

VDSM host_mixed_3 command failed: Heartbeat exceeded

Comment 1 Yaniv Kaul 2017-01-10 13:01:03 UTC

Please attach relevant logs.

Comment 2 Piotr Kliczewski 2017-01-10 13:06:56 UTC

Nelly was this found in environment with recovery issue?

Comment 3 Piotr Kliczewski 2017-01-10 13:10:17 UTC

If this is the case here is the cause of the failure

2017-01-10 10:24:04,832+02 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.PollVDSCommand] (org.ovirt.thread.pool-6-thread-1) [39da46a2] Exception: org.ovirt.engine.core.vdsbroker.vdsbroker.VDSRecoveringException: Recovering from crash or Initializing

The question is why newly added host triggered recovery flow. It seems that we may want to handle it in NetworkConfigurer class.

Comment 4 Nelly Credi 2017-01-10 13:23:00 UTC

yes same env

Comment 5 Yaniv Kaul 2017-01-10 13:28:28 UTC

(In reply to Nelly Credi from comment #4)
> yes same env

So either a dup, or we still need the logs, please.

Comment 6 Piotr Kliczewski 2017-01-10 13:37:50 UTC

I think we need to update the BZ title to reflect what it is about. I see 2 issues:
- why the host entered recovery mode when started on clean (was it really clean?) vm
- NetworkConfigurer seems not to be able to handle recovery mode properly. I wonder why initial getCaps was OK (at least I haven't seen recovery mode in the logs).

Comment 7 Nelly Credi 2017-01-10 15:10:24 UTC

i think this bug is a different bug, unrelated. it was caused by the issus you are talking about, but its a general bad logging problem imo

Comment 8 Nelly Credi 2017-01-10 15:30:34 UTC

Created attachment 1239143 [details]
log collector

Comment 9 Nelly Credi 2017-01-10 15:33:00 UTC

Created attachment 1239144 [details]
log collector

Comment 10 Piotr Kliczewski 2017-01-10 15:53:43 UTC

The audit entries in my opinion are just side effect of the recovery issue.

BTW I am not able to extract the logs from both attachments.

Comment 11 Nelly Credi 2017-01-11 07:23:55 UTC

its a split
because the file was too big to upload as 1 file

Comment 12 Piotr Kliczewski 2017-01-11 08:17:46 UTC

I saw the logs in the env so my 2 questions from comment #6 still apply. Dan do you know?

Comment 13 Dan Kenigsberg 2017-01-11 08:25:09 UTC

(In reply to Piotr Kliczewski from comment #6)
> I think we need to update the BZ title to reflect what it is about. I see 2
> issues:
> - why the host entered recovery mode when started on clean (was it really
> clean?) vm

Whenever Vdsm starts, it starts in "recovery" mode. It would stay there for a longer while if there are running VMs that it needs to recover, but due to random fluctuation it could be longer anyway.

> - NetworkConfigurer seems not to be able to handle recovery mode properly. I
> wonder why initial getCaps was OK (at least I haven't seen recovery mode in
> the logs).

Could it be that Vdsm crashed after a first successful getCaps? (NetworkConfigurer should not attempt to handle this, of course)

Comment 14 Piotr Kliczewski 2017-01-11 08:41:16 UTC

(In reply to Dan Kenigsberg from comment #13)
> (In reply to Piotr Kliczewski from comment #6)
> > I think we need to update the BZ title to reflect what it is about. I see 2
> > issues:
> > - why the host entered recovery mode when started on clean (was it really
> > clean?) vm
> 
> Whenever Vdsm starts, it starts in "recovery" mode. It would stay there for
> a longer while if there are running VMs that it needs to recover, but due to
> random fluctuation it could be longer anyway.

I was told it was freshly created vm and I wonder why we see this in the log.

> 
> > - NetworkConfigurer seems not to be able to handle recovery mode properly. I
> > wonder why initial getCaps was OK (at least I haven't seen recovery mode in
> > the logs).
> 
> Could it be that Vdsm crashed after a first successful getCaps?
> (NetworkConfigurer should not attempt to handle this, of course)

Logs do not confirm it.

Comment 15 Yaniv Kaul 2017-01-11 09:38:28 UTC

(In reply to Nelly Credi from comment #11)
> its a split
> because the file was too big to upload as 1 file

Sane logs that are relevant are welcome. You expect us to join them? Save them in Google drive is an option as well.

Comment 16 Nelly Credi 2017-01-11 13:05:23 UTC

well i dont get why the limit is 20m per file
i need to find creative ways to upload logs each time
its annoying :-/

https://drive.google.com/open?id=0BzKxtECDfsbIMFctTGtDMUJCa0E

Comment 17 Martin Perina 2017-01-18 08:57:53 UTC

Missed RC build, moving to 4.1.1

Comment 18 Piotr Kliczewski 2017-02-13 14:59:56 UTC

This BZ was fixed together with jsonrpc 1.3.8 for 4.1.

Comment 19 Israel Pinto 2017-03-01 07:27:57 UTC

Verify with:
Engine: Red Hat Virtualization Manager Version: 4.1.1.2-0.1.el7
Host:
OS Version:RHEL - 7.3 - 7.el7
Kernel Version:3.10.0 - 550.el7.x86_64
KVM Version:2.6.0 - 28.el7_3.3.1
LIBVIRT Version:libvirt-2.0.0-10.el7_3.5
VDSM Version:vdsm-4.19.6-1.el7ev
SPICE Version:0.12.4 - 20.el7_3

Steps:
1. Add new host to cluster
2. Watch logs and event tab

Results:
Host added and not error messages in the logs