Description of problem: after add host which failed to configure management network, we see every 20sec this message: "Unrecognized message received" message in the UI Version-Release number of selected component (if applicable): ovirt-engine-4.1.0-0.4.master.20170109222652.git53fd6cb.el7.centos.noarch How reproducible: Steps to Reproduce: 1. add host 2. add hosts failed on configure mgmt network (no ovirtmgmt) 3. there are a few relevant errors in the 'events' tab 4. "Unrecognized message received" message is printed every ~20sec Actual results: "Unrecognized message received" message Expected results: I want to see the relevant issue Additional info: Jan 10, 2017 10:27:39 AM VDSM host_mixed_3 command failed: Unrecognized message received oVirt Jan 10, 2017 10:27:16 AM VDSM host_mixed_3 command failed: Unrecognized message received oVirt Jan 10, 2017 10:26:53 AM VDSM host_mixed_3 command failed: Unrecognized message received oVirt Jan 10, 2017 10:26:30 AM VDSM host_mixed_3 command failed: Unrecognized message received oVirt Jan 10, 2017 10:26:07 AM VDSM host_mixed_3 command failed: Unrecognized message received oVirt Jan 10, 2017 10:25:44 AM Host host_mixed_3 is non responsive. oVirt Jan 10, 2017 10:25:43 AM VDSM host_mixed_3 command failed: Unrecognized message received oVirt Jan 10, 2017 10:25:21 AM Host host_mixed_3 was autorecovered. 2c2fe369 oVirt Jan 10, 2017 10:25:20 AM VDSM host_mixed_3 command failed: Unrecognized message received oVirt Jan 10, 2017 10:24:49 AM Host host_mixed_3 installation failed. Failed to configure management network on the host. 1fdd96cb oVirt Jan 10, 2017 10:24:48 AM Host host_mixed_3 is not responding. Host cannot be fenced automatically because power management for the host is disabled. oVirt Jan 10, 2017 10:24:48 AM Failed to configure management network on host host_mixed_3 due to setup networks failure. oVirt Jan 10, 2017 10:24:48 AM VDSM host_mixed_3 command failed: Heartbeat exceeded
Please attach relevant logs.
Nelly was this found in environment with recovery issue?
If this is the case here is the cause of the failure 2017-01-10 10:24:04,832+02 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.PollVDSCommand] (org.ovirt.thread.pool-6-thread-1) [39da46a2] Exception: org.ovirt.engine.core.vdsbroker.vdsbroker.VDSRecoveringException: Recovering from crash or Initializing The question is why newly added host triggered recovery flow. It seems that we may want to handle it in NetworkConfigurer class.
yes same env
(In reply to Nelly Credi from comment #4) > yes same env So either a dup, or we still need the logs, please.
I think we need to update the BZ title to reflect what it is about. I see 2 issues: - why the host entered recovery mode when started on clean (was it really clean?) vm - NetworkConfigurer seems not to be able to handle recovery mode properly. I wonder why initial getCaps was OK (at least I haven't seen recovery mode in the logs).
i think this bug is a different bug, unrelated. it was caused by the issus you are talking about, but its a general bad logging problem imo
Created attachment 1239143 [details] log collector
Created attachment 1239144 [details] log collector
The audit entries in my opinion are just side effect of the recovery issue. BTW I am not able to extract the logs from both attachments.
its a split because the file was too big to upload as 1 file
I saw the logs in the env so my 2 questions from comment #6 still apply. Dan do you know?
(In reply to Piotr Kliczewski from comment #6) > I think we need to update the BZ title to reflect what it is about. I see 2 > issues: > - why the host entered recovery mode when started on clean (was it really > clean?) vm Whenever Vdsm starts, it starts in "recovery" mode. It would stay there for a longer while if there are running VMs that it needs to recover, but due to random fluctuation it could be longer anyway. > - NetworkConfigurer seems not to be able to handle recovery mode properly. I > wonder why initial getCaps was OK (at least I haven't seen recovery mode in > the logs). Could it be that Vdsm crashed after a first successful getCaps? (NetworkConfigurer should not attempt to handle this, of course)
(In reply to Dan Kenigsberg from comment #13) > (In reply to Piotr Kliczewski from comment #6) > > I think we need to update the BZ title to reflect what it is about. I see 2 > > issues: > > - why the host entered recovery mode when started on clean (was it really > > clean?) vm > > Whenever Vdsm starts, it starts in "recovery" mode. It would stay there for > a longer while if there are running VMs that it needs to recover, but due to > random fluctuation it could be longer anyway. I was told it was freshly created vm and I wonder why we see this in the log. > > > - NetworkConfigurer seems not to be able to handle recovery mode properly. I > > wonder why initial getCaps was OK (at least I haven't seen recovery mode in > > the logs). > > Could it be that Vdsm crashed after a first successful getCaps? > (NetworkConfigurer should not attempt to handle this, of course) Logs do not confirm it.
(In reply to Nelly Credi from comment #11) > its a split > because the file was too big to upload as 1 file Sane logs that are relevant are welcome. You expect us to join them? Save them in Google drive is an option as well.
well i dont get why the limit is 20m per file i need to find creative ways to upload logs each time its annoying :-/ https://drive.google.com/open?id=0BzKxtECDfsbIMFctTGtDMUJCa0E
Missed RC build, moving to 4.1.1
This BZ was fixed together with jsonrpc 1.3.8 for 4.1.
Verify with: Engine: Red Hat Virtualization Manager Version: 4.1.1.2-0.1.el7 Host: OS Version:RHEL - 7.3 - 7.el7 Kernel Version:3.10.0 - 550.el7.x86_64 KVM Version:2.6.0 - 28.el7_3.3.1 LIBVIRT Version:libvirt-2.0.0-10.el7_3.5 VDSM Version:vdsm-4.19.6-1.el7ev SPICE Version:0.12.4 - 20.el7_3 Steps: 1. Add new host to cluster 2. Watch logs and event tab Results: Host added and not error messages in the logs