1117504 – Constant, repeating messages "User <x> is connected to vm <y> in Events.

Bug 1117504 - Constant, repeating messages "User <x> is connected to vm <y> in Events.

Summary: Constant, repeating messages "User <x> is connected to vm <y> in Events.

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Virtualization Manager
Classification:	Red Hat
Component:	ovirt-guest-agent
Sub Component:
Version:	3.4.0
Hardware:	x86_64
OS:	Windows
Priority:	unspecified
Severity:	high
Target Milestone:	ovirt-3.6.3
Target Release:	---
Assignee:	Vinzenz Feenstra [evilissimo]
QA Contact:	Israel Pinto
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	1172039 (view as bug list)
Depends On:
Blocks:	1302319
TreeView+	depends on / blocked

Reported:	2014-07-08 19:54 UTC by Robert McSwain
Modified:	2019-10-10 09:20 UTC (History)
CC List:	23 users (show)
Fixed In Version:	rhev-guest-tools-iso-3.6-5
Doc Type:	Bug Fix
Doc Text:	A logical mistake caused a buffer with random data to be passed as 'user' value to the engine which caused the engine to fail on the data. This has been fixed now - And in case of an error a fallback value will be used to ensure consistency of the data.
Clone Of:
Clones:	1302319 (view as bug list)
Environment:
Last Closed:	2016-03-21 14:00:03 UTC
oVirt Team:	Virt
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
VM sanctuary2 fails (442.61 KB, application/gzip) 2014-07-29 07:49 UTC, Alexandros Gkesos	no flags	Details
postgresql and server logs from 01135008 case (781.10 KB, application/octet-stream) 2014-08-29 14:12 UTC, Alexandros Gkesos	no flags	Details
verbose logging engine.log (294.47 KB, application/x-xz) 2014-10-13 11:42 UTC, Jaison Raju	no flags	Details
upgrade scrit to turn all varchar 255 columns in vm_dynamic to text (489 bytes, text/plain) 2014-10-22 09:20 UTC, Eli Mesika	no flags	Details
script output (57.60 KB, text/plain) 2015-01-07 12:26 UTC, Jaison Raju	no flags	Details
script output (14.30 KB, text/plain) 2015-01-08 15:45 UTC, Jaison Raju	no flags	Details
View All

Links
System	ID	Priority	Status	Summary	Last Updated
oVirt gerrit	51849	master	MERGED	win: Try to revert to previous value on TranslateName failure	2020-03-20 11:52:02 UTC
oVirt gerrit	51857	ovirt-3.6	MERGED	win: Try to revert to previous value on TranslateName failure	2020-03-20 11:52:02 UTC
oVirt gerrit	51858	ovirt-3.5	MERGED	win: Try to revert to previous value on TranslateName failure	2020-03-20 11:52:02 UTC

Description Robert McSwain 2014-07-08 19:54:37 UTC

Description of problem:
Since running an upgrade of RHEV Manager to version 3.4.0-0.22.el6ev, messages are displayed every 30 seconds about "User <x> is connected to vm <y> in the events log for some VMs. Last night it was 1 VM and now it is now up to 2. Rebooting the RHEV Manager server has not served as a fix or workaround.


Version-Release number of selected component (if applicable):
3.4.0-0.22.el6ev, upgraded

How reproducible:
Unknown

Steps to Reproduce:
Presumably stopping a VM, putting it into an export domain, and restarting it. The actual cause is currently unknown.

Actual results:
Constant, repeating messages "User <x> is connected to vm <y> in Events log

Expected results:
No messages unless user is connected, and only the once.

Comment 2 Omer Frenkel 2014-07-09 08:25:03 UTC

this is logged when the clientIp reported from vdsm is changed (different from the value saved in the engine db).
can you please check if this value change in vdsClient where the vm is running?
you can check the vm id in the engine UI, in the vm general sub tab, and use this command on the host it is running on:
watch vdsClient -s 0 getVmStats <vmId> | grep clientIp

Comment 3 acrow 2014-07-09 14:31:54 UTC

Hi,

I've not yet been able to reproduce the problem (since upgrading my hypervisors to the latest version) - but -

In the UI, the same IP address is shown 3 times for quickbooks, separated by spaces.

vmstat shows this:

guestIPs = 192.168.20.53 192.168.20.53 192.168.20.53

And at least one of the VMs has the issue post-export that I cannot import it to another cluster. I've just made the 3rd attempt to export/import it (fresh export) and get this:

Error while executing action:

quickbooks:

    Cannot import VM. The snapshot configuration is corrupted (snapshot ID is empty). Please contact the system administrator.

Now after starting the above vm again (on the original cluster) the UI shows it stuck with a grey "shadowed up-arrow" and "wait for launch". However the VM is up and I can VNC directly to it - but can't get a console.


I have to say I've had a lot of issues since 3.4 whereas all previous upgrades went really well. I have had a separate issue where I could not start a VM (that had been working fine in 3.3) with 2 sockets, 6 cores per socket - got a topology error. I /could/ start it with 6 sockets/2 cps! Will raise that in another support ticket if I can reproduce it.

Thanks

Alex

Comment 4 Martin Betak 2014-07-16 11:52:46 UTC

Hi,

from the log collector I can see that the first problem (of repeated connection messages) was caused by database constraint violation. The database logs point to that the engine was trying to update the current logged in user name (vm_dynamic.guest_cur_user_name) and the username was larger than the maximum 255 characters which sounds really strange for a username. Sadly the logs don't show what exactly was the "username" it was trying to persist. (be it corrupted or mangled data)

This may be a case of database corruption, which is probably causing the other  problems with import/export. 
Could You please attach logs related to the subsequent import/export and run, so we may diagnose the problem further? 

Thanks

Martin

Comment 8 Martin Betak 2014-07-22 12:14:14 UTC

Also could you please show the output of the command 
$ psql -l
(to determine the locale)

Comment 10 Martin Betak 2014-07-28 15:19:56 UTC

The logs provided so far can be distilled to the following repeated SQL error:

2014-07-07 03:21:19,557 ERROR [org.ovirt.engine.core.dal.dbbroker.BatchProcedureExecutionConnectionCallback] (DefaultQuartzScheduler_Worker-65) Can't execute batch. Next exception is: : org.postgresql.util.PSQLException: ERROR: value too long for type character varying(255)
  Where: SQL statement "UPDATE vm_dynamic SET app_list =  $1 ,guest_cur_user_name =  $2 , console_cur_user_name =  $3 , guest_last_login_time =  $4 , guest_last_logout_time =  $5 , console_user_id =  $6 , guest_os =  $7 ,migrating_to_vds =  $8 ,RUN_ON_VDS =  $9 , status =  $10 ,vm_host =  $11 ,vm_ip =  $12 ,vm_fqdn =  $13 , last_start_time =  $14 , last_stop_time =  $15 , vm_pid =  $16 ,display =  $17 ,acpi_enable =  $18 , session =  $19 ,display_ip =  $20 , display_type =  $21 ,kvm_enable =  $22 ,boot_sequence =  $23 , display_secure_port =  $24 , utc_diff =  $25 ,last_vds_run_on =  $26 ,client_ip =  $27 , guest_requested_memory =  $28 , hibernation_vol_handle =  $29 ,exit_status =  $30 , pause_status =  $31 ,exit_message =  $32 , hash= $33 , guest_agent_nics_hash =  $34 , last_watchdog_event =  $35 , last_watchdog_action =  $36 , is_run_once =  $37 , cpu_name =  $38 , current_cd =  $39 , guest_cpu_count =  $40  WHERE vm_guid =  $41 "
PL/pgSQL function "updatevmdynamic" line 2 at SQL statement

which would point to the v_guest_cur_user_name being longer than 255 characters.

Comment 11 acrow 2014-07-28 15:23:54 UTC

(In reply to Martin Betak from comment #10)
> The logs provided so far can be distilled to the following repeated SQL
> error:
> 
> 2014-07-07 03:21:19,557 ERROR
> [org.ovirt.engine.core.dal.dbbroker.
> BatchProcedureExecutionConnectionCallback]
> (DefaultQuartzScheduler_Worker-65) Can't execute batch. Next exception is: :
> org.postgresql.util.PSQLException: ERROR: value too long for type character
> varying(255)
>   Where: SQL statement "UPDATE vm_dynamic SET app_list =  $1
> ,guest_cur_user_name =  $2 , console_cur_user_name =  $3 ,
> guest_last_login_time =  $4 , guest_last_logout_time =  $5 , console_user_id
> =  $6 , guest_os =  $7 ,migrating_to_vds =  $8 ,RUN_ON_VDS =  $9 , status = 
> $10 ,vm_host =  $11 ,vm_ip =  $12 ,vm_fqdn =  $13 , last_start_time =  $14 ,
> last_stop_time =  $15 , vm_pid =  $16 ,display =  $17 ,acpi_enable =  $18 ,
> session =  $19 ,display_ip =  $20 , display_type =  $21 ,kvm_enable =  $22
> ,boot_sequence =  $23 , display_secure_port =  $24 , utc_diff =  $25
> ,last_vds_run_on =  $26 ,client_ip =  $27 , guest_requested_memory =  $28 ,
> hibernation_vol_handle =  $29 ,exit_status =  $30 , pause_status =  $31
> ,exit_message =  $32 , hash= $33 , guest_agent_nics_hash =  $34 ,
> last_watchdog_event =  $35 , last_watchdog_action =  $36 , is_run_once = 
> $37 , cpu_name =  $38 , current_cd =  $39 , guest_cpu_count =  $40  WHERE
> vm_guid =  $41 "
> PL/pgSQL function "updatevmdynamic" line 2 at SQL statement
> 
> which would point to the v_guest_cur_user_name being longer than 255
> characters.

That is extremely strange as all our usernames are 3 characters!

This user was ah1.

Cheers

Alex

Comment 12 Eli Mesika 2014-07-29 07:20:19 UTC

Hi 
If this scenario can be re-produced, please 

1) edit /var/lib/pgsql/data/postgresql.conf
2) set
   log_min_messages="DEBUG5"
   and
   log_error_verbosity="VERBOSE"
3) service postgresql restart
4) reproduce 
5) attach server.log

Comment 13 Alexandros Gkesos 2014-07-29 07:49:42 UTC

Created attachment 922050 [details]
VM sanctuary2 fails

VM sanctuary2 is down. Exit message: Maximum CPUs greater than topology limit.

After these incidents we have only the "user X is connected" logs

Comment 14 Alexandros Gkesos 2014-07-29 07:51:40 UTC

I've attached the requested engine.log from 07/11/2014

No errors after that, but the known logs.

Comment 15 Michal Skrivanek 2014-08-04 14:50:03 UTC

(In reply to Alexandros Gkesos from comment #13)
> Created attachment 922050 [details]
> VM sanctuary2 is down. Exit message: Maximum CPUs greater than topology
> limit.

this is a known bug 1121295; doesn't look relevant to this bug
did you succeed with comment #12?

Comment 16 Alexandros Gkesos 2014-08-06 12:28:34 UTC

(In reply to Michal Skrivanek from comment #15)
> (In reply to Alexandros Gkesos from comment #13)
> > Created attachment 922050 [details]
> > VM sanctuary2 is down. Exit message: Maximum CPUs greater than topology
> > limit.
> 
> this is a known bug 1121295; doesn't look relevant to this bug
> did you succeed with comment #12?

Customer hasn't replied yet. I will update you, as soon as i have an answer.

Comment 17 Michal Skrivanek 2014-08-14 09:06:13 UTC

ok, please clear the needinfo once the customer replies with something useful...

Comment 20 Alexandros Gkesos 2014-08-29 14:12:05 UTC

Created attachment 932693 [details]
postgresql and server logs from 01135008 case

Customer provided the requested logs (attached)

Comment 25 Jaison Raju 2014-10-13 11:42:34 UTC

Created attachment 946341 [details]
verbose logging engine.log

Comment 26 Omer Frenkel 2014-10-20 11:21:06 UTC

Eli, can you please see if the new logs help?

Comment 27 Eli Mesika 2014-10-20 12:10:57 UTC

(In reply to Omer Frenkel from comment #26)
> Eli, can you please see if the new logs help?

Omer
No new info here , the same " value too long for type character" message is reported 

The only thing I think of is 

1) tweak the table to have all varchar(255) as text fields 
2) run the scenario 
3) check which values exceeded the 255 characters limit
4) investigate why this happens 


Is that acceptable ?

Comment 28 Michal Skrivanek 2014-10-21 08:28:54 UTC

seems like the only things to try. Eli, can you please handle the orchestration of trying to alter the DB to accommodate ridiculously long chars, run the scenario and get the data?

flagging infra as it seems to be something weird with postgres

Comment 29 Oved Ourfali 2014-10-22 08:40:31 UTC

Michal - Eli will provide the script, please take it from there.
Flagging as virt for further investigation, with the provided script.

Comment 30 Eli Mesika 2014-10-22 09:20:39 UTC

Created attachment 949307 [details]
upgrade scrit to turn all varchar 255 columns in vm_dynamic to text

Comment 37 Omer Frenkel 2014-11-19 10:02:23 UTC

Eli can you please look at the results?

Comment 39 Eli Mesika 2014-12-01 10:22:09 UTC

(In reply to Omer Frenkel from comment #37)
> Eli can you please look at the results?

From the results in comments 35 I think that my instructions in comment 31 were not fully applied 

The upgrade process drops views and SP before the upgrade takes place in the pre-upgrade step and restore them afterwards 

Can you please provide a step-by-step log of comment 31 instructions aligned with the output of each step ?

Thanks

Comment 40 Jaison Raju 2014-12-02 09:29:43 UTC

(In reply to Eli Mesika from comment #39)
> (In reply to Omer Frenkel from comment #37)
> > Eli can you please look at the results?
> 
> From the results in comments 35 I think that my instructions in comment 31
> were not fully applied 
> 
> The upgrade process drops views and SP before the upgrade takes place in the
> pre-upgrade step and restore them afterwards 
> 
> Can you please provide a step-by-step log of comment 31 instructions aligned
> with the output of each step ?
> 
> Thanks

Hi,

Since while running the script , we didnt perform rest of the commands .
#31 : Customer didnt proceed after step2 .
Should we still proceed with all the steps & confirm the output from step 7 ?

Regards,
Jaison R

Comment 41 Michal Skrivanek 2014-12-02 14:23:37 UTC

moving back to infra as at this point it still seems as some generic issue.

Comment 42 Eli Mesika 2014-12-02 21:02:37 UTC

(In reply to Jaison Raju from comment #40)

> Since while running the script , we didnt perform rest of the commands .
> #31 : Customer didnt proceed after step2 .
> Should we still proceed with all the steps & confirm the output from step 7 ?

there is no run action in step #2 and this is probably your mistake probably 

2) download attachment " upgrade script to turn all varchar 255 columns in
> vm_dynamic to text " to dbscripts/upgrade directory and add a prefix
> numbering such this script will be the last , for example if the last script
> prefix is 03_04_0810 , the attached script should have 03_04_0820 numbering.

This is only a file creation in a certain location and with a certain name , please do not run this with psql , just put the file exactly as defined and follow other steps

Comment 44 Eli Mesika 2014-12-18 09:56:33 UTC

*** Bug 1172039 has been marked as a duplicate of this bug. ***

Comment 48 Jaison Raju 2015-01-07 12:26:44 UTC

Created attachment 977246 [details]
script output

Comment 51 Jaison Raju 2015-01-08 15:45:35 UTC

Created attachment 977832 [details]
script output

Comment 52 Eli Mesika 2015-01-11 08:57:29 UTC

Arik, can you please take a look and see what can cause the guest_cur_user_name value to be corrupted and exceed the 255 character limit

We had extended here the column to be text value and the attached result in comment 51 shows illegal values 

You can clearly see that the values are in a format of a sql error that engine attempts to insert to the database 
(".... ExecQuery(self, strQuery=pythoncom.Missing, strQueryLanguage=u'WQL', ...)

Please advice

Comment 53 Arik 2015-01-11 16:12:15 UTC

(In reply to Eli Mesika from comment #52)
Vinzenz, is it possible that this error comes from the guest agent and somehow sent to the engine as the current user name?

Comment 54 Vinzenz Feenstra [evilissimo] 2015-01-12 09:13:59 UTC

Yes this definitely comes from the guest agent. How this is possible, however is a puzzling me big time right now.
This is not an insert, this is a WQL Query which queries information from windows about the system, which can be user name, hostname (FQDN) etc.

However there's also quite a lot of corrupted data in it. Something seems to be totally corrupted there.

@Jaison
Can you please gather the guest agent log files from at least two of the affected systems?

Thanks.

Comment 55 Vinzenz Feenstra [evilissimo] 2015-01-20 08:33:20 UTC

@Jaison: Is there any update on this available?

Comment 65 Michal Skrivanek 2015-04-27 13:43:50 UTC

no new updates for a long time, closing

Comment 76 Vinzenz Feenstra [evilissimo] 2016-01-14 13:03:24 UTC

Sorry for the delay - I am working on a potential fix - The problem is a bit more complex as it turns out.

Comment 88 Israel Pinto 2016-02-22 09:03:36 UTC

Verify:
RHEVM: 3.6.3.2-0.1.el6
Host:
OS Version: RHEL-7.2-9.el7_2.1
Kernel:3.10.0-327.10.1.el7.x86_64
KVM: 2.3.0-31.el7_2.7
LIBVIRT: libvirt-1.2.17-13.el7_2.3
VDSM: vdsm-4.17.21-0.el7ev
SPICE: 0.12.4-15.el7

Scenario: 
1. Create VM in Admin portal
2. Create VM in user portal
3. Start VM in user portal
4. Connect to VM via Console
5. Monitor event log for 0.5 hour

Results:
PASS 
The message is shown only time time per VM.

Note You need to log in before you can comment on or make changes to this bug.

acrow
agkesos
ahadas
audgiri
emesika
iheim
jraju
lsurette
lveyde
mavital
mgoldboi
michal.skrivanek
mkenneth
nashok
ofrenkel
rbalakri
Rhev-m-bugs
rhodain
rmcswain
tdosek
vfeenstr
yeylon
ykaul