Description of problem: Since running an upgrade of RHEV Manager to version 3.4.0-0.22.el6ev, messages are displayed every 30 seconds about "User <x> is connected to vm <y> in the events log for some VMs. Last night it was 1 VM and now it is now up to 2. Rebooting the RHEV Manager server has not served as a fix or workaround. Version-Release number of selected component (if applicable): 3.4.0-0.22.el6ev, upgraded How reproducible: Unknown Steps to Reproduce: Presumably stopping a VM, putting it into an export domain, and restarting it. The actual cause is currently unknown. Actual results: Constant, repeating messages "User <x> is connected to vm <y> in Events log Expected results: No messages unless user is connected, and only the once.
this is logged when the clientIp reported from vdsm is changed (different from the value saved in the engine db). can you please check if this value change in vdsClient where the vm is running? you can check the vm id in the engine UI, in the vm general sub tab, and use this command on the host it is running on: watch vdsClient -s 0 getVmStats <vmId> | grep clientIp
Hi, I've not yet been able to reproduce the problem (since upgrading my hypervisors to the latest version) - but - In the UI, the same IP address is shown 3 times for quickbooks, separated by spaces. vmstat shows this: guestIPs = 192.168.20.53 192.168.20.53 192.168.20.53 And at least one of the VMs has the issue post-export that I cannot import it to another cluster. I've just made the 3rd attempt to export/import it (fresh export) and get this: Error while executing action: quickbooks: Cannot import VM. The snapshot configuration is corrupted (snapshot ID is empty). Please contact the system administrator. Now after starting the above vm again (on the original cluster) the UI shows it stuck with a grey "shadowed up-arrow" and "wait for launch". However the VM is up and I can VNC directly to it - but can't get a console. I have to say I've had a lot of issues since 3.4 whereas all previous upgrades went really well. I have had a separate issue where I could not start a VM (that had been working fine in 3.3) with 2 sockets, 6 cores per socket - got a topology error. I /could/ start it with 6 sockets/2 cps! Will raise that in another support ticket if I can reproduce it. Thanks Alex
Hi, from the log collector I can see that the first problem (of repeated connection messages) was caused by database constraint violation. The database logs point to that the engine was trying to update the current logged in user name (vm_dynamic.guest_cur_user_name) and the username was larger than the maximum 255 characters which sounds really strange for a username. Sadly the logs don't show what exactly was the "username" it was trying to persist. (be it corrupted or mangled data) This may be a case of database corruption, which is probably causing the other problems with import/export. Could You please attach logs related to the subsequent import/export and run, so we may diagnose the problem further? Thanks Martin
Also could you please show the output of the command $ psql -l (to determine the locale)
The logs provided so far can be distilled to the following repeated SQL error: 2014-07-07 03:21:19,557 ERROR [org.ovirt.engine.core.dal.dbbroker.BatchProcedureExecutionConnectionCallback] (DefaultQuartzScheduler_Worker-65) Can't execute batch. Next exception is: : org.postgresql.util.PSQLException: ERROR: value too long for type character varying(255) Where: SQL statement "UPDATE vm_dynamic SET app_list = $1 ,guest_cur_user_name = $2 , console_cur_user_name = $3 , guest_last_login_time = $4 , guest_last_logout_time = $5 , console_user_id = $6 , guest_os = $7 ,migrating_to_vds = $8 ,RUN_ON_VDS = $9 , status = $10 ,vm_host = $11 ,vm_ip = $12 ,vm_fqdn = $13 , last_start_time = $14 , last_stop_time = $15 , vm_pid = $16 ,display = $17 ,acpi_enable = $18 , session = $19 ,display_ip = $20 , display_type = $21 ,kvm_enable = $22 ,boot_sequence = $23 , display_secure_port = $24 , utc_diff = $25 ,last_vds_run_on = $26 ,client_ip = $27 , guest_requested_memory = $28 , hibernation_vol_handle = $29 ,exit_status = $30 , pause_status = $31 ,exit_message = $32 , hash= $33 , guest_agent_nics_hash = $34 , last_watchdog_event = $35 , last_watchdog_action = $36 , is_run_once = $37 , cpu_name = $38 , current_cd = $39 , guest_cpu_count = $40 WHERE vm_guid = $41 " PL/pgSQL function "updatevmdynamic" line 2 at SQL statement which would point to the v_guest_cur_user_name being longer than 255 characters.
(In reply to Martin Betak from comment #10) > The logs provided so far can be distilled to the following repeated SQL > error: > > 2014-07-07 03:21:19,557 ERROR > [org.ovirt.engine.core.dal.dbbroker. > BatchProcedureExecutionConnectionCallback] > (DefaultQuartzScheduler_Worker-65) Can't execute batch. Next exception is: : > org.postgresql.util.PSQLException: ERROR: value too long for type character > varying(255) > Where: SQL statement "UPDATE vm_dynamic SET app_list = $1 > ,guest_cur_user_name = $2 , console_cur_user_name = $3 , > guest_last_login_time = $4 , guest_last_logout_time = $5 , console_user_id > = $6 , guest_os = $7 ,migrating_to_vds = $8 ,RUN_ON_VDS = $9 , status = > $10 ,vm_host = $11 ,vm_ip = $12 ,vm_fqdn = $13 , last_start_time = $14 , > last_stop_time = $15 , vm_pid = $16 ,display = $17 ,acpi_enable = $18 , > session = $19 ,display_ip = $20 , display_type = $21 ,kvm_enable = $22 > ,boot_sequence = $23 , display_secure_port = $24 , utc_diff = $25 > ,last_vds_run_on = $26 ,client_ip = $27 , guest_requested_memory = $28 , > hibernation_vol_handle = $29 ,exit_status = $30 , pause_status = $31 > ,exit_message = $32 , hash= $33 , guest_agent_nics_hash = $34 , > last_watchdog_event = $35 , last_watchdog_action = $36 , is_run_once = > $37 , cpu_name = $38 , current_cd = $39 , guest_cpu_count = $40 WHERE > vm_guid = $41 " > PL/pgSQL function "updatevmdynamic" line 2 at SQL statement > > which would point to the v_guest_cur_user_name being longer than 255 > characters. That is extremely strange as all our usernames are 3 characters! This user was ah1. Cheers Alex
Hi If this scenario can be re-produced, please 1) edit /var/lib/pgsql/data/postgresql.conf 2) set log_min_messages="DEBUG5" and log_error_verbosity="VERBOSE" 3) service postgresql restart 4) reproduce 5) attach server.log
Created attachment 922050 [details] VM sanctuary2 fails VM sanctuary2 is down. Exit message: Maximum CPUs greater than topology limit. After these incidents we have only the "user X is connected" logs
I've attached the requested engine.log from 07/11/2014 No errors after that, but the known logs.
(In reply to Alexandros Gkesos from comment #13) > Created attachment 922050 [details] > VM sanctuary2 is down. Exit message: Maximum CPUs greater than topology > limit. this is a known bug 1121295; doesn't look relevant to this bug did you succeed with comment #12?
(In reply to Michal Skrivanek from comment #15) > (In reply to Alexandros Gkesos from comment #13) > > Created attachment 922050 [details] > > VM sanctuary2 is down. Exit message: Maximum CPUs greater than topology > > limit. > > this is a known bug 1121295; doesn't look relevant to this bug > did you succeed with comment #12? Customer hasn't replied yet. I will update you, as soon as i have an answer.
ok, please clear the needinfo once the customer replies with something useful...
Created attachment 932693 [details] postgresql and server logs from 01135008 case Customer provided the requested logs (attached)
Created attachment 946341 [details] verbose logging engine.log
Eli, can you please see if the new logs help?
(In reply to Omer Frenkel from comment #26) > Eli, can you please see if the new logs help? Omer No new info here , the same " value too long for type character" message is reported The only thing I think of is 1) tweak the table to have all varchar(255) as text fields 2) run the scenario 3) check which values exceeded the 255 characters limit 4) investigate why this happens Is that acceptable ?
seems like the only things to try. Eli, can you please handle the orchestration of trying to alter the DB to accommodate ridiculously long chars, run the scenario and get the data? flagging infra as it seems to be something weird with postgres
Michal - Eli will provide the script, please take it from there. Flagging as virt for further investigation, with the provided script.
Created attachment 949307 [details] upgrade scrit to turn all varchar 255 columns in vm_dynamic to text
Eli can you please look at the results?
(In reply to Omer Frenkel from comment #37) > Eli can you please look at the results? From the results in comments 35 I think that my instructions in comment 31 were not fully applied The upgrade process drops views and SP before the upgrade takes place in the pre-upgrade step and restore them afterwards Can you please provide a step-by-step log of comment 31 instructions aligned with the output of each step ? Thanks
(In reply to Eli Mesika from comment #39) > (In reply to Omer Frenkel from comment #37) > > Eli can you please look at the results? > > From the results in comments 35 I think that my instructions in comment 31 > were not fully applied > > The upgrade process drops views and SP before the upgrade takes place in the > pre-upgrade step and restore them afterwards > > Can you please provide a step-by-step log of comment 31 instructions aligned > with the output of each step ? > > Thanks Hi, Since while running the script , we didnt perform rest of the commands . #31 : Customer didnt proceed after step2 . Should we still proceed with all the steps & confirm the output from step 7 ? Regards, Jaison R
moving back to infra as at this point it still seems as some generic issue.
(In reply to Jaison Raju from comment #40) > Since while running the script , we didnt perform rest of the commands . > #31 : Customer didnt proceed after step2 . > Should we still proceed with all the steps & confirm the output from step 7 ? there is no run action in step #2 and this is probably your mistake probably 2) download attachment " upgrade script to turn all varchar 255 columns in > vm_dynamic to text " to dbscripts/upgrade directory and add a prefix > numbering such this script will be the last , for example if the last script > prefix is 03_04_0810 , the attached script should have 03_04_0820 numbering. This is only a file creation in a certain location and with a certain name , please do not run this with psql , just put the file exactly as defined and follow other steps
*** Bug 1172039 has been marked as a duplicate of this bug. ***
Created attachment 977246 [details] script output
Created attachment 977832 [details] script output
Arik, can you please take a look and see what can cause the guest_cur_user_name value to be corrupted and exceed the 255 character limit We had extended here the column to be text value and the attached result in comment 51 shows illegal values You can clearly see that the values are in a format of a sql error that engine attempts to insert to the database (".... ExecQuery(self, strQuery=pythoncom.Missing, strQueryLanguage=u'WQL', ...) Please advice
(In reply to Eli Mesika from comment #52) Vinzenz, is it possible that this error comes from the guest agent and somehow sent to the engine as the current user name?
Yes this definitely comes from the guest agent. How this is possible, however is a puzzling me big time right now. This is not an insert, this is a WQL Query which queries information from windows about the system, which can be user name, hostname (FQDN) etc. However there's also quite a lot of corrupted data in it. Something seems to be totally corrupted there. @Jaison Can you please gather the guest agent log files from at least two of the affected systems? Thanks.
@Jaison: Is there any update on this available?
no new updates for a long time, closing
Sorry for the delay - I am working on a potential fix - The problem is a bit more complex as it turns out.
Verify: RHEVM: 3.6.3.2-0.1.el6 Host: OS Version: RHEL-7.2-9.el7_2.1 Kernel:3.10.0-327.10.1.el7.x86_64 KVM: 2.3.0-31.el7_2.7 LIBVIRT: libvirt-1.2.17-13.el7_2.3 VDSM: vdsm-4.17.21-0.el7ev SPICE: 0.12.4-15.el7 Scenario: 1. Create VM in Admin portal 2. Create VM in user portal 3. Start VM in user portal 4. Connect to VM via Console 5. Monitor event log for 0.5 hour Results: PASS The message is shown only time time per VM.