Bug 1154389
Summary: | [scale] VM shutdown causes errors under load: exception in acpiShutdown() | ||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Virtualization Manager | Reporter: | Yuri Obshansky <yobshans> | ||||||||||||||||||
Component: | vdsm | Assignee: | Francesco Romani <fromani> | ||||||||||||||||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | Eldad Marciano <emarcian> | ||||||||||||||||||
Severity: | low | Docs Contact: | |||||||||||||||||||
Priority: | low | ||||||||||||||||||||
Version: | 3.5.0 | CC: | bazulay, fromani, gklein, istein, lpeer, lsurette, michal.skrivanek, rbalakri, Rhev-m-bugs, srevivo, ykaul, yobshans | ||||||||||||||||||
Target Milestone: | ovirt-3.6.0-rc | ||||||||||||||||||||
Target Release: | 3.6.0 | ||||||||||||||||||||
Hardware: | x86_64 | ||||||||||||||||||||
OS: | Linux | ||||||||||||||||||||
Whiteboard: | |||||||||||||||||||||
Fixed In Version: | 4.17.4 | Doc Type: | Bug Fix | ||||||||||||||||||
Doc Text: | Story Points: | --- | |||||||||||||||||||
Clone Of: | Environment: | ||||||||||||||||||||
Last Closed: | 2016-04-20 01:29:06 UTC | Type: | Bug | ||||||||||||||||||
Regression: | --- | Mount Type: | --- | ||||||||||||||||||
Documentation: | --- | CRM: | |||||||||||||||||||
Verified Versions: | Category: | --- | |||||||||||||||||||
oVirt Team: | Virt | RHEL 7.3 requirements from Atomic Host: | |||||||||||||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||||||||||||
Embargoed: | |||||||||||||||||||||
Attachments: |
|
Description
Yuri Obshansky
2014-10-19 12:17:08 UTC
please attach vdsm logs Sorry, I don't have vdsm.log Test was performed on 15.10 and vdsm has already deleted all log files created early from 17.10 Created attachment 948946 [details]
vdsm.log
Created attachment 948947 [details]
engine.log
Could you send the libvirt logs too, please? Unfortunately, I don't have libvirt.log file created under /var/log/libvirt I looked at /etc/libvirt/libvirtd.conf file and found that all log configurations are commented out. Is it OK? ################################################################# # # Logging controls # # Logging level: 4 errors, 3 warnings, 2 information, 1 debug # basically 1 will log everything possible #log_level = 3 # Logging filters: # A filter allows to select a different logging level for a given category # of logs # The format for a filter is one of: # x:name # x:+name # where name is a string which is matched against source file name, # e.g., "remote", "qemu", or "util/json", the optional "+" prefix # tells libvirt to log stack trace for each message matching name, # and x is the minimal level where matching messages should be logged: # 1: DEBUG # 2: INFO # 3: WARNING # 4: ERROR # # Multiple filter can be defined in a single @filters, they just need to be # separated by spaces. # # e.g. to only get warning or errors from the remote layer and only errors # from the event layer: #log_filters="3:remote 4:event" # Logging outputs: # An output is one of the places to save logging information # The format for an output can be: # x:stderr # output goes to stderr # x:syslog:name # use syslog for the output and use the given name as the ident # x:file:file_path # output to a file, with the given filepath # In all case the x prefix is the minimal level, acting as a filter # 1: DEBUG # 2: INFO # 3: WARNING # 4: ERROR # # Multiple output can be defined, they just need to be separated by spaces. # e.g. to log all warnings and errors to syslog under the libvirtd ident: #log_outputs="3:syslog:libvirtd" # # Log debug buffer size: default 64 # The daemon keeps an internal debug log buffer which will be dumped in case # of crash or upon receiving a SIGUSR2 signal. This setting allows to override # the default buffer size in kilobytes. # If value is 0 or less the debug log buffer is deactivated #log_buffer_size = 64 I can rerun test again. Please, suggest what log configuration uncomment and what level do you prefer. Created attachment 949744 [details]
libvirt.log 1
Created attachment 949746 [details]
libvirt.log 2
Created attachment 949747 [details]
libvirt.log 3
The logs point out to QEMU dying earlier than expected, but no reason is given for it. Looks like an issue with QEMU rather than with the engine or VDSM. duplicate of bug 1143968 ? Part of this problem (the "dictionary changed size during iteration" error) has to do with the unsafe handling of Vm.conf that is mentioned in that BZ, another part (the "'NoneType' object has no attribute 'shutdownFlags'" error) is related to QEMU dying too quickly after the shutdown command is issued, which is a different problem. (In reply to Vitor de Lima from comment #12) > Part of this problem (the "dictionary changed size during iteration" error) > has to do with the unsafe handling of Vm.conf that is mentioned in that BZ, > another part (the "'NoneType' object has no attribute 'shutdownFlags'" > error) is related to QEMU dying too quickly after the shutdown command is > issued, which is a different problem. Correct. Unfortunately lots of flow are potentially affected, so definitely VDSM logs will help shed some light here, in order to prioritize. We know QEMU death and Shutdown/destroy are racy and potentially buggy, even though the specific error reported should be little harm. refactoring of those flows was also planned but slipped through because of other more urgent issues, maybe is time to bump it up. BZ1143968 is related but is tackling a different subset of the bigger problem, one of which we have evidence and a quite clean reproducer. I don't think device handling is the root cause of this issue, but to be sure we need the full backtrace. patch http://gerrit.ovirt.org/#/c/34879/ should help for this issue. "dictionary changed size during iteration" tracked in bug 1143968. Changing this one to the shutdown race in comment #14. bug is harmless. The bug was verified on versions: rhevm-setup-plugin-ovirt-engine-common-3.6.3-0.1.el6.noarch ovirt-engine-extension-aaa-jdbc-1.0.5-1.el6ev.noarch rhevm-setup-plugin-ovirt-engine-3.6.3-0.1.el6.noarch vdsm-python-4.17.19-0.el7ev.noarch vdsm-hook-vmfex-dev-4.17.19-0.el7ev.noarch vdsm-jsonrpc-4.17.19-0.el7ev.noarch vdsm-yajsonrpc-4.17.19-0.el7ev.noarch vdsm-xmlrpc-4.17.19-0.el7ev.noarch vdsm-cli-4.17.19-0.el7ev.noarch vdsm-4.17.19-0.el7ev.noarch vdsm-infra-4.17.19-0.el7ev.noarch There was performed the same load test with 50 concurrent threads. No exceptions raised during 2 hours of test. (see attached engine.log file timeframe: 2016-02-04 11:35:00 - 2016-02-04 13:35:00) REST API call StopVM response time 90% line - 5385 msec which is worse than in 3.4 but better than 3.5 (see attached AggregateTable.csv for details and graph StopVM-response-time.png) The bug can be closed. Created attachment 1121083 [details]
Bug verification engine.log
Created attachment 1121084 [details]
Bug verification API response time table
Created attachment 1121085 [details]
Bug verification API response time graph
|