Bug 853725
Summary: | [vdsm] vm disk stats failures is flooding log (every 2ms for each vm) in case storage domain containing vm disk is not-reachable | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | Haim <hateya> | ||||||||
Component: | vdsm | Assignee: | Yaniv Bronhaim <ybronhei> | ||||||||
Status: | CLOSED WONTFIX | QA Contact: | Haim <hateya> | ||||||||
Severity: | medium | Docs Contact: | |||||||||
Priority: | unspecified | ||||||||||
Version: | 6.3 | CC: | abaron, bazulay, danken, fsimonce, iheim, lpeer, yeylon, ykaul | ||||||||
Target Milestone: | rc | ||||||||||
Target Release: | --- | ||||||||||
Hardware: | x86_64 | ||||||||||
OS: | Linux | ||||||||||
Whiteboard: | infra | ||||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||||
Doc Text: | Story Points: | --- | |||||||||
Clone Of: | Environment: | ||||||||||
Last Closed: | 2012-10-23 17:14:29 UTC | Type: | Bug | ||||||||
Regression: | --- | Mount Type: | --- | ||||||||
Documentation: | --- | CRM: | |||||||||
Verified Versions: | Category: | --- | |||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||
Embargoed: | |||||||||||
Attachments: |
|
Description
Haim
2012-09-02 12:54:44 UTC
Created attachment 609078 [details]
vdsm logs
Thread-1997::ERROR::2012-09-02 17:30:02,522::utils::399::vm.Vm::(collect) vmId=`e5f954b1-99c0-49e6-8b30-4948e922a21c`::Stats function failed: <AdvancedStatsFunction _highWrite at 0x1c765a8> Traceback (most recent call last): File "/usr/lib64/python2.6/site-packages/vdsm/utils.py", line 395, in collect statsFunction() File "/usr/lib64/python2.6/site-packages/vdsm/utils.py", line 272, in __call__ retValue = self._function(*args, **kwargs) File "/usr/share/vdsm/libvirtvm.py", line 89, in _highWrite self._vm._dom.blockInfo(vmDrive.path, 0) File "/usr/share/vdsm/libvirtvm.py", line 492, in f ret = attr(*args, **kwargs) File "/usr/lib64/python2.6/site-packages/vdsm/libvirtconnection.py", line 82, in wrapper ret = f(*args, **kwargs) File "/usr/lib64/python2.6/site-packages/libvirt.py", line 1664, in blockInfo if ret is None: raise libvirtError ('virDomainGetBlockInfo() failed', dom=self) libvirtError: failed to open path '/rhev/data-center/520180ca-5062-4abd-a973-6246282b2812/29741863-4295-4794-bfcb-7b9e7e8775f9/images/b01b5f3a-08b2-49e5-b350-cf5ffb2c320e/571744fd-259d-4ac8-8739-85e682160d54': No such file or directory All the log contains is 8 such message out of which 4 are unique. So I don't understand what the bug is: IOError: [Errno 2] No such file or directory: '/dev/29741863-4295-4794-bfcb-7b9e7e8775f9/50fa7537-79da-468f-9cab-d671a23035c9' IOError: [Errno 2] No such file or directory: '/dev/29741863-4295-4794-bfcb-7b9e7e8775f9/50fa7537-79da-468f-9cab-d671a23035c9' IOError: [Errno 2] No such file or directory: '/dev/29741863-4295-4794-bfcb-7b9e7e8775f9/f8389e21-0de6-4508-9623-3ac600e99727' IOError: [Errno 2] No such file or directory: '/dev/29741863-4295-4794-bfcb-7b9e7e8775f9/f8389e21-0de6-4508-9623-3ac600e99727' IOError: [Errno 2] No such file or directory: '/dev/29741863-4295-4794-bfcb-7b9e7e8775f9/9f5af493-24e0-4e14-84cc-654988159e33' IOError: [Errno 2] No such file or directory: '/dev/29741863-4295-4794-bfcb-7b9e7e8775f9/9f5af493-24e0-4e14-84cc-654988159e33' IOError: [Errno 2] No such file or directory: '/dev/29741863-4295-4794-bfcb-7b9e7e8775f9/51dbcf41-fbb2-4cab-a76c-f73c4e1dd82f' IOError: [Errno 2] No such file or directory: '/dev/29741863-4295-4794-bfcb-7b9e7e8775f9/51dbcf41-fbb2-4cab-a76c-f73c4e1dd82f' (In reply to comment #4) > All the log contains is 8 such message out of which 4 are unique. > So I don't understand what the bug is: > > IOError: [Errno 2] No such file or directory: > '/dev/29741863-4295-4794-bfcb-7b9e7e8775f9/50fa7537-79da-468f-9cab- > d671a23035c9' > IOError: [Errno 2] No such file or directory: > '/dev/29741863-4295-4794-bfcb-7b9e7e8775f9/50fa7537-79da-468f-9cab- > d671a23035c9' > IOError: [Errno 2] No such file or directory: > '/dev/29741863-4295-4794-bfcb-7b9e7e8775f9/f8389e21-0de6-4508-9623- > 3ac600e99727' > IOError: [Errno 2] No such file or directory: > '/dev/29741863-4295-4794-bfcb-7b9e7e8775f9/f8389e21-0de6-4508-9623- > 3ac600e99727' > IOError: [Errno 2] No such file or directory: > '/dev/29741863-4295-4794-bfcb-7b9e7e8775f9/9f5af493-24e0-4e14-84cc- > 654988159e33' > IOError: [Errno 2] No such file or directory: > '/dev/29741863-4295-4794-bfcb-7b9e7e8775f9/9f5af493-24e0-4e14-84cc- > 654988159e33' > IOError: [Errno 2] No such file or directory: > '/dev/29741863-4295-4794-bfcb-7b9e7e8775f9/51dbcf41-fbb2-4cab-a76c- > f73c4e1dd82f' > IOError: [Errno 2] No such file or directory: > '/dev/29741863-4295-4794-bfcb-7b9e7e8775f9/51dbcf41-fbb2-4cab-a76c- > f73c4e1dd82f' how? [root@nott-vds2 ~]# xzgrep 'Stats function failed:' /var/log/vdsm/vdsm.log.1.xz | wc -l 3956 then you've attached the wrong log, because the one attached doesn't contain even 1 'Stats function failed:'... ? Created attachment 610351 [details]
correct vdsm logs
Problem is with frequency of sampling stats: Thread-2002::ERROR::2012-09-02 17:22:29,200::utils::399::vm.Vm::(collect) vmId=`c01dc87b-f962-4eda-815a-49684c69c228`::Stats function failed: <AdvancedStatsFunction _highWrite at 0x1c765a8> Traceback (most recent call last): File "/usr/lib64/python2.6/site-packages/vdsm/utils.py", line 395, in collect statsFunction() File "/usr/lib64/python2.6/site-packages/vdsm/utils.py", line 272, in __call__ retValue = self._function(*args, **kwargs) File "/usr/share/vdsm/libvirtvm.py", line 89, in _highWrite self._vm._dom.blockInfo(vmDrive.path, 0) File "/usr/share/vdsm/libvirtvm.py", line 492, in f ret = attr(*args, **kwargs) File "/usr/lib64/python2.6/site-packages/vdsm/libvirtconnection.py", line 82, in wrapper ret = f(*args, **kwargs) File "/usr/lib64/python2.6/site-packages/libvirt.py", line 1664, in blockInfo if ret is None: raise libvirtError ('virDomainGetBlockInfo() failed', dom=self) libvirtError: failed to open path '/rhev/data-center/520180ca-5062-4abd-a973-6246282b2812/29741863-4295-4794-bfcb-7b9e7e8775f9/images/50a0888a-2ac7-4464-b61c-d9ab8b766515/b40766dd-b705-47a7-a441-4bb93d097707': No such file or directory Do you prefer to change the frequency of this print or handling this specific libvirt error somehow (like printing something else than stats function failed..)? (In reply to comment #9) > Do you prefer to change the frequency of this print or handling this > specific libvirt error somehow (like printing something else than stats > function failed..)? first, I'm not sure we need a traceback here. second, we have to log the problem, just don't need to do it so frequently. Note that there are 20 unique VMs here and for each one we have a log message every 10s. With 100 VMs it would be worse. What we probably really need here is rca and suppression of most of these events (replacing with 1 generic message). Thread-9289::DEBUG::2012-09-02 16:01:02,661::libvirtvm::241::vm.Vm::(_getDiskStats) vmId=`8eea11d0-103f-4aa3-8694-405bfa450919`::Disk hdc stats not available Thread-9289::DEBUG::2012-09-02 16:01:02,680::libvirtvm::241::vm.Vm::(_getDiskStats) vmId=`21efbc74-b677-487f-9db4-fdbbeae179ca`::Disk hdc stats not available Thread-9289::DEBUG::2012-09-02 16:01:02,681::libvirtvm::241::vm.Vm::(_getDiskStats) vmId=`5f04f6ab-2cd9-43cb-98a8-78b5eea25ef7`::Disk hdc stats not available Thread-9289::DEBUG::2012-09-02 16:01:02,681::libvirtvm::241::vm.Vm::(_getDiskStats) vmId=`12aec5c0-3e1d-48e2-9921-db6b344057eb`::Disk hdc stats not available Thread-9289::DEBUG::2012-09-02 16:01:02,682::libvirtvm::241::vm.Vm::(_getDiskStats) vmId=`dd402cf2-6237-44f6-9d98-bf0dae55f1f8`::Disk hdc stats not available Thread-9289::DEBUG::2012-09-02 16:01:02,683::libvirtvm::241::vm.Vm::(_getDiskStats) vmId=`74d21a9c-516b-44b6-8fe5-d74279345c86`::Disk hdc stats not available Thread-9289::DEBUG::2012-09-02 16:01:02,684::libvirtvm::241::vm.Vm::(_getDiskStats) vmId=`8a7169ca-49c0-42c5-b1b7-0e49defc3ed8`::Disk hdc stats not available Thread-9289::DEBUG::2012-09-02 16:01:02,684::libvirtvm::241::vm.Vm::(_getDiskStats) vmId=`41c16001-a028-4844-a1fe-40558b99215a`::Disk hdc stats not available Thread-9289::DEBUG::2012-09-02 16:01:02,685::libvirtvm::241::vm.Vm::(_getDiskStats) vmId=`2bebcde4-0be2-447b-8eed-6368054dab0f`::Disk hdc stats not available Thread-9289::DEBUG::2012-09-02 16:01:02,686::libvirtvm::241::vm.Vm::(_getDiskStats) vmId=`88a4b827-2d89-4501-9d0c-bdece761d0d6`::Disk hdc stats not available Thread-9289::DEBUG::2012-09-02 16:01:02,687::libvirtvm::241::vm.Vm::(_getDiskStats) vmId=`d47149f9-ab9e-4501-83e8-1547d25c2fe8`::Disk hdc stats not available Thread-9289::DEBUG::2012-09-02 16:01:02,687::libvirtvm::241::vm.Vm::(_getDiskStats) vmId=`7f4ec82f-1c18-4fbe-99d4-e53d60e5b05b`::Disk hdc stats not available Thread-9289::DEBUG::2012-09-02 16:01:02,688::libvirtvm::241::vm.Vm::(_getDiskStats) vmId=`f78850bf-46cd-4606-86ab-1c72823edf76`::Disk hdc stats not available Thread-9289::DEBUG::2012-09-02 16:01:02,689::libvirtvm::241::vm.Vm::(_getDiskStats) vmId=`c3ddabed-5ec5-4883-9b15-33db959ec656`::Disk hdc stats not available Thread-9289::DEBUG::2012-09-02 16:01:02,690::libvirtvm::241::vm.Vm::(_getDiskStats) vmId=`052a871e-794a-4005-bc8a-e23f2bd353a0`::Disk hdc stats not available Thread-9289::DEBUG::2012-09-02 16:01:02,691::libvirtvm::241::vm.Vm::(_getDiskStats) vmId=`1e9b7f40-cb7a-47c4-941f-8347c9171025`::Disk hdc stats not available Thread-9289::DEBUG::2012-09-02 16:01:02,692::libvirtvm::241::vm.Vm::(_getDiskStats) vmId=`c01dc87b-f962-4eda-815a-49684c69c228`::Disk hdc stats not available Thread-9289::DEBUG::2012-09-02 16:01:02,693::libvirtvm::241::vm.Vm::(_getDiskStats) vmId=`b5a5739a-3b9e-4b2a-91b3-bb7137b35e5b`::Disk hdc stats not available Thread-9289::DEBUG::2012-09-02 16:01:02,694::libvirtvm::241::vm.Vm::(_getDiskStats) vmId=`23049264-993d-4427-89d0-2bcd7f04b92d`::Disk hdc stats not available Thread-9289::DEBUG::2012-09-02 16:01:02,695::libvirtvm::241::vm.Vm::(_getDiskStats) vmId=`e5f954b1-99c0-49e6-8b30-4948e922a21c`::Disk hdc stats not available Another issue here is that for some reason disk stats and disk latency are sampled separately (seems inefficient + in case of storage problem both fail but scope is separate so logging is done for both). Dan, why are these 2 methods separate? _sampleDisk, _sampleDiskLatency *** Bug 859103 has been marked as a duplicate of this bug. *** (In reply to comment #12) > Dan, why are these 2 methods separate? _sampleDisk, _sampleDiskLatency They have different sampling intervals, and different means to acquire data. What would you like to consolidate here? i can stop collecting stats about storage issues when getRealDomain once fails on looking for domain, until next success call to getRealDomain when the connection will be back on. how is it sound? the sampling way is ok, when the connection is on, it looks like we want to sample all those tests for each vm. Created attachment 631400 [details]
reproduce log
I attached the log I get during blocking storage. Seems like the libvirtError you've got there I don't get. Maybe I reproduce it differently or maybe it was fixed during different patch, but this exception doesn't appear.
If sample function throws exception it means we have something wrong that we want to fix. After Blocking storage domain exception shouldn't be raised during sampling highwrite.
http://gerrit.ovirt.org/#/c/8288 see comment It looks like the stack trace is displayed in log only when libvirt is down. When the SD is down it has a log per VM every 10 secs, this looks like a valid behaviour. see comment in the patch. Hence moving this BZ to CLOSED WONTFIX (In reply to comment #17) > http://gerrit.ovirt.org/#/c/8288 > > see comment > > It looks like the stack trace is displayed in log only when libvirt is down. > > When the SD is down it has a log per VM every 10 secs, this looks like a > valid behaviour. It's not per VM every 10 seconds, it's per VM multiplied by the number of disks, every 10 seconds. Although from the log above, it looks like it is actually printing twice for every disk: Thread-356342::DEBUG::2012-10-22 12:27:15,283::libvirtvm::224::vm.Vm::(_getDiskStats) vmId=`c75ec2fe-2f73-4ed4-98bf-0d7286160a87`::Disk hdc stats not available Thread-356342::DEBUG::2012-10-22 12:27:15,283::libvirtvm::224::vm.Vm::(_getDiskStats) vmId=`c75ec2fe-2f73-4ed4-98bf-0d7286160a87`::Disk vda stats not available Thread-356342::DEBUG::2012-10-22 12:27:15,283::libvirtvm::263::vm.Vm::(_getDiskLatency) vmId=`c75ec2fe-2f73-4ed4-98bf-0d7286160a87`::Disk hdc latency not available Thread-356342::DEBUG::2012-10-22 12:27:15,283::libvirtvm::263::vm.Vm::(_getDiskLatency) vmId=`c75ec2fe-2f73-4ed4-98bf-0d7286160a87`::Disk vda latency not available So for 50 VMs with 1 disk, you'll get 10 lines/second, which is quite a lot, IMHO - it will flood the log pretty quickly. > > > see comment in the patch. The problem was not the stack trace, but the log flood. > > Hence moving this BZ to CLOSED WONTFIX |