RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 853725 - [vdsm] vm disk stats failures is flooding log (every 2ms for each vm) in case storage domain containing vm disk is not-reachable
Summary: [vdsm] vm disk stats failures is flooding log (every 2ms for each vm) in case...
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: vdsm
Version: 6.3
Hardware: x86_64
OS: Linux
unspecified
medium
Target Milestone: rc
: ---
Assignee: Yaniv Bronhaim
QA Contact: Haim
URL:
Whiteboard: infra
: 859103 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2012-09-02 12:54 UTC by Haim
Modified: 2014-01-13 00:54 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2012-10-23 17:14:29 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
vdsm logs (785.68 KB, application/x-xz)
2012-09-02 12:58 UTC, Haim
no flags Details
correct vdsm logs (752.29 KB, application/x-xz)
2012-09-06 14:29 UTC, Haim
no flags Details
reproduce log (3.23 MB, text/x-log)
2012-10-22 10:49 UTC, Yaniv Bronhaim
no flags Details

Description Haim 2012-09-02 12:54:44 UTC
Description of problem:

m disk stats failures is flooding log (every 2ms for each vm) in case storage domain containing vm disk is not-reachable:

[root@nott-vds2 ~]# xzgrep 'Stats function failed:'  /var/log/vdsm/vdsm.log.1.xz | wc -l 
3956

storage was not accessible for 7 minutes and I got 3956 errors.

expected results: no need to run disk stats in case storage is reported as faulty (repo stats

Comment 1 Haim 2012-09-02 12:58:10 UTC
Created attachment 609078 [details]
vdsm logs

Comment 2 Haim 2012-09-02 12:58:36 UTC
Thread-1997::ERROR::2012-09-02 17:30:02,522::utils::399::vm.Vm::(collect) vmId=`e5f954b1-99c0-49e6-8b30-4948e922a21c`::Stats function failed: <AdvancedStatsFunction _highWrite at 0x1c765a8>
Traceback (most recent call last):
  File "/usr/lib64/python2.6/site-packages/vdsm/utils.py", line 395, in collect
    statsFunction()
  File "/usr/lib64/python2.6/site-packages/vdsm/utils.py", line 272, in __call__
    retValue = self._function(*args, **kwargs)
  File "/usr/share/vdsm/libvirtvm.py", line 89, in _highWrite
    self._vm._dom.blockInfo(vmDrive.path, 0)
  File "/usr/share/vdsm/libvirtvm.py", line 492, in f
    ret = attr(*args, **kwargs)
  File "/usr/lib64/python2.6/site-packages/vdsm/libvirtconnection.py", line 82, in wrapper
    ret = f(*args, **kwargs)
  File "/usr/lib64/python2.6/site-packages/libvirt.py", line 1664, in blockInfo
    if ret is None: raise libvirtError ('virDomainGetBlockInfo() failed', dom=self)
libvirtError: failed to open path '/rhev/data-center/520180ca-5062-4abd-a973-6246282b2812/29741863-4295-4794-bfcb-7b9e7e8775f9/images/b01b5f3a-08b2-49e5-b350-cf5ffb2c320e/571744fd-259d-4ac8-8739-85e682160d54': No 
such file or directory

Comment 4 Ayal Baron 2012-09-02 13:40:58 UTC
All the log contains is 8 such message out of which 4 are unique.
So I don't understand what the bug is:

IOError: [Errno 2] No such file or directory: '/dev/29741863-4295-4794-bfcb-7b9e7e8775f9/50fa7537-79da-468f-9cab-d671a23035c9'
IOError: [Errno 2] No such file or directory: '/dev/29741863-4295-4794-bfcb-7b9e7e8775f9/50fa7537-79da-468f-9cab-d671a23035c9'
IOError: [Errno 2] No such file or directory: '/dev/29741863-4295-4794-bfcb-7b9e7e8775f9/f8389e21-0de6-4508-9623-3ac600e99727'
IOError: [Errno 2] No such file or directory: '/dev/29741863-4295-4794-bfcb-7b9e7e8775f9/f8389e21-0de6-4508-9623-3ac600e99727'
IOError: [Errno 2] No such file or directory: '/dev/29741863-4295-4794-bfcb-7b9e7e8775f9/9f5af493-24e0-4e14-84cc-654988159e33'
IOError: [Errno 2] No such file or directory: '/dev/29741863-4295-4794-bfcb-7b9e7e8775f9/9f5af493-24e0-4e14-84cc-654988159e33'
IOError: [Errno 2] No such file or directory: '/dev/29741863-4295-4794-bfcb-7b9e7e8775f9/51dbcf41-fbb2-4cab-a76c-f73c4e1dd82f'
IOError: [Errno 2] No such file or directory: '/dev/29741863-4295-4794-bfcb-7b9e7e8775f9/51dbcf41-fbb2-4cab-a76c-f73c4e1dd82f'

Comment 5 Haim 2012-09-02 14:01:07 UTC
(In reply to comment #4)
> All the log contains is 8 such message out of which 4 are unique.
> So I don't understand what the bug is:
> 
> IOError: [Errno 2] No such file or directory:
> '/dev/29741863-4295-4794-bfcb-7b9e7e8775f9/50fa7537-79da-468f-9cab-
> d671a23035c9'
> IOError: [Errno 2] No such file or directory:
> '/dev/29741863-4295-4794-bfcb-7b9e7e8775f9/50fa7537-79da-468f-9cab-
> d671a23035c9'
> IOError: [Errno 2] No such file or directory:
> '/dev/29741863-4295-4794-bfcb-7b9e7e8775f9/f8389e21-0de6-4508-9623-
> 3ac600e99727'
> IOError: [Errno 2] No such file or directory:
> '/dev/29741863-4295-4794-bfcb-7b9e7e8775f9/f8389e21-0de6-4508-9623-
> 3ac600e99727'
> IOError: [Errno 2] No such file or directory:
> '/dev/29741863-4295-4794-bfcb-7b9e7e8775f9/9f5af493-24e0-4e14-84cc-
> 654988159e33'
> IOError: [Errno 2] No such file or directory:
> '/dev/29741863-4295-4794-bfcb-7b9e7e8775f9/9f5af493-24e0-4e14-84cc-
> 654988159e33'
> IOError: [Errno 2] No such file or directory:
> '/dev/29741863-4295-4794-bfcb-7b9e7e8775f9/51dbcf41-fbb2-4cab-a76c-
> f73c4e1dd82f'
> IOError: [Errno 2] No such file or directory:
> '/dev/29741863-4295-4794-bfcb-7b9e7e8775f9/51dbcf41-fbb2-4cab-a76c-
> f73c4e1dd82f'

how? 

[root@nott-vds2 ~]# xzgrep 'Stats function failed:'  /var/log/vdsm/vdsm.log.1.xz   | wc -l 
3956

Comment 6 Ayal Baron 2012-09-06 11:40:57 UTC
then you've attached the wrong log, because the one attached doesn't contain even 1 'Stats function failed:'... ?

Comment 7 Haim 2012-09-06 14:29:37 UTC
Created attachment 610351 [details]
correct vdsm logs

Comment 8 Ayal Baron 2012-09-16 06:36:52 UTC
Problem is with frequency of sampling stats:

Thread-2002::ERROR::2012-09-02 17:22:29,200::utils::399::vm.Vm::(collect) vmId=`c01dc87b-f962-4eda-815a-49684c69c228`::Stats function failed: <AdvancedStatsFunction _highWrite at 0x1c765a8>
Traceback (most recent call last):
  File "/usr/lib64/python2.6/site-packages/vdsm/utils.py", line 395, in collect
    statsFunction()
  File "/usr/lib64/python2.6/site-packages/vdsm/utils.py", line 272, in __call__
    retValue = self._function(*args, **kwargs)
  File "/usr/share/vdsm/libvirtvm.py", line 89, in _highWrite
    self._vm._dom.blockInfo(vmDrive.path, 0)
  File "/usr/share/vdsm/libvirtvm.py", line 492, in f
    ret = attr(*args, **kwargs)
  File "/usr/lib64/python2.6/site-packages/vdsm/libvirtconnection.py", line 82, in wrapper
    ret = f(*args, **kwargs)
  File "/usr/lib64/python2.6/site-packages/libvirt.py", line 1664, in blockInfo
    if ret is None: raise libvirtError ('virDomainGetBlockInfo() failed', dom=self)
libvirtError: failed to open path '/rhev/data-center/520180ca-5062-4abd-a973-6246282b2812/29741863-4295-4794-bfcb-7b9e7e8775f9/images/50a0888a-2ac7-4464-b61c-d9ab8b766515/b40766dd-b705-47a7-a441-4bb93d097707': No such file or directory

Comment 9 Yaniv Bronhaim 2012-09-27 16:57:11 UTC
Do you prefer to change the frequency of this print or handling this specific  libvirt error somehow (like printing something else than stats function failed..)?

Comment 10 Ayal Baron 2012-09-29 13:56:49 UTC
(In reply to comment #9)
> Do you prefer to change the frequency of this print or handling this
> specific  libvirt error somehow (like printing something else than stats
> function failed..)?

first, I'm not sure we need a traceback here.
second, we have to log the problem, just don't need to do it so frequently.

Comment 11 Ayal Baron 2012-09-29 14:03:51 UTC
Note that there are 20 unique VMs here and for each one we have a log message every 10s.  With 100 VMs it would be worse.  What we probably really need here is rca and suppression of most of these events (replacing with 1 generic message).

Thread-9289::DEBUG::2012-09-02 16:01:02,661::libvirtvm::241::vm.Vm::(_getDiskStats) vmId=`8eea11d0-103f-4aa3-8694-405bfa450919`::Disk hdc stats not available
Thread-9289::DEBUG::2012-09-02 16:01:02,680::libvirtvm::241::vm.Vm::(_getDiskStats) vmId=`21efbc74-b677-487f-9db4-fdbbeae179ca`::Disk hdc stats not available
Thread-9289::DEBUG::2012-09-02 16:01:02,681::libvirtvm::241::vm.Vm::(_getDiskStats) vmId=`5f04f6ab-2cd9-43cb-98a8-78b5eea25ef7`::Disk hdc stats not available
Thread-9289::DEBUG::2012-09-02 16:01:02,681::libvirtvm::241::vm.Vm::(_getDiskStats) vmId=`12aec5c0-3e1d-48e2-9921-db6b344057eb`::Disk hdc stats not available
Thread-9289::DEBUG::2012-09-02 16:01:02,682::libvirtvm::241::vm.Vm::(_getDiskStats) vmId=`dd402cf2-6237-44f6-9d98-bf0dae55f1f8`::Disk hdc stats not available
Thread-9289::DEBUG::2012-09-02 16:01:02,683::libvirtvm::241::vm.Vm::(_getDiskStats) vmId=`74d21a9c-516b-44b6-8fe5-d74279345c86`::Disk hdc stats not available
Thread-9289::DEBUG::2012-09-02 16:01:02,684::libvirtvm::241::vm.Vm::(_getDiskStats) vmId=`8a7169ca-49c0-42c5-b1b7-0e49defc3ed8`::Disk hdc stats not available
Thread-9289::DEBUG::2012-09-02 16:01:02,684::libvirtvm::241::vm.Vm::(_getDiskStats) vmId=`41c16001-a028-4844-a1fe-40558b99215a`::Disk hdc stats not available
Thread-9289::DEBUG::2012-09-02 16:01:02,685::libvirtvm::241::vm.Vm::(_getDiskStats) vmId=`2bebcde4-0be2-447b-8eed-6368054dab0f`::Disk hdc stats not available
Thread-9289::DEBUG::2012-09-02 16:01:02,686::libvirtvm::241::vm.Vm::(_getDiskStats) vmId=`88a4b827-2d89-4501-9d0c-bdece761d0d6`::Disk hdc stats not available
Thread-9289::DEBUG::2012-09-02 16:01:02,687::libvirtvm::241::vm.Vm::(_getDiskStats) vmId=`d47149f9-ab9e-4501-83e8-1547d25c2fe8`::Disk hdc stats not available
Thread-9289::DEBUG::2012-09-02 16:01:02,687::libvirtvm::241::vm.Vm::(_getDiskStats) vmId=`7f4ec82f-1c18-4fbe-99d4-e53d60e5b05b`::Disk hdc stats not available
Thread-9289::DEBUG::2012-09-02 16:01:02,688::libvirtvm::241::vm.Vm::(_getDiskStats) vmId=`f78850bf-46cd-4606-86ab-1c72823edf76`::Disk hdc stats not available
Thread-9289::DEBUG::2012-09-02 16:01:02,689::libvirtvm::241::vm.Vm::(_getDiskStats) vmId=`c3ddabed-5ec5-4883-9b15-33db959ec656`::Disk hdc stats not available
Thread-9289::DEBUG::2012-09-02 16:01:02,690::libvirtvm::241::vm.Vm::(_getDiskStats) vmId=`052a871e-794a-4005-bc8a-e23f2bd353a0`::Disk hdc stats not available
Thread-9289::DEBUG::2012-09-02 16:01:02,691::libvirtvm::241::vm.Vm::(_getDiskStats) vmId=`1e9b7f40-cb7a-47c4-941f-8347c9171025`::Disk hdc stats not available
Thread-9289::DEBUG::2012-09-02 16:01:02,692::libvirtvm::241::vm.Vm::(_getDiskStats) vmId=`c01dc87b-f962-4eda-815a-49684c69c228`::Disk hdc stats not available
Thread-9289::DEBUG::2012-09-02 16:01:02,693::libvirtvm::241::vm.Vm::(_getDiskStats) vmId=`b5a5739a-3b9e-4b2a-91b3-bb7137b35e5b`::Disk hdc stats not available
Thread-9289::DEBUG::2012-09-02 16:01:02,694::libvirtvm::241::vm.Vm::(_getDiskStats) vmId=`23049264-993d-4427-89d0-2bcd7f04b92d`::Disk hdc stats not available
Thread-9289::DEBUG::2012-09-02 16:01:02,695::libvirtvm::241::vm.Vm::(_getDiskStats) vmId=`e5f954b1-99c0-49e6-8b30-4948e922a21c`::Disk hdc stats not available

Comment 12 Ayal Baron 2012-09-29 14:09:58 UTC
Another issue here is that for some reason disk stats and disk latency are sampled separately (seems inefficient + in case of storage problem both fail but scope is separate so logging is done for both).
Dan, why are these 2 methods separate? _sampleDisk, _sampleDiskLatency

Comment 13 Ayal Baron 2012-09-29 14:10:15 UTC
*** Bug 859103 has been marked as a duplicate of this bug. ***

Comment 14 Dan Kenigsberg 2012-09-29 19:55:23 UTC
(In reply to comment #12)
> Dan, why are these 2 methods separate? _sampleDisk, _sampleDiskLatency

They have different sampling intervals, and different means to acquire data. What would you like to consolidate here?

Comment 15 Yaniv Bronhaim 2012-10-01 17:13:50 UTC
i can stop collecting stats about storage issues when getRealDomain once fails on looking for domain, until next success call to getRealDomain when the connection will be back on. how is it sound?
the sampling way is ok, when the connection is on, it looks like we want to sample all those tests for each vm.

Comment 16 Yaniv Bronhaim 2012-10-22 10:49:49 UTC
Created attachment 631400 [details]
reproduce log

I attached the log I get during blocking storage. Seems like the libvirtError you've got there I don't get. Maybe I reproduce it differently or maybe it was fixed during different patch, but this exception doesn't appear.

If sample function throws exception it means we have something wrong that we want to fix. After Blocking storage domain exception shouldn't be raised during sampling highwrite.

Comment 17 Barak 2012-10-23 17:14:29 UTC
http://gerrit.ovirt.org/#/c/8288

see comment 

It looks like the stack trace is displayed in log only when libvirt is down.

When the SD is down it has a log per VM every 10 secs, this looks like a valid behaviour.


see comment in the patch.

Hence moving this BZ to CLOSED WONTFIX

Comment 18 Yaniv Kaul 2012-10-23 17:32:28 UTC
(In reply to comment #17)
> http://gerrit.ovirt.org/#/c/8288
> 
> see comment 
> 
> It looks like the stack trace is displayed in log only when libvirt is down.
> 
> When the SD is down it has a log per VM every 10 secs, this looks like a
> valid behaviour.

It's not per VM every 10 seconds, it's per VM multiplied by the number of disks, every 10 seconds. Although from the log above, it looks like it is actually printing twice for every disk:
Thread-356342::DEBUG::2012-10-22 12:27:15,283::libvirtvm::224::vm.Vm::(_getDiskStats) vmId=`c75ec2fe-2f73-4ed4-98bf-0d7286160a87`::Disk hdc stats not available
Thread-356342::DEBUG::2012-10-22 12:27:15,283::libvirtvm::224::vm.Vm::(_getDiskStats) vmId=`c75ec2fe-2f73-4ed4-98bf-0d7286160a87`::Disk vda stats not available
Thread-356342::DEBUG::2012-10-22 12:27:15,283::libvirtvm::263::vm.Vm::(_getDiskLatency) vmId=`c75ec2fe-2f73-4ed4-98bf-0d7286160a87`::Disk hdc latency not available
Thread-356342::DEBUG::2012-10-22 12:27:15,283::libvirtvm::263::vm.Vm::(_getDiskLatency) vmId=`c75ec2fe-2f73-4ed4-98bf-0d7286160a87`::Disk vda latency not available

So for 50 VMs with 1 disk, you'll get 10 lines/second, which is quite a lot, IMHO - it will flood the log pretty quickly.

> 
> 
> see comment in the patch.

The problem was not the stack trace, but the log flood.

> 
> Hence moving this BZ to CLOSED WONTFIX


Note You need to log in before you can comment on or make changes to this bug.