Bug 1612152

Summary: [GSS] Crashes in glusterVdoVolumeList seen in messages file.
Product: Red Hat Enterprise Virtualization Manager Reporter: Ben Turner <bturner>
Component: vdsmAssignee: Kaustav Majumder <kmajumde>
Status: CLOSED ERRATA QA Contact: SATHEESARAN <sasundar>
Severity: high Docs Contact:
Priority: high    
Version: 4.2.2CC: bturner, danken, dblack, dfediuck, godas, guillaume.pavese, kmajumde, lsurette, mkalinin, rdlugyhe, sabose, sbose, seamurph, sgoodman, srevivo, tnisan, ycui
Target Milestone: ovirt-4.4.1Flags: lsvaty: testing_plan_complete-
Target Release: 4.4.0   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: vdsm-4.40.20 Doc Type: Bug Fix
Doc Text:
Previously, Virtual Data Optimizer (VDO) statistics were not available for VDO volumes with an error, so VDO monitoring from VDSM caused a traceback. This update fixes the issue by correctly handling the different outputs from the VDO statistics tool.
Story Points: ---
Clone Of:
: 1721097 (view as bug list) Environment:
Last Closed: 2020-08-04 13:26:06 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Gluster RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1721097    

Description Ben Turner 2018-08-03 15:30:38 UTC
Description of problem:

We are seeing the following crash:

Aug  1 13:47:15 rhv-van-001 vdsm[59733]: ERROR Internal server error#012Traceback (most recent call last):#012  File "/usr/lib/python2.7/site-packages/yajsonrpc/__init__.py", line 606, in _handle_request#012    res = method
(**params)#012  File "/usr/lib/python2.7/site-packages/vdsm/rpc/Bridge.py", line 197, in _dynamicMethod#012    result = fn(*methodArgs)#012  File "/usr/lib/python2.7/site-packages/vdsm/gluster/apiwrapper.py", line 91, in vd
oVolumeList#012    return self._gluster.vdoVolumeList()#012  File "/usr/lib/python2.7/site-packages/vdsm/gluster/api.py", line 90, in wrapper#012    rv = func(*args, **kwargs)#012  File "/usr/lib/python2.7/site-packages/vds
m/gluster/api.py", line 818, in vdoVolumeList#012    status = self.svdsmProxy.glusterVdoVolumeList()#012  File "/usr/lib/python2.7/site-packages/vdsm/common/supervdsm.py", line 55, in __call__#012    return callMethod()#012
  File "/usr/lib/python2.7/site-packages/vdsm/common/supervdsm.py", line 53, in <lambda>#012    **kwargs)#012  File "<string>", line 2, in glusterVdoVolumeList#012  File "/usr/lib64/python2.7/multiprocessing/managers.py", l
ine 773, in _callmethod#012    raise convert_to_error(kind, result)#012AttributeError: 'str' object has no attribute 'iteritems'

In the messages file.  It looks to be VDSM querying gluster volumes, #012AttributeError: 'str' object has no attribute 'iteritems' is the exception from the look of it.

Version-Release number of selected component (if applicable):

RHEV 4.2

How reproducible:

This happens consistently on the setup in question.

Steps to Reproduce:
1.  Setup RHHI
2.  Store VMs on gluster volume
3.  Check messages

Actual results:

Crashes happening every 5 minutes.

Expected results:

Normal operation.

Additional info:

Comment 5 Sahina Bose 2018-08-31 07:07:24 UTC
This is not a vdsm crash, but an error querying vdo stats, as the vdo module is not present. The error is safe to be ignored as per Denis. Denis, do you want to comment?

Is this on a RHEL 7.5 system?

Comment 6 Marina Kalinin 2018-09-07 19:30:29 UTC
Ben, can you help to answer Sahina's question?

Comment 7 Denis Chaplygin 2018-09-12 10:02:41 UTC
Right, it means vdo module is not installed/working correctly. It can be safely ignored.

Comment 8 Marina Kalinin 2018-09-12 19:50:12 UTC
(In reply to Denis Chaplygin from comment #7)
> Right, it means vdo module is not installed/working correctly. It can be
> safely ignored.

So, is there another bug confirming this behavior or that it is fixed in next version or something so that we can close that bug as duplicate of it?

Comment 9 Sahina Bose 2018-09-21 09:07:36 UTC
Hi Denis,

I see this when vdo is installed and running. We need to handle cases where -  VDO statistics: not available

Error in supervdsm.log
MainProcess|jsonrpc/5::DEBUG::2018-09-21 14:31:47,941::supervdsm_server::96::SuperVdsm.ServerCallback::(wrapper) call vdoVolumeList with () {}
MainProcess|jsonrpc/5::DEBUG::2018-09-21 14:31:47,942::commands::65::root::(execCmd) /usr/bin/taskset --cpu-list 0-31 /bin/vdo status (cwd None)
MainProcess|jsonrpc/5::DEBUG::2018-09-21 14:31:48,413::commands::86::root::(execCmd) SUCCESS: <err> = ''; <rc> = 0
MainProcess|jsonrpc/5::ERROR::2018-09-21 14:31:48,490::supervdsm_server::100::SuperVdsm.ServerCallback::(wrapper) Error in vdoVolumeList
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/vdsm/supervdsm_server.py", line 98, in wrapper
    res = func(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/vdsm/gluster/thinstorage.py", line 95, in vdoVolumeList
    for mapper, stats in iteritems(data["VDO statistics"]):
  File "/usr/lib/python2.7/site-packages/six.py", line 576, in iteritems
    return iter(d.iteritems(**kw))
AttributeError: 'str' object has no attribute 'iteritems'


vdo status output:
VDO status:
  Date: '2018-09-21 14:36:42+05:30'
  Node: rhsdev-grafton4-nic2.lab.eng.blr.redhat.com
Kernel module:
  Loaded: true
  Name: kvdo
  Version information:
    kvdo version: 6.1.0.181
Configuration:
  File: /etc/vdoconf.yml
  Last modified: '2018-07-06 18:27:02'
VDOs:
  vdo_sdb:
    Acknowledgement threads: 1
    Activate: enabled
    Bio rotation interval: 64
    Bio submission threads: 4
    Block map cache size: 128M
    Block map period: 16380
    Block size: 4096
    CPU-work threads: 2
    Compression: enabled
    Configured write policy: sync
    Deduplication: enabled
    Device mapper status: 0 2936012800 vdo /dev/sdb albserver online cpu=2,bio=4,ack=1,bioRotationInterval=64
    Emulate 512 byte: enabled
    Hash zone threads: 1
    Index checkpoint frequency: 0
    Index memory setting: 0.25
    Index parallel factor: 0
    Index sparse: disabled
    Index status: online
    Logical size: 1400G
    Logical threads: 1
    Physical size: 190240M
    Physical threads: 1
    Read cache: enabled
    Read cache size: 20M
    Slab size: 32G
    Storage device: /dev/sdb
    VDO statistics:
      /dev/mapper/vdo_sdb:
        1K-blocks: 194805760
        1K-blocks available: 158031652
        1K-blocks used: 36774108
        512 byte emulation: true
        KVDO module bios used: 63766
        KVDO module bytes used: 586508896
        KVDO module peak bio count: 160524
        KVDO module peak bytes used: 1143910024
        bios acknowledged discard: 0
        bios acknowledged flush: 39010980
        bios acknowledged fua: 34634281
        bios acknowledged partial discard: 0
        bios acknowledged partial flush: 0
        bios acknowledged partial fua: 0
        bios acknowledged partial read: 29881488
        bios acknowledged partial write: 915785
        bios acknowledged read: 65605641
        bios acknowledged write: 106957648
        bios in discard: 0
        bios in flush: 39010980
        bios in fua: 34634281
        bios in partial discard: 0
        bios in partial flush: 0
        bios in partial fua: 0
        bios in partial read: 101917
        bios in partial write: 915785
        bios in progress discard: 0
        bios in progress flush: 0
        bios in progress fua: 0
        bios in progress read: 0
        bios in progress write: 0
        bios in read: 65605641
        bios in write: 106957648
        bios journal completed discard: 0
        bios journal completed flush: 0
        bios journal completed fua: 0
        bios journal completed read: 32768
        bios journal completed write: 176081539
        bios journal discard: 0
        bios journal flush: 176081539
        bios journal fua: 176081539
        bios journal read: 32768
        bios journal write: 176081539
        bios meta completed discard: 0
        bios meta completed flush: 0
        bios meta completed fua: 0
        bios meta completed read: 38712
        bios meta completed write: 200858893
        bios meta discard: 0
        bios meta flush: 177013669
        bios meta fua: 176081541
        bios meta read: 38712
        bios meta write: 200858893
        bios out completed discard: 0
        bios out completed flush: 0
        bios out completed fua: 0
        bios out completed read: 32552330
        bios out completed write: 66473542
        bios out discard: 0
        bios out flush: 0
        bios out fua: 0
        bios out read: 32552330
        bios out write: 66473542
        bios page cache completed discard: 0
        bios page cache completed flush: 0
        bios page cache completed fua: 0
        bios page cache completed read: 3334
        bios page cache completed write: 28898
        bios page cache discard: 0
        bios page cache flush: 3
        bios page cache fua: 0
        bios page cache read: 3334
        bios page cache write: 28898
        block map cache pressure: 0
        block map cache size: 134217728
        block map clean pages: 2709
        block map dirty pages: 625
        block map discard required: 0
        block map failed pages: 0
        block map failed reads: 0
        block map failed writes: 0
        block map fetch required: 3334
        block map flush count: 23887
        block map found in cache: 264958494
        block map free pages: 29434
        block map incoming pages: 0
        block map outgoing pages: 0
        block map pages loaded: 3334
        block map pages saved: 28895
        block map read count: 134317562
        block map read outgoing: 44
        block map reclaimed: 79
        block map wait for page: 787162
        block map write count: 131431428
        block size: 4096
        completed recovery count: 1
        compressed blocks written: 23624169
        compressed fragments in packer: 31
        compressed fragments written: 53649807
        current VDO IO requests in progress: 31
        current dedupe queries: 0
        data blocks used: 2414920
        dedupe advice stale: 7455
        dedupe advice timeouts: 0
        dedupe advice valid: 9756065
        entries indexed: 63432491
        flush out: 39010980
        instance: 1
        invalid advice PBN count: 0
        journal blocks batching: 0
        journal blocks committed: 176081539
        journal blocks started: 176081539
        journal blocks writing: 0
        journal blocks written: 176081539
        journal commits requested count: 0
        journal disk full count: 0
        journal entries batching: 0
        journal entries committed: 262861901
        journal entries started: 262861901
        journal entries writing: 0
        journal entries written: 262861901
        logical blocks: 367001600
        logical blocks used: 6580537
        maximum VDO IO requests in progress: 2000
        maximum dedupe queries: 1984
        no space error count: 0
        operating mode: normal
        overhead blocks used: 6778607
        physical blocks: 48701440
        posts found: 9780277
        posts not found: 55994283
        queries found: 0
        queries not found: 0
        read cache accesses: 13447159
        read cache data hits: 10471636
        read cache hits: 10887649
        read only error count: 0
        read-only recovery count: 0
        recovery progress (%): N/A
        reference blocks written: 740026
        release version: 131337
        saving percent: 63
        slab count: 5
        slab journal blocked count: 0
        slab journal blocks written: 192096
        slab journal disk full count: 0
        slab journal flush count: 43296
        slab journal tail busy count: 0
        slab summary blocks written: 192096
        slabs opened: 1
        slabs reopened: 15
        updates found: 53197347
        updates not found: 19
        used percent: 18
        version: 26
        write amplification ratio: 2.5
        write policy: sync
  vdo_sdd1:
    Acknowledgement threads: 1
    Activate: enabled
    Bio rotation interval: 64
    Bio submission threads: 4
    Block map cache size: 128M
    Block map period: 16380
    Block size: 4096
    CPU-work threads: 2
    Compression: enabled
    Configured write policy: auto
    Deduplication: enabled
    Device mapper status: not available
    Emulate 512 byte: enabled
    Hash zone threads: 1
    Index checkpoint frequency: 0
    Index memory setting: 0.25
    Index parallel factor: 0
    Index sparse: disabled
    Index status: not available
    Logical size: 5000G
    Logical threads: 1
    Physical size: 50G
    Physical threads: 1
    Read cache: enabled
    Read cache size: 20M
    Slab size: 32G
    Storage device: /dev/sdd1
    VDO statistics: not available

Comment 10 Sandro Bonazzola 2019-01-28 09:39:44 UTC
This bug has not been marked as blocker for oVirt 4.3.0.
Since we are releasing it tomorrow, January 29th, this bug has been re-targeted to 4.3.1.

Comment 12 Gobinda Das 2019-02-27 10:58:28 UTC
Ovirt-4.3.1 already released, so moving to ovirt-4.3.2

Comment 13 Sahina Bose 2019-03-18 07:06:17 UTC
Denis, any update on this patch?

Comment 14 Denis Chaplygin 2019-04-09 11:46:03 UTC
PInged reviewers, will try to get it merged asap

Comment 15 Sahina Bose 2019-04-23 06:51:36 UTC
Any update?

Comment 17 Sahina Bose 2019-06-10 05:21:20 UTC
Tal, need some help with patch reviews. Can you help move this forward?

Comment 18 Tal Nisan 2019-06-16 09:05:27 UTC
(In reply to Sahina Bose from comment #17)
> Tal, need some help with patch reviews. Can you help move this forward?

Discussed offline.

Comment 19 Sahina Bose 2019-06-17 05:09:35 UTC
Can you provide pm_ack?

Comment 20 Daniel Gur 2019-08-28 13:11:21 UTC
sync2jira

Comment 21 Daniel Gur 2019-08-28 13:15:32 UTC
sync2jira

Comment 22 Sahina Bose 2019-11-12 07:00:10 UTC
Kaustav, can you rebase the patch and own this?

Comment 23 Sandro Bonazzola 2020-05-18 14:46:31 UTC
Moved to 4.4.1 not being marked as blocker for 4.4.0 and we are preparing to GA.

Comment 27 SATHEESARAN 2020-07-14 05:48:07 UTC
Tested with RHV Manager Version 4.4.1.8-0.7.el8ev  and vdsm-4.40.22-1.el8ev.x86_64

Ran the soak test for about 4 weeks, with the setup up and running with VDO volumes, no
such errors popped up in the vdsm.log

Comment 32 errata-xmlrpc 2020-08-04 13:26:06 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (RHV RHEL Host (ovirt-host) 4.4), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2020:3246