Bug 1563794
Summary: | vdostats displays incorrect information when used with logical volumes | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | David Galloway <dgallowa> |
Component: | vdo | Assignee: | Bryan Gurney <bgurney> |
Status: | CLOSED NOTABUG | QA Contact: | Filip Suba <fsuba> |
Severity: | unspecified | Docs Contact: | |
Priority: | unspecified | ||
Version: | 7.5 | CC: | awalsh, bgurney, corwin, dgallowa, hansjoerg.maurer, limershe, pasik |
Target Milestone: | rc | ||
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2020-05-12 19:05:20 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
David Galloway
2018-04-04 17:39:47 UTC
The "vdostats" command displays a df-style output, and in this output, the "Used" column contains, according to the man page for vdostats(8), "The total number of 1K blocks used on a VDO volume". This is more than just the blocks used by the data itself; it also includes VDO's metadata. If you run "vdostats --verbose vdo_sda", you'll see a lot more statistics for the VDO volume. Among those statistics will be "data blocks used" ("the number of physical blocks currently in use by a VDO volume to store data") and "overhead blocks used" ("the number of physical blocks currently in use by a VDO volume to store VDO metadata"). The "Used" statistic in "vdostats" corresponds to "(data blocks used + overhead blocks used) * 4096 / 1024". (In "vdostats --verbose", this is also shown in "1K-blocks used".) In "vdostats --verbose", there's another statistic called "logical blocks used"; this tracks the number of logical blocks currently mapped. Also, don't forget to run "mount -o discard", in order to ensure that the OSD will perform discards on the device, which will reclaim space. If you have a chance, can you run the command "vdostats --verbose vdo_sda | grep blocks" to show the block statistics for your VDO volume? Ah, the 'mount -o discard' may need to be added to ceph-volume then since that was the tool used to create the OSD. Here's the output requested. [root@reesi004 ~]# vdostats --verbose vdo_sda | grep blocks data blocks used : 113125669 overhead blocks used : 1879291 logical blocks used : 133219360 physical blocks : 976754646 logical blocks : 973838646 1K-blocks : 3907018584 1K-blocks used : 460019840 1K-blocks available : 3446998744 compressed blocks written : 9318395 journal blocks batching : 0 journal blocks started : 2277538 journal blocks writing : 0 journal blocks written : 2277538 journal blocks committed : 2277538 slab journal blocks written : 105095 slab summary blocks written : 104693 reference blocks written : 38655 Note that more data was written since I opened the bug. # ceph osd df | grep 'ID\|^55' ID CLASS WEIGHT REWEIGHT SIZE USE AVAIL %USE VAR PGS 55 hdd 3.65810 1.00000 3745G 542G 3203G 14.48 0.58 0 # vdostats --hu Device Size Used Available Use% Space saving% /dev/mapper/vdo_sda 3.6T 438.7G 3.2T 11% 15% I also opened a bug for Ceph to make it aware of VDO OSDs: https://tracker.ceph.com/issues/23554 Bug filed for ceph-volume: https://tracker.ceph.com/issues/23581 So I see that the "logical blocks used" statistic is 133219360 (this is the number of 4096-byte blocks that are used in the logical space of the VDO volume). If I compare that to the "data blocks used" statistic of 113125669, and calculate the savings percentage: (133219360 - 113125669) / 133219360.0 ...the result is "0.1508316...", which corresponds to the "Space saving%" value of "15%". One question that I have for the "ceph osd df" command: does the "USE" statistic add the used space for the "db" device (in this OSD, /dev/journals/sda), and the block device (/dev/vg_sda/lv_sda)? The VDO statistic "logical blocks used" tracks the number of logical blocks currently mapped. A block will be "mapped" until it is discarded, which could occur manually on a filesystem via "fstrim", or by the filesystem driver for a filesystem mounted via "mount -o discard". Another question (which I think we should ask Sage): is there a way to configure Bluestore to issue discards to a VDO volume, in order to reclaim space from blocks on the VDO volume that are no longer used? I did see a pull request in Luminous for adding discard support to bluestore (https://github.com/ceph/ceph/pull/14727), but it was in the context of discarding blocks on solid state drives (and it is disabled by default). Discarding will also be important for a VDO volume, since it's a thin-provisioned device. Now that I think of it, you would be able to see on your test system if any discards were sent to the VDO volume. If you run "vdostats --verbose", there should be a statistic called "bios in discard"; does it have a number that is greater than zero? (In reply to Bryan Gurney from comment #6) > Another question (which I think we should ask Sage): is there a way to > configure Bluestore to issue discards to a VDO volume, in order to reclaim > space from blocks on the VDO volume that are no longer used? > > I did see a pull request in Luminous for adding discard support to bluestore > (https://github.com/ceph/ceph/pull/14727), but it was in the context of > discarding blocks on solid state drives (and it is disabled by default). > Discarding will also be important for a VDO volume, since it's a > thin-provisioned device. I filed a ticket to have the volume/disk preparation tool (ceph-volume) mount VDO devices using '-o discard' https://tracker.ceph.com/issues/23581 (In reply to Bryan Gurney from comment #7) > Now that I think of it, you would be able to see on your test system if any > discards were sent to the VDO volume. If you run "vdostats --verbose", > there should be a statistic called "bios in discard"; does it have a number > that is greater than zero? [root@reesi004 ~]# vdostats --verbose /dev/mapper/vdo_sda : version : 26 release version : 131337 data blocks used : 113125669 overhead blocks used : 1879291 logical blocks used : 133219360 physical blocks : 976754646 logical blocks : 973838646 1K-blocks : 3907018584 1K-blocks used : 460019840 1K-blocks available : 3446998744 used percent : 11 saving percent : 15 block map cache size : 134217728 write policy : async block size : 4096 completed recovery count : 0 read-only recovery count : 0 operating mode : normal recovery progress (%) : N/A compressed fragments written : 28111173 compressed blocks written : 9318395 compressed fragments in packer : 0 slab count : 1861 slabs opened : 272 slabs reopened : 0 journal disk full count : 0 journal commits requested count : 2645565 journal entries batching : 0 journal entries started : 266606857 journal entries writing : 0 journal entries written : 266606857 journal entries committed : 266606857 journal blocks batching : 0 journal blocks started : 2277538 journal blocks writing : 0 journal blocks written : 2277538 journal blocks committed : 2277538 slab journal disk full count : 0 slab journal flush count : 12761 slab journal blocked count : 0 slab journal blocks written : 105095 slab journal tail busy count : 0 slab summary blocks written : 104693 reference blocks written : 38655 block map dirty pages : 3160 block map clean pages : 29608 block map free pages : 0 block map failed pages : 0 block map incoming pages : 0 block map outgoing pages : 0 block map cache pressure : 0 block map read count : 133244416 block map write count : 133220519 block map failed reads : 0 block map failed writes : 0 block map reclaimed : 0 block map read outgoing : 0 block map found in cache : 203948315 block map discard required : 132653 block map wait for page : 62351199 block map fetch required : 32768 block map pages loaded : 165421 block map pages saved : 162272 block map flush count : 161721 invalid advice PBN count : 0 no space error count : 0 read only error count : 0 instance : 1 512 byte emulation : off current VDO IO requests in progress : 0 maximum VDO IO requests in progress : 2000 dedupe advice valid : 762443 dedupe advice stale : 0 dedupe advice timeouts : 0 flush out : 166229 write amplification ratio : 0.87 bios in read : 24592 bios in write : 133386744 bios in discard : 0 bios in flush : 166225 bios in fua : 4 bios in partial read : 0 bios in partial write : 0 bios in partial discard : 0 bios in partial flush : 0 bios in partial fua : 0 bios out read : 767487 bios out write : 103807477 bios out discard : 0 bios out flush : 0 bios out fua : 0 bios meta read : 165482 bios meta write : 12504681 bios meta discard : 0 bios meta flush : 640090 bios meta fua : 1 bios journal read : 0 bios journal write : 2443939 bios journal discard : 0 bios journal flush : 166401 bios journal fua : 0 bios page cache read : 165421 bios page cache write : 486230 bios page cache discard : 0 bios page cache flush : 323958 bios page cache fua : 0 bios out completed read : 767487 bios out completed write : 103807477 bios out completed discard : 0 bios out completed flush : 0 bios out completed fua : 0 bios meta completed read : 165482 bios meta completed write : 12173036 bios meta completed discard : 0 bios meta completed flush : 308445 bios meta completed fua : 1 bios journal completed read : 0 bios journal completed write : 2277538 bios journal completed discard : 0 bios journal completed flush : 0 bios journal completed fua : 0 bios page cache completed read : 165421 bios page cache completed write : 324509 bios page cache completed discard : 0 bios page cache completed flush : 162237 bios page cache completed fua : 0 bios acknowledged read : 24592 bios acknowledged write : 133386744 bios acknowledged discard : 0 bios acknowledged flush : 166225 bios acknowledged fua : 4 bios acknowledged partial read : 3546 bios acknowledged partial write : 0 bios acknowledged partial discard : 0 bios acknowledged partial flush : 0 bios acknowledged partial fua : 0 bios in progress read : 0 bios in progress write : 0 bios in progress discard : 0 bios in progress flush : 0 bios in progress fua : 0 read cache accesses : 0 read cache hits : 0 read cache data hits : 0 KVDO module bytes used : 1468797760 KVDO module peak bytes used : 1468800640 KVDO module bios used : 37286 KVDO module peak bio count : 37574 entries indexed : 65180178 posts found : 761330 posts not found : 131918538 queries found : 0 queries not found : 0 updates found : 28111255 updates not found : 0 current dedupe queries : 0 maximum dedupe queries : 1037 Hi David, I can see from the stats that "bios in discard" is 0, confirming that the VDO volume has not received any discards. Have you tried a test with the "mount -o discard" option? Hi I observe the same behavior with Thin-LVM on top of vdo I have mounted the filesystem with discard /dev/mapper/VolGroupData-lvdata /data xfs rw,relatime,attr2,discard,inode64,sunit=1024,swidth=2048,noquota 0 0 vdostats Device 1K-blocks Used Available Use% Space saving% /dev/mapper/vdo1 314570752 46212788 268357964 14% 99% When I write file to it the occopied space grows vdostats --hu Device Size Used Available Use% Space saving% /dev/mapper/vdo1 300.0G 45.3G 254.7G 15% 78% ... vdostats --hu Device Size Used Available Use% Space saving% /dev/mapper/vdo1 300.0G 47.9G 252.1G 15% 64% When I delete the data it gets not discared. vdostat shows the same output and vdostats --verbose | grep discard block map discard required : 0 bios in discard : 0 bios in partial discard : 0 bios out discard : 0 bios meta discard : 0 bios journal discard : 0 bios page cache discard : 0 bios out completed discard : 0 bios meta completed discard : 0 bios journal completed discard : 0 bios page cache completed discard : 0 bios acknowledged discard : 0 bios acknowledged partial discard : 0 bios in progress discard : 0 even after an fstrim -v /data /data: 1,8 TiB (1931789565952 bytes) trimmed nothing changes The LV's are created with discards=passdown lvs -o lv_name,pool_lv,discards,data_percent LV Pool Discards Data% lvhomelocal lvroot lvvar lvdata pool passdown 0,05 pool passdown 0,05 pool_meta0 If I run blkdiscard /dev/mapper/VolGroupData-pool (which takes long time) the discarded number grows vdostats --verbose | grep discard block map discard required : 0 bios in discard : 22750953 bios in partial discard : 0 bios out discard : 0 bios meta discard : 0 bios journal discard : 0 bios page cache discard : 0 bios out completed discard : 0 bios meta completed discard : 0 bios journal completed discard : 0 bios page cache completed discard : 0 bios acknowledged discard : 22749493 bios acknowledged partial discard : 0 bios in progress discard : 1460 Why does the discard not run automatically when deleting files and why does the space savings stay at 36% after the manual discard Regards Hansjörg this may be related to https://bugzilla.redhat.com/show_bug.cgi?id=1600156 kernel: device-mapper: thin: Data device (dm-5) max discard sectors smaller than a block: Disabling discard passdown. The problem, that thin-lvm discards are nor passed down to kvdo can be fixed as follows [root@rmc-cs57 ~]# lvs -o name,chunksize LV Chunk lvhomelocal 0 lvroot 0 lvvar 0 lvdata 0 pool 1,00m pool_meta0 0 -> Chunksize is 1,00m = 1024k = 2048 * 512 Byte An echo 2048 > /sys/kvdo/max_discard_sectors and both sizes fit This has to be done - after kvdo module is loaded - before the vdo device is startet and the LVM is assembled After that, the max discard sectors smaller than a block: Disabling discard passdown. disappears and removing files in LV frees space in the vdo BUT: This is not stable on our case under heavy load. If I copy about half a million of files to the LVM (XFS on top of it) an remove it I get errors like Nov 14 16:03:22 rmc-cs57 kernel: INFO: task xfsaild/dm-8:1175 blocked for more than 120 seconds. Nov 14 16:03:22 rmc-cs57 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Nov 14 16:03:22 rmc-cs57 kernel: xfsaild/dm-8 D ffff8ad5d484eeb0 0 1175 2 0x00000000 Nov 14 16:03:22 rmc-cs57 kernel: Call Trace: Nov 14 16:03:22 rmc-cs57 kernel: [<ffffffff8d718f39>] schedule+0x29/0x70 Nov 14 16:03:22 rmc-cs57 kernel: [<ffffffff8d7168a9>] schedule_timeout+0x239/0x2c0 Nov 14 16:03:22 rmc-cs57 kernel: [<ffffffff8d060603>] ? x2apic_send_IPI_mask+0x13/0x20 Nov 14 16:03:22 rmc-cs57 kernel: [<ffffffff8d0d1d8c>] ? try_to_wake_up+0x18c/0x350 Nov 14 16:03:22 rmc-cs57 kernel: [<ffffffff8d7192ed>] wait_for_completion+0xfd/0x140 Nov 14 16:03:22 rmc-cs57 kernel: [<ffffffff8d0d2010>] ? wake_up_state+0x20/0x20 Nov 14 16:03:22 rmc-cs57 kernel: [<ffffffff8d0b68bd>] flush_work+0xfd/0x190 Nov 14 16:03:22 rmc-cs57 kernel: [<ffffffff8d0b36b0>] ? move_linked_works+0x90/0x90 Nov 14 16:03:22 rmc-cs57 kernel: [<ffffffffc05ced7a>] xlog_cil_force_lsn+0x8a/0x210 [xfs] Nov 14 16:03:22 rmc-cs57 kernel: [<ffffffff8d0a697e>] ? try_to_del_timer_sync+0x5e/0x90 Nov 14 16:03:22 rmc-cs57 kernel: [<ffffffffc05ccce5>] _xfs_log_force+0x85/0x2c0 [xfs] Nov 14 16:03:22 rmc-cs57 kernel: [<ffffffff8d0a6220>] ? internal_add_timer+0x70/0x70 Nov 14 16:03:22 rmc-cs57 kernel: [<ffffffffc05d8fdc>] ? xfsaild+0x16c/0x6f0 [xfs] Nov 14 16:03:22 rmc-cs57 kernel: [<ffffffffc05ccf4c>] xfs_log_force+0x2c/0x70 [xfs] Nov 14 16:03:22 rmc-cs57 kernel: [<ffffffffc05d8e70>] ? xfs_trans_ail_cursor_first+0x90/0x90 [xfs] Nov 14 16:03:22 rmc-cs57 kernel: [<ffffffffc05d8fdc>] xfsaild+0x16c/0x6f0 [xfs] Nov 14 16:03:22 rmc-cs57 kernel: [<ffffffffc05d8e70>] ? xfs_trans_ail_cursor_first+0x90/0x90 [xfs] Nov 14 16:03:22 rmc-cs57 kernel: [<ffffffff8d0bdf21>] kthread+0xd1/0xe0 Nov 14 16:03:22 rmc-cs57 kernel: [<ffffffff8d0bde50>] ? insert_kthread_work+0x40/0x40 Nov 14 16:03:22 rmc-cs57 kernel: [<ffffffff8d7255dd>] ret_from_fork_nospec_begin+0x7/0x21 Nov 14 16:03:22 rmc-cs57 kernel: [<ffffffff8d0bde50>] ? insert_kthread_work+0x40/0x40 and the discard does not free the vdo completely Even if I place the xfs directly on top of vdo and mount it with discard mount option (without thin lvm and with the default /sys/kvdo/max_discard_sectors of 8 I get xfs errors when removing 100.000 of files Nov 15 11:44:33 rmc-cs57 kernel: INFO: task rm:30539 blocked for more than 120 seconds. Nov 15 11:44:33 rmc-cs57 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Nov 15 11:44:33 rmc-cs57 kernel: rm D ffff97fc672f8fd0 0 30539 7743 0x00000080 Nov 15 11:44:33 rmc-cs57 kernel: Call Trace: Nov 15 11:44:33 rmc-cs57 kernel: [<ffffffffb2d18f39>] schedule+0x29/0x70 Nov 15 11:44:33 rmc-cs57 kernel: [<ffffffffb2d168a9>] schedule_timeout+0x239/0x2c0 Nov 15 11:44:33 rmc-cs57 kernel: [<ffffffffb26cea5a>] ? check_preempt_curr+0x8a/0xa0 Nov 15 11:44:33 rmc-cs57 kernel: [<ffffffffb26cea89>] ? ttwu_do_wakeup+0x19/0xe0 Nov 15 11:44:33 rmc-cs57 kernel: [<ffffffffb26d1d8c>] ? try_to_wake_up+0x18c/0x350 Nov 15 11:44:33 rmc-cs57 kernel: [<ffffffffb2d192ed>] wait_for_completion+0xfd/0x140 Nov 15 11:44:33 rmc-cs57 kernel: [<ffffffffb26d2010>] ? wake_up_state+0x20/0x20 Nov 15 11:44:33 rmc-cs57 kernel: [<ffffffffb26b68bd>] flush_work+0xfd/0x190 Nov 15 11:44:33 rmc-cs57 kernel: [<ffffffffb26b36b0>] ? move_linked_works+0x90/0x90 Nov 15 11:44:33 rmc-cs57 kernel: [<ffffffffc0256d7a>] xlog_cil_force_lsn+0x8a/0x210 [xfs] Nov 15 11:44:33 rmc-cs57 kernel: [<ffffffffb26d7979>] ? select_task_rq_fair+0x549/0x700 Nov 15 11:44:33 rmc-cs57 kernel: [<ffffffffc0255110>] _xfs_log_force_lsn+0x80/0x340 [xfs] Nov 15 11:44:33 rmc-cs57 kernel: [<ffffffffb2955f34>] ? __radix_tree_lookup+0x84/0xf0 Nov 15 11:44:33 rmc-cs57 kernel: [<ffffffffc024303c>] ? __xfs_iunpin_wait+0x9c/0x150 [xfs] Nov 15 11:44:33 rmc-cs57 kernel: [<ffffffffc0255404>] xfs_log_force_lsn+0x34/0x70 [xfs] Nov 15 11:44:33 rmc-cs57 kernel: [<ffffffffc02462a9>] ? xfs_iunpin_wait+0x19/0x20 [xfs] If I mount the xfs without discard option, the 'rm' finishes without errors and if I do an fstrim afterwards, the vdo gets cleaned The 'rm' with mount discard option takes about twice the time the an "rm" without mount discard option AND a subsequent fstrim There seems to bee a performance bottleneck with the interaction of xfs discard mount option and vdo Regards Hansjörg Mass migration to Filip. (In reply to Bryan Gurney from comment #11) > Hi David, > > I can see from the stats that "bios in discard" is 0, confirming that the > VDO volume has not received any discards. > > Have you tried a test with the "mount -o discard" option? The discard mount option was added to the tool that creates OSDs for Ceph so as far as I know, this is resolved from my perspective. https://tracker.ceph.com/issues/23581 |