Bug 1490517 - Request to speed up thin_check on large volumes
Summary: Request to speed up thin_check on large volumes
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: device-mapper-persistent-data
Version: 7.4
Hardware: All
OS: Linux
medium
medium
Target Milestone: rc
: ---
Assignee: Joe Thornber
QA Contact: Lin Li
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-09-11 20:43 UTC by John Pittman
Modified: 2021-12-10 15:15 UTC (History)
17 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-11-11 21:41:37 UTC
Target Upstream Version:


Attachments (Terms of Use)

Description John Pittman 2017-09-11 20:43:09 UTC
Description of problem:

RFE to request that thin_check be sped up or allowed to run after activation (if possible).  In the most recent case, it takes about an hour on 30T volume.

* Is it possible to speed it up to the point that it will not have an impact on boot?  We can use 'global { thin_check_options = ["-q", "--clear-needs-check-flag", "--skip-mappings"] }', to work around the slowness.  

* Can it run async to activation?  Like mdadm resync?

* If the workaround is used, is the expectation to manually check later with no workaround?
 
Version-Release number of selected component (if applicable):

kernel-3.10.0-693.1.1.el7.x86_64                                                                                                                                          
lvm2-2.02.171-8.el7.x86_64                                                                                                                                                
device-mapper-1.02.140-8.el7.x86_64 

Expected results:

Fast(er) thin_check or async to boot somehow

Additional info:

If you need any info at all please let me know we'll be glad to grab it.

Comment 2 Mike Snitzer 2017-09-12 13:56:03 UTC
Reassigning component to device-mapper-persistent-data (and to Joe).

Please provide the version of the device-mapper-persistent-data package (but I assume whatever ships with 7.4)

Comment 3 Joe Thornber 2017-09-12 14:03:53 UTC
I need to know what version of thin_check you're running:

    thin_check --version

Comment 5 Zdenek Kabelac 2017-09-12 14:49:43 UTC
A few words from lvm2 side:

lvm2  waits for successful result from a command completion to avoid 'activation' of thin-pool with bad data.

There is not yet any lvm2 code doing 'parallel' thin-check execution
(Would need some sort of lvm2 support for metadata snapshot usage - which might theoretically cause more serious damage of metadata in case there would be some consistency problems)

lvm2 supports configurable options to be passed to the command (or eventually whole thin_check command can be skipped completely)  - decision is typically on a user ATM to decide what's the level of trust put into kernel - since lvm2 runs with wide range of kernels.

Recent edition of  dmpd (>=0.7)  dramatically speeds-up thin_check time by using faster crc32 algorithm (though thin_restore is still relatively very slow tool even on fast CPU if your metadasize is approaching ~16GB)

For larger volumes I'd probably recommend adding  --skip-mappings into lvm.conf.

We are already thinking of making this option a default for current thin-pools as the thin-pool kernel metadata format matured and as very good at consistency protection.

Comment 6 John Pittman 2017-09-13 13:16:46 UTC
Joe,

Here are the versions.

device-mapper-event-1.02.140-8.el7.x86_64
device-mapper-multipath-libs-0.4.9-111.el7.x86_64
device-mapper-1.02.140-8.el7.x86_64
device-mapper-persistent-data-0.7.0-0.1.rc6.el7.x86_64
device-mapper-event-libs-1.02.140-8.el7.x86_64
device-mapper-multipath-0.4.9-111.el7.x86_64
device-mapper-libs-1.02.140-8.el7.x86_64

# thin_check --version
0.7.0-0.1.rc6.el7

Comment 7 Joe Thornber 2017-09-15 11:00:39 UTC
Are you storing the metadata on a spindle or ssd?

Comment 9 John Pittman 2017-09-27 20:07:07 UTC
Joe, its stored on the spindle disks.

  [pool1_tdata_rmeta_0]                   vg1 ewi-aor---   4.00m                                                                /dev/sda1(950649)
  [pool1_tdata_rmeta_1]                   vg1 ewi-aor---   4.00m                                                                /dev/sdb1(950649)
  [pool1_tdata_rmeta_2]                   vg1 ewi-aor---   4.00m                                                                /dev/sdd1(950649)
  [pool1_tdata_rmeta_3]                   vg1 ewi-aor---   4.00m                                                                /dev/sdf1(950649)
  [pool1_tdata_rmeta_4]                   vg1 ewi-aor---   4.00m                                                                /dev/sdh1(950649)
  [pool1_tdata_rmeta_5]                   vg1 ewi-aor---   4.00m                                                                /dev/sdj1(2561)
  [pool1_tdata_rmeta_6]                   vg1 ewi-aor---   4.00m                                                                /dev/sdm1(950649)
  [pool1_tdata_rmeta_7]                   vg1 ewi-aor---   4.00m                                                                /dev/sdn1(950649)
  [pool1_tmeta]                           vg1 ewi-aor---  10.00g                                       100.00                   pool1_tmeta_rimage_0(0),pool1_tmeta_rimage_1(0),pool1_tmeta_rimage_2(0),pool1_tmeta_rimage_3(0),pool1_tmeta_rimage_4(0),pool1_tmeta_rimage_5(0),pool1_tmeta_rimage_6(0),pool1_tmeta_rimage_7(0)
  [pool1_tmeta_rmeta_0]                   vg1 ewi-aor---   4.00m                                                                /dev/sda1(2560)
  [pool1_tmeta_rmeta_1]                   vg1 ewi-aor---   4.00m                                                                /dev/sdb1(2560)
  [pool1_tmeta_rmeta_2]                   vg1 ewi-aor---   4.00m                                                                /dev/sdd1(2560)
  [pool1_tmeta_rmeta_3]                   vg1 ewi-aor---   4.00m                                                                /dev/sdf1(2560)
  [pool1_tmeta_rmeta_4]                   vg1 ewi-aor---   4.00m                                                                /dev/sdh1(2560)
  [pool1_tmeta_rmeta_5]                   vg1 ewi-aor---   4.00m                                                                /dev/sdj1(0)
  [pool1_tmeta_rmeta_6]                   vg1 ewi-aor---   4.00m                                                                /dev/sdm1(2560)
  [pool1_tmeta_rmeta_7]                   vg1 ewi-aor---   4.00m                                                                /dev/sdn1(2560)                                         

[    3.095185] scsi 0:0:1:0: Direct-Access     HGST     HUH728080AL5200  A7J0 PQ: 0 ANSI: 6   <=== /dev/sda
[    3.120115] scsi 0:0:3:0: Direct-Access     HGST     HUH728080AL5200  A7J0 PQ: 0 ANSI: 6   <=== /dev/sdb
[    3.232810] scsi 0:0:4:0: Direct-Access     ATA      WDC WD60EFRX-68L 0A82 PQ: 0 ANSI: 6   <=== /dev/sdc
[    3.370059] scsi 0:0:5:0: Direct-Access     HGST     HUH728080AL5200  A7J0 PQ: 0 ANSI: 6   <=== /dev/sdd
[    3.401832] scsi 0:0:6:0: Direct-Access     ATA      WDC WD60EFRX-68L 0A82 PQ: 0 ANSI: 6   <=== /dev/sde
[    3.425039] scsi 0:0:7:0: Direct-Access     HGST     HUH728080AL5200  A7J0 PQ: 0 ANSI: 6   <=== /dev/sdf
[    3.441778] scsi 0:0:8:0: Direct-Access     ATA      Samsung SSD 850  2B6Q PQ: 0 ANSI: 6   <=== /dev/sdg
[    3.454998] scsi 0:0:9:0: Direct-Access     HGST     HUH728080AL5200  A7J0 PQ: 0 ANSI: 6   <=== /dev/sdh
[    3.472646] scsi 0:0:10:0: Direct-Access     ATA      Samsung SSD 850  2B6Q PQ: 0 ANSI: 6  <=== /dev/sdi
[    3.494036] scsi 0:0:11:0: Direct-Access     HGST     HUH728080AL5200  A7J0 PQ: 0 ANSI: 6  <=== /dev/sdj
[    3.509801] scsi 0:0:12:0: Direct-Access     ATA      WDC WD60EFRX-68L 0A82 PQ: 0 ANSI: 6  <=== /dev/sdk
[    3.529512] scsi 0:0:13:0: Direct-Access     ATA      WDC WD6001FFWX-6 0A81 PQ: 0 ANSI: 6  <=== /dev/sdl
[    3.552152] scsi 0:0:15:0: Direct-Access     HGST     HUH728080AL5200  A7J0 PQ: 0 ANSI: 6  <=== /dev/sdm
[    3.595170] scsi 0:0:17:0: Direct-Access     HGST     HUH728080AL5200  A7J0 PQ: 0 ANSI: 6  <=== /dev/sdn
[    3.633032] scsi 0:2:0:0: Direct-Access     DELL     PERC H730 Adp    4.24 PQ: 0 ANSI: 5   
[    3.662651] scsi 0:2:1:0: Direct-Access     DELL     PERC H730 Adp    4.24 PQ: 0 ANSI: 5

Comment 17 Zdenek Kabelac 2019-06-21 21:03:41 UTC
Can we obtain 'metadata' that are perceived as slow for 'checking' ?

Especially when added with   '--skip-mappings' - which we do not normally advice - as the 'security' obtained by running the full scan typically by far outweights rather 'small' speedup compared to over running time.

However if the goal is to quickly activate and deactivate large thin-pools -   using '--skip-mappings' might be required to achieve wanted performance.


So can we also obtain some 'performance' numbers seen by customer -  and storage layout ?
(we normally do expect  metadata devices to be located on SSD - thin_check running on rotational storage might be futher impacted on performance (but thin-pool even way more)).

And also which exact version of  device-mapper-persistent-data were in-use.

Comment 18 Jonathan Earl Brassow 2019-06-27 21:55:59 UTC
Firstly, we need to know the following from the customer:
1) what is their configuration
2) what version of the software are they running
3) can they provide us with a copy of the metadata if they are still experiencing slow check times

We have made some improvements in recent rhel releases that we feel should speed things up.  If not, we'd like to know why.  Having the metadata and being able to run the tools ourselves against that set would help us investigate.

At this stage in rhel7, if it is not minimal change (or an urgently necessary one), we would likely not pursue the problem in rhel7.  There are still options though.  You could move your metadata onto fast storage, or provide us the metadata in the event it is truly something anomalous.

Comment 21 Jakub Krysl 2019-10-02 11:41:03 UTC
Mass migration to lilin.

Comment 27 Chris Williams 2020-11-11 21:41:37 UTC
Red Hat Enterprise Linux 7 shipped it's final minor release on September 29th, 2020. 7.9 was the last minor releases scheduled for RHEL 7.
From intial triage it does not appear the remaining Bugzillas meet the inclusion criteria for Maintenance Phase 2 and will now be closed. 

From the RHEL life cycle page:
https://access.redhat.com/support/policy/updates/errata#Maintenance_Support_2_Phase
"During Maintenance Support 2 Phase for Red Hat Enterprise Linux version 7,Red Hat defined Critical and Important impact Security Advisories (RHSAs) and selected (at Red Hat discretion) Urgent Priority Bug Fix Advisories (RHBAs) may be released as they become available."

If this BZ was closed in error and meets the above criteria please re-open it flag for 7.9.z, provide suitable business and technical justifications, and follow the process for Accelerated Fixes:
https://source.redhat.com/groups/public/pnt-cxno/pnt_customer_experience_and_operations_wiki/support_delivery_accelerated_fix_release_handbook  

Feature Requests can re-opened and moved to RHEL 8 if the desired functionality is not already present in the product. 

Please reach out to the applicable Product Experience Engineer[0] if you have any questions or concerns.  

[0] https://bugzilla.redhat.com/page.cgi?id=agile_component_mapping.html&product=Red+Hat+Enterprise+Linux+7


Note You need to log in before you can comment on or make changes to this bug.