1490517 – Request to speed up thin_check on large volumes

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1490517 - Request to speed up thin_check on large volumes

Summary: Request to speed up thin_check on large volumes

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	Red Hat Enterprise Linux 7
Classification:	Red Hat
Component:	device-mapper-persistent-data
Sub Component:
Version:	7.4
Hardware:	All
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	rc
Target Release:	---
Assignee:	Joe Thornber
QA Contact:	Lin Li
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2017-09-11 20:43 UTC by John Pittman
Modified:	2023-12-15 15:58 UTC (History)
CC List:	17 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2020-11-11 21:41:37 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description John Pittman 2017-09-11 20:43:09 UTC

Description of problem:

RFE to request that thin_check be sped up or allowed to run after activation (if possible).  In the most recent case, it takes about an hour on 30T volume.

* Is it possible to speed it up to the point that it will not have an impact on boot?  We can use 'global { thin_check_options = ["-q", "--clear-needs-check-flag", "--skip-mappings"] }', to work around the slowness.  

* Can it run async to activation?  Like mdadm resync?

* If the workaround is used, is the expectation to manually check later with no workaround?
 
Version-Release number of selected component (if applicable):

kernel-3.10.0-693.1.1.el7.x86_64                                                                                                                                          
lvm2-2.02.171-8.el7.x86_64                                                                                                                                                
device-mapper-1.02.140-8.el7.x86_64 

Expected results:

Fast(er) thin_check or async to boot somehow

Additional info:

If you need any info at all please let me know we'll be glad to grab it.

Comment 2 Mike Snitzer 2017-09-12 13:56:03 UTC

Reassigning component to device-mapper-persistent-data (and to Joe).

Please provide the version of the device-mapper-persistent-data package (but I assume whatever ships with 7.4)

Comment 3 Joe Thornber 2017-09-12 14:03:53 UTC

I need to know what version of thin_check you're running:

    thin_check --version

Comment 5 Zdenek Kabelac 2017-09-12 14:49:43 UTC

A few words from lvm2 side:

lvm2  waits for successful result from a command completion to avoid 'activation' of thin-pool with bad data.

There is not yet any lvm2 code doing 'parallel' thin-check execution
(Would need some sort of lvm2 support for metadata snapshot usage - which might theoretically cause more serious damage of metadata in case there would be some consistency problems)

lvm2 supports configurable options to be passed to the command (or eventually whole thin_check command can be skipped completely)  - decision is typically on a user ATM to decide what's the level of trust put into kernel - since lvm2 runs with wide range of kernels.

Recent edition of  dmpd (>=0.7)  dramatically speeds-up thin_check time by using faster crc32 algorithm (though thin_restore is still relatively very slow tool even on fast CPU if your metadasize is approaching ~16GB)

For larger volumes I'd probably recommend adding  --skip-mappings into lvm.conf.

We are already thinking of making this option a default for current thin-pools as the thin-pool kernel metadata format matured and as very good at consistency protection.

Comment 6 John Pittman 2017-09-13 13:16:46 UTC

Joe,

Here are the versions.

device-mapper-event-1.02.140-8.el7.x86_64
device-mapper-multipath-libs-0.4.9-111.el7.x86_64
device-mapper-1.02.140-8.el7.x86_64
device-mapper-persistent-data-0.7.0-0.1.rc6.el7.x86_64
device-mapper-event-libs-1.02.140-8.el7.x86_64
device-mapper-multipath-0.4.9-111.el7.x86_64
device-mapper-libs-1.02.140-8.el7.x86_64

# thin_check --version
0.7.0-0.1.rc6.el7

Comment 7 Joe Thornber 2017-09-15 11:00:39 UTC

Are you storing the metadata on a spindle or ssd?

Comment 9 John Pittman 2017-09-27 20:07:07 UTC

Joe, its stored on the spindle disks.

  [pool1_tdata_rmeta_0]                   vg1 ewi-aor---   4.00m                                                                /dev/sda1(950649)
  [pool1_tdata_rmeta_1]                   vg1 ewi-aor---   4.00m                                                                /dev/sdb1(950649)
  [pool1_tdata_rmeta_2]                   vg1 ewi-aor---   4.00m                                                                /dev/sdd1(950649)
  [pool1_tdata_rmeta_3]                   vg1 ewi-aor---   4.00m                                                                /dev/sdf1(950649)
  [pool1_tdata_rmeta_4]                   vg1 ewi-aor---   4.00m                                                                /dev/sdh1(950649)
  [pool1_tdata_rmeta_5]                   vg1 ewi-aor---   4.00m                                                                /dev/sdj1(2561)
  [pool1_tdata_rmeta_6]                   vg1 ewi-aor---   4.00m                                                                /dev/sdm1(950649)
  [pool1_tdata_rmeta_7]                   vg1 ewi-aor---   4.00m                                                                /dev/sdn1(950649)
  [pool1_tmeta]                           vg1 ewi-aor---  10.00g                                       100.00                   pool1_tmeta_rimage_0(0),pool1_tmeta_rimage_1(0),pool1_tmeta_rimage_2(0),pool1_tmeta_rimage_3(0),pool1_tmeta_rimage_4(0),pool1_tmeta_rimage_5(0),pool1_tmeta_rimage_6(0),pool1_tmeta_rimage_7(0)
  [pool1_tmeta_rmeta_0]                   vg1 ewi-aor---   4.00m                                                                /dev/sda1(2560)
  [pool1_tmeta_rmeta_1]                   vg1 ewi-aor---   4.00m                                                                /dev/sdb1(2560)
  [pool1_tmeta_rmeta_2]                   vg1 ewi-aor---   4.00m                                                                /dev/sdd1(2560)
  [pool1_tmeta_rmeta_3]                   vg1 ewi-aor---   4.00m                                                                /dev/sdf1(2560)
  [pool1_tmeta_rmeta_4]                   vg1 ewi-aor---   4.00m                                                                /dev/sdh1(2560)
  [pool1_tmeta_rmeta_5]                   vg1 ewi-aor---   4.00m                                                                /dev/sdj1(0)
  [pool1_tmeta_rmeta_6]                   vg1 ewi-aor---   4.00m                                                                /dev/sdm1(2560)
  [pool1_tmeta_rmeta_7]                   vg1 ewi-aor---   4.00m                                                                /dev/sdn1(2560)                                         

[    3.095185] scsi 0:0:1:0: Direct-Access     HGST     HUH728080AL5200  A7J0 PQ: 0 ANSI: 6   <=== /dev/sda
[    3.120115] scsi 0:0:3:0: Direct-Access     HGST     HUH728080AL5200  A7J0 PQ: 0 ANSI: 6   <=== /dev/sdb
[    3.232810] scsi 0:0:4:0: Direct-Access     ATA      WDC WD60EFRX-68L 0A82 PQ: 0 ANSI: 6   <=== /dev/sdc
[    3.370059] scsi 0:0:5:0: Direct-Access     HGST     HUH728080AL5200  A7J0 PQ: 0 ANSI: 6   <=== /dev/sdd
[    3.401832] scsi 0:0:6:0: Direct-Access     ATA      WDC WD60EFRX-68L 0A82 PQ: 0 ANSI: 6   <=== /dev/sde
[    3.425039] scsi 0:0:7:0: Direct-Access     HGST     HUH728080AL5200  A7J0 PQ: 0 ANSI: 6   <=== /dev/sdf
[    3.441778] scsi 0:0:8:0: Direct-Access     ATA      Samsung SSD 850  2B6Q PQ: 0 ANSI: 6   <=== /dev/sdg
[    3.454998] scsi 0:0:9:0: Direct-Access     HGST     HUH728080AL5200  A7J0 PQ: 0 ANSI: 6   <=== /dev/sdh
[    3.472646] scsi 0:0:10:0: Direct-Access     ATA      Samsung SSD 850  2B6Q PQ: 0 ANSI: 6  <=== /dev/sdi
[    3.494036] scsi 0:0:11:0: Direct-Access     HGST     HUH728080AL5200  A7J0 PQ: 0 ANSI: 6  <=== /dev/sdj
[    3.509801] scsi 0:0:12:0: Direct-Access     ATA      WDC WD60EFRX-68L 0A82 PQ: 0 ANSI: 6  <=== /dev/sdk
[    3.529512] scsi 0:0:13:0: Direct-Access     ATA      WDC WD6001FFWX-6 0A81 PQ: 0 ANSI: 6  <=== /dev/sdl
[    3.552152] scsi 0:0:15:0: Direct-Access     HGST     HUH728080AL5200  A7J0 PQ: 0 ANSI: 6  <=== /dev/sdm
[    3.595170] scsi 0:0:17:0: Direct-Access     HGST     HUH728080AL5200  A7J0 PQ: 0 ANSI: 6  <=== /dev/sdn
[    3.633032] scsi 0:2:0:0: Direct-Access     DELL     PERC H730 Adp    4.24 PQ: 0 ANSI: 5   
[    3.662651] scsi 0:2:1:0: Direct-Access     DELL     PERC H730 Adp    4.24 PQ: 0 ANSI: 5

Comment 17 Zdenek Kabelac 2019-06-21 21:03:41 UTC

Can we obtain 'metadata' that are perceived as slow for 'checking' ?

Especially when added with   '--skip-mappings' - which we do not normally advice - as the 'security' obtained by running the full scan typically by far outweights rather 'small' speedup compared to over running time.

However if the goal is to quickly activate and deactivate large thin-pools -   using '--skip-mappings' might be required to achieve wanted performance.


So can we also obtain some 'performance' numbers seen by customer -  and storage layout ?
(we normally do expect  metadata devices to be located on SSD - thin_check running on rotational storage might be futher impacted on performance (but thin-pool even way more)).

And also which exact version of  device-mapper-persistent-data were in-use.

Comment 18 Jonathan Earl Brassow 2019-06-27 21:55:59 UTC

Firstly, we need to know the following from the customer:
1) what is their configuration
2) what version of the software are they running
3) can they provide us with a copy of the metadata if they are still experiencing slow check times

We have made some improvements in recent rhel releases that we feel should speed things up.  If not, we'd like to know why.  Having the metadata and being able to run the tools ourselves against that set would help us investigate.

At this stage in rhel7, if it is not minimal change (or an urgently necessary one), we would likely not pursue the problem in rhel7.  There are still options though.  You could move your metadata onto fast storage, or provide us the metadata in the event it is truly something anomalous.

Comment 21 Jakub Krysl 2019-10-02 11:41:03 UTC

Mass migration to lilin.

Comment 27 Chris Williams 2020-11-11 21:41:37 UTC

Red Hat Enterprise Linux 7 shipped it's final minor release on September 29th, 2020. 7.9 was the last minor releases scheduled for RHEL 7.
From intial triage it does not appear the remaining Bugzillas meet the inclusion criteria for Maintenance Phase 2 and will now be closed. 

From the RHEL life cycle page:
https://access.redhat.com/support/policy/updates/errata#Maintenance_Support_2_Phase
"During Maintenance Support 2 Phase for Red Hat Enterprise Linux version 7,Red Hat defined Critical and Important impact Security Advisories (RHSAs) and selected (at Red Hat discretion) Urgent Priority Bug Fix Advisories (RHBAs) may be released as they become available."

If this BZ was closed in error and meets the above criteria please re-open it flag for 7.9.z, provide suitable business and technical justifications, and follow the process for Accelerated Fixes:
https://source.redhat.com/groups/public/pnt-cxno/pnt_customer_experience_and_operations_wiki/support_delivery_accelerated_fix_release_handbook  

Feature Requests can re-opened and moved to RHEL 8 if the desired functionality is not already present in the product. 

Please reach out to the applicable Product Experience Engineer[0] if you have any questions or concerns.  

[0] https://bugzilla.redhat.com/page.cgi?id=agile_component_mapping.html&product=Red+Hat+Enterprise+Linux+7

Note You need to log in before you can comment on or make changes to this bug.