Bug 1490517
| Summary: | Request to speed up thin_check on large volumes | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | John Pittman <jpittman> |
| Component: | device-mapper-persistent-data | Assignee: | Joe Thornber <thornber> |
| Status: | CLOSED WONTFIX | QA Contact: | Lin Li <lilin> |
| Severity: | medium | Docs Contact: | |
| Priority: | medium | ||
| Version: | 7.4 | CC: | agk, akarvi, awang, heinzm, jbrassow, loberman, lvm-team, msnitzer, nweddle, prajnoha, pvlasin, revers, rhandlin, tcleveng, thornber, tonay, zkabelac |
| Target Milestone: | rc | Keywords: | FutureFeature |
| Target Release: | --- | ||
| Hardware: | All | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2020-11-11 21:41:37 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
Reassigning component to device-mapper-persistent-data (and to Joe). Please provide the version of the device-mapper-persistent-data package (but I assume whatever ships with 7.4) I need to know what version of thin_check you're running:
thin_check --version
A few words from lvm2 side: lvm2 waits for successful result from a command completion to avoid 'activation' of thin-pool with bad data. There is not yet any lvm2 code doing 'parallel' thin-check execution (Would need some sort of lvm2 support for metadata snapshot usage - which might theoretically cause more serious damage of metadata in case there would be some consistency problems) lvm2 supports configurable options to be passed to the command (or eventually whole thin_check command can be skipped completely) - decision is typically on a user ATM to decide what's the level of trust put into kernel - since lvm2 runs with wide range of kernels. Recent edition of dmpd (>=0.7) dramatically speeds-up thin_check time by using faster crc32 algorithm (though thin_restore is still relatively very slow tool even on fast CPU if your metadasize is approaching ~16GB) For larger volumes I'd probably recommend adding --skip-mappings into lvm.conf. We are already thinking of making this option a default for current thin-pools as the thin-pool kernel metadata format matured and as very good at consistency protection. Joe, Here are the versions. device-mapper-event-1.02.140-8.el7.x86_64 device-mapper-multipath-libs-0.4.9-111.el7.x86_64 device-mapper-1.02.140-8.el7.x86_64 device-mapper-persistent-data-0.7.0-0.1.rc6.el7.x86_64 device-mapper-event-libs-1.02.140-8.el7.x86_64 device-mapper-multipath-0.4.9-111.el7.x86_64 device-mapper-libs-1.02.140-8.el7.x86_64 # thin_check --version 0.7.0-0.1.rc6.el7 Are you storing the metadata on a spindle or ssd? Joe, its stored on the spindle disks. [pool1_tdata_rmeta_0] vg1 ewi-aor--- 4.00m /dev/sda1(950649) [pool1_tdata_rmeta_1] vg1 ewi-aor--- 4.00m /dev/sdb1(950649) [pool1_tdata_rmeta_2] vg1 ewi-aor--- 4.00m /dev/sdd1(950649) [pool1_tdata_rmeta_3] vg1 ewi-aor--- 4.00m /dev/sdf1(950649) [pool1_tdata_rmeta_4] vg1 ewi-aor--- 4.00m /dev/sdh1(950649) [pool1_tdata_rmeta_5] vg1 ewi-aor--- 4.00m /dev/sdj1(2561) [pool1_tdata_rmeta_6] vg1 ewi-aor--- 4.00m /dev/sdm1(950649) [pool1_tdata_rmeta_7] vg1 ewi-aor--- 4.00m /dev/sdn1(950649) [pool1_tmeta] vg1 ewi-aor--- 10.00g 100.00 pool1_tmeta_rimage_0(0),pool1_tmeta_rimage_1(0),pool1_tmeta_rimage_2(0),pool1_tmeta_rimage_3(0),pool1_tmeta_rimage_4(0),pool1_tmeta_rimage_5(0),pool1_tmeta_rimage_6(0),pool1_tmeta_rimage_7(0) [pool1_tmeta_rmeta_0] vg1 ewi-aor--- 4.00m /dev/sda1(2560) [pool1_tmeta_rmeta_1] vg1 ewi-aor--- 4.00m /dev/sdb1(2560) [pool1_tmeta_rmeta_2] vg1 ewi-aor--- 4.00m /dev/sdd1(2560) [pool1_tmeta_rmeta_3] vg1 ewi-aor--- 4.00m /dev/sdf1(2560) [pool1_tmeta_rmeta_4] vg1 ewi-aor--- 4.00m /dev/sdh1(2560) [pool1_tmeta_rmeta_5] vg1 ewi-aor--- 4.00m /dev/sdj1(0) [pool1_tmeta_rmeta_6] vg1 ewi-aor--- 4.00m /dev/sdm1(2560) [pool1_tmeta_rmeta_7] vg1 ewi-aor--- 4.00m /dev/sdn1(2560) [ 3.095185] scsi 0:0:1:0: Direct-Access HGST HUH728080AL5200 A7J0 PQ: 0 ANSI: 6 <=== /dev/sda [ 3.120115] scsi 0:0:3:0: Direct-Access HGST HUH728080AL5200 A7J0 PQ: 0 ANSI: 6 <=== /dev/sdb [ 3.232810] scsi 0:0:4:0: Direct-Access ATA WDC WD60EFRX-68L 0A82 PQ: 0 ANSI: 6 <=== /dev/sdc [ 3.370059] scsi 0:0:5:0: Direct-Access HGST HUH728080AL5200 A7J0 PQ: 0 ANSI: 6 <=== /dev/sdd [ 3.401832] scsi 0:0:6:0: Direct-Access ATA WDC WD60EFRX-68L 0A82 PQ: 0 ANSI: 6 <=== /dev/sde [ 3.425039] scsi 0:0:7:0: Direct-Access HGST HUH728080AL5200 A7J0 PQ: 0 ANSI: 6 <=== /dev/sdf [ 3.441778] scsi 0:0:8:0: Direct-Access ATA Samsung SSD 850 2B6Q PQ: 0 ANSI: 6 <=== /dev/sdg [ 3.454998] scsi 0:0:9:0: Direct-Access HGST HUH728080AL5200 A7J0 PQ: 0 ANSI: 6 <=== /dev/sdh [ 3.472646] scsi 0:0:10:0: Direct-Access ATA Samsung SSD 850 2B6Q PQ: 0 ANSI: 6 <=== /dev/sdi [ 3.494036] scsi 0:0:11:0: Direct-Access HGST HUH728080AL5200 A7J0 PQ: 0 ANSI: 6 <=== /dev/sdj [ 3.509801] scsi 0:0:12:0: Direct-Access ATA WDC WD60EFRX-68L 0A82 PQ: 0 ANSI: 6 <=== /dev/sdk [ 3.529512] scsi 0:0:13:0: Direct-Access ATA WDC WD6001FFWX-6 0A81 PQ: 0 ANSI: 6 <=== /dev/sdl [ 3.552152] scsi 0:0:15:0: Direct-Access HGST HUH728080AL5200 A7J0 PQ: 0 ANSI: 6 <=== /dev/sdm [ 3.595170] scsi 0:0:17:0: Direct-Access HGST HUH728080AL5200 A7J0 PQ: 0 ANSI: 6 <=== /dev/sdn [ 3.633032] scsi 0:2:0:0: Direct-Access DELL PERC H730 Adp 4.24 PQ: 0 ANSI: 5 [ 3.662651] scsi 0:2:1:0: Direct-Access DELL PERC H730 Adp 4.24 PQ: 0 ANSI: 5 Can we obtain 'metadata' that are perceived as slow for 'checking' ? Especially when added with '--skip-mappings' - which we do not normally advice - as the 'security' obtained by running the full scan typically by far outweights rather 'small' speedup compared to over running time. However if the goal is to quickly activate and deactivate large thin-pools - using '--skip-mappings' might be required to achieve wanted performance. So can we also obtain some 'performance' numbers seen by customer - and storage layout ? (we normally do expect metadata devices to be located on SSD - thin_check running on rotational storage might be futher impacted on performance (but thin-pool even way more)). And also which exact version of device-mapper-persistent-data were in-use. Firstly, we need to know the following from the customer: 1) what is their configuration 2) what version of the software are they running 3) can they provide us with a copy of the metadata if they are still experiencing slow check times We have made some improvements in recent rhel releases that we feel should speed things up. If not, we'd like to know why. Having the metadata and being able to run the tools ourselves against that set would help us investigate. At this stage in rhel7, if it is not minimal change (or an urgently necessary one), we would likely not pursue the problem in rhel7. There are still options though. You could move your metadata onto fast storage, or provide us the metadata in the event it is truly something anomalous. Mass migration to lilin. Red Hat Enterprise Linux 7 shipped it's final minor release on September 29th, 2020. 7.9 was the last minor releases scheduled for RHEL 7. From intial triage it does not appear the remaining Bugzillas meet the inclusion criteria for Maintenance Phase 2 and will now be closed. From the RHEL life cycle page: https://access.redhat.com/support/policy/updates/errata#Maintenance_Support_2_Phase "During Maintenance Support 2 Phase for Red Hat Enterprise Linux version 7,Red Hat defined Critical and Important impact Security Advisories (RHSAs) and selected (at Red Hat discretion) Urgent Priority Bug Fix Advisories (RHBAs) may be released as they become available." If this BZ was closed in error and meets the above criteria please re-open it flag for 7.9.z, provide suitable business and technical justifications, and follow the process for Accelerated Fixes: https://source.redhat.com/groups/public/pnt-cxno/pnt_customer_experience_and_operations_wiki/support_delivery_accelerated_fix_release_handbook Feature Requests can re-opened and moved to RHEL 8 if the desired functionality is not already present in the product. Please reach out to the applicable Product Experience Engineer[0] if you have any questions or concerns. [0] https://bugzilla.redhat.com/page.cgi?id=agile_component_mapping.html&product=Red+Hat+Enterprise+Linux+7 |
Description of problem: RFE to request that thin_check be sped up or allowed to run after activation (if possible). In the most recent case, it takes about an hour on 30T volume. * Is it possible to speed it up to the point that it will not have an impact on boot? We can use 'global { thin_check_options = ["-q", "--clear-needs-check-flag", "--skip-mappings"] }', to work around the slowness. * Can it run async to activation? Like mdadm resync? * If the workaround is used, is the expectation to manually check later with no workaround? Version-Release number of selected component (if applicable): kernel-3.10.0-693.1.1.el7.x86_64 lvm2-2.02.171-8.el7.x86_64 device-mapper-1.02.140-8.el7.x86_64 Expected results: Fast(er) thin_check or async to boot somehow Additional info: If you need any info at all please let me know we'll be glad to grab it.