1274834 – check_cache 'slowness'

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1274834 - check_cache 'slowness'

Summary: check_cache 'slowness'

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 7
Classification:	Red Hat
Component:	device-mapper-persistent-data
Sub Component:
Version:	7.2
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	medium
Target Milestone:	rc
Target Release:	---
Assignee:	Joe Thornber
QA Contact:	Jakub Krysl
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1469559
TreeView+	depends on / blocked

Reported:	2015-10-23 15:25 UTC by Corey Marthaler
Modified:	2021-09-03 12:07 UTC (History)
CC List:	8 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2018-04-10 13:15:56 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHEA-2018:0776	0	None	None	None	2018-04-10 13:16:51 UTC

Description Corey Marthaler 2015-10-23 15:25:54 UTC

Description of problem:

Copying comment #15 from bug 1189051

dmsetup create --table '0 2000 error'  errdev

cache_check  /dev/mapper/errdev

examining superblock
  superblock is corrupt
    incomplete io for block 0, e.res = 18446744073709551611, e.res2 = 0, offset = 0, nbytes = 4096


--- very slooooowly going here  ---

(gdb) bt
#0  0x00007f6cc456c644 in __io_getevents_0_4 (ctx=0x7f6cc4c9d000, min_nr=1, nr=3949, events=0x555ce9562820, timeout=0x0)
    at io_getevents.c:25
#1  0x00007f6cc456c67d in io_getevents_0_4 (ctx=<optimized out>, min_nr=min_nr@entry=1, nr=<optimized out>, 
    events=<optimized out>, timeout=timeout@entry=0x0) at io_getevents.c:54
#2  0x0000555ce7e3a7f0 in bcache::block_cache::wait_io (this=this@entry=0x555ce955e6c8) at block-cache/block_cache.cc:202
#3  0x0000555ce7e3add0 in bcache::block_cache::wait_all (this=<optimized out>) at block-cache/block_cache.cc:261
#4  bcache::block_cache::flush (this=this@entry=0x555ce955e6c8) at block-cache/block_cache.cc:674
#5  0x0000555ce7e3aecc in bcache::block_cache::~block_cache (this=0x555ce955e6c8, __in_chrg=<optimized out>)
    at block-cache/block_cache.cc:491
#6  0x0000555ce7eb1ef3 in persistent_data::block_manager<4096u>::~block_manager (this=0x555ce955e6c0, 
    __in_chrg=<optimized out>) at persistent-data/block.h:42
#7  boost::checked_delete<persistent_data::block_manager<4096u> > (x=0x555ce955e6c0)
    at /usr/include/boost/core/checked_delete.hpp:34
#8  boost::detail::sp_counted_impl_p<persistent_data::block_manager<4096u> >::dispose (this=<optimized out>)
    at /usr/include/boost/smart_ptr/detail/sp_counted_impl.hpp:78
#9  0x0000555ce7e3eaf8 in boost::detail::sp_counted_base::release (this=0x555ce95815d0)
    at /usr/include/boost/smart_ptr/detail/sp_counted_base_gcc_x86.hpp:146
#10 boost::detail::shared_count::~shared_count (this=0x7ffc84793ad8, __in_chrg=<optimized out>)
    at /usr/include/boost/smart_ptr/detail/shared_count.hpp:467
#11 boost::shared_ptr<persistent_data::block_manager<4096u> >::~shared_ptr (this=0x7ffc84793ad0, __in_chrg=<optimized out>)
    at /usr/include/boost/smart_ptr/shared_ptr.hpp:330
#12 (anonymous namespace)::metadata_check (fs=<synthetic pointer>, path="/dev/mapper/errdev") at caching/cache_check.cc:224
#13 (anonymous namespace)::check (fs=<synthetic pointer>, path="/dev/mapper/errdev") at caching/cache_check.cc:300
#14 (anonymous namespace)::check_with_exception_handling (fs=<synthetic pointer>, path="/dev/mapper/errdev")
    at caching/cache_check.cc:318
#15 cache_check_main (argc=<optimized out>, argv=<optimized out>) at caching/cache_check.cc:410
#16 0x0000555ce7e364d4 in base::command::run (this=0x555ce815d6c0 <caching::cache_check_cmd>, argv=0x7ffc84794798, argc=2)
    at base/application.h:26
#17 base::application::run (this=0x7ffc84794680, argc=2, argv=0x7ffc84794798) at base/application.cc:32
#18 0x0000555ce7e35431 in main (argc=2, argv=0x7ffc84794798) at main.cc:39


---  at some point it will finish ---

but this clearly relates to the size of  errored device

For 'reproducer'  with 2000 sectors - it's about 40 seconds
with  20000 sectors  it has NOT finished in 10 minutes - so it's not even linear time increase...

So - new bug for cache_check tool to address this 'slowness' should be created.


As for actual testing of this NEEDSCHECKFLAG - I'd probably suggest  to restore 'previous' table content before 'vgchange -an' -  since  cache_check  is  executed on 'deactivation' as well as on 'activation' phase.

So if you want to check 'needs' has been cleared in this test case - you would need to give it back original device - after  cache target itself switched to 'Fail' mode.


Version-Release number of selected component (if applicable):
3.10.0-325.el7.x86_64

lvm2-2.02.130-5.el7    BUILT: Wed Oct 14 08:27:29 CDT 2015
lvm2-libs-2.02.130-5.el7    BUILT: Wed Oct 14 08:27:29 CDT 2015
lvm2-cluster-2.02.130-5.el7    BUILT: Wed Oct 14 08:27:29 CDT 2015
device-mapper-1.02.107-5.el7    BUILT: Wed Oct 14 08:27:29 CDT 2015
device-mapper-libs-1.02.107-5.el7    BUILT: Wed Oct 14 08:27:29 CDT 2015
device-mapper-event-1.02.107-5.el7    BUILT: Wed Oct 14 08:27:29 CDT 2015
device-mapper-event-libs-1.02.107-5.el7    BUILT: Wed Oct 14 08:27:29 CDT 2015
device-mapper-persistent-data-0.5.5-1.el7    BUILT: Thu Aug 13 09:58:10 CDT 2015
cmirror-2.02.130-5.el7    BUILT: Wed Oct 14 08:27:29 CDT 2015
sanlock-3.2.4-1.el7    BUILT: Fri Jun 19 12:48:49 CDT 2015
sanlock-lib-3.2.4-1.el7    BUILT: Fri Jun 19 12:48:49 CDT 2015
lvm2-lockd-2.02.130-5.el7    BUILT: Wed Oct 14 08:27:29 CDT 2015

Comment 3 Joe Thornber 2017-09-29 10:28:37 UTC

I just tried to reproduce and it works for me:


ejt@reti:~/work/RedHat/thin-provisioning-tools$ time sudo bin/cache_check /dev/mapper/errdev
examining superblock
  superblock is corrupt
    bad checksum in superblock

real    0m0.041s
user    0m0.008s
sys     0m0.000s

So it's quite possible I've already fixed this one, I've certainly done a fair amount of work on the libaio block cache in the last couple of years.

Could we retest this one with the latest version please?

Comment 4 Jakub Krysl 2017-09-29 13:06:44 UTC

Tested with build device-mapper-persistent-data-0.7.2-1.el7:
# dmsetup create --table '0 20000 error'  errdev                                                                                                                                              
# time cache_check /dev/mapper/errdev                                                                                                                                                         
examining superblock                                
  superblock is corrupt                             
    bad checksum in superblock                      

real    0m0.008s                                    
user    0m0.001s                                    
sys     0m0.001s 

I tried 0.7.0-0.1.rc6.el7 with the same result:
# time cache_check /dev/mapper/errdev 
examining superblock
  superblock is corrupt
    bad checksum in superblock

real    0m0.008s
user    0m0.002s
sys     0m0.004s

However in 0.6.3-1.el7 it is still broken:
# time cache_check /dev/mapper/errdev 
examining superblock
  superblock is corrupt
    incomplete io for block 0, e.res = 18446744073709551611, e.res2 = 0, offset = 0, nbytes = 4096
^C

real    0m42.958s
user    0m0.003s
sys     0m0.000s

So this got fixed somewhere between 0.7.0-0.1.rc6.el7 and 0.6.3-1.el7.

Comment 7 errata-xmlrpc 2018-04-10 13:15:56 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2018:0776

Note You need to log in before you can comment on or make changes to this bug.