Bug 990583
Summary: | lvremove of thin snapshots takes 5 to 20 minutes (single core cpu bound?) | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | Lars Ellenberg <lars.ellenberg> | ||||||||||
Component: | kernel | Assignee: | Joe Thornber <thornber> | ||||||||||
Status: | CLOSED WONTFIX | QA Contact: | Storage QE <storage-qe> | ||||||||||
Severity: | unspecified | Docs Contact: | |||||||||||
Priority: | unspecified | ||||||||||||
Version: | 6.4 | CC: | agk, dwysocha, heinzm, jbrassow, lars.ellenberg, msnitzer, prajnoha, prockai, thornber, zkabelac | ||||||||||
Target Milestone: | rc | Keywords: | Reopened | ||||||||||
Target Release: | --- | ||||||||||||
Hardware: | Unspecified | ||||||||||||
OS: | Unspecified | ||||||||||||
Whiteboard: | |||||||||||||
Fixed In Version: | Doc Type: | If docs needed, set a value | |||||||||||
Doc Text: | Story Points: | --- | |||||||||||
Clone Of: | Environment: | ||||||||||||
Last Closed: | 2017-09-15 10:57:24 UTC | Type: | Bug | ||||||||||
Regression: | --- | Mount Type: | --- | ||||||||||
Documentation: | --- | CRM: | |||||||||||
Verified Versions: | Category: | --- | |||||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||||
Embargoed: | |||||||||||||
Attachments: |
|
Description
Lars Ellenberg
2013-07-31 13:38:12 UTC
I guess it would be nice to have attached compressed thin pool kernel metadata and related lvm2 metadata. What size are they compressed? (size of thin-pool metadata is not visible in lvs output (needs '-a')) Please attach 'lvs -a -o+metadata_percent' May I assume all the volumes are active when you run lvremove ? (Since removal of inactive thin volume when thin pool is inactive would trigger 2 metadata consistency checks - when the pool is activated and deactivated). Also please try to attach memory usage 'slabtop -o' and 'cat /proc/vmstat' Also - 'perf record -ag' & 'perf report' of long running lvremove command would help. Created attachment 783260 [details]
slabtop
Created attachment 783261 [details]
vmstat
Created attachment 783262 [details]
meminfo
in the original post: metadata_percent: 24.02 in our usage, it seems to vary between 23 and 25 % Right now: pool1 bu4db twi-a-tz 1.95t 71.43 23.42 [pool1_tdata] bu4db Twi-aot- 1.95t [pool1_tmeta] bu4db ewi-aot- 16.00g all volumes are "active" (mappings exist) only one volume is "open" (the "main" volume is mounted) no application IO at that time. I don't see any thin_check hanging around, only 10854 root 20 0 23220 4468 3312 R 99.0 0.0 6:46.28 lvremove -f bu4db/db.20130802.030003.raw meaning 99% cpu (sys) since almost 7 minutes ... slabtop, vmstat, meminfo attached just now. Regarding the meta data dump, we talk about ~ 1.6 GB bzip2 (raw dd'ed tmeta volume) I'll also try to produce some perf output. Due do involved communication overhead [ :-) ] this may take a few days, though. Created attachment 783547 [details]
perf-report
grabbed by:
while ! pid=$(pgrep lvremove); do sleep 1; done ; perf record -ag -p $pid
full perf.data available,
or different perf report options, if you like.
Or maybe it is useful enough as it is?
Patches went in in October 2014 to prefetch metadata which greatly speeds up this situation. uhm, it was *CPU* limitted. 100% actual cpu usage on a single core. NOT IO-limited at all. So how can prefetch of meta data IO "greatly speed up" something that is cpu bound? Just curious... My guess is the underlying issue is still to do with metadata paging in and out due to the very large thin device. You're using a very fast device for the metadata, so the IO delay will be minimal. But every time a metadata page is read or written a checksum of it is calculated. Try tweaking the bufio parameters to increase the cache size. |