Bug 1194446
| Summary: | GFS2: mkfs.gfs2 scalability issue on large devices | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | Nate Straz <nstraz> | ||||
| Component: | gfs2-utils | Assignee: | Andrew Price <anprice> | ||||
| Status: | CLOSED ERRATA | QA Contact: | cluster-qe <cluster-qe> | ||||
| Severity: | medium | Docs Contact: | |||||
| Priority: | unspecified | ||||||
| Version: | 7.1 | CC: | cluster-maint, gfs2-maint | ||||
| Target Milestone: | rc | ||||||
| Target Release: | --- | ||||||
| Hardware: | Unspecified | ||||||
| OS: | Unspecified | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | gfs2-utils-3.1.8-1.el7 | Doc Type: | Bug Fix | ||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2015-11-19 03:53:50 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Bug Depends On: | 1184482 | ||||||
| Bug Blocks: | 1111393, 1497636 | ||||||
| Attachments: |
|
||||||
Created attachment 993801 [details]
Patch submitted upstream
With this patch I'm seeing much better performance with a 250T volume and far lower CPU usage:
Before: 13034.77user 41.25system 3:47:21elapsed 95%CPU (0avgtext+0avgdata 416248maxresident)k
2840inputs+41337136outputs (0major+449613minor)pagefaults 0swaps
After: 7.07user 32.58system 29:16.12elapsed 2%CPU (0avgtext+0avgdata 416308maxresident)k
3368inputs+41337136outputs (1major+105705minor)pagefaults 0swaps
Patch is now upstream and will land in RHEL7 with the gfs2-utils rebase. BEFORE with gfs2-utils-3.1.7-6.el7.x86_64 [root@dash-02 ~]# /usr/bin/time blkdiscard /dev/fsck/large 0.00user 66.44system 5:10.35elapsed 21%CPU (0avgtext+0avgdata 632maxresident)k 40inputs+0outputs (1major+198minor)pagefaults 0swaps [root@dash-02 ~]# /usr/bin/time mkfs.gfs2 -O -p lock_nolock -j 1 -K /dev/fsck/large /dev/fsck/large is a symbolic link to /dev/dm-16 This will destroy any data on /dev/dm-16 Device: /dev/fsck/large Block size: 4096 Device size: 256000.00 GB (67108866048 blocks) Filesystem size: 255999.98 GB (67108860932 blocks) Journals: 1 Resource groups: 1023251 Locking protocol: "lock_nolock" Lock table: "" UUID: ec8d8197-816a-fafc-330f-1d2e3543dfb3 20854.75user 21.14system 5:48:04elapsed 99%CPU (0avgtext+0avgdata 416772maxresident)k 2960inputs+41387248outputs (0major+576524minor)pagefaults 0swaps [root@dash-02 ~]# /usr/bin/time mkfs.gfs2 -O -p lock_nolock -j 1 -K /dev/fsck/large It appears to contain an existing filesystem (gfs2) /dev/fsck/large is a symbolic link to /dev/dm-16 This will destroy any data on /dev/dm-16 Device: /dev/fsck/large Block size: 4096 Device size: 256000.00 GB (67108866048 blocks) Filesystem size: 255999.98 GB (67108860932 blocks) Journals: 1 Resource groups: 1023251 Locking protocol: "lock_nolock" Lock table: "" UUID: 8e15a54a-dc63-acb5-c92b-1a16059f7218 21262.27user 20.98system 5:54:51elapsed 99%CPU (0avgtext+0avgdata 416812maxresident)k 2952inputs+41387248outputs (0major+633204minor)pagefaults 0swaps AFTER with gfs2-utils-3.1.8-4.el7.x86_64 [root@dash-03 ~]# blkdiscard /dev/fsck/large [root@dash-03 ~]# /usr/bin/time mkfs.gfs2 -O -p lock_nolock -j 1 -K /dev/fsck/large /dev/fsck/large is a symbolic link to /dev/dm-16 This will destroy any data on /dev/dm-16 Device: /dev/fsck/large Block size: 4096 Device size: 256000.00 GB (67108866048 blocks) Filesystem size: 255999.98 GB (67108860932 blocks) Journals: 1 Resource groups: 1023251 Locking protocol: "lock_nolock" Lock table: "" UUID: feb57a73-338d-70bf-eacc-212d200340b1 2.39user 24.40system 3:07.09elapsed 14%CPU (0avgtext+0avgdata 416800maxresident)k 2960inputs+41387248outputs (0major+170243minor)pagefaults 0swaps [root@dash-03 ~]# /usr/bin/time mkfs.gfs2 -O -p lock_nolock -j 1 -K /dev/fsck/large It appears to contain an existing filesystem (gfs2) /dev/fsck/large is a symbolic link to /dev/dm-16 This will destroy any data on /dev/dm-16 Device: /dev/fsck/large Block size: 4096 Device size: 256000.00 GB (67108866048 blocks) Filesystem size: 255999.98 GB (67108860932 blocks) Journals: 1 Resource groups: 1023251 Locking protocol: "lock_nolock" Lock table: "" UUID: 9d17c0bc-57b7-dcca-fe5a-667ae8e98ced 2.31user 24.08system 1:13.30elapsed 36%CPU (0avgtext+0avgdata 416816maxresident)k 2952inputs+41387248outputs (0major+193953minor)pagefaults 0swaps [root@dash-03 ~]# /usr/bin/time mkfs.gfs2 -O -p lock_nolock -j 1 -K /dev/fsck/large It appears to contain an existing filesystem (gfs2) /dev/fsck/large is a symbolic link to /dev/dm-16 This will destroy any data on /dev/dm-16 Device: /dev/fsck/large Block size: 4096 Device size: 256000.00 GB (67108866048 blocks) Filesystem size: 255999.98 GB (67108860932 blocks) Journals: 1 Resource groups: 1023251 Locking protocol: "lock_nolock" Lock table: "" UUID: 9f8bb1f9-7a75-13ab-8df9-dadfefa59650 2.47user 21.97system 1:06.49elapsed 36%CPU (0avgtext+0avgdata 416808maxresident)k 2952inputs+41387248outputs (0major+204423minor)pagefaults 0swaps Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2015-2178.html |
Description of problem: A mkfs.gfs2 on a 250TB device took 7 hours last night and used a lot of CPU time. # /usr/bin/time mkfs -t gfs2 -p lock_nolock -O /dev/XL/gfs2 24723.05user 59.37system 7:08:46elapsed 96%CPU (0avgtext+0avgdata 417080maxresident)k I collected perf data on file systems from 100GB to 100TB and found that lgfs2_rgrps_append quickly dominates all other functions. ==> perf-100G.txt <== # Overhead Command Symbol # ........ ......... ............................ 3.15% mkfs.gfs2 [.] gfs2_disk_hash 0.30% mkfs.gfs2 [.] gfs2_meta_header_out_bh ==> perf-1T.txt <== # Overhead Command Symbol # ........ ......... ............................ 17.48% mkfs.gfs2 [.] lgfs2_rgrps_append 1.29% mkfs.gfs2 [.] gfs2_disk_hash 0.12% mkfs.gfs2 [.] lgfs2_rgrp_write ==> perf-10T.txt <== # Overhead Command Symbol # ........ ......... ............................ 84.46% mkfs.gfs2 [.] lgfs2_rgrps_append 0.03% mkfs.gfs2 [.] lgfs2_rgrp_bitbuf_alloc ==> perf-100T.txt <== # Overhead Command Symbol # ........ ......... ............................ 98.85% mkfs.gfs2 [.] lgfs2_rgrps_append 0.00% mkfs.gfs2 [.] __errno_location@plt perf annotate: Sorted summary for file /usr/sbin/mkfs.gfs2 ---------------------------------------------- 97.15 /usr/src/debug/gfs2-utils-3.1.7/gfs2/libgfs2/../../gfs2/include/osi_tree.h:320 2.81 /usr/src/debug/gfs2-utils-3.1.7/gfs2/libgfs2/../../gfs2/include/osi_tree.h:321 313 static inline struct osi_node *osi_last(struct osi_root *root) ... 320 while (n->osi_right) 321 n = n->osi_right; 322 return n; It looks like the tree being used isn't balanced and probably has turned into a list which is traversed every time an RG is added. That ends up being a list 400k entries deep on a 100TB file system w/ 256MB RGs. Version-Release number of selected component (if applicable): gfs2-utils-3.1.7-6.el7.x86_64 How reproducible: Easily Steps to Reproduce: 1. perf record mkfs.gfs2 -O -p lock_nolock -j 1 /dev/foo 2. perf report --stdio -d mkfs.gfs2 | head -n 30 Actual results: Expected results: Additional info: