Bug 1401380
Summary: | [Compound FOPs] : Memory leaks while doing deep directory creation | ||
---|---|---|---|
Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | Nag Pavan Chilakam <nchilaka> |
Component: | replicate | Assignee: | Krutika Dhananjay <kdhananj> |
Status: | CLOSED ERRATA | QA Contact: | Nag Pavan Chilakam <nchilaka> |
Severity: | urgent | Docs Contact: | |
Priority: | unspecified | ||
Version: | rhgs-3.2 | CC: | amukherj, mchangir, pkarampu, ravishankar, rcyriac, rhs-bugs, storage-qa-internal |
Target Milestone: | --- | ||
Target Release: | RHGS 3.2.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | glusterfs-3.8.4-10 | Doc Type: | If docs needed, set a value |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2017-03-23 05:54:07 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | 1401218, 1402212 | ||
Bug Blocks: | 1351528 |
Description
Nag Pavan Chilakam
2016-12-05 05:18:19 UTC
I found this while verifying 1393709 [Compound FOPs] Client side IObuff leaks at a high pace consumes complete client memory and hence making gluster volume inaccessible And is that bug and this one seemed to be root caused to be different hence raising a new bug Krutika has a bug upstream for the same https://bugzilla.redhat.com/show_bug.cgi?id=1401218 - Fix compound fops memory leaks Giving QA_ack as it is a must fix for compound fops to go as a feature upstream mainline : http://review.gluster.org/16020 downstream : https://code.engineering.redhat.com/gerrit/#/c/92061/ sosreports [qe@rhsqe-repo clients]$ pwd /var/www/html/sosreports/nchilaka/bug.1401380/clients Moving it to POST as I will use the BZ to backport http://review.gluster.org/#/c/16029/ On_qa validation: While validation , i still found memory consumption high: Will have to fail the fix: Below are the findings: Clients: client1 (rhs-client45): Without CFOPs: Mon Dec 19 17:01:35 IST 2016 PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 10306 root 20 0 778260 55824 4024 S 43.8 0.3 18:16.37 gluster+ Mon Dec 19 17:17:10 IST 2016 PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 10306 root 20 0 778260 76924 4036 S 43.8 0.5 24:17.27 gluster+ ===>consumed about 20MB in 15min with CFOPS: Mon Dec 19 17:24:14 IST 2016 PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 6669 root 20 0 775092 53316 3908 S 37.5 0.3 0:05.77 gluster+ =========>started dir create Mon Dec 19 17:24:44 IST 2016 PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 6669 root 20 0 775092 71252 3908 S 31.2 0.4 0:17.29 gluster+ Mon Dec 19 17:25:14 IST 2016 Mon Dec 19 17:41:20 IST 2016 PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 6669 root 20 0 1299380 634920 3924 S 43.8 3.9 6:33.27 gluster+ ====>stopped creations ============> increased by 550mb in about 15min #################################################################################### Client2:(rhs-client24): without cfops enabled ========>started dir create from 45 and 24 Mon Dec 19 17:04:29 IST 2016 PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 20323 root 20 0 696m 44m 3972 S 45.8 0.3 0:12.85 glusterfs Mon Dec 19 17:20:16 IST 2016 PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 20323 root 20 0 761m 61m 3984 S 45.4 0.4 7:30.26 glusterfs ===>consumed about 15MB in 15min With cfops: =========>started dir create Mon Dec 19 17:27:32 IST 2016 PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 3599 root 20 0 696m 53m 3992 S 48.4 0.3 0:14.33 glusterfs Mon Dec 19 17:28:02 IST 2016 PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 3599 root 20 0 696m 65m 3992 S 46.3 0.4 0:28.36 glusterfs ====>stopped creations Mon Dec 19 17:44:51 IST 2016 PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 3599 root 20 0 1145m 442m 3992 S 0.0 2.8 7:59.31 glusterfs ^C ====>consumed about 440 mb in 16min #################################################################################### Server side: brick1: Without cfops enabled: Mon Dec 19 17:08:18 IST 2016 PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 24524 root 20 0 1382176 37840 3132 S 50.0 0.5 2:17.61 gluster+ Mon Dec 19 17:23:23 IST 2016 PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 24524 root 20 0 1382176 62904 3160 S 50.0 0.8 10:10.55 gluster+ ===>consumed about 25MB in 15min with CFOPS: Mon Dec 19 17:29:54 IST 2016 PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ 25058 root 20 0 1250076 39396 4312 S 37.5 0.5 0:07.09 =========>started dir create Mon Dec 19 17:30:25 IST 2016 PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ 25058 root 20 0 1250076 43620 4312 S 47.1 0.5 0:22.52 Mon Dec 19 17:30:55 IST 2016 Mon Dec 19 17:47:00 IST 2016 PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ 25058 root 20 0 1513248 191188 4328 S 56.2 2.4 8:40.34 ===>consumed about 150mb in 16min ################################################################################ brick2: Without cfops enabled: Mon Dec 19 17:08:29 IST 2016 PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 23156 root 20 0 1385296 51752 3124 S 50.0 0.6 12:41.95 gluster+ Mon Dec 19 17:23:34 IST 2016 PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 23156 root 20 0 1385296 75764 3128 S 0.0 0.9 19:54.83 gluster+ ===>consumed about 15MB in 15min ==>with CFOPs: Mon Dec 19 17:29:48 IST 2016 PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 24747 root 20 0 1250080 41108 4320 S 43.8 0.5 0:03.97 glusterfsd =========>started dir create Mon Dec 19 17:46:54 IST 2016 PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 24747 root 20 0 1447716 193404 4336 S 50.0 2.4 8:37.14 glusterfsd ===>consumed about 140MB As discussed offline, I will be moving to failed-qa upstream mainline patch http://review.gluster.org/#/c/16210/ posted for review. downstream patch : https://code.engineering.redhat.com/gerrit/#/c/93488/ BZ added to erratum https://errata.devel.redhat.com/advisory/24866 Moving to ON_QA on_qa validation: ================== reran same testcase, ie simultaneous deep dir creations from two differen clients(one rhel6.7 and other rhel7.3, the same clients used when the bz was raised) had a 1x2 vol with cfops enabled as below [root@dhcp35-37 ~]# gluster v info cfp Volume Name: cfp Type: Replicate Volume ID: d84aaa47-62d3-485b-92ff-83f3b5cfc46b Status: Started Snapshot Count: 0 Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1: 10.70.35.37:/rhs/brick1/cfp Brick2: 10.70.35.116:/rhs/brick1/cfp Options Reconfigured: cluster.use-compound-fops: on transport.address-family: inet performance.readdir-ahead: on nfs.disable: on nfs-ganesha: enable cluster.enable-shared-storage: enable [root@dhcp35-37 ~]# gluster v status cfp Status of volume: cfp Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick 10.70.35.37:/rhs/brick1/cfp 49155 0 Y 7255 Brick 10.70.35.116:/rhs/brick1/cfp 49155 0 Y 3352 Self-heal Daemon on localhost N/A N/A Y 7275 Self-heal Daemon on dhcp35-116.lab.eng.blr. redhat.com N/A N/A Y 3428 Self-heal Daemon on dhcp35-239.lab.eng.blr. redhat.com N/A N/A Y 8537 Self-heal Daemon on dhcp35-196.lab.eng.blr. redhat.com N/A N/A Y 13185 Self-heal Daemon on dhcp35-135.lab.eng.blr. redhat.com N/A N/A Y 14433 Self-heal Daemon on dhcp35-8.lab.eng.blr.re dhat.com N/A N/A Y 14371 Task Status of Volume cfp ------------------------------------------------------------------------------ There are no active volume tasks I saw that with a span of about 15min the mem consumed was as below client45:el7.3: Mon Dec 26 20:00:12 IST 2016 PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 30215 root 20 0 775108 151512 3924 S 37.5 0.9 24:27.19 glusterfs ############## Mon Dec 26 20:15:17 IST 2016 PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 30215 root 20 0 775108 181924 3924 S 33.3 1.1 30:06.44 glusterfs so the mem consumed was seen as about 30MB client24:el6.7: ==== ############## Mon Dec 26 20:00:14 IST 2016 PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 15038 root 20 0 761m 88m 4000 S 43.5 0.6 27:32.00 glusterfs ############## Mon Dec 26 20:15:31 IST 2016 PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 15038 root 20 0 761m 103m 4000 S 51.6 0.6 34:31.00 glusterfs so mem consumer was about 15MB Which are almost equal to what the clients consume without cfop enabled Hence marking as fixed test version: glusterfs-libs-3.8.4-10.el7.x86_64 glusterfs-devel-3.8.4-10.el7.x86_64 glusterfs-api-devel-3.8.4-10.el7.x86_64 glusterfs-cli-3.8.4-10.el7.x86_64 glusterfs-3.8.4-10.el7.x86_64 glusterfs-api-3.8.4-10.el7.x86_64 glusterfs-fuse-3.8.4-10.el7.x86_64 glusterfs-client-xlators-3.8.4-10.el7.x86_64 glusterfs-rdma-3.8.4-10.el7.x86_64 the brick process consumption too was normal Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2017-0486.html |