| Summary: | fuse mount crashed with OOM kill | ||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Gluster Storage | Reporter: | Nag Pavan Chilakam <nchilaka> | ||||||||||||
| Component: | core | Assignee: | Vijay Bellur <vbellur> | ||||||||||||
| Status: | CLOSED WORKSFORME | QA Contact: | Rahul Hinduja <rhinduja> | ||||||||||||
| Severity: | urgent | Docs Contact: | |||||||||||||
| Priority: | unspecified | ||||||||||||||
| Version: | rhgs-3.2 | CC: | amukherj, csaba, nchilaka, rhs-bugs, storage-qa-internal | ||||||||||||
| Target Milestone: | --- | Keywords: | ZStream | ||||||||||||
| Target Release: | --- | ||||||||||||||
| Hardware: | Unspecified | ||||||||||||||
| OS: | Unspecified | ||||||||||||||
| Whiteboard: | |||||||||||||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |||||||||||||
| Doc Text: | Story Points: | --- | |||||||||||||
| Clone Of: | Environment: | ||||||||||||||
| Last Closed: | 2018-02-07 04:26:30 UTC | Type: | Bug | ||||||||||||
| Regression: | --- | Mount Type: | --- | ||||||||||||
| Documentation: | --- | CRM: | |||||||||||||
| Verified Versions: | Category: | --- | |||||||||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||||||||
| Attachments: |
|
||||||||||||||
|
Description
Nag Pavan Chilakam
2016-09-26 11:33:58 UTC
With the same cluster setup, I am rerunning the lookups on a new client. I feel the issue seems to be with the lookups as I see that for every 1 min the Mem consumption by glusterfs mount process is increasing by about ~0.1%
From this specific Client I created a directory and started lookup on the root of mount directory using below command:
for i in {1..1000};do ls -lRt;stat *;sleep 500;done
Also, note that I am collecting top and free -h output by appending it to a file in the directory i created from this mount. This is the only write IO being done from *THIS* mount point (there can be some other IOs from other clients)
Following is the memory consumption by glusterfs mount process every 1 min
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
14567 root 20 0 720424 64348 3716 S 0.0 0.8 0:00.29 glusterfs
14567 root 20 0 720424 68472 3716 S 0.0 0.9 0:00.32 glusterfs
14567 root 20 0 785960 68676 3732 S 0.0 0.9 0:00.38 glusterfs
14567 root 20 0 785960 68708 3732 S 0.0 0.9 0:00.42 glusterfs
14567 root 20 0 985732 80352 4016 S 18.8 1.0 0:06.80 glusterfs
14567 root 20 0 985732 89856 4016 S 31.2 1.1 0:24.41 glusterfs
14567 root 20 0 985732 101584 4016 S 31.2 1.3 0:42.36 glusterfs
14567 root 20 0 985732 111684 4016 S 0.0 1.4 0:59.91 glusterfs
14567 root 20 0 985732 123144 4016 S 31.2 1.5 1:18.09 glusterfs
14567 root 20 0 985732 135124 4016 S 31.2 1.7 1:36.66 glusterfs
14567 root 20 0 985732 144356 4016 S 31.2 1.8 1:55.16 glusterfs
14567 root 20 0 985732 157132 4016 S 26.7 2.0 2:12.72 glusterfs
14567 root 20 0 985732 168676 4016 S 40.0 2.1 2:30.25 glusterfs
14567 root 20 0 985732 182460 4016 S 26.7 2.3 2:49.70 glusterfs
14567 root 20 0 985732 193696 4016 S 29.4 2.4 3:07.33 glusterfs
14567 root 20 0 985732 208052 4032 S 33.3 2.6 3:24.99 glusterfs
14567 root 20 0 985732 219024 4032 S 31.2 2.7 3:42.46 glusterfs
Another Sample:
took two different samples in a time span of ~3 min:
First sample:
[root@dhcp35-179 logs]# free -h
total used free shared buff/cache available
Mem: 7.6G 357M 6.0G 763M 1.3G 6.2G
Swap: 4.0G 108M 3.9G
[root@dhcp35-179 logs]# top -n 1 -b
top - 18:24:26 up 67 days, 5:06, 1 user, load average: 0.29, 0.23, 0.14
Tasks: 225 total, 1 running, 224 sleeping, 0 stopped, 0 zombie
%Cpu(s): 2.7 us, 3.4 sy, 0.0 ni, 93.1 id, 0.0 wa, 0.0 hi, 0.2 si, 0.6 st
KiB Mem : 8017096 total, 6336308 free, 368696 used, 1312092 buff/cache
KiB Swap: 4194300 total, 4082984 free, 111316 used. 6525196 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
14567 root 20 0 985732 134596 4016 S 53.3 1.7 1:35.92 glusterfs
1013 root 20 0 4376 528 496 S 20.0 0.0 54:06.94 rngd
15233 root 20 0 157680 2140 1488 R 6.7 0.0 0:00.01 top
1 root 20 0 191868 6656 3524 S 0.0 0.1 16:33.93 systemd
Second Sample:
============
[root@dhcp35-179 logs]# free -h
total used free shared buff/cache available
Mem: 7.6G 401M 6.0G 773M 1.3G 6.2G
Swap: 4.0G 108M 3.9G
[root@dhcp35-179 logs]# top -n 1 -b
top - 18:28:16 up 67 days, 5:10, 1 user, load average: 0.22, 0.26, 0.17
Tasks: 224 total, 1 running, 223 sleeping, 0 stopped, 0 zombie
%Cpu(s): 2.7 us, 3.4 sy, 0.0 ni, 93.1 id, 0.0 wa, 0.0 hi, 0.2 si, 0.6 st
KiB Mem : 8017096 total, 6262540 free, 413840 used, 1340716 buff/cache
KiB Swap: 4194300 total, 4082984 free, 111316 used. 6453096 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
14567 root 20 0 985732 179488 4016 S 25.0 2.2 2:44.08 glusterfs
1 root 20 0 191868 6656 3524 S 0.0 0.1 16:33.94 systemd
2 root 20 0 0 0 0 S 0.0 0.0 0:02.81 kthreadd
I have even take the process dumps between two intervals, which is available at /var/run (not in /var/run/gluster but in just /var/run)
(In reply to nchilaka from comment #2) > With the same cluster setup, I am rerunning the lookups on a new client. I > feel the issue seems to be with the lookups as I see that for every 1 min > the Mem consumption by glusterfs mount process is increasing by about ~0.1% > From this specific Client I created a directory and started lookup on the > root of mount directory using below command: What is the exact setup now? Ie. particular questions I have: - This time are you doing lookups only from the new client, or the four (five?) clients exhibit simultaneous activity? - Also, how does the file system tree look like in the voiume? Is it just the single dir you created now? If not, and there is already stuff there, what's the significance of creating a new directory? My other type of questions are about reproducibility. Do you see the same issue if you create a similar setup on an other set of nodes? And if yes, then if you try to simplify the cluster? Ie. cutting back on the number of replicas / size of the distribution set (maybe simplifying as far as only distributed / only replicated)? (In reply to Csaba Henk from comment #3) > (In reply to nchilaka from comment #2) > > With the same cluster setup, I am rerunning the lookups on a new client. I > > feel the issue seems to be with the lookups as I see that for every 1 min > > the Mem consumption by glusterfs mount process is increasing by about ~0.1% > > From this specific Client I created a directory and started lookup on the > > root of mount directory using below command: > > What is the exact setup now? Ie. particular questions I have: > - This time are you doing lookups only from the new client, or the four > (five?) clients exhibit simultaneous activity? > - Also, how does the file system tree look like in the voiume? Is it just > the single dir you created now? If not, and there is already stuff there, > what's the significance of creating a new directory? I have a 4x2 volume, spanning across 4 gluster nodes Regarding IOs: I have 4 clients, each mounting the volume from different cluster node IP client1: untar of linux kernel client2: using dd for 2million file creation of 10KB size client3:mkdir creation with lot of deapth and breadth client4: doing Lookup using ls and stat from the root ===>hitting oom kill only here. > > My other type of questions are about reproducibility. Do you see the same > issue if you create a similar setup on an other set of nodes? And if yes, > then if you try to simplify the cluster? Ie. cutting back on the number of > replicas / size of the distribution set (maybe simplifying as far as only > distributed / only replicated)? Reproducible: I observed this on atleast another 2 machines while doing lookups....same workload as client4 I have to see about reproducing on a smaller/simplified setup. But this setup is for testing systemic style . Reason why I am having directory names like that,they are all client-hostnames and I am collecting log information like memory and cpu usage into those directories.....this tackles two things:one is log collection and the other is continuous append IO (In reply to nchilaka from comment #4) > (In reply to Csaba Henk from comment #3) > > (In reply to nchilaka from comment #2) > > > With the same cluster setup, I am rerunning the lookups on a new client. I > > > feel the issue seems to be with the lookups as I see that for every 1 min > > > the Mem consumption by glusterfs mount process is increasing by about ~0.1% > > > From this specific Client I created a directory and started lookup on the > > > root of mount directory using below command: > > > > What is the exact setup now? Ie. particular questions I have: > > - This time are you doing lookups only from the new client, or the four > > (five?) clients exhibit simultaneous activity? > > - Also, how does the file system tree look like in the voiume? Is it just > > the single dir you created now? If not, and there is already stuff there, > > what's the significance of creating a new directory? > I have a 4x2 volume, spanning across 4 gluster nodes > Regarding IOs: > I have 4 clients, each mounting the volume from different cluster node IP > client1: untar of linux kernel > client2: using dd for 2million file creation of 10KB size > client3:mkdir creation with lot of deapth and breadth > client4: doing Lookup using ls and stat from the root ===>hitting oom kill > only here. > > > > > My other type of questions are about reproducibility. Do you see the same > > issue if you create a similar setup on an other set of nodes? And if yes, > > then if you try to simplify the cluster? Ie. cutting back on the number of > > replicas / size of the distribution set (maybe simplifying as far as only > > distributed / only replicated)? > > Reproducible: I observed this on atleast another 2 machines while doing > lookups....same workload as client4 > I have to see about reproducing on a smaller/simplified setup. But this > setup is for testing systemic style . > > Reason why I am having directory names like that,they are all > client-hostnames and I am collecting log information like memory and cpu > usage into those directories.....this tackles two things:one is log > collection and the other is continuous append IO Also the file structure is as below: I used for directory creation in dhcp35-86 folder using " for i in {1..1000};do for j in {1..1000};do for k in {1..1000};do for l in {1..1000};do for m in {1..1000};do mkdir -p level1.$i;mkdir -p level1.$i/level2.$j;mkdir -p level1.$i/level2.$j/level3.$k;mkdir -p level1.$i/level2.$j/level3.$k/level4.$l;mkdir -p level1.$i/level2.$j/level3.$k/level4.$l;mkdir -p level1.$i/level2.$j/level3.$k/level4.$l/level5.$m;done;done;done;done;done" The current structure is as below and growing distrepvol/rootdir1/dhcp35-86.lab.eng.blr.redhat.com/dirs/level1.1/level2.1/level{3.1..3.12 and growing}/level{4.1..4.1000}/level{1..1000} While the other hostdirectories are not of much depth Can you please attach the statedumps you have? (In reply to nchilaka from comment #2) > Also, note that I am collecting top and free -h output by appending it to a > file in the directory i created from this mount. This is the only write IO > being done from *THIS* mount point (there can be some other IOs from other > clients) > > > Following is the memory consumption by glusterfs mount process every 1 min > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND > 14567 root 20 0 720424 64348 3716 S 0.0 0.8 0:00.29 glusterfs > 14567 root 20 0 720424 68472 3716 S 0.0 0.9 0:00.32 glusterfs > 14567 root 20 0 785960 68676 3732 S 0.0 0.9 0:00.38 glusterfs > 14567 root 20 0 785960 68708 3732 S 0.0 0.9 0:00.42 glusterfs > 14567 root 20 0 985732 80352 4016 S 18.8 1.0 0:06.80 glusterfs > 14567 root 20 0 985732 89856 4016 S 31.2 1.1 0:24.41 glusterfs > 14567 root 20 0 985732 101584 4016 S 31.2 1.3 0:42.36 glusterfs > 14567 root 20 0 985732 111684 4016 S 0.0 1.4 0:59.91 glusterfs > 14567 root 20 0 985732 123144 4016 S 31.2 1.5 1:18.09 glusterfs > 14567 root 20 0 985732 135124 4016 S 31.2 1.7 1:36.66 glusterfs > 14567 root 20 0 985732 144356 4016 S 31.2 1.8 1:55.16 glusterfs > 14567 root 20 0 985732 157132 4016 S 26.7 2.0 2:12.72 glusterfs > 14567 root 20 0 985732 168676 4016 S 40.0 2.1 2:30.25 glusterfs > 14567 root 20 0 985732 182460 4016 S 26.7 2.3 2:49.70 glusterfs > 14567 root 20 0 985732 193696 4016 S 29.4 2.4 3:07.33 glusterfs > 14567 root 20 0 985732 208052 4032 S 33.3 2.6 3:24.99 glusterfs > 14567 root 20 0 985732 219024 4032 S 31.2 2.7 3:42.46 glusterfs Methodologically, it's a bit weird that you write these stats to the filesystem under investigation, but probably you are right and this does not make much of a difference. What would be a useful extension: record these stats not at regular times but after each run of the .. ls -ltR; stat * ... loop. Or even better, keep making records at regular times but *also* after the runs (indicating that a given record is at a run boundary). (Endowed with timeststamps, too.) Then we could see the amount of non-freed memory originating from a given run. > I have even take the process dumps between two intervals, which is available > at /var/run (not in /var/run/gluster but in just /var/run) Do you mean statedumps? Would it be possible to attach them to the BZ entry? (In reply to Csaba Henk from comment #7) > (In reply to nchilaka from comment #2) > > Also, note that I am collecting top and free -h output by appending it to a > > file in the directory i created from this mount. This is the only write IO > > being done from *THIS* mount point (there can be some other IOs from other > > clients) > > > > > > Following is the memory consumption by glusterfs mount process every 1 min > > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND > > 14567 root 20 0 720424 64348 3716 S 0.0 0.8 0:00.29 glusterfs > > 14567 root 20 0 720424 68472 3716 S 0.0 0.9 0:00.32 glusterfs > > 14567 root 20 0 785960 68676 3732 S 0.0 0.9 0:00.38 glusterfs > > 14567 root 20 0 785960 68708 3732 S 0.0 0.9 0:00.42 glusterfs > > 14567 root 20 0 985732 80352 4016 S 18.8 1.0 0:06.80 glusterfs > > 14567 root 20 0 985732 89856 4016 S 31.2 1.1 0:24.41 glusterfs > > 14567 root 20 0 985732 101584 4016 S 31.2 1.3 0:42.36 glusterfs > > 14567 root 20 0 985732 111684 4016 S 0.0 1.4 0:59.91 glusterfs > > 14567 root 20 0 985732 123144 4016 S 31.2 1.5 1:18.09 glusterfs > > 14567 root 20 0 985732 135124 4016 S 31.2 1.7 1:36.66 glusterfs > > 14567 root 20 0 985732 144356 4016 S 31.2 1.8 1:55.16 glusterfs > > 14567 root 20 0 985732 157132 4016 S 26.7 2.0 2:12.72 glusterfs > > 14567 root 20 0 985732 168676 4016 S 40.0 2.1 2:30.25 glusterfs > > 14567 root 20 0 985732 182460 4016 S 26.7 2.3 2:49.70 glusterfs > > 14567 root 20 0 985732 193696 4016 S 29.4 2.4 3:07.33 glusterfs > > 14567 root 20 0 985732 208052 4032 S 33.3 2.6 3:24.99 glusterfs > > 14567 root 20 0 985732 219024 4032 S 31.2 2.7 3:42.46 glusterfs > > Methodologically, it's a bit weird that you write these stats to the > filesystem under investigation, but probably you are right and this does not > make much of a difference. > > What would be a useful extension: record these stats not at regular times > but after each run of the .. ls -ltR; stat * ... loop. Or even better, keep > making records at regular times but *also* after the runs (indicating that a > given record is at a run boundary). (Endowed with timeststamps, too.) > > Then we could see the amount of non-freed memory originating from a given > run. > > > I have even take the process dumps between two intervals, which is available > > at /var/run (not in /var/run/gluster but in just /var/run) > > Do you mean statedumps? Would it be possible to attach them to the BZ entry? I have attached some of the statedumps Also, note that the last lookup message displayed was "trnasport end point error" ./dhcp35-86.lab.eng.blr.redhat.com/dirs/level1.1/level2.1/level3.11/level4.267: ls: ./dhcp35-86.lab.eng.blr.redhat.com/dirs/level1.1/level2.1/level3.11/level4.267/level5.275: Transport endpoint is not connected I tried to cd to the directory but the session hangs I have the treestructure of the directory which hosts many directories, but the problem is that it is a 600MB file. Created attachment 1206220 [details]
statedump1
Created attachment 1206221 [details]
statedump2
Created attachment 1206222 [details]
statedump3
Created attachment 1206223 [details]
5 statedumps tar image
Csaba,
You can look into statedumps available in attachment 1206223 [details]. ignore the other attachments.
The treepath log even after zipping and tarring is still 31MB, which cannot be attached here.
Nag, the RPM http://download.eng.bos.redhat.com/brewroot/work/tasks/434/11900434/glusterfs-fuse-3.8.4-0.el7rhgs.x86_64.rpm delivers a glusterfs fuse client patched to make statedumps with more details about mem ory usage. So could you please: - install the RPM - repeat the test - meanwhile take statedumps the same way you did before - attach the statedums. Created attachment 1216808 [details]
glusterfs-fuse rpm for debugging
The rpm is no longer available at the location indicated by the previous comment, so providing it as attachment.
Can we retest the same with the rpms shared in comment 15 and share the results? Comment on attachment 1216808 [details] glusterfs-fuse rpm for debugging For the record -- the change applied on the source is this one: http://review.gluster.org/13736 We have noticed that the bug is not reproduced in the latest version of the product (RHGS-3.3.1+). If the bug is still relevant and is being reproduced, feel free to reopen the bug. |