Bug 1369349
Summary: | enable trash, then truncate a large file lead to glusterfsd segfault | ||
---|---|---|---|
Product: | [Community] GlusterFS | Reporter: | jiademing.dd <iesool> |
Component: | trash-xlator | Assignee: | Jiffin <jthottan> |
Status: | CLOSED DEFERRED | QA Contact: | |
Severity: | high | Docs Contact: | |
Priority: | unspecified | ||
Version: | mainline | CC: | anoopcs, atumball, bugs, jbyers, jthottan, wzmvincent |
Target Milestone: | --- | Keywords: | Triaged |
Target Release: | --- | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2019-05-09 20:07:05 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
jiademing.dd
2016-08-23 07:57:00 UTC
when truncate a large file, trash will read src and write dst, util the end of the file.However, the whole process is many recursive calls in the trash.so if the file is too large, too many recursive calls, will lead to stack overflow. Can we use syncop instead of STACK_WIND to avoid recursive call? Now when truncate a large file, trash will cost a lot of time(trash-max-filesize=1GB). We cannot use syncop infra here. It is usually used in glusterfs client code path. Thanks for pointing out this issue. I will try to reproduce this issue and let you know (In reply to Jiffin from comment #3) > We cannot use syncop infra here. It is usually used in glusterfs client code > path. Thanks for pointing out this issue. I will try to reproduce this issue > and let you know Hi, Jiffin, I've met the same problem. 1) gluster volume set v1 features.trash on 2) gluster volume set v1 features.trash-max-filesize 1GB 3) mount -t glusterfs 127.0.0.1:v1 /mnt/test 4) dd if=/dev/zero of=/mnt/test/d1 bs=1M count=150 5) dd if=/dev/zero of=/mnt/test/d1 bs=1M count=150 (second time) After the fifth step, glusterfsd went down. I changed the stack size to unlimited, the problem still exists. Here's some info: [root@node12 7]# gluster v info v1 Volume Name: v1 Type: Distribute Volume ID: 50429860-d368-49fe-aa8e-1b06a1ec5a44 Status: Started Number of Bricks: 1 Transport-type: tcp Bricks: Brick1: node12:/data/2fe9ae62-3e0c-4f7f-be1d-d023732e4c36/v2/brick Options Reconfigured: features.trash-max-filesize: 1GB diagnostics.brick-log-level: INFO nfs.disable: on user.smb: disable auth.allow: node12,node13,, performance.client-io-threads: on performance.io-thread-count: 16 performance.write-behind: on performance.flush-behind: on performance.strict-o-direct: on performance.write-behind-window-size: 32MB performance.io-cache: on performance.cache-size: 64MB performance.cache-refresh-timeout: 1 features.trash: on diagnostics.client-log-level: INFO [root@node12 7]# ulimit -a core file size (blocks, -c) unlimited data seg size (kbytes, -d) unlimited scheduling priority (-e) 0 file size (blocks, -f) unlimited pending signals (-i) 3878 max locked memory (kbytes, -l) 64 max memory size (kbytes, -m) unlimited open files (-n) 1024 pipe size (512 bytes, -p) 8 POSIX message queues (bytes, -q) 819200 real-time priority (-r) 0 stack size (kbytes, -s) unlimited cpu time (seconds, -t) unlimited max user processes (-u) 3878 virtual memory (kbytes, -v) unlimited file locks (-x) unlimited Is the any work around to avoid this problem, or any plan to solve this problem? Thanks! (In reply to WuVT from comment #4) > (In reply to Jiffin from comment #3) > > We cannot use syncop infra here. It is usually used in glusterfs client code > > path. Thanks for pointing out this issue. I will try to reproduce this issue > > and let you know > > Hi, Jiffin, I've met the same problem. > 1) gluster volume set v1 features.trash on > 2) gluster volume set v1 features.trash-max-filesize 1GB > 3) mount -t glusterfs 127.0.0.1:v1 /mnt/test > 4) dd if=/dev/zero of=/mnt/test/d1 bs=1M count=150 > 5) dd if=/dev/zero of=/mnt/test/d1 bs=1M count=150 (second time) > After the fifth step, glusterfsd went down. > I changed the stack size to unlimited, the problem still exists. > Here's some info: > [root@node12 7]# gluster v info v1 > > Volume Name: v1 > Type: Distribute > Volume ID: 50429860-d368-49fe-aa8e-1b06a1ec5a44 > Status: Started > Number of Bricks: 1 > Transport-type: tcp > Bricks: > Brick1: node12:/data/2fe9ae62-3e0c-4f7f-be1d-d023732e4c36/v2/brick > Options Reconfigured: > features.trash-max-filesize: 1GB > diagnostics.brick-log-level: INFO > nfs.disable: on > user.smb: disable > auth.allow: node12,node13,, > performance.client-io-threads: on > performance.io-thread-count: 16 > performance.write-behind: on > performance.flush-behind: on > performance.strict-o-direct: on > performance.write-behind-window-size: 32MB > performance.io-cache: on > performance.cache-size: 64MB > performance.cache-refresh-timeout: 1 > features.trash: on > diagnostics.client-log-level: INFO > > [root@node12 7]# ulimit -a > core file size (blocks, -c) unlimited > data seg size (kbytes, -d) unlimited > scheduling priority (-e) 0 > file size (blocks, -f) unlimited > pending signals (-i) 3878 > max locked memory (kbytes, -l) 64 > max memory size (kbytes, -m) unlimited > open files (-n) 1024 > pipe size (512 bytes, -p) 8 > POSIX message queues (bytes, -q) 819200 > real-time priority (-r) 0 > stack size (kbytes, -s) unlimited > cpu time (seconds, -t) unlimited > max user processes (-u) 3878 > virtual memory (kbytes, -v) unlimited > file locks (-x) unlimited > > Is the any work around to avoid this problem, or any plan to solve this > problem? Thanks! IMO the findings of jiademing.dd( iesool) is correct. It happens only if tries to truncate very large files(Till files of size 20M is fine). Original solution requires lot of change in the code base.(may b target for 3.11). As workaround I can sent a patch which won't stores files large than 10M to trash directory during truncate operation (In reply to Jiffin from comment #5) > IMO the findings of jiademing.dd( iesool) is correct. It happens > only if tries to truncate very large files(Till files of size 20M is fine). > Original solution requires lot of change in the code base.(may b target for > 3.11). > As workaround I can sent a patch which won't stores files large than 10M to > trash directory during truncate operation In my production env, they would like to keep files in trash for a few days, and the size is from hundreds of MBs to tens of GBs. In the code of 3.7.20, the old file is copied to trash directory. Is it possible to implement trash truncate by calling posix_rename? (In reply to WuVT from comment #6) > (In reply to Jiffin from comment #5) > > IMO the findings of jiademing.dd( iesool) is correct. It happens > > only if tries to truncate very large files(Till files of size 20M is fine). > > Original solution requires lot of change in the code base.(may b target for > > 3.11). > > As workaround I can sent a patch which won't stores files large than 10M to > > trash directory during truncate operation > In my production env, they would like to keep files in trash for a few days, > and the size is from hundreds of MBs to tens of GBs. > In the code of 3.7.20, the old file is copied to trash directory. Is it > possible to implement trash truncate by calling posix_rename? Sorry I didn't get this. Incase of truncate we need to copy the old (original file) to trash directory before performing the truncate. so I don't understand how rename will helpful here? The change which I am talking as work around will only effect truncated files. For the deleted files it will work based on the limit which have set(trash-max-file-size). (In reply to Jiffin from comment #7) > Sorry I didn't get this. Incase of truncate we need to copy the old > (original file) to trash directory before performing the truncate. so I > don't understand how rename will helpful here? > > The change which I am talking as work around will only effect truncated > files. For the deleted files it will work based on the limit which have > set(trash-max-file-size). Sorry for my poor English. I misunderstood the meaning of trash truncate. I need to learn the function of vfs->fops. Another question, I tried to comment out truncate and ftruncate of trash-fops, like this: struct xlator_fops fops = { .unlink = trash_unlink, // .truncate = trash_truncate, // .ftruncate = trash_ftruncate, .rmdir = trash_rmdir, .mkdir = trash_mkdir, .rename = trash_rename, }; Is there any bad effects by doing this? I tested the recycle of samba-4.2.3, it seems like that the recycle dosn't deal with truncated files. (In reply to WuVT from comment #8) > (In reply to Jiffin from comment #7) > > Sorry I didn't get this. Incase of truncate we need to copy the old > > (original file) to trash directory before performing the truncate. so I > > don't understand how rename will helpful here? > > > > The change which I am talking as work around will only effect truncated > > files. For the deleted files it will work based on the limit which have > > set(trash-max-file-size). > > Sorry for my poor English. > I misunderstood the meaning of trash truncate. I need to learn the function > of vfs->fops. > Another question, I tried to comment out truncate and ftruncate of > trash-fops, like this: > struct xlator_fops fops = { > .unlink = trash_unlink, > // .truncate = trash_truncate, > // .ftruncate = trash_ftruncate, > .rmdir = trash_rmdir, > .mkdir = trash_mkdir, > .rename = trash_rename, > }; > Is there any bad effects by doing this? It will disable for trash feature for truncate operations. If are u not worried about truncated files, then it is perfectly okay to do it. > I tested the recycle of samba-4.2.3, it seems like that the recycle dosn't > deal with truncated files. As there are work around which exists to get over the issue, would like to fix the issue later (if we get to it). For now, CLOSING with DEFERRED. |