Bug 1218535
| Summary: | FSAL_GLUSTER : ganesha daemon crashes when write exceeds brick size | ||||||
|---|---|---|---|---|---|---|---|
| Product: | [Community] GlusterFS | Reporter: | Jiffin <jthottan> | ||||
| Component: | libgfapi | Assignee: | Niels de Vos <ndevos> | ||||
| Status: | CLOSED DUPLICATE | QA Contact: | Sudhir D <sdharane> | ||||
| Severity: | urgent | Docs Contact: | |||||
| Priority: | urgent | ||||||
| Version: | 3.7.3 | CC: | ansubram, bugs, gluster-bugs, jthottan, kkeithle, mmadhusu, ndevos, skoduri | ||||
| Target Milestone: | --- | ||||||
| Target Release: | --- | ||||||
| Hardware: | All | ||||||
| OS: | All | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2015-08-25 11:24:08 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Bug Depends On: | |||||||
| Bug Blocks: | 1248533, 1255471, 1262798 | ||||||
| Attachments: |
|
||||||
Trying to reproduce this, and got a similar segfault (though it was on a getattrs/glfs_fstat call):
(gdb) f 0
#0 0x00007f732dbd827f in __glfs_entry_fd (fd=0x7f730c7a30a0) at glfs-internal.h:193
193 THIS = fd->fd->inode->table->xl->ctx->master;
(gdb) l
188
189
190 static inline void
191 __glfs_entry_fd (struct glfs_fd *fd)
192 {
193 THIS = fd->fd->inode->table->xl->ctx->master;
194 }
195
196
197 /*
(gdb) p *fd
$1 = {
openfds = { <--- empty list
next = 0x7f730c7a30a0,
prev = 0x7f730c7a30a0
},
fs = 0x7f732e6b4660,
offset = 796860416,
fd = 0x7f732ab3703c,
entries = {
next = 0x0,
prev = 0x0
},
next = 0x0,
readdirbuf = 0x0
}
(gdb) p *fd->fd
$2 = {
pid = 1039,
flags = 2,
refcount = 0, <--- refcount is 0!
inode_list = { <--- empty list
next = 0x7f732ab3704c,
prev = 0x7f732ab3704c
},
inode = 0xaaaaaaaa, <--- invalid pointer
lock = 1,
_ctx = 0x7f730c559060,
xl_count = 11,
lk_ctx = 0x7f730c5e9e20,
anonymous = _gf_false
}
(gdb) p *fd->fd->inode
Cannot access memory at address 0xaaaaaaaa
Upstream prevents this issue after added checks with http://review.gluster.org/10759 (also returns EBADFD instead of EINVAL).
http://review.gluster.org/9797 replaces __glfs_entry_fs() with __GLFS_ENTRY_VALIDATE_FS().
In the /tmp/gfapi.log (yes, changed location in newer versions) the following message is repeated a lot:
[2015-08-24 09:19:15.610646] W [client-rpc-fops.c:851:client3_3_writev_cbk] 0-bz1218535-client-0: remote operation failed: No space left on device
My guess is that there is a race where the error is returned by Gluster, but not correctly handled by libgfapi. This would cause an incorrect error to be passed to NFS-Ganesha, which then re-uses the fd/handle that should have been marked invalid.
Including the above two patches should prevent the problem from occuring.
*** This bug has been marked as a duplicate of bug 1240920 *** |
Created attachment 1022108 [details] core file Description of problem: If nfs-client tries to write into file which exceeds brick size , then ganesha daemon crashes with core. Version-Release number of selected component (if applicable): mainline source installated from latest ganesha(2.3dev-1) and gluster sources(mainline) How reproducible: always Steps to Reproduce: 1.mount the exported volume using nfs-ganesha v3 . 2.Try perform write operation which exceeds brick size dd if=/dev/zero of=/mnt/nfs/2/file1 bs=11000K count=1000 conv=sync ; which writes 11GB data to brick of size 10GB Actual results: ganesha daemon crashes after reaching the limit of brick size Expected results: write should complete till the brick size ,stops after that, throwing input/output error. Additional info: The crash is didn't occur for gluster-nfs , v4 ganesha and native mount volume configuration : Volume Name: test Type: Distribute Volume ID: f84a1ad8-2b19-42ae-89b6-d0b8243410be Status: Started Number of Bricks: 1 Transport-type: tcp Bricks: Brick1: 10.70.42.219:/brick/test Options Reconfigured: features.cache-invalidation: off nfs.disable: on size of brick = size of volume mount 10.70.42.219:/test 9.8G 3.2G 6.7G 33% /mnt/nfs/2 nfs client - fedora20(3.17.6-200.fc20.x86_64) bt of the core : #0 0x00000030b540e273 in __glfs_entry_fd (glfd=<value optimized out>) at glfs-internal.h:222 #1 pub_glfs_close (glfd=<value optimized out>) at glfs-fops.c:161 #2 0x00007fdd64579b5d in file_close (obj_hdl=0x7fdd24002358) at /root/lat_nfs_ganesha/src/FSAL/FSAL_GLUSTER/handle.c:1307 #3 0x00000000004f06b2 in cache_inode_close (entry=0x7fdd24002620, flags=168) at /root/lat_nfs_ganesha/src/cache_inode/cache_inode_open_close.c:305 #4 0x00000000004e5b44 in cache_inode_rdwr_plus (entry=0x7fdd24002620, io_direction=CACHE_INODE_WRITE, offset=10112888832, io_size=1048576, bytes_moved=0x7fdd41b75e28, buffer=0x7fdd044c2430, eof=0x7fdd41b75e27, sync=0x7fdd41b75e26, info=0x0) at /root/lat_nfs_ganesha/src/cache_inode/cache_inode_rdwr.c:227 #5 0x00000000004e64cd in cache_inode_rdwr (entry=0x7fdd24002620, io_direction=CACHE_INODE_WRITE, offset=10112888832, io_size=1048576, bytes_moved=0x7fdd41b75e28, buffer=0x7fdd044c2430, eof=0x7fdd41b75e27, sync=0x7fdd41b75e26) at /root/lat_nfs_ganesha/src/cache_inode/cache_inode_rdwr.c:304 #6 0x000000000045eed2 in nfs3_write (arg=0x7fdd040c1858, worker=0x7fdd1c0008c0, req=0x7fdd040c1718, res=0x7fdd1c009fa0) at /root/lat_nfs_ganesha/src/Protocols/NFS/nfs3_write.c:234 #7 0x00000000004548ac in nfs_rpc_execute (req=0x7fdd04000f10, worker_data=0x7fdd1c0008c0) at /root/lat_nfs_ganesha/src/MainNFSD/nfs_worker_thread.c:1268 #8 0x0000000000455646 in worker_run (ctx=0x101ac60) at /root/lat_nfs_ganesha/src/MainNFSD/nfs_worker_thread.c:1535 #9 0x000000000051bf5e in fridgethr_start_routine (arg=0x101ac60) at /root/lat_nfs_ganesha/src/support/fridgethr.c:562 #10 0x00000039090079d1 in start_thread () from /lib64/libpthread.so.0 #11 0x0000003908ce88fd in clone () from /lib64/libc.so.6