Created attachment 991272 [details] Client mount log Description of problem: ======================= Enabled epoll with server.event-threads and client.event-threads set to and then started renames of files 1000 at a time on the client. After some time the client crashed. I am not sure if this has to do anything with epoll. Seen the crash after enabling that. Version-Release number of selected component (if applicable): ============================================================= glusterfs 3.7dev built on Feb 8 2015 01:04:29 Volume options: ============== [root@ninja ~]# gluster volume get testvol all Option Value ------ ----- cluster.lookup-unhashed on cluster.min-free-disk 10% cluster.min-free-inodes 5% cluster.rebalance-stats off cluster.subvols-per-directory (null) cluster.readdir-optimize off cluster.rsync-hash-regex (null) cluster.extra-hash-regex (null) cluster.dht-xattr-name trusted.glusterfs.dht cluster.randomize-hash-range-by-gfid off cluster.local-volume-name (null) cluster.weighted-rebalance on cluster.switch-pattern (null) cluster.entry-change-log on cluster.read-subvolume (null) cluster.read-subvolume-index -1 cluster.read-hash-mode 1 cluster.background-self-heal-count 16 cluster.metadata-self-heal on cluster.data-self-heal on cluster.entry-self-heal on cluster.self-heal-daemon on cluster.heal-timeout 600 cluster.self-heal-window-size 1 cluster.data-change-log on cluster.metadata-change-log on cluster.data-self-heal-algorithm (null) cluster.eager-lock on cluster.quorum-type none cluster.quorum-count (null) cluster.choose-local true cluster.self-heal-readdir-size 1KB cluster.post-op-delay-secs 1 cluster.ensure-durability on cluster.stripe-block-size 128KB cluster.stripe-coalesce true diagnostics.latency-measurement off diagnostics.dump-fd-stats off diagnostics.count-fop-hits off diagnostics.brick-log-level INFO diagnostics.client-log-level INFO diagnostics.brick-sys-log-level CRITICAL diagnostics.client-sys-log-level CRITICAL diagnostics.brick-logger (null) diagnostics.client-logger (null) diagnostics.brick-log-format (null) diagnostics.client-log-format (null) diagnostics.brick-log-buf-size 5 diagnostics.client-log-buf-size 5 diagnostics.brick-log-flush-timeout 120 diagnostics.client-log-flush-timeout 120 performance.cache-max-file-size 0 performance.cache-min-file-size 0 performance.cache-refresh-timeout 1 performance.cache-priority performance.cache-size 32MB performance.io-thread-count 16 performance.high-prio-threads 16 performance.normal-prio-threads 16 performance.low-prio-threads 16 performance.least-prio-threads 1 performance.enable-least-priority on performance.least-rate-limit 0 performance.cache-size 128MB performance.flush-behind on performance.nfs.flush-behind on performance.write-behind-window-size 1MB performance.nfs.write-behind-window-size1MB performance.strict-o-direct off performance.nfs.strict-o-direct off performance.strict-write-ordering off performance.nfs.strict-write-ordering off performance.lazy-open yes performance.read-after-open no performance.read-ahead-page-count 4 performance.md-cache-timeout 1 features.encryption off encryption.master-key (null) encryption.data-key-size 256 encryption.block-size 4096 network.frame-timeout 1800 network.ping-timeout 42 network.tcp-window-size (null) features.lock-heal off features.grace-timeout 10 network.remote-dio disable client.event-threads 4 network.tcp-window-size (null) network.inode-lru-limit 16384 auth.allow * auth.reject (null) transport.keepalive (null) server.allow-insecure (null) server.root-squash off server.anonuid 65534 server.anongid 65534 server.statedump-path /var/run/gluster server.outstanding-rpc-limit 64 features.lock-heal off features.grace-timeout (null) server.ssl (null) auth.ssl-allow * server.manage-gids off client.send-gids on server.gid-timeout 2 server.own-thread (null) server.event-threads 4 performance.write-behind on performance.read-ahead on performance.readdir-ahead off performance.io-cache on performance.quick-read on performance.open-behind on performance.stat-prefetch on performance.client-io-threads off performance.nfs.write-behind on performance.nfs.read-ahead off performance.nfs.io-cache off performance.nfs.quick-read off performance.nfs.stat-prefetch off performance.nfs.io-threads off performance.force-readdirp true features.file-snapshot off features.uss off features.snapshot-directory .snaps features.show-snapshot-directory off network.compression off network.compression.window-size -15 network.compression.mem-level 8 network.compression.min-size 0 network.compression.compression-level -1 network.compression.debug false features.limit-usage (null) features.quota-timeout 0 features.default-soft-limit 80% features.soft-timeout 60 features.hard-timeout 5 features.alert-time 86400 features.quota-deem-statfs off geo-replication.indexing off geo-replication.indexing off geo-replication.ignore-pid-check off geo-replication.ignore-pid-check off features.quota on debug.trace off debug.log-history no debug.log-file no debug.exclude-ops (null) debug.include-ops (null) debug.error-gen off debug.error-failure (null) debug.error-number (null) debug.random-failure off debug.error-fops (null) nfs.enable-ino32 no nfs.mem-factor 15 nfs.export-dirs on nfs.export-volumes on nfs.addr-namelookup off nfs.dynamic-volumes off nfs.register-with-portmap on nfs.outstanding-rpc-limit 16 nfs.port 2049 nfs.rpc-auth-unix on nfs.rpc-auth-null on nfs.rpc-auth-allow all nfs.rpc-auth-reject none nfs.ports-insecure off nfs.trusted-sync off nfs.trusted-write off nfs.volume-access read-write nfs.export-dir nfs.disable false nfs.nlm on nfs.acl on nfs.mount-udp off nfs.mount-rmtab /var/lib/glusterd/nfs/rmtab nfs.rpc-statd /sbin/rpc.statd nfs.server-aux-gids off nfs.drc off nfs.drc-size 0x20000 nfs.read-size (1 * 1048576ULL) nfs.write-size (1 * 1048576ULL) nfs.readdir-size (1 * 1048576ULL) features.read-only off features.worm off storage.linux-aio off storage.batch-fsync-mode reverse-fsync storage.batch-fsync-delay-usec 0 storage.owner-uid -1 storage.owner-gid -1 storage.node-uuid-pathinfo off storage.health-check-interval 30 storage.build-pgfid off storage.bd-aio off cluster.server-quorum-type off cluster.server-quorum-ratio 0 changelog.changelog off changelog.changelog-dir (null) changelog.encoding ascii changelog.rollover-time 15 changelog.fsync-interval 5 changelog.changelog-barrier-timeout 120 features.barrier disable features.barrier-timeout 120 locks.trace disable cluster.disperse-self-heal-daemon enable Gluster volume status: ====================== [root@ninja ~]# gluster volume status Status of volume: testvol Gluster process Port Online Pid ------------------------------------------------------------------------------ Brick ninja:/rhs/brick1/b1 49152 Y 29212 Brick vertigo:/rhs/brick1/b1 49152 Y 13852 Brick ninja:/rhs/brick2/b2 49153 Y 13041 Brick vertigo:/rhs/brick2/b2 49153 Y 13864 Brick ninja:/rhs/brick3/b3 49154 Y 13053 Brick vertigo:/rhs/brick3/b3 49158 Y 13876 NFS Server on localhost 2049 Y 29220 Quota Daemon on localhost N/A Y 29237 NFS Server on vertigo 2049 Y 13891 Quota Daemon on vertigo N/A Y 13907 Task Status of Volume testvol ------------------------------------------------------------------------------ There are no active volume tasks Gluster volume info: ==================== [root@ninja ~]# gluster volume info Volume Name: testvol Type: Disperse Volume ID: 21ed8908-3458-4834-b93d-161b694c3e37 Status: Started Number of Bricks: 1 x (4 + 2) = 6 Transport-type: tcp Bricks: Brick1: ninja:/rhs/brick1/b1 Brick2: vertigo:/rhs/brick1/b1 Brick3: ninja:/rhs/brick2/b2 Brick4: vertigo:/rhs/brick2/b2 Brick5: ninja:/rhs/brick3/b3 Brick6: vertigo:/rhs/brick3/b3 Options Reconfigured: client.event-threads: 4 server.event-threads: 4 features.barrier: disable features.quota: on cluster.disperse-self-heal-daemon: enable [root@ninja ~]# How reproducible: ================= Tried only once Steps to Reproduce: 1. Fuse mount 1x(4+2) disperse volume 2. Create huge number of files/directories and rename them in 1000's in parallel 3. Actual results: =============== Client crashed Expected results: ================= No crash or other errors should be seen Additional info: Attaching the client mount log and corefile.
Created attachment 991275 [details] Client corefile
Attached the client corefile.
REVIEW: http://review.gluster.org/9995 (cluster/ec: Use fd when appropriate for updating size/version) posted (#1) for review on master by Pranith Kumar Karampuri (pkarampu)
COMMIT: http://review.gluster.org/9995 committed in master by Vijay Bellur (vbellur) ------ commit 1f655aed1e439935c24da86fa70f19fd9151e7e8 Author: Pranith Kumar K <pkarampu> Date: Wed Mar 25 15:32:42 2015 +0530 cluster/ec: Use fd when appropriate for updating size/version Change-Id: I5d3aca101c8cdda406d31d06c40404fa6a2b7170 BUG: 1192378 Signed-off-by: Pranith Kumar K <pkarampu> Reviewed-on: http://review.gluster.org/9995 Tested-by: Gluster Build System <jenkins.com> Reviewed-by: Kaleb KEITHLEY <kkeithle> Reviewed-by: Dan Lambright <dlambrig>
The client has crashed again with the same backtrace. Reopening the bug. This is on the latest 3.7 nightly build. [root@rhs-client32 core]# gluster --version glusterfs 3.7.0beta1 built on May 1 2015 01:45:36 Repository revision: git://git.gluster.com/glusterfs.git Copyright (c) 2006-2011 Gluster Inc. <http://www.gluster.com> GlusterFS comes with ABSOLUTELY NO WARRANTY. You may redistribute copies of GlusterFS under the terms of the GNU General Public License. [root@rhs-client32 core]# (gdb) bt #0 dht_writev_cbk (frame=0x7f653ab709e8, cookie=0x7f653ab92164, this=0x7f652c00ed80, op_ret=-1, op_errno=2, prebuf=0x0, postbuf=0x0, xdata=0x0) at dht-inode-write.c:59 #1 0x0000003a37e2fd5c in default_writev_cbk (frame=0x7f653ab92164, cookie=<value optimized out>, this=<value optimized out>, op_ret=-1, op_errno=2, prebuf=<value optimized out>, postbuf=0x0, xdata=0x0) at defaults.c:1019 #2 0x00007f653061d26d in ec_manager_writev (fop=0x7f652a423724, state=<value optimized out>) at ec-inode-write.c:1632 #3 0x00007f6530604b74 in __ec_manager (fop=0x7f652a423724, error=<value optimized out>) at ec-common.c:1642 #4 0x00007f6530604981 in ec_resume (fop=0x7f652a423724, error=0) at ec-common.c:313 #5 0x00007f6530620286 in ec_combine (newcbk=0x7f6529dc3e94, combine=<value optimized out>) at ec-combine.c:936 #6 0x00007f653061ec8e in ec_inode_write_cbk (frame=<value optimized out>, this=0x7f652c00d500, cookie=<value optimized out>, op_ret=-1, op_errno=<value optimized out>, prestat=0x7f652af83b10, poststat=0x7f652af83aa0, xdata=0x0) at ec-inode-write.c:60 #7 0x00007f653087b94c in client3_3_writev_cbk (req=<value optimized out>, iov=<value optimized out>, count=<value optimized out>, myframe=0x7f653ab9506c) at client-rpc-fops.c:856 #8 0x0000003a3820ec45 in rpc_clnt_handle_reply (clnt=0x7f652c1171f0, pollin=0x7f65241c4740) at rpc-clnt.c:766 #9 0x0000003a382100e2 in rpc_clnt_notify (trans=<value optimized out>, mydata=0x7f652c117220, event=<value optimized out>, data=<value optimized out>) at rpc-clnt.c:894 #10 0x0000003a3820b7b8 in rpc_transport_notify (this=<value optimized out>, event=<value optimized out>, data=<value optimized out>) at rpc-transport.c:543 #11 0x00007f65318b7bcd in socket_event_poll_in (this=0x7f652c126e60) at socket.c:2290 #12 0x00007f65318b96fd in socket_event_handler (fd=<value optimized out>, idx=<value optimized out>, data=0x7f652c126e60, poll_in=1, poll_out=0, poll_err=0) at socket.c:2403 #13 0x0000003a37e7d4b0 in event_dispatch_epoll_handler (data=0x7f652c027700) at event-epoll.c:572 #14 event_dispatch_epoll_worker (data=0x7f652c027700) at event-epoll.c:674 #15 0x0000003d144079d1 in start_thread (arg=0x7f652af84700) at pthread_create.c:301 #16 0x0000003d13ce88fd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115 (gdb)
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.0, please open a new bug report. glusterfs-3.7.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution. [1] http://thread.gmane.org/gmane.comp.file-systems.gluster.devel/10939 [2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user