Description of problem: during heavy Gluster-NFS runs I see noisy gluster log that makes it difficult to see if something is wrong. Can we quiet down the log so that in normal mode it doesn't log so many informational messages? Otherwise it effectively makes the log unusable. I can filter but shouldn't have to. Ideally there would be a way to turn the log level up or down without restarting gluster daemons, I tried gluster volume set testfs diagnostics.brick-log-level WARNING gluster volume set testfs diagnostics.client-log-level WARNING But it didn't seem to help. Version-Release number of selected component (if applicable): glusterfs-3.3.0qa33-1.x86_64 How reproducible: run netmist with Gluster-NFS, I haven't evaluated log traffic while running it with native client yet, haven't evaluated with other applications. Steps to Reproduce: 1.start workload 2.tail -f below file 3.try using gluster command to filter informational log messages Actual results: no filtering seen Expected results: expected informational messages to stop, just warnings and higher-severity messages. Additional info: There were kernel stack traces occurring on system at same time (different bz that I filed), maybe these somehow caused extra informational messages. For example of noisy /var/log/glusterfs/etc-glusterfs-glusterd.vol.log [2012-04-13 13:13:39.609879] I [glusterd-op-sm.c:2222:glusterd_op_ac_send_commit_op] 0-management: Sent op req to 3 peers [2012-04-13 13:13:39.613276] I [glusterd-rpc-ops.c:1294:glusterd3_1_commit_op_cbk] 0-glusterd: Received ACC from uuid: 51b63711-df88-465f-94f5-64c3bf6c17b2 [2012-04-13 13:13:39.614645] I [glusterd-rpc-ops.c:1294:glusterd3_1_commit_op_cbk] 0-glusterd: Received ACC from uuid: 22b84f37-b80f-43e9-857c-dcd8ab295e28 [2012-04-13 13:13:39.616463] I [glusterd-rpc-ops.c:1294:glusterd3_1_commit_op_cbk] 0-glusterd: Received ACC from uuid: f7504c34-e863-41c6-9078-0fd8f9472d08 [2012-04-13 13:13:39.618502] I [glusterd-rpc-ops.c:606:glusterd3_1_cluster_unlock_cbk] 0-glusterd: Received ACC from uuid: 51b63711-df88-465f-94f5-64c3bf6c17b2 [2012-04-13 13:13:39.618561] I [glusterd-rpc-ops.c:606:glusterd3_1_cluster_unlock_cbk] 0-glusterd: Received ACC from uuid: f7504c34-e863-41c6-9078-0fd8f9472d08 [2012-04-13 13:13:39.618593] I [glusterd-rpc-ops.c:606:glusterd3_1_cluster_unlock_cbk] 0-glusterd: Received ACC from uuid: 22b84f37-b80f-43e9-857c-dcd8ab295e28 [2012-04-13 13:13:39.618625] I [glusterd-op-sm.c:2491:glusterd_op_txn_complete] 0-glusterd: Cleared local lock [2012-04-13 13:13:40.281756] I [glusterd-handler.c:495:glusterd_handle_cluster_lock] 0-glusterd: Received LOCK from uuid: f7504c34-e863-41c6-9078-0fd8f9472d08 [2012-04-13 13:13:40.281817] I [glusterd-utils.c:271:glusterd_lock] 0-glusterd: Cluster lock held by f7504c34-e863-41c6-9078-0fd8f9472d08 [2012-04-13 13:13:40.281856] I [glusterd-handler.c:1313:glusterd_op_lock_send_resp] 0-glusterd: Responded, ret: 0 [2012-04-13 13:13:40.282221] I [glusterd-handler.c:540:glusterd_req_ctx_create] 0-glusterd: Received op from uuid: f7504c34-e863-41c6-9078-0fd8f9472d08 [2012-04-13 13:13:40.282324] I [glusterd-handler.c:1415:glusterd_op_stage_send_resp] 0-glusterd: Responded to stage, ret: 0 [2012-04-13 13:13:40.284855] I [glusterd-handler.c:540:glusterd_req_ctx_create] 0-glusterd: Received op from uuid: f7504c34-e863-41c6-9078-0fd8f9472d08 [2012-04-13 13:13:40.288025] I [glusterd-handler.c:1456:glusterd_op_commit_send_resp] 0-glusterd: Responded to commit, ret: 0 [2012-04-13 13:13:40.292972] I [glusterd-handler.c:1357:glusterd_handle_cluster_unlock] 0-glusterd: Received UNLOCK from uuid: f7504c34-e863-41c6-9078-0fd8f9472d08 [2012-04-13 13:13:40.293024] I [glusterd-handler.c:1333:glusterd_op_unlock_send_resp] 0-glusterd: Responded to unlock, ret: 0 [2012-04-13 13:13:43.032425] I [glusterd-handler.c:495:glusterd_handle_cluster_lock] 0-glusterd: Received LOCK from uuid: 22b84f37-b80f-43e9-857c-dcd8ab295e28 [2012-04-13 13:13:43.032486] I [glusterd-utils.c:271:glusterd_lock] 0-glusterd: Cluster lock held by 22b84f37-b80f-43e9-857c-dcd8ab295e28 [2012-04-13 13:13:43.032525] I [glusterd-handler.c:1313:glusterd_op_lock_send_resp] 0-glusterd: Responded, ret: 0 [2012-04-13 13:13:43.032857] I [glusterd-handler.c:540:glusterd_req_ctx_create] 0-glusterd: Received op from uuid: 22b84f37-b80f-43e9-857c-dcd8ab295e28 [2012-04-13 13:13:43.032955] I [glusterd-handler.c:1415:glusterd_op_stage_send_resp] 0-glusterd: Responded to stage, ret: 0 [2012-04-13 13:13:43.035372] I [glusterd-handler.c:540:glusterd_req_ctx_create] 0-glusterd: Received op from uuid: 22b84f37-b80f-43e9-857c-dcd8ab295e28 [2012-04-13 13:13:43.038426] I [glusterd-handler.c:1456:glusterd_op_commit_send_resp] 0-glusterd: Responded to commit, ret: 0 [2012-04-13 13:13:43.043831] I [glusterd-handler.c:1357:glusterd_handle_cluster_unlock] 0-glusterd: Received UNLOCK from uuid: 22b84f37-b80f-43e9-857c-dcd8ab295e28 [2012-04-13 13:13:43.043878] I [glusterd-handler.c:1333:glusterd_op_unlock_send_resp] 0-glusterd: Responded to unlock, ret: 0
planning to solve this command by command. first effort is @ http://review.gluster.org/4353
CHANGE: http://review.gluster.org/4353 (glusterd: log enhancements for volume start) merged in master by Vijay Bellur (vbellur)
CHANGE: http://review.gluster.org/4382 (glusterd: log changes in volume stop (and in op sm codepath)) merged in master by Anand Avati (avati)
CHANGE: http://review.gluster.org/4462 (glusterd: log changes in volume create and delete codepath) merged in master by Anand Avati (avati)
CHANGE: http://review.gluster.org/4463 (glusterd: remove extra call to glusterd_volume_compute_cksum()) merged in master by Anand Avati (avati)
CHANGE: http://review.gluster.org/4543 (glusterd: log changes in volume set and volume reset codepath) merged in master by Anand Avati (avati)
CHANGE: http://review.gluster.org/4585 (mgmt/glusterd: Move found-brick logs to DEBUG) merged in master by Vijay Bellur (vbellur)
Krutika, wanted to check if rhs-2.1 branch has all the patches listed above. If not, we would need a backport.
Amar, rhs-2.1 does have all the patches submitted for this bug.
When running volume start I am still seeing in etc-glusterfs-glusterd.vol.log: [2013-05-07 15:26:47.763213] I [rpc-clnt.c:961:rpc_clnt_connection_init] 0-management: setting frame-timeout to 600 [2013-05-07 15:26:47.763269] I [socket.c:3482:socket_init] 0-management: SSL support is NOT enabled [2013-05-07 15:26:47.763281] I [socket.c:3497:socket_init] 0-management: using system polling thread [2013-05-07 15:26:48.241131] I [rpc-clnt.c:961:rpc_clnt_connection_init] 0-management: setting frame-timeout to 600 [2013-05-07 15:26:48.241204] I [socket.c:3482:socket_init] 0-management: SSL support is NOT enabled [2013-05-07 15:26:48.241216] I [socket.c:3497:socket_init] 0-management: using system polling thread [2013-05-07 15:26:48.241287] I [socket.c:2237:socket_event_handler] 0-transport: disconnecting now [2013-05-07 15:26:48.254656] I [rpc-clnt.c:961:rpc_clnt_connection_init] 0-management: setting frame-timeout to 600 [2013-05-07 15:26:48.254767] I [socket.c:3482:socket_init] 0-management: SSL support is NOT enabled [2013-05-07 15:26:48.254781] I [socket.c:3497:socket_init] 0-management: using system polling thread [2013-05-07 15:26:48.254861] I [socket.c:2237:socket_event_handler] 0-transport: disconnecting now Could we clean those up as well?
On both volume start and volume stop the /var/log/glusterfs/bricks/<my brick> log looks much improved. A couple pieces of feedback: 1. Easier to decipher messages. When I run gluster volume stop the only message that gets generated is: [2013-05-07 15:37:25.337381] W [glusterfsd.c:970:cleanup_and_exit] (-->/lib64/libc.so.6() [0x36ac843b70] (-->/usr/lib64/libglusterfs.so.0(synctask_wrap+0x12) [0x7fb782140c02] (-->/usr/sbin/glusterfsd(glusterfs_handle_terminate+0x15) [0x40a825]))) 0-: received signum (15), shutting down This is a great improvement WRT to cleaning up messages that we dont need to see but I feel like the messages we are displaying could be a bit more informative. What was shut down? If it was a volume which volume was shut down? Without having just run the command it would be difficult to look back at this and know the volume stop command was run. Same goes for volume start: [2013-05-07 17:18:38.501063] I [glusterfsd.c:1878:main] 0-/usr/sbin/glusterfsd: Started running /usr/sbin/glusterfsd version 3.4.0.4rhs (/usr/sbin/glusterfsd -s 10.13.129.122 --volfile-id repvol.10.13.129.122.br ick -p /var/lib/glusterd/vols/repvol/run/10.13.129.122-brick.pid -S /var/run/f6183931da455b0aa8d77208625f95a4.socket --brick-name /brick -l /var/log/glusterfs/bricks/brick.log --xlator-option *-posix.glusterd-uu id=449ceab6-5a85-4cbb-a4cc-f6738bbc85e6 --brick-port 49152 --xlator-option repvol-server.listen-port=49152) 2. While the looks of the brick log are much improved I am still seeing some noise in nfs.log and etc-glusterfs-glusterd.vol.log. Can we revisit those for volume start/stop? Comment #13 has what I am seeing for volume start, here is what I am seeing for volume stop: [2013-05-07 15:37:25.341411] I [socket.c:2237:socket_event_handler] 0-transport: disconnecting now [2013-05-07 15:37:25.351662] E [run.c:190:runner_log] 0-management: Failed to execute script: /var/lib/glusterd/hooks/1/stop/pre/S40ufo-stop.pyc --volname=repvol --last=yes [2013-05-07 15:37:25.358633] E [run.c:190:runner_log] 0-management: Failed to execute script: /var/lib/glusterd/hooks/1/stop/pre/S40ufo-stop.pyo --volname=repvol --last=yes [2013-05-07 15:37:25.358705] I [mem-pool.c:541:mem_pool_destroy] 0-management: size=2236 max=1 total=1 [2013-05-07 15:37:25.358723] I [mem-pool.c:541:mem_pool_destroy] 0-management: size=124 max=1 total=1 [2013-05-07 15:37:25.358747] I [glusterd-utils.c:1182:glusterd_service_stop] 0-management: brick already stopped [2013-05-07 15:37:25.358792] E [glusterd-utils.c:1477:glusterd_brick_unlink_socket_file] 0-management: Failed to remove /var/run/f6183931da455b0aa8d77208625f95a4.socket error: No such file or directory [2013-05-07 15:37:25.774854] I [mem-pool.c:541:mem_pool_destroy] 0-management: size=2236 max=0 total=0 [2013-05-07 15:37:25.774895] I [mem-pool.c:541:mem_pool_destroy] 0-management: size=124 max=0 total=0 [2013-05-07 15:37:26.778168] E [glusterd-utils.c:3625:glusterd_nodesvc_unlink_socket_file] 0-management: Failed to remove /var/run/b504062ac1189ba0e64032b11e3a9887.socket error: Permission denied [2013-05-07 15:37:26.778631] I [glusterd-utils.c:3659:glusterd_nfs_pmap_deregister] 0-: De-registered MOUNTV3 successfully [2013-05-07 15:37:26.778876] I [glusterd-utils.c:3664:glusterd_nfs_pmap_deregister] 0-: De-registered MOUNTV1 successfully [2013-05-07 15:37:26.779158] I [glusterd-utils.c:3669:glusterd_nfs_pmap_deregister] 0-: De-registered NFSV3 successfully [2013-05-07 15:37:26.779399] I [glusterd-utils.c:3674:glusterd_nfs_pmap_deregister] 0-: De-registered NLM v4 successfully [2013-05-07 15:37:26.779657] I [glusterd-utils.c:3679:glusterd_nfs_pmap_deregister] 0-: De-registered NLM v1 successfully [2013-05-07 15:37:26.779751] I [mem-pool.c:541:mem_pool_destroy] 0-management: size=2236 max=0 total=0 [2013-05-07 15:37:26.779771] I [mem-pool.c:541:mem_pool_destroy] 0-management: size=124 max=0 total=0 [2013-05-07 15:37:27.781631] E [glusterd-utils.c:3625:glusterd_nodesvc_unlink_socket_file] 0-management: Failed to remove /var/run/ac127e4a6f1335d530a773df4265854f.socket error: No such file or directory [2013-05-07 15:37:27.781904] I [glusterd-pmap.c:271:pmap_registry_remove] 0-pmap: removing brick /brick on port 49152 [2013-05-07 15:37:27.782296] W [socket.c:515:__socket_rwv] 0-management: readv on /var/run/b504062ac1189ba0e64032b11e3a9887.socket failed (No data available) [2013-05-07 15:37:27.782410] W [socket.c:515:__socket_rwv] 0-management: readv on /var/run/ac127e4a6f1335d530a773df4265854f.socket failed (No data available)
I checked volume create/delete and everything looks good there with the exception of: [2013-05-07 18:07:38.498637] I [socket.c:3482:socket_init] 0-glusterfs: SSL support is NOT enabled [2013-05-07 18:07:38.498720] I [socket.c:3497:socket_init] 0-glusterfs: using system polling thread Are those informational messages something what we could move to debug as well? It looks like they get logged when running most gluster commands in cli.log. I even just ran gluster volume help and it logged the SSL and polling thread messages.
The code fixes that were submitted to mark this BZ on QA have been resolved, hence I am marking this BZ as verified. I will open up new BZs for anything mentioned above that warrants it.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2013-1262.html