812356 – excessive informational log messages in gluster log files, can't get it to filter them

Bug 812356 - excessive informational log messages in gluster log files, can't get it to filter them

Summary: excessive informational log messages in gluster log files, can't get it to fi...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	glusterfs
Sub Component:
Version:	2.0
Hardware:	All
OS:	Linux
Priority:	high
Severity:	medium
Target Milestone:	Release Candidate
Target Release:	---
Assignee:	Krutika Dhananjay
QA Contact:	Ben Turner
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2012-04-13 13:49 UTC by Ben England
Modified:	2013-09-23 22:32 UTC (History)
CC List:	6 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2013-09-23 22:32:37 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Ben England 2012-04-13 13:49:33 UTC

Description of problem:

during heavy Gluster-NFS runs I see noisy gluster log that makes it difficult to see if something is wrong.    Can we quiet down the log so that in normal mode it doesn't log so many informational messages?   Otherwise it effectively makes the log unusable.  I can filter but shouldn't have to.   Ideally there would be a way to turn the log level up or down without restarting gluster daemons, I tried 

gluster volume set testfs diagnostics.brick-log-level WARNING
gluster volume set testfs diagnostics.client-log-level WARNING

But it didn't seem to help.  


Version-Release number of selected component (if applicable):

glusterfs-3.3.0qa33-1.x86_64

How reproducible:

run netmist with Gluster-NFS, I haven't evaluated log traffic while running it with native client yet, haven't evaluated with other applications.

Steps to Reproduce:
1.start workload
2.tail -f below file
3.try using gluster command to filter informational log messages
  
Actual results:

no filtering seen

Expected results:

expected informational messages to stop, just warnings and higher-severity messages.

Additional info:

There were kernel stack traces occurring on system at same time (different bz that I filed), maybe these somehow caused extra informational messages.

For example of noisy /var/log/glusterfs/etc-glusterfs-glusterd.vol.log


[2012-04-13 13:13:39.609879] I [glusterd-op-sm.c:2222:glusterd_op_ac_send_commit_op] 0-management: Sent op req to 3 peers
[2012-04-13 13:13:39.613276] I [glusterd-rpc-ops.c:1294:glusterd3_1_commit_op_cbk] 0-glusterd: Received ACC from uuid: 51b63711-df88-465f-94f5-64c3bf6c17b2
[2012-04-13 13:13:39.614645] I [glusterd-rpc-ops.c:1294:glusterd3_1_commit_op_cbk] 0-glusterd: Received ACC from uuid: 22b84f37-b80f-43e9-857c-dcd8ab295e28
[2012-04-13 13:13:39.616463] I [glusterd-rpc-ops.c:1294:glusterd3_1_commit_op_cbk] 0-glusterd: Received ACC from uuid: f7504c34-e863-41c6-9078-0fd8f9472d08
[2012-04-13 13:13:39.618502] I [glusterd-rpc-ops.c:606:glusterd3_1_cluster_unlock_cbk] 0-glusterd: Received ACC from uuid: 51b63711-df88-465f-94f5-64c3bf6c17b2
[2012-04-13 13:13:39.618561] I [glusterd-rpc-ops.c:606:glusterd3_1_cluster_unlock_cbk] 0-glusterd: Received ACC from uuid: f7504c34-e863-41c6-9078-0fd8f9472d08
[2012-04-13 13:13:39.618593] I [glusterd-rpc-ops.c:606:glusterd3_1_cluster_unlock_cbk] 0-glusterd: Received ACC from uuid: 22b84f37-b80f-43e9-857c-dcd8ab295e28
[2012-04-13 13:13:39.618625] I [glusterd-op-sm.c:2491:glusterd_op_txn_complete] 0-glusterd: Cleared local lock
[2012-04-13 13:13:40.281756] I [glusterd-handler.c:495:glusterd_handle_cluster_lock] 0-glusterd: Received LOCK from uuid: f7504c34-e863-41c6-9078-0fd8f9472d08
[2012-04-13 13:13:40.281817] I [glusterd-utils.c:271:glusterd_lock] 0-glusterd: Cluster lock held by f7504c34-e863-41c6-9078-0fd8f9472d08
[2012-04-13 13:13:40.281856] I [glusterd-handler.c:1313:glusterd_op_lock_send_resp] 0-glusterd: Responded, ret: 0
[2012-04-13 13:13:40.282221] I [glusterd-handler.c:540:glusterd_req_ctx_create] 0-glusterd: Received op from uuid: f7504c34-e863-41c6-9078-0fd8f9472d08
[2012-04-13 13:13:40.282324] I [glusterd-handler.c:1415:glusterd_op_stage_send_resp] 0-glusterd: Responded to stage, ret: 0
[2012-04-13 13:13:40.284855] I [glusterd-handler.c:540:glusterd_req_ctx_create] 0-glusterd: Received op from uuid: f7504c34-e863-41c6-9078-0fd8f9472d08
[2012-04-13 13:13:40.288025] I [glusterd-handler.c:1456:glusterd_op_commit_send_resp] 0-glusterd: Responded to commit, ret: 0
[2012-04-13 13:13:40.292972] I [glusterd-handler.c:1357:glusterd_handle_cluster_unlock] 0-glusterd: Received UNLOCK from uuid: f7504c34-e863-41c6-9078-0fd8f9472d08
[2012-04-13 13:13:40.293024] I [glusterd-handler.c:1333:glusterd_op_unlock_send_resp] 0-glusterd: Responded to unlock, ret: 0
[2012-04-13 13:13:43.032425] I [glusterd-handler.c:495:glusterd_handle_cluster_lock] 0-glusterd: Received LOCK from uuid: 22b84f37-b80f-43e9-857c-dcd8ab295e28
[2012-04-13 13:13:43.032486] I [glusterd-utils.c:271:glusterd_lock] 0-glusterd: Cluster lock held by 22b84f37-b80f-43e9-857c-dcd8ab295e28
[2012-04-13 13:13:43.032525] I [glusterd-handler.c:1313:glusterd_op_lock_send_resp] 0-glusterd: Responded, ret: 0
[2012-04-13 13:13:43.032857] I [glusterd-handler.c:540:glusterd_req_ctx_create] 0-glusterd: Received op from uuid: 22b84f37-b80f-43e9-857c-dcd8ab295e28
[2012-04-13 13:13:43.032955] I [glusterd-handler.c:1415:glusterd_op_stage_send_resp] 0-glusterd: Responded to stage, ret: 0
[2012-04-13 13:13:43.035372] I [glusterd-handler.c:540:glusterd_req_ctx_create] 0-glusterd: Received op from uuid: 22b84f37-b80f-43e9-857c-dcd8ab295e28
[2012-04-13 13:13:43.038426] I [glusterd-handler.c:1456:glusterd_op_commit_send_resp] 0-glusterd: Responded to commit, ret: 0
[2012-04-13 13:13:43.043831] I [glusterd-handler.c:1357:glusterd_handle_cluster_unlock] 0-glusterd: Received UNLOCK from uuid: 22b84f37-b80f-43e9-857c-dcd8ab295e28
[2012-04-13 13:13:43.043878] I [glusterd-handler.c:1333:glusterd_op_unlock_send_resp] 0-glusterd: Responded to unlock, ret: 0

Comment 2 Amar Tumballi 2012-12-28 08:56:40 UTC

planning to solve this command by command.

first effort is @ http://review.gluster.org/4353

Comment 3 Vijay Bellur 2013-01-04 15:26:39 UTC

CHANGE: http://review.gluster.org/4353 (glusterd: log enhancements for volume start) merged in master by Vijay Bellur (vbellur)

Comment 4 Vijay Bellur 2013-01-18 21:24:35 UTC

CHANGE: http://review.gluster.org/4382 (glusterd: log changes in volume stop (and in op sm codepath)) merged in master by Anand Avati (avati)

Comment 5 Vijay Bellur 2013-02-05 03:22:13 UTC

CHANGE: http://review.gluster.org/4462 (glusterd: log changes in volume create and delete codepath) merged in master by Anand Avati (avati)

Comment 6 Vijay Bellur 2013-02-05 03:23:13 UTC

CHANGE: http://review.gluster.org/4463 (glusterd: remove extra call to glusterd_volume_compute_cksum()) merged in master by Anand Avati (avati)

Comment 7 Vijay Bellur 2013-02-20 03:15:59 UTC

CHANGE: http://review.gluster.org/4543 (glusterd: log changes in volume set and volume reset codepath) merged in master by Anand Avati (avati)

Comment 8 Vijay Bellur 2013-02-27 11:55:37 UTC

CHANGE: http://review.gluster.org/4585 (mgmt/glusterd: Move found-brick logs to DEBUG) merged in master by Vijay Bellur (vbellur)

Comment 10 Amar Tumballi 2013-04-26 07:16:19 UTC

Krutika, wanted to check if rhs-2.1 branch has all the patches listed above. If not, we would need a backport.

Comment 11 Krutika Dhananjay 2013-04-26 07:20:13 UTC

Amar,

rhs-2.1 does have all the patches submitted for this bug.

Comment 13 Ben Turner 2013-05-07 15:33:21 UTC

When running volume start I am still seeing in etc-glusterfs-glusterd.vol.log:

[2013-05-07 15:26:47.763213] I [rpc-clnt.c:961:rpc_clnt_connection_init] 0-management: setting frame-timeout to 600
[2013-05-07 15:26:47.763269] I [socket.c:3482:socket_init] 0-management: SSL support is NOT enabled
[2013-05-07 15:26:47.763281] I [socket.c:3497:socket_init] 0-management: using system polling thread
[2013-05-07 15:26:48.241131] I [rpc-clnt.c:961:rpc_clnt_connection_init] 0-management: setting frame-timeout to 600
[2013-05-07 15:26:48.241204] I [socket.c:3482:socket_init] 0-management: SSL support is NOT enabled
[2013-05-07 15:26:48.241216] I [socket.c:3497:socket_init] 0-management: using system polling thread
[2013-05-07 15:26:48.241287] I [socket.c:2237:socket_event_handler] 0-transport: disconnecting now
[2013-05-07 15:26:48.254656] I [rpc-clnt.c:961:rpc_clnt_connection_init] 0-management: setting frame-timeout to 600
[2013-05-07 15:26:48.254767] I [socket.c:3482:socket_init] 0-management: SSL support is NOT enabled
[2013-05-07 15:26:48.254781] I [socket.c:3497:socket_init] 0-management: using system polling thread
[2013-05-07 15:26:48.254861] I [socket.c:2237:socket_event_handler] 0-transport: disconnecting now

Could we clean those up as well?

Comment 14 Ben Turner 2013-05-07 17:28:14 UTC

On both volume start and volume stop the /var/log/glusterfs/bricks/<my brick> log looks much improved.  A couple pieces of feedback:

1.  Easier to decipher messages.  When I run gluster volume stop the only message that gets generated is:

[2013-05-07 15:37:25.337381] W [glusterfsd.c:970:cleanup_and_exit] (-->/lib64/libc.so.6() [0x36ac843b70] (-->/usr/lib64/libglusterfs.so.0(synctask_wrap+0x12) [0x7fb782140c02] (-->/usr/sbin/glusterfsd(glusterfs_handle_terminate+0x15) [0x40a825]))) 0-: received signum (15), shutting down

This is a great improvement WRT to cleaning up messages that we dont need to see but I feel like the messages we are displaying could be a bit more informative.  What was shut down?  If it was a volume which volume was shut down?  Without having just run the command it would be difficult to look back at this and know the volume stop command was run.  Same goes for volume start:

[2013-05-07 17:18:38.501063] I [glusterfsd.c:1878:main] 0-/usr/sbin/glusterfsd: Started running /usr/sbin/glusterfsd version 3.4.0.4rhs (/usr/sbin/glusterfsd -s 10.13.129.122 --volfile-id repvol.10.13.129.122.br
ick -p /var/lib/glusterd/vols/repvol/run/10.13.129.122-brick.pid -S /var/run/f6183931da455b0aa8d77208625f95a4.socket --brick-name /brick -l /var/log/glusterfs/bricks/brick.log --xlator-option *-posix.glusterd-uu
id=449ceab6-5a85-4cbb-a4cc-f6738bbc85e6 --brick-port 49152 --xlator-option repvol-server.listen-port=49152)

2.  While the looks of the brick log are much improved I am still seeing some noise in nfs.log and etc-glusterfs-glusterd.vol.log.  Can we revisit those for volume start/stop?  Comment #13 has what I am seeing for volume start, here is what I am seeing for volume stop:

[2013-05-07 15:37:25.341411] I [socket.c:2237:socket_event_handler] 0-transport: disconnecting now
[2013-05-07 15:37:25.351662] E [run.c:190:runner_log] 0-management: Failed to execute script: /var/lib/glusterd/hooks/1/stop/pre/S40ufo-stop.pyc --volname=repvol --last=yes
[2013-05-07 15:37:25.358633] E [run.c:190:runner_log] 0-management: Failed to execute script: /var/lib/glusterd/hooks/1/stop/pre/S40ufo-stop.pyo --volname=repvol --last=yes
[2013-05-07 15:37:25.358705] I [mem-pool.c:541:mem_pool_destroy] 0-management: size=2236 max=1 total=1
[2013-05-07 15:37:25.358723] I [mem-pool.c:541:mem_pool_destroy] 0-management: size=124 max=1 total=1
[2013-05-07 15:37:25.358747] I [glusterd-utils.c:1182:glusterd_service_stop] 0-management: brick already stopped
[2013-05-07 15:37:25.358792] E [glusterd-utils.c:1477:glusterd_brick_unlink_socket_file] 0-management: Failed to remove /var/run/f6183931da455b0aa8d77208625f95a4.socket error: No such file or directory
[2013-05-07 15:37:25.774854] I [mem-pool.c:541:mem_pool_destroy] 0-management: size=2236 max=0 total=0
[2013-05-07 15:37:25.774895] I [mem-pool.c:541:mem_pool_destroy] 0-management: size=124 max=0 total=0
[2013-05-07 15:37:26.778168] E [glusterd-utils.c:3625:glusterd_nodesvc_unlink_socket_file] 0-management: Failed to remove /var/run/b504062ac1189ba0e64032b11e3a9887.socket error: Permission denied
[2013-05-07 15:37:26.778631] I [glusterd-utils.c:3659:glusterd_nfs_pmap_deregister] 0-: De-registered MOUNTV3 successfully
[2013-05-07 15:37:26.778876] I [glusterd-utils.c:3664:glusterd_nfs_pmap_deregister] 0-: De-registered MOUNTV1 successfully
[2013-05-07 15:37:26.779158] I [glusterd-utils.c:3669:glusterd_nfs_pmap_deregister] 0-: De-registered NFSV3 successfully
[2013-05-07 15:37:26.779399] I [glusterd-utils.c:3674:glusterd_nfs_pmap_deregister] 0-: De-registered NLM v4 successfully
[2013-05-07 15:37:26.779657] I [glusterd-utils.c:3679:glusterd_nfs_pmap_deregister] 0-: De-registered NLM v1 successfully
[2013-05-07 15:37:26.779751] I [mem-pool.c:541:mem_pool_destroy] 0-management: size=2236 max=0 total=0
[2013-05-07 15:37:26.779771] I [mem-pool.c:541:mem_pool_destroy] 0-management: size=124 max=0 total=0
[2013-05-07 15:37:27.781631] E [glusterd-utils.c:3625:glusterd_nodesvc_unlink_socket_file] 0-management: Failed to remove /var/run/ac127e4a6f1335d530a773df4265854f.socket error: No such file or directory
[2013-05-07 15:37:27.781904] I [glusterd-pmap.c:271:pmap_registry_remove] 0-pmap: removing brick /brick on port 49152
[2013-05-07 15:37:27.782296] W [socket.c:515:__socket_rwv] 0-management: readv on /var/run/b504062ac1189ba0e64032b11e3a9887.socket failed (No data available)
[2013-05-07 15:37:27.782410] W [socket.c:515:__socket_rwv] 0-management: readv on /var/run/ac127e4a6f1335d530a773df4265854f.socket failed (No data available)

Comment 15 Ben Turner 2013-05-07 18:14:31 UTC

I checked volume create/delete and everything looks good there with the exception of:

[2013-05-07 18:07:38.498637] I [socket.c:3482:socket_init] 0-glusterfs: SSL support is NOT enabled
[2013-05-07 18:07:38.498720] I [socket.c:3497:socket_init] 0-glusterfs: using system polling thread

Are those informational messages something what we could move to debug as well?  It looks like they get logged when running most gluster commands in cli.log.  I even just ran gluster volume help and it logged the SSL and polling thread messages.

Comment 16 Ben Turner 2013-08-01 01:18:57 UTC

The code fixes that were submitted to mark this BZ on QA have been resolved, hence I am marking this BZ as verified.  I will open up new BZs for anything mentioned above that warrants it.

Comment 17 Scott Haines 2013-09-23 22:32:37 UTC

Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. 

For information on the advisory, and where to find the updated files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-1262.html

Note You need to log in before you can comment on or make changes to this bug.