+++ This bug was initially created as a clone of Bug #844758 +++ +++ This bug was initially created as a clone of Bug #844695 +++ Created attachment 601512 [details] log analysing script which parses the log file and prints the number of times each function has logged Description of problem: Self-heal daemon logs heavily by increasing the log file to some GBs thus consuming large disk space. It should be minimized by preventing the logs being repeated and also avoiding non important logs or printing those log message in some other log level such as DEBUG or TRACE. With the bug a script has been attached which analyzes the glusterfs log file provided as an argument and prints the functions and number of times they have been called. This is the o/p of log analyzer script run on a self-heal-daemon log file which had grown to 15-20MBs. afr_detect_self_heal_by_iatt:7353 afr_detect_self_heal_by_lookup_status:243 afr_dir_exclusive_crawl:20 afr_launch_self_heal:19993 afr_notify:15 afr_self_heal_completion_cbk:19993 afr_sh_common_lookup_resp_handler:5745 afr_sh_data_open_cbk:240 afr_sh_entry_expunge_entry_cbk:1956 afr_sh_entry_fix:1 afr_sh_metadata_post_nonblocking_inodelk_cbk:486 afr_sh_post_nonblocking_entry_cbk:3 cleanup_and_exit:2 client3_1_entrylk_cbk:3 client3_1_getxattr_cbk:3 client3_1_inodelk_cbk:243 client3_1_open_cbk:240 client3_1_unlink_cbk:4 client_ping_cbk:3 client_rpc_notify:42 client_set_lk_version_cbk:54 client_setvolume_cbk:108 _crawl_proceed:12 gf_add_cmdline_options:12 main:2 mgmt_cbk_spec:4 mgmt_getspec_cbk:5 notify:81 reincarnate:1 _remove_stale_index:9 rpc_client_ping_timer_expired:3 rpc_clnt_reconfig:24 saved_frames_unwind:6 select_server_supported_programs:54 sh_loop_driver_done:18008 socket_connect_finish:24 __socket_keepalive:2 __socket_proto_state_machine:41 __socket_rwv:18 socket_server_event_handler:2 ========= Error Functions ======== afr_notify:3 afr_self_heal_completion_cbk:243 afr_sh_common_lookup_resp_handler:5745 afr_sh_data_open_cbk:240 afr_sh_metadata_post_nonblocking_inodelk_cbk:486 afr_sh_post_nonblocking_entry_cbk:3 cleanup_and_exit:2 client3_1_entrylk_cbk:3 client3_1_getxattr_cbk:3 client3_1_inodelk_cbk:243 client3_1_open_cbk:240 client3_1_unlink_cbk:4 client_ping_cbk:3 _crawl_proceed:12 _remove_stale_index:4 saved_frames_unwind:6 socket_connect_finish:24 __socket_keepalive:2 __socket_proto_state_machine:41 __socket_rwv:18 socket_server_event_handler:2 "Error function" section lists also functions which have logged in WARNING level. Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
CHANGE: http://review.gluster.com/3827 (afr: Avoid excessive logging in self-heal.) merged in release-3.3 by Vijay Bellur (vbellur)
CHANGE: http://review.gluster.org/3805 (afr: Avoid excessive logging in self-heal.) merged in master by Anand Avati (avati)
Verified the fix on glusterfs 3.3.0rhs built on Aug 17 2012 07:06:59. The bug is fixed. Steps executed:- ---------------- "create_dirs_files.sh" --------------------- #!/bin/bash mkdir testdir cd testdir for i in `seq 1 10`; do mkdir dir.$i for j in `seq 1 10`; do mkdir dir.$i/dir.$j for k in `seq 1 20`;do dd if=/dev/urandom of=dir.$i/dir.$j/file.$k bs=1K count=$k done dd if=/dev/urandom of=dir.$i/file.$j bs=1K count=$j done done 1. Create a replicate volume with replica count 3 (brick1, brick2, brick3). 2. Start the volume. 3. Bring down 2 bricks. (brick2 and brick3) 4. Create a fuse mount. 5. Run the script "create_dirs_files.sh" from the mountpoint. 6. After the script has successfully executed, brick back the bricks brick2 and brick3. 7. perform "gluster volume heal <vol_name> full" from one of the storage nodes. 8. Once the self-heal is complete run the log_analyzer.sh script with glustershd log file as argument. Note:- The self-heal of files is said to complete if the direcotry <path_to_brick1>/.glusterfs/indices/xattrop/ on brick1 is empty. Output of running log_analyzer.sh on glustershd log file on 3.3.0rhs-26.el6rhs: ------------------------------------------------------------------------------- Brick1: ~~~~~~~ [root@harv ~]# ./log_analyzer.sh /var/log/glusterfs/glustershd.log afr_notify:2 cleanup_and_exit:1 client_query_portmap_cbk:2 client_rpc_notify:4 client_set_lk_version_cbk:6 client_setvolume_cbk:12 _crawl_proceed:20 gf_add_cmdline_options:2 main:2 notify:12 rpc_clnt_reconfig:6 select_server_supported_programs:6 socket_connect_finish:2 __socket_keepalive:2 __socket_proto_state_machine:2 socket_server_event_handler:2 ========= Error Functions ======== cleanup_and_exit:1 client_query_portmap_cbk:2 _crawl_proceed:20 socket_connect_finish:2 __socket_keepalive:2 __socket_proto_state_machine:2 socket_server_event_handler:2 Brick2:- ~~~~~~~~~ [root@hicks ~]# ./log_analyzer.sh /var/log/glusterfs/glustershd.log afr_notify:2 afr_start_crawl:4 cleanup_and_exit:1 client_rpc_notify:2 client_set_lk_version_cbk:6 client_setvolume_cbk:12 _crawl_proceed:19 gf_add_cmdline_options:2 main:2 notify:10 rpc_clnt_reconfig:6 select_server_supported_programs:6 socket_connect_finish:2 socket_event_handler:1 __socket_keepalive:2 __socket_proto_state_machine:2 socket_server_event_handler:2 ========= Error Functions ======== afr_start_crawl:4 cleanup_and_exit:1 _crawl_proceed:19 socket_connect_finish:2 __socket_keepalive:2 __socket_proto_state_machine:2 socket_server_event_handler:2 Brick3:- ~~~~~~~~~ [root@luigi ~]# ./log_analyzer.sh /var/log/glusterfs/glustershd.log afr_notify:2 afr_start_crawl:4 cleanup_and_exit:1 client_query_portmap_cbk:1 client_rpc_notify:3 client_set_lk_version_cbk:6 client_setvolume_cbk:12 _crawl_proceed:19 gf_add_cmdline_options:2 main:2 notify:11 rpc_clnt_reconfig:6 select_server_supported_programs:6 socket_connect_finish:2 __socket_keepalive:2 __socket_proto_state_machine:2 socket_server_event_handler:2 ========= Error Functions ======== afr_start_crawl:4 cleanup_and_exit:1 client_query_portmap_cbk:1 _crawl_proceed:19 socket_connect_finish:2 __socket_keepalive:2 __socket_proto_state_machine:2 socket_server_event_handler:2 Output of running log_analyzer.sh on glustershd log file on 3.3.0rhs-25.el6rhs: ------------------------------------------------------------------------------- Brick1:- ~~~~~~~~ [root@harv ~]# ./log_analyzer.sh /var/log/glusterfs/glustershd.log afr_detect_self_heal_by_lookup_status:1349 afr_launch_self_heal:2212 afr_notify:2 afr_self_heal_completion_cbk:2212 afr_sh_common_lookup_resp_handler:1 afr_sh_data_open_cbk:2100 afr_sh_metadata_post_nonblocking_inodelk_cbk:2698 afr_sh_post_nonblocking_entry_cbk:111 cleanup_and_exit:1 client3_1_entrylk_cbk:63 client3_1_inodelk_cbk:1349 client3_1_open_cbk:1287 client_query_portmap_cbk:3 client_rpc_notify:5 client_set_lk_version_cbk:6 client_setvolume_cbk:12 _crawl_proceed:27 gf_add_cmdline_options:2 main:2 notify:13 rpc_clnt_reconfig:6 select_server_supported_programs:6 socket_connect_finish:2 __socket_keepalive:2 __socket_proto_state_machine:2 socket_server_event_handler:2 ========= Error Functions ======== afr_self_heal_completion_cbk:2211 afr_sh_common_lookup_resp_handler:1 afr_sh_data_open_cbk:2100 afr_sh_metadata_post_nonblocking_inodelk_cbk:2698 afr_sh_post_nonblocking_entry_cbk:111 cleanup_and_exit:1 client3_1_entrylk_cbk:63 client3_1_inodelk_cbk:1349 client3_1_open_cbk:1287 client_query_portmap_cbk:3 _crawl_proceed:27 socket_connect_finish:2 __socket_keepalive:2 __socket_proto_state_machine:2 socket_server_event_handler:2 Brick2:- ~~~~~~~~ [root@hicks ~]# ./log_analyzer.sh /var/log/glusterfs/glustershd.log afr_notify:2 afr_start_crawl:2 cleanup_and_exit:1 client_rpc_notify:2 client_set_lk_version_cbk:6 client_setvolume_cbk:12 _crawl_proceed:26 gf_add_cmdline_options:2 main:2 notify:10 rpc_clnt_reconfig:6 select_server_supported_programs:6 socket_connect_finish:2 __socket_keepalive:2 __socket_proto_state_machine:2 socket_server_event_handler:2 ========= Error Functions ======== afr_start_crawl:2 cleanup_and_exit:1 _crawl_proceed:26 socket_connect_finish:2 __socket_keepalive:2 __socket_proto_state_machine:2 socket_server_event_handler:2 Brick3:- ~~~~~~~~ [root@luigi ~]# ./log_analyzer.sh /var/log/glusterfs/glustershd.log afr_detect_self_heal_by_iatt:3060 afr_launch_self_heal:2212 afr_notify:2 afr_self_heal_completion_cbk:2212 afr_sh_common_lookup_resp_handler:4421 afr_start_crawl:2 cleanup_and_exit:1 client_query_portmap_cbk:2 client_rpc_notify:4 client_set_lk_version_cbk:6 client_setvolume_cbk:12 _crawl_proceed:26 gf_add_cmdline_options:2 main:2 notify:12 rpc_clnt_reconfig:6 select_server_supported_programs:6 sh_loop_driver_done:2100 socket_connect_finish:2 __socket_keepalive:2 __socket_proto_state_machine:2 socket_server_event_handler:2 ========= Error Functions ======== afr_sh_common_lookup_resp_handler:4421 afr_start_crawl:2 cleanup_and_exit:1 client_query_portmap_cbk:2 _crawl_proceed:26 socket_connect_finish:2 __socket_keepalive:2 __socket_proto_state_machine:2 socket_server_event_handler:2 Compare the output of log_analyser for 3.3.0rhs-26.el6rhs and 3.3.0rhs-25.el6rhs.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2012-1253.html
*** Bug 837676 has been marked as a duplicate of this bug. ***