Bug 844804 - excessive logging in self-heal daemon should be reduced
Summary: excessive logging in self-heal daemon should be reduced
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: fuse
Version: 2.0
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: ---
Assignee: Pranith Kumar K
QA Contact: spandura
URL:
Whiteboard:
: 837676 (view as bug list)
Depends On: 844695 844758
Blocks:
TreeView+ depends on / blocked
 
Reported: 2012-07-31 20:29 UTC by Scott Haines
Modified: 2015-01-22 15:29 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of: 844758
Environment:
Last Closed: 2012-09-11 14:23:59 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2012:1253 0 normal SHIPPED_LIVE Red Hat Storage 2.0 enhancement and bug fix update #2 2012-09-11 18:21:30 UTC

Description Scott Haines 2012-07-31 20:29:25 UTC
+++ This bug was initially created as a clone of Bug #844758 +++

+++ This bug was initially created as a clone of Bug #844695 +++

Created attachment 601512 [details]
log analysing script which parses the log file and prints the number of times each function has logged

Description of problem:

Self-heal daemon logs heavily by increasing the log file to some GBs thus consuming large disk space. It should be minimized by preventing the logs being repeated and also avoiding non important logs or printing those log message in some other log level such as DEBUG or TRACE.

With the bug a script has been attached which analyzes the glusterfs log file provided as an argument and prints the functions and number of times they have been called.

This is the o/p of log analyzer script run on a self-heal-daemon log file which had grown to 15-20MBs.

afr_detect_self_heal_by_iatt:7353
afr_detect_self_heal_by_lookup_status:243
afr_dir_exclusive_crawl:20
afr_launch_self_heal:19993
afr_notify:15
afr_self_heal_completion_cbk:19993
afr_sh_common_lookup_resp_handler:5745
afr_sh_data_open_cbk:240
afr_sh_entry_expunge_entry_cbk:1956
afr_sh_entry_fix:1
afr_sh_metadata_post_nonblocking_inodelk_cbk:486
afr_sh_post_nonblocking_entry_cbk:3
cleanup_and_exit:2
client3_1_entrylk_cbk:3
client3_1_getxattr_cbk:3
client3_1_inodelk_cbk:243
client3_1_open_cbk:240
client3_1_unlink_cbk:4
client_ping_cbk:3
client_rpc_notify:42
client_set_lk_version_cbk:54
client_setvolume_cbk:108
_crawl_proceed:12
gf_add_cmdline_options:12
main:2
mgmt_cbk_spec:4
mgmt_getspec_cbk:5
notify:81
reincarnate:1
_remove_stale_index:9
rpc_client_ping_timer_expired:3
rpc_clnt_reconfig:24
saved_frames_unwind:6
select_server_supported_programs:54
sh_loop_driver_done:18008
socket_connect_finish:24
__socket_keepalive:2
__socket_proto_state_machine:41
__socket_rwv:18
socket_server_event_handler:2
========= Error Functions ========
afr_notify:3
afr_self_heal_completion_cbk:243
afr_sh_common_lookup_resp_handler:5745
afr_sh_data_open_cbk:240
afr_sh_metadata_post_nonblocking_inodelk_cbk:486
afr_sh_post_nonblocking_entry_cbk:3
cleanup_and_exit:2
client3_1_entrylk_cbk:3
client3_1_getxattr_cbk:3
client3_1_inodelk_cbk:243
client3_1_open_cbk:240
client3_1_unlink_cbk:4
client_ping_cbk:3
_crawl_proceed:12
_remove_stale_index:4
saved_frames_unwind:6
socket_connect_finish:24
__socket_keepalive:2
__socket_proto_state_machine:41
__socket_rwv:18
socket_server_event_handler:2


"Error function"  section lists also functions which have logged in WARNING level.


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:

Comment 2 Vijay Bellur 2012-08-17 09:22:11 UTC
CHANGE: http://review.gluster.com/3827 (afr: Avoid excessive logging in self-heal.) merged in release-3.3 by Vijay Bellur (vbellur)

Comment 3 Vijay Bellur 2012-08-24 04:42:29 UTC
CHANGE: http://review.gluster.org/3805 (afr: Avoid excessive logging in self-heal.) merged in master by Anand Avati (avati)

Comment 4 spandura 2012-08-29 12:28:34 UTC
Verified the fix on glusterfs 3.3.0rhs built on Aug 17 2012 07:06:59. 

The bug is fixed. 

Steps executed:-
----------------

"create_dirs_files.sh"
---------------------
#!/bin/bash

mkdir testdir
cd testdir
for i in `seq 1 10`; do
        mkdir dir.$i
        for j in `seq 1 10`; do
                mkdir dir.$i/dir.$j
                for k in `seq 1 20`;do
                        dd if=/dev/urandom of=dir.$i/dir.$j/file.$k bs=1K count=$k
                done
                dd if=/dev/urandom of=dir.$i/file.$j bs=1K count=$j
        done
done

1. Create a replicate volume with replica count 3 (brick1, brick2, brick3). 
2. Start the volume.
3. Bring down 2 bricks. (brick2 and brick3)
4. Create a fuse mount.
5. Run the script "create_dirs_files.sh" from the mountpoint. 
6. After the script has successfully executed, brick back the bricks brick2 and brick3. 
7. perform "gluster volume heal <vol_name> full" from one of the storage nodes. 
8. Once the self-heal is complete run the log_analyzer.sh script with glustershd log file as argument. 

Note:- The self-heal of files is said to complete if the direcotry <path_to_brick1>/.glusterfs/indices/xattrop/ on brick1 is empty. 


Output of running log_analyzer.sh on glustershd log file on 3.3.0rhs-26.el6rhs:
-------------------------------------------------------------------------------

Brick1:
~~~~~~~
[root@harv ~]# ./log_analyzer.sh /var/log/glusterfs/glustershd.log
afr_notify:2
cleanup_and_exit:1
client_query_portmap_cbk:2
client_rpc_notify:4
client_set_lk_version_cbk:6
client_setvolume_cbk:12
_crawl_proceed:20
gf_add_cmdline_options:2
main:2
notify:12
rpc_clnt_reconfig:6
select_server_supported_programs:6
socket_connect_finish:2
__socket_keepalive:2
__socket_proto_state_machine:2
socket_server_event_handler:2
========= Error Functions ========
cleanup_and_exit:1
client_query_portmap_cbk:2
_crawl_proceed:20
socket_connect_finish:2
__socket_keepalive:2
__socket_proto_state_machine:2
socket_server_event_handler:2


Brick2:-
~~~~~~~~~
[root@hicks ~]# ./log_analyzer.sh /var/log/glusterfs/glustershd.log
afr_notify:2
afr_start_crawl:4
cleanup_and_exit:1
client_rpc_notify:2
client_set_lk_version_cbk:6
client_setvolume_cbk:12
_crawl_proceed:19
gf_add_cmdline_options:2
main:2
notify:10
rpc_clnt_reconfig:6
select_server_supported_programs:6
socket_connect_finish:2
socket_event_handler:1
__socket_keepalive:2
__socket_proto_state_machine:2
socket_server_event_handler:2
========= Error Functions ========
afr_start_crawl:4
cleanup_and_exit:1
_crawl_proceed:19
socket_connect_finish:2
__socket_keepalive:2
__socket_proto_state_machine:2
socket_server_event_handler:2


Brick3:-
~~~~~~~~~
[root@luigi ~]#  ./log_analyzer.sh /var/log/glusterfs/glustershd.log
afr_notify:2
afr_start_crawl:4
cleanup_and_exit:1
client_query_portmap_cbk:1
client_rpc_notify:3
client_set_lk_version_cbk:6
client_setvolume_cbk:12
_crawl_proceed:19
gf_add_cmdline_options:2
main:2
notify:11
rpc_clnt_reconfig:6
select_server_supported_programs:6
socket_connect_finish:2
__socket_keepalive:2
__socket_proto_state_machine:2
socket_server_event_handler:2
========= Error Functions ========
afr_start_crawl:4
cleanup_and_exit:1
client_query_portmap_cbk:1
_crawl_proceed:19
socket_connect_finish:2
__socket_keepalive:2
__socket_proto_state_machine:2
socket_server_event_handler:2


Output of running log_analyzer.sh on glustershd log file on 3.3.0rhs-25.el6rhs:
-------------------------------------------------------------------------------

Brick1:-
~~~~~~~~
[root@harv ~]# ./log_analyzer.sh /var/log/glusterfs/glustershd.log
afr_detect_self_heal_by_lookup_status:1349
afr_launch_self_heal:2212
afr_notify:2
afr_self_heal_completion_cbk:2212
afr_sh_common_lookup_resp_handler:1
afr_sh_data_open_cbk:2100
afr_sh_metadata_post_nonblocking_inodelk_cbk:2698
afr_sh_post_nonblocking_entry_cbk:111
cleanup_and_exit:1
client3_1_entrylk_cbk:63
client3_1_inodelk_cbk:1349
client3_1_open_cbk:1287
client_query_portmap_cbk:3
client_rpc_notify:5
client_set_lk_version_cbk:6
client_setvolume_cbk:12
_crawl_proceed:27
gf_add_cmdline_options:2
main:2
notify:13
rpc_clnt_reconfig:6
select_server_supported_programs:6
socket_connect_finish:2
__socket_keepalive:2
__socket_proto_state_machine:2
socket_server_event_handler:2
========= Error Functions ========
afr_self_heal_completion_cbk:2211
afr_sh_common_lookup_resp_handler:1
afr_sh_data_open_cbk:2100
afr_sh_metadata_post_nonblocking_inodelk_cbk:2698
afr_sh_post_nonblocking_entry_cbk:111
cleanup_and_exit:1
client3_1_entrylk_cbk:63
client3_1_inodelk_cbk:1349
client3_1_open_cbk:1287
client_query_portmap_cbk:3
_crawl_proceed:27
socket_connect_finish:2
__socket_keepalive:2
__socket_proto_state_machine:2
socket_server_event_handler:2

Brick2:-
~~~~~~~~
[root@hicks ~]# ./log_analyzer.sh /var/log/glusterfs/glustershd.log
afr_notify:2
afr_start_crawl:2
cleanup_and_exit:1
client_rpc_notify:2
client_set_lk_version_cbk:6
client_setvolume_cbk:12
_crawl_proceed:26
gf_add_cmdline_options:2
main:2
notify:10
rpc_clnt_reconfig:6
select_server_supported_programs:6
socket_connect_finish:2
__socket_keepalive:2
__socket_proto_state_machine:2
socket_server_event_handler:2
========= Error Functions ========
afr_start_crawl:2
cleanup_and_exit:1
_crawl_proceed:26
socket_connect_finish:2
__socket_keepalive:2
__socket_proto_state_machine:2
socket_server_event_handler:2


Brick3:-
~~~~~~~~
[root@luigi ~]# ./log_analyzer.sh /var/log/glusterfs/glustershd.log
afr_detect_self_heal_by_iatt:3060
afr_launch_self_heal:2212
afr_notify:2
afr_self_heal_completion_cbk:2212
afr_sh_common_lookup_resp_handler:4421
afr_start_crawl:2
cleanup_and_exit:1
client_query_portmap_cbk:2
client_rpc_notify:4
client_set_lk_version_cbk:6
client_setvolume_cbk:12
_crawl_proceed:26
gf_add_cmdline_options:2
main:2
notify:12
rpc_clnt_reconfig:6
select_server_supported_programs:6
sh_loop_driver_done:2100
socket_connect_finish:2
__socket_keepalive:2
__socket_proto_state_machine:2
socket_server_event_handler:2
========= Error Functions ========
afr_sh_common_lookup_resp_handler:4421
afr_start_crawl:2
cleanup_and_exit:1
client_query_portmap_cbk:2
_crawl_proceed:26
socket_connect_finish:2
__socket_keepalive:2
__socket_proto_state_machine:2
socket_server_event_handler:2


Compare the output of log_analyser for 3.3.0rhs-26.el6rhs and 3.3.0rhs-25.el6rhs.

Comment 6 errata-xmlrpc 2012-09-11 14:23:59 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2012-1253.html

Comment 7 vsomyaju 2012-09-24 10:24:00 UTC
*** Bug 837676 has been marked as a duplicate of this bug. ***


Note You need to log in before you can comment on or make changes to this bug.