Bug 858197

Summary: It's not possible to change loglevels at runtime for TOTEM subsystem
Product: [Fedora] Fedora Reporter: Jan Friesse <jfriesse>
Component: libqbAssignee: Angus Salkeld <asalkeld>
Status: CLOSED CURRENTRELEASE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: high Docs Contact:
Priority: high    
Version: rawhideCC: abeekhof, asalkeld, dvossel, fdinitto, sdake
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-11-16 07:39:46 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Jan Friesse 2012-09-18 09:59:51 UTC
Description of problem:
See "steps to reproduce"

Version-Release number of selected component (if applicable):
git master (both corosync and libqb)

How reproducible:
99.7%

Steps to Reproduce:
1. set debug:off in corosync.conf
2. corosync -f
3. corosync-cmapctl -s logging.debug str on
4. start corosync -f on other node


Actual results:

NO debugging information (messages in debug level) on stderr nor syslog from TOTEM subsystem. Other subsystems (like cpg, libqb, ...) seems to handle change.


Expected results:
It's possible to change loglevel for TOTEM.

Additional info:
It can (of course) be bug in corosync, but filling to libqb, because after 10+ hours, I was unable to find out, HOW libqb log works).

Comment 1 Fabio Massimo Di Nitto 2012-09-19 12:04:22 UTC
using both libqb and corosync master branch from git as of today.

        logger_subsys {
                subsys: TOTEM
                debug: off
        }

corosync-cmapctl -s logging.logger_subsys.TOTEM.debug str trace

It appears that there is memory corruption and list entries duplication in call site handling.

libtotem does not link with libqb nor uses libqb directly.

corosync/exec/main.c defines _logsys_log_printf as:

static void
_logsys_log_printf(int level, int subsys,
                const char *function_name,
                const char *file_name,
                int file_line,
                const char *format, ...)
{
        va_list ap;

        va_start(ap, format);
        qb_log_from_external_source_va(function_name, file_name,
                                    format, level, file_line,
                                    subsys, ap);
        va_end(ap);
}

libtotem (and all parts of totem) are passed that function as logging function.
Including other bits such as subsys.

Every call libtotem does to log_printf is an alias to totem_log_printf that jumps back to qb_log_from_external_source_va.
Also to be noted that TOTEM "filename" for filtering is defined as totem.

All other subsystems are using qb_log directly. This is the first and most important difference.

qb_log_from_external_source_va -> calls into qb_log_callsite_get that
calls into qb_log_dcs_get.

For some reasons, that I can´t really figure out yet, qb_log_dcs_get, does allocated different callsite for the same file and I suspect that there is a memory leak and memory corruption. I don´t exclude that what we do in corosync is wrong, but....

diff --git a/exec/main.c b/exec/main.c
index c21f47f..140845f 100644
--- a/exec/main.c
+++ b/exec/main.c
@@ -827,11 +827,17 @@ _logsys_log_printf(int level, int subsys,
                const char *format, ...)
 {
        va_list ap;
+       struct qb_log_callsite *cs = qb_log_callsite_get(function_name,
+                                                        file_name,
+                                                        format,
+                                                        level,
+                                                        file_line,
+                                                        0);
+
+       fprintf(stderr, "DEBUG: callsite: %p %s\n", cs, file_name);
 
        va_start(ap, format);
-       qb_log_from_external_source_va(function_name, file_name,
-                                   format, level, file_line,
-                                   subsys, ap);
+       qb_log_real_va_(cs, ap);
        va_end(ap);
 }

changing _logsys_log_printf as in this patch shows:
(short snippet)
Sep 19 13:49:26 trace   [TOTEM ] Received ringid(192.168.2.97:304) seq 1
DEBUG: callsite: 0x7fec263c3320 /home/fabbione/work/cluster/corosync/corosync/exec/totemsrp.c
Sep 19 13:49:26 trace   [YKD   ] Received ringid(192.168.2.97:304) seq 2
DEBUG: callsite: 0x7fec263be110 /home/fabbione/work/cluster/corosync/corosync/exec/totemsrp.c
DEBUG: callsite: 0x7fec263be138 /home/fabbione/work/cluster/corosync/corosync/exec/totemsrp.c

that doesn´t look correct at all. subsystem has become YKD from TOTEM (note i don´t load anything in corosync.conf) after changing debug=off to debug=trace.

and it´s odd to see that new cs are allocated (this is just a short summary of a full run).

Adding debugging log to libqb somehow confirms that new_dcs are allocated from time to time.

Comment 3 Fedora Update System 2012-10-29 02:22:12 UTC
libqb-0.14.3-2.fc17 has been submitted as an update for Fedora 17.
https://admin.fedoraproject.org/updates/libqb-0.14.3-2.fc17

Comment 4 Fedora Update System 2012-10-29 02:22:24 UTC
libqb-0.14.3-2.fc18 has been submitted as an update for Fedora 18.
https://admin.fedoraproject.org/updates/libqb-0.14.3-2.fc18

Comment 5 Fedora Update System 2012-10-29 18:16:09 UTC
Package libqb-0.14.3-2.fc18:
* should fix your issue,
* was pushed to the Fedora 18 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing libqb-0.14.3-2.fc18'
as soon as you are able to.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-2012-17170/libqb-0.14.3-2.fc18
then log in and leave karma (feedback).

Comment 6 Fedora Update System 2012-11-16 07:39:47 UTC
libqb-0.14.3-2.fc17 has been pushed to the Fedora 17 stable repository.  If problems still persist, please make note of it in this bug report.