Note: This bug is displayed in read-only format because
the product is no longer active in Red Hat Bugzilla.
RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Description of problem:
Customer upgrades rsyslog and rsyslog-elasticsearch to version rsyslog-8.24.0-34.el7.x86_64 rsyslog-elasticsearch-8.24.0-34. This appears to be due to a null pointer reference in rsyslog-elasticsearch package that is resolved upstream.
Version-Release number of selected component (if applicable):
rsyslog-8.24.0-34.el7.x86_64
rsyslog-elasticsearch-8.24.0-34.el7.x86_64
How reproducible:
I do not have an elasticsearch environment, but the customer is able to reproduce this every time they run rsyslog.
Steps to Reproduce:
1. Upgrade to rsyslog and rsyslog-elasticsearch version 8.24.0-34
2. Enable the rsyslog module 'module(load="omelasticsearch")' and forward logs to elasticsearch server.
3. Start rsyslog and it segfaults.
Actual results:
rsyslog segfaults on start.
Expected results:
rsyslog doesn't segfault.
Additional info:
(gdb) bt
#0 0x00007f920b1d4901 in __memcmp_ssse3 () at ../sysdeps/x86_64/multiarch/memcmp-ssse3.S:1949
#1 0x00007f920b851eb0 in fjson_tokener_parse_ex (tok=tok@entry=0x7f91f804f360, str=str@entry=0x0, len=len@entry=-1)
at json_tokener.c:250
#2 0x00007f920b853d57 in fjson_tokener_parse_verbose (str=0x0, error=error@entry=0x7f9203318b84) <<<<---str is null
at json_tokener.c:174
#3 0x00007f920b853dbe in fjson_tokener_parse (str=<optimized out>) at json_tokener.c:162
#4 0x00007f9209a1afd5 in checkResult (
reqmsg=0x7f91f80478d0 "{\"index\":{\"_index\": \"hunx_log_sys_write\",\"_type\":\"item\"}}\n{\"@timestamp\":\"2019-02-11T03:23:17+01:00\",\"timegenerated\":\"2019-02-28T10:37:37.991659+01:00\",\"message\":\" patrol : TTY=unknown ; PWD=/home/pa"..., pWrkrData=0x7f91f8004350) at omelasticsearch.c:1375
#5 curlPost (pWrkrData=pWrkrData@entry=0x7f91f8004350,
message=message@entry=0x7f91f80478d0 "{\"index\":{\"_index\": \"hunx_log_sys_write\",\"_type\":\"item\"}}\n{\"@timestamp\":\"2019-02-11T03:23:17+01:00\",\"timegenerated\":\"2019-02-28T10:37:37.991659+01:00\",\"message\":\" patrol : TTY=unknown ; PWD=/home/pa"..., msglen=2825, tpls=tpls@entry=0x0, nmsgs=8) at omelasticsearch.c:1449
#6 0x00007f9209a1b313 in submitBatch (pWrkrData=0x7f91f8004350) at omelasticsearch.c:1465
#7 0x0000562f6d426556 in actionTryCommit (pWti=0x562f6db41640, pThis=0x562f6dae8720) at ../action.c:1246
#8 actionCommit (pThis=pThis@entry=0x562f6dae8720, pWti=pWti@entry=0x562f6db41640) at ../action.c:1332
#9 0x0000562f6d42771b in processBatchMain (pVoid=0x562f6dae8720, pBatch=0x562f6db41670, pWti=0x562f6db41640)
at ../action.c:1449
#10 0x0000562f6d41be9f in ConsumerReg (pThis=0x562f6db41150, pWti=0x562f6db41640) at queue.c:2008
#11 0x0000562f6d41891e in wtiWorker (pThis=pThis@entry=0x562f6db41640) at wti.c:365
#12 0x0000562f6d417a8d in wtpWorker (arg=0x562f6db41640) at wtp.c:410
#13 0x00007f920c06fdd5 in start_thread (arg=0x7f9203319700) at pthread_create.c:308
#14 0x00007f920b163ead in __libc_ifunc_impl_list (name=<optimized out>, array=0x7f9203319700, max=<optimized out>)
at ../sysdeps/x86_64/multiarch/ifunc-impl-list.c:154
#15 0x0000000000000000 in ?? ()
(gdb) frame 5
#5 curlPost (pWrkrData=pWrkrData@entry=0x7f91f8004350,
message=message@entry=0x7f91f80478d0 "{\"index\":{\"_index\": \"hunx_log_sys_write\",\"_type\":\"item\"}}\n{\"@timestamp\":\"2019-02-11T03:23:17+01:00\",\"timegenerated\":\"2019-02-28T10:37:37.991659+01:00\",\"message\":\" patrol : TTY=unknown ; PWD=/home/pa"..., msglen=2825, tpls=tpls@entry=0x0, nmsgs=8) at omelasticsearch.c:1449
1449 CHKiRet(checkResult(pWrkrData, message));
(gdb) p pWrkrData
$1 = (wrkrInstanceData_t *) 0x7f91f8004350
(gdb) p pWrkrData.reply
$2 = 0x0 <<<<---null
1413 static rsRetVal
1414 curlPost(wrkrInstanceData_t *pWrkrData, uchar *message, int msglen, uchar **tpls, int nmsgs)
1415 {
1416 CURLcode code;
1417 CURL *curl = pWrkrData->curlPostHandle;
1418 DEFiRet;
1419
1420 pWrkrData->reply = NULL;
1421 pWrkrData->replyLen = 0;
1422
1423 CHKiRet(checkConn(pWrkrData));
1424 CHKiRet(setPostURL(pWrkrData, pWrkrData->pData, tpls));
1425
1426 curl_easy_setopt(curl, CURLOPT_WRITEDATA, pWrkrData);
1427 curl_easy_setopt(curl, CURLOPT_POSTFIELDS, (char *)message);
1428 curl_easy_setopt(curl, CURLOPT_POSTFIELDSIZE, msglen);
1429 code = curl_easy_perform(curl);
1430 if ( code == CURLE_COULDNT_RESOLVE_HOST
1431 || code == CURLE_COULDNT_RESOLVE_PROXY
1432 || code == CURLE_COULDNT_CONNECT
1433 || code == CURLE_WRITE_ERROR
1434 ) {
1435 STATSCOUNTER_INC(indexHTTPReqFail, mutIndexHTTPReqFail);
1436 indexHTTPFail += nmsgs;
1437 DBGPRINTF("omelasticsearch: we are suspending ourselfs due "
1438 "to failure %lld of curl_easy_perform()\n",
1439 (long long) code);
1440 ABORT_FINALIZE(RS_RET_SUSPENDED);
1441 }
1442
1443 DBGPRINTF("omelasticsearch: pWrkrData replyLen = '%d'\n", pWrkrData->replyLen);
1444 if(pWrkrData->replyLen > 0) {
1445 pWrkrData->reply[pWrkrData->replyLen] = '\0'; /* Append 0 Byte if replyLen is above 0 - byte has been reserved in malloc */
1446 }
1447 DBGPRINTF("omelasticsearch: pWrkrData reply: '%s'\n", pWrkrData->reply);
1448
1449 CHKiRet(checkResult(pWrkrData, message)); <<<<---pWrkrData->reply isn't ever checked to see if it's null
It looks like this issue may have been introduced in the patch from this bz https://bugzilla.redhat.com/show_bug.cgi?id=1565214 which replaced cJSON with libfastjson. I also found the following upstream issue with a similar bt https://github.com/rsyslog/rsyslog/issues/1885. Looks like the fix is in this patch https://github.com/rsyslog/rsyslog/commit/3d2740dd758aad86e857d9e7730bb8db2a23eeb8. I'm not 100% sure it was introduced by the json bz mentioned earlier, but it's possible that cJSON handled the null pointer in some way that didn't cause a segfault in previous versions.
In trying to triage this issue, I'm wondering how it is possible for reply to be NULL. Here is the code that sets reply:
static size_t
curlResult(void *ptr, size_t size, size_t nmemb, void *userdata)
{
char *p = (char *)ptr;
wrkrInstanceData_t *pWrkrData = (wrkrInstanceData_t*) userdata;
char *buf;
size_t newlen;
newlen = pWrkrData->replyLen + size*nmemb;
if((buf = realloc(pWrkrData->reply, newlen + 1)) == NULL) {
DBGPRINTF("omelasticsearch: realloc failed in curlResult\n");
return 0; /* abort due to failure */
}
memcpy(buf+pWrkrData->replyLen, p, size*nmemb);
pWrkrData->replyLen = newlen;
pWrkrData->reply = buf;
return size*nmemb;
}
I think OOM is the only way for reply == NULL? Can you confirm that you are seeing rsyslog OOM? If not, can you possibly shed some light into how it is possible for reply to be NULL?
hmm - looking at the sosreport in the linked case - I don't see anything like an oomkill - but the problem is that it is rsyslog that writes to /var/log/messages, so if rsyslog is having problems, messages will not be written. Is it possible you could do some sort of scan using journalctl to look for memory or oom kill messages?
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory, and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.
https://access.redhat.com/errata/RHSA-2020:1000
Description of problem: Customer upgrades rsyslog and rsyslog-elasticsearch to version rsyslog-8.24.0-34.el7.x86_64 rsyslog-elasticsearch-8.24.0-34. This appears to be due to a null pointer reference in rsyslog-elasticsearch package that is resolved upstream. Version-Release number of selected component (if applicable): rsyslog-8.24.0-34.el7.x86_64 rsyslog-elasticsearch-8.24.0-34.el7.x86_64 How reproducible: I do not have an elasticsearch environment, but the customer is able to reproduce this every time they run rsyslog. Steps to Reproduce: 1. Upgrade to rsyslog and rsyslog-elasticsearch version 8.24.0-34 2. Enable the rsyslog module 'module(load="omelasticsearch")' and forward logs to elasticsearch server. 3. Start rsyslog and it segfaults. Actual results: rsyslog segfaults on start. Expected results: rsyslog doesn't segfault. Additional info: (gdb) bt #0 0x00007f920b1d4901 in __memcmp_ssse3 () at ../sysdeps/x86_64/multiarch/memcmp-ssse3.S:1949 #1 0x00007f920b851eb0 in fjson_tokener_parse_ex (tok=tok@entry=0x7f91f804f360, str=str@entry=0x0, len=len@entry=-1) at json_tokener.c:250 #2 0x00007f920b853d57 in fjson_tokener_parse_verbose (str=0x0, error=error@entry=0x7f9203318b84) <<<<---str is null at json_tokener.c:174 #3 0x00007f920b853dbe in fjson_tokener_parse (str=<optimized out>) at json_tokener.c:162 #4 0x00007f9209a1afd5 in checkResult ( reqmsg=0x7f91f80478d0 "{\"index\":{\"_index\": \"hunx_log_sys_write\",\"_type\":\"item\"}}\n{\"@timestamp\":\"2019-02-11T03:23:17+01:00\",\"timegenerated\":\"2019-02-28T10:37:37.991659+01:00\",\"message\":\" patrol : TTY=unknown ; PWD=/home/pa"..., pWrkrData=0x7f91f8004350) at omelasticsearch.c:1375 #5 curlPost (pWrkrData=pWrkrData@entry=0x7f91f8004350, message=message@entry=0x7f91f80478d0 "{\"index\":{\"_index\": \"hunx_log_sys_write\",\"_type\":\"item\"}}\n{\"@timestamp\":\"2019-02-11T03:23:17+01:00\",\"timegenerated\":\"2019-02-28T10:37:37.991659+01:00\",\"message\":\" patrol : TTY=unknown ; PWD=/home/pa"..., msglen=2825, tpls=tpls@entry=0x0, nmsgs=8) at omelasticsearch.c:1449 #6 0x00007f9209a1b313 in submitBatch (pWrkrData=0x7f91f8004350) at omelasticsearch.c:1465 #7 0x0000562f6d426556 in actionTryCommit (pWti=0x562f6db41640, pThis=0x562f6dae8720) at ../action.c:1246 #8 actionCommit (pThis=pThis@entry=0x562f6dae8720, pWti=pWti@entry=0x562f6db41640) at ../action.c:1332 #9 0x0000562f6d42771b in processBatchMain (pVoid=0x562f6dae8720, pBatch=0x562f6db41670, pWti=0x562f6db41640) at ../action.c:1449 #10 0x0000562f6d41be9f in ConsumerReg (pThis=0x562f6db41150, pWti=0x562f6db41640) at queue.c:2008 #11 0x0000562f6d41891e in wtiWorker (pThis=pThis@entry=0x562f6db41640) at wti.c:365 #12 0x0000562f6d417a8d in wtpWorker (arg=0x562f6db41640) at wtp.c:410 #13 0x00007f920c06fdd5 in start_thread (arg=0x7f9203319700) at pthread_create.c:308 #14 0x00007f920b163ead in __libc_ifunc_impl_list (name=<optimized out>, array=0x7f9203319700, max=<optimized out>) at ../sysdeps/x86_64/multiarch/ifunc-impl-list.c:154 #15 0x0000000000000000 in ?? () (gdb) frame 5 #5 curlPost (pWrkrData=pWrkrData@entry=0x7f91f8004350, message=message@entry=0x7f91f80478d0 "{\"index\":{\"_index\": \"hunx_log_sys_write\",\"_type\":\"item\"}}\n{\"@timestamp\":\"2019-02-11T03:23:17+01:00\",\"timegenerated\":\"2019-02-28T10:37:37.991659+01:00\",\"message\":\" patrol : TTY=unknown ; PWD=/home/pa"..., msglen=2825, tpls=tpls@entry=0x0, nmsgs=8) at omelasticsearch.c:1449 1449 CHKiRet(checkResult(pWrkrData, message)); (gdb) p pWrkrData $1 = (wrkrInstanceData_t *) 0x7f91f8004350 (gdb) p pWrkrData.reply $2 = 0x0 <<<<---null 1413 static rsRetVal 1414 curlPost(wrkrInstanceData_t *pWrkrData, uchar *message, int msglen, uchar **tpls, int nmsgs) 1415 { 1416 CURLcode code; 1417 CURL *curl = pWrkrData->curlPostHandle; 1418 DEFiRet; 1419 1420 pWrkrData->reply = NULL; 1421 pWrkrData->replyLen = 0; 1422 1423 CHKiRet(checkConn(pWrkrData)); 1424 CHKiRet(setPostURL(pWrkrData, pWrkrData->pData, tpls)); 1425 1426 curl_easy_setopt(curl, CURLOPT_WRITEDATA, pWrkrData); 1427 curl_easy_setopt(curl, CURLOPT_POSTFIELDS, (char *)message); 1428 curl_easy_setopt(curl, CURLOPT_POSTFIELDSIZE, msglen); 1429 code = curl_easy_perform(curl); 1430 if ( code == CURLE_COULDNT_RESOLVE_HOST 1431 || code == CURLE_COULDNT_RESOLVE_PROXY 1432 || code == CURLE_COULDNT_CONNECT 1433 || code == CURLE_WRITE_ERROR 1434 ) { 1435 STATSCOUNTER_INC(indexHTTPReqFail, mutIndexHTTPReqFail); 1436 indexHTTPFail += nmsgs; 1437 DBGPRINTF("omelasticsearch: we are suspending ourselfs due " 1438 "to failure %lld of curl_easy_perform()\n", 1439 (long long) code); 1440 ABORT_FINALIZE(RS_RET_SUSPENDED); 1441 } 1442 1443 DBGPRINTF("omelasticsearch: pWrkrData replyLen = '%d'\n", pWrkrData->replyLen); 1444 if(pWrkrData->replyLen > 0) { 1445 pWrkrData->reply[pWrkrData->replyLen] = '\0'; /* Append 0 Byte if replyLen is above 0 - byte has been reserved in malloc */ 1446 } 1447 DBGPRINTF("omelasticsearch: pWrkrData reply: '%s'\n", pWrkrData->reply); 1448 1449 CHKiRet(checkResult(pWrkrData, message)); <<<<---pWrkrData->reply isn't ever checked to see if it's null It looks like this issue may have been introduced in the patch from this bz https://bugzilla.redhat.com/show_bug.cgi?id=1565214 which replaced cJSON with libfastjson. I also found the following upstream issue with a similar bt https://github.com/rsyslog/rsyslog/issues/1885. Looks like the fix is in this patch https://github.com/rsyslog/rsyslog/commit/3d2740dd758aad86e857d9e7730bb8db2a23eeb8. I'm not 100% sure it was introduced by the json bz mentioned earlier, but it's possible that cJSON handled the null pointer in some way that didn't cause a segfault in previous versions.