Bug 1684236

Summary: rsyslog-8.24.0-34.el7.x86_64 SIGSEGV when using rsyslog-elasticsearch-8.24.0-34
Product: Red Hat Enterprise Linux 7 Reporter: Anthony Zone <azone>
Component: rsyslogAssignee: Rich Megginson <rmeggins>
Status: CLOSED ERRATA QA Contact: Dalibor Pospíšil <dapospis>
Severity: low Docs Contact:
Priority: low    
Version: 7.6CC: azone, dapospis, jvymazal, nhosoi, pamadio, rmeggins
Target Milestone: rcKeywords: Triaged
Target Release: 7.7   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: rsyslog-8.24.0-41.el7 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-03-31 19:10:09 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1594286    
Attachments:
Description Flags
patch jvymazal: review+

Description Anthony Zone 2019-02-28 19:01:49 UTC
Description of problem:
Customer upgrades rsyslog and rsyslog-elasticsearch to version rsyslog-8.24.0-34.el7.x86_64 rsyslog-elasticsearch-8.24.0-34.  This appears to be due to a null pointer reference in rsyslog-elasticsearch package that is resolved upstream.

Version-Release number of selected component (if applicable):

rsyslog-8.24.0-34.el7.x86_64               
rsyslog-elasticsearch-8.24.0-34.el7.x86_64 


How reproducible:
I do not have an elasticsearch environment, but the customer is able to reproduce this every time they run rsyslog.

Steps to Reproduce:
1. Upgrade to rsyslog and rsyslog-elasticsearch version 8.24.0-34
2. Enable the rsyslog module 'module(load="omelasticsearch")' and forward logs to elasticsearch server.
3. Start rsyslog and it segfaults.

Actual results:
rsyslog segfaults on start.

Expected results:
rsyslog doesn't segfault.

Additional info:

(gdb) bt
#0  0x00007f920b1d4901 in __memcmp_ssse3 () at ../sysdeps/x86_64/multiarch/memcmp-ssse3.S:1949
#1  0x00007f920b851eb0 in fjson_tokener_parse_ex (tok=tok@entry=0x7f91f804f360, str=str@entry=0x0, len=len@entry=-1)
    at json_tokener.c:250
#2  0x00007f920b853d57 in fjson_tokener_parse_verbose (str=0x0, error=error@entry=0x7f9203318b84) <<<<---str is null
    at json_tokener.c:174
#3  0x00007f920b853dbe in fjson_tokener_parse (str=<optimized out>) at json_tokener.c:162
#4  0x00007f9209a1afd5 in checkResult (
    reqmsg=0x7f91f80478d0 "{\"index\":{\"_index\": \"hunx_log_sys_write\",\"_type\":\"item\"}}\n{\"@timestamp\":\"2019-02-11T03:23:17+01:00\",\"timegenerated\":\"2019-02-28T10:37:37.991659+01:00\",\"message\":\"   patrol : TTY=unknown ; PWD=/home/pa"..., pWrkrData=0x7f91f8004350) at omelasticsearch.c:1375
#5  curlPost (pWrkrData=pWrkrData@entry=0x7f91f8004350, 
    message=message@entry=0x7f91f80478d0 "{\"index\":{\"_index\": \"hunx_log_sys_write\",\"_type\":\"item\"}}\n{\"@timestamp\":\"2019-02-11T03:23:17+01:00\",\"timegenerated\":\"2019-02-28T10:37:37.991659+01:00\",\"message\":\"   patrol : TTY=unknown ; PWD=/home/pa"..., msglen=2825, tpls=tpls@entry=0x0, nmsgs=8) at omelasticsearch.c:1449
#6  0x00007f9209a1b313 in submitBatch (pWrkrData=0x7f91f8004350) at omelasticsearch.c:1465
#7  0x0000562f6d426556 in actionTryCommit (pWti=0x562f6db41640, pThis=0x562f6dae8720) at ../action.c:1246
#8  actionCommit (pThis=pThis@entry=0x562f6dae8720, pWti=pWti@entry=0x562f6db41640) at ../action.c:1332
#9  0x0000562f6d42771b in processBatchMain (pVoid=0x562f6dae8720, pBatch=0x562f6db41670, pWti=0x562f6db41640)
    at ../action.c:1449
#10 0x0000562f6d41be9f in ConsumerReg (pThis=0x562f6db41150, pWti=0x562f6db41640) at queue.c:2008
#11 0x0000562f6d41891e in wtiWorker (pThis=pThis@entry=0x562f6db41640) at wti.c:365
#12 0x0000562f6d417a8d in wtpWorker (arg=0x562f6db41640) at wtp.c:410
#13 0x00007f920c06fdd5 in start_thread (arg=0x7f9203319700) at pthread_create.c:308
#14 0x00007f920b163ead in __libc_ifunc_impl_list (name=<optimized out>, array=0x7f9203319700, max=<optimized out>)
    at ../sysdeps/x86_64/multiarch/ifunc-impl-list.c:154
#15 0x0000000000000000 in ?? ()

(gdb) frame 5
#5  curlPost (pWrkrData=pWrkrData@entry=0x7f91f8004350, 
    message=message@entry=0x7f91f80478d0 "{\"index\":{\"_index\": \"hunx_log_sys_write\",\"_type\":\"item\"}}\n{\"@timestamp\":\"2019-02-11T03:23:17+01:00\",\"timegenerated\":\"2019-02-28T10:37:37.991659+01:00\",\"message\":\"   patrol : TTY=unknown ; PWD=/home/pa"..., msglen=2825, tpls=tpls@entry=0x0, nmsgs=8) at omelasticsearch.c:1449
1449		CHKiRet(checkResult(pWrkrData, message));

(gdb) p pWrkrData
$1 = (wrkrInstanceData_t *) 0x7f91f8004350

(gdb) p pWrkrData.reply
$2 = 0x0 <<<<---null

1413	static rsRetVal
1414	curlPost(wrkrInstanceData_t *pWrkrData, uchar *message, int msglen, uchar **tpls, int nmsgs)
1415	{
1416		CURLcode code;
1417		CURL *curl = pWrkrData->curlPostHandle;
1418		DEFiRet;
1419	
1420		pWrkrData->reply = NULL;
1421		pWrkrData->replyLen = 0;
1422	
1423		CHKiRet(checkConn(pWrkrData));
1424		CHKiRet(setPostURL(pWrkrData, pWrkrData->pData, tpls));
1425	
1426		curl_easy_setopt(curl, CURLOPT_WRITEDATA, pWrkrData);
1427		curl_easy_setopt(curl, CURLOPT_POSTFIELDS, (char *)message);
1428		curl_easy_setopt(curl, CURLOPT_POSTFIELDSIZE, msglen);
1429		code = curl_easy_perform(curl);
1430		if (   code == CURLE_COULDNT_RESOLVE_HOST
1431		    || code == CURLE_COULDNT_RESOLVE_PROXY
1432		    || code == CURLE_COULDNT_CONNECT
1433		    || code == CURLE_WRITE_ERROR
1434		   ) {
1435			STATSCOUNTER_INC(indexHTTPReqFail, mutIndexHTTPReqFail);
1436			indexHTTPFail += nmsgs;
1437			DBGPRINTF("omelasticsearch: we are suspending ourselfs due "
1438				  "to failure %lld of curl_easy_perform()\n",
1439				  (long long) code);
1440			ABORT_FINALIZE(RS_RET_SUSPENDED);
1441		}
1442	
1443		DBGPRINTF("omelasticsearch: pWrkrData replyLen = '%d'\n", pWrkrData->replyLen);
1444		if(pWrkrData->replyLen > 0) {
1445			pWrkrData->reply[pWrkrData->replyLen] = '\0'; /* Append 0 Byte if replyLen is above 0 - byte has been reserved in malloc */
1446		}
1447		DBGPRINTF("omelasticsearch: pWrkrData reply: '%s'\n", pWrkrData->reply);
1448	
1449		CHKiRet(checkResult(pWrkrData, message)); <<<<---pWrkrData->reply isn't ever checked to see if it's null


It looks like this issue may have been introduced in the patch from this bz https://bugzilla.redhat.com/show_bug.cgi?id=1565214 which replaced cJSON with libfastjson. I also found the following upstream issue with a similar bt https://github.com/rsyslog/rsyslog/issues/1885. Looks like the fix is in this patch https://github.com/rsyslog/rsyslog/commit/3d2740dd758aad86e857d9e7730bb8db2a23eeb8. I'm not 100% sure it was introduced by the json bz mentioned earlier, but it's possible that cJSON handled the null pointer in some way that didn't cause a segfault in previous versions.

Comment 1 Rich Megginson 2019-03-01 00:14:33 UTC
In trying to triage this issue, I'm wondering how it is possible for reply to be NULL.  Here is the code that sets reply:

static size_t
curlResult(void *ptr, size_t size, size_t nmemb, void *userdata)
{
	char *p = (char *)ptr;
	wrkrInstanceData_t *pWrkrData = (wrkrInstanceData_t*) userdata;
	char *buf;
	size_t newlen;

	newlen = pWrkrData->replyLen + size*nmemb;
	if((buf = realloc(pWrkrData->reply, newlen + 1)) == NULL) {
		DBGPRINTF("omelasticsearch: realloc failed in curlResult\n");
		return 0; /* abort due to failure */
	}
	memcpy(buf+pWrkrData->replyLen, p, size*nmemb);
	pWrkrData->replyLen = newlen;
	pWrkrData->reply = buf;
	return size*nmemb;
}

I think OOM is the only way for reply == NULL?  Can you confirm that you are seeing rsyslog OOM?  If not, can you possibly shed some light into how it is possible for reply to be NULL?

Comment 2 Rich Megginson 2019-03-01 02:02:24 UTC
hmm - looking at the sosreport in the linked case - I don't see anything like an oomkill - but the problem is that it is rsyslog that writes to /var/log/messages, so if rsyslog is having problems, messages will not be written.  Is it possible you could do some sort of scan using journalctl to look for memory or oom kill messages?

Comment 3 Jiří Vymazal 2019-03-01 07:44:28 UTC
Hi Anthony,

could you please provide relevant rsyslog config snippet needed to reproduce the issue?

Thanks

Comment 13 Rich Megginson 2019-03-07 14:40:12 UTC
Created attachment 1541851 [details]
patch

This patch uses option 1

Comment 21 errata-xmlrpc 2020-03-31 19:10:09 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:1000