Bug 1632211

Summary: sd_journal_get_cursor() failed - when time is changed
Product: Red Hat Enterprise Linux 7 Reporter: Ondrej Moriš <omoris>
Component: rsyslogAssignee: Jiří Vymazal <jvymazal>
Status: CLOSED ERRATA QA Contact: Dalibor Pospíšil <dapospis>
Severity: high Docs Contact:
Priority: urgent    
Version: 7.6CC: anrussel, apmukher, bkurt, cedrie+rhelbugzilla, dapospis, dawwu, dbodnarc, dgross, dtardon, emarquez, fredrik, gregory.young, hmatsumo, jocatala, jsynacek, jvymazal, kfujii, klaas, mezhang, mezhu, mpoole, myamazak, pasik, qguo, rhel, rmeggins, santony, sfroemer, shgao, systemd-maint-list, systemd-maint, systems, thomas.oulevey, troels, yozone
Target Milestone: rcKeywords: Triaged
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: rsyslog-8.24.0-37.el7 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1698218 1722165 (view as bug list) Environment:
Last Closed: 2019-08-06 12:48:13 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1647307, 1698218, 1707454, 1722165    
Attachments:
Description Flags
simple systemtap probe none

Description Ondrej Moriš 2018-09-24 11:15:45 UTC
Description of problem:

When time is changed on a machine, rsyslogd reports the following error:

rsyslogd: sd_journal_get_cursor() failed: 'Cannot assign requested address'  [v8.24.0-34.el7]

Version-Release number of selected component (if applicable):

rsyslog-8.24.0-34.el7

How reproducible:

100%

Steps to Reproduce:

1. Make sure rsyslog is running:

# service rsyslog status

2. Change time:

# date -s "$(date -d "now +1000 hours")"

3. Check /var/log/messages

Actual results:

* time is changed correctly
* rsyslog is running and works correctly before and after time change
* /var/log/messages contains sd_journal_get_cursor() failed:

Nov  4 21:54:09 host-172-16-36-101 systemd: Time has been changed
Nov  4 21:54:09 host-172-16-36-101 rsyslogd: sd_journal_get_cursor() failed: 'Cannot assign requested address'  [v8.24.0-34.el7]
Nov  4 21:54:09 host-172-16-36-101 rsyslogd: imjournal: journal reloaded... [v8.24.0-34.el7 try http://www.rsyslog.com/e/0 ]

Expected results:

* time is changed correctly
* rsyslog is running and works correctly before and after time change
* /var/log/messages DOES NOT contain sd_journal_get_cursor() failed'

Additional info:

This issue seems no to have any negative impact on rsyslog behaviour. BUT there was no such message on RHEL-7.5.

Comment 4 Dalibor Pospíšil 2018-11-13 17:49:55 UTC
*** Bug 1647307 has been marked as a duplicate of this bug. ***

Comment 11 Klaas Demter 2019-01-07 08:22:28 UTC
This also seems to happen on every journal reload.

Jan 01 22:57:48 server.domain.tld rsyslogd[64946]: sd_journal_get_cursor() failed: 'Cannot assign requested address'  [v8.24.0-34.el7]
Jan 01 22:57:48 server.domain.tld rsyslogd[64946]: imjournal: journal reloaded... [v8.24.0-34.el7 try http://www.rsyslog.com/e/0 ]
Jan 05 09:34:16 server.domain.tld rsyslogd[64946]: sd_journal_get_cursor() failed: 'Cannot assign requested address'  [v8.24.0-34.el7]
Jan 05 09:34:16 server.domain.tld rsyslogd[64946]: imjournal: journal reloaded... [v8.24.0-34.el7 try http://www.rsyslog.com/e/0 ]

I've opened a support case about this: https://access.redhat.com/support/cases/#/case/02278579

Support engineer suggested workaround from ticket upstream ticket: https://bugzilla.redhat.com/show_bug.cgi?id=1615014#c6

Comment 14 Jan Synacek 2019-03-11 11:47:11 UTC
907 _public_ int sd_journal_get_cursor(sd_journal *j, char **cursor) {
908         Object *o;
909         int r;
910         char bid[33], sid[33];
911 
912         assert_return(j, -EINVAL);
913         assert_return(!journal_pid_changed(j), -ECHILD);
914         assert_return(cursor, -EINVAL);
915 
916         if (!j->current_file || j->current_file->current_offset <= 0) // <----------------- HERE
917                 return -EADDRNOTAVAIL;
918 
919         r = journal_file_move_to_object(j->current_file, OBJECT_ENTRY, j->current_file->current_offset, &o);
920         if (r < 0)
921                 return r;

I've marked the place where the code exists from the sd-journal library. The j->current_file is set to NULL.

I believe that this is simply a user error while handling the API. rsyslog should react to the time change and reset the cursor properly.

Comment 15 Jan Synacek 2019-03-11 11:48:16 UTC
Created attachment 1542831 [details]
simple systemtap probe

Comment 16 Jiří Vymazal 2019-03-11 12:44:36 UTC
(In reply to Jan Synacek from comment #14)
> 907 _public_ int sd_journal_get_cursor(sd_journal *j, char **cursor) {
> 908         Object *o;
> 909         int r;
> 910         char bid[33], sid[33];
> 911 
> 912         assert_return(j, -EINVAL);
> 913         assert_return(!journal_pid_changed(j), -ECHILD);
> 914         assert_return(cursor, -EINVAL);
> 915 
> 916         if (!j->current_file || j->current_file->current_offset <= 0) //
> <----------------- HERE
> 917                 return -EADDRNOTAVAIL;
> 918 
> 919         r = journal_file_move_to_object(j->current_file, OBJECT_ENTRY,
> j->current_file->current_offset, &o);
> 920         if (r < 0)
> 921                 return r;
> 
> I've marked the place where the code exists from the sd-journal library. The
> j->current_file is set to NULL.
> 
> I believe that this is simply a user error while handling the API. rsyslog
> should react to the time change and reset the cursor properly.

Hi, thanks for analysis, however how can I know that j->current_file is NULL in user code? You do not export journal object properties via API (unless there is some usable hack around it?)

Comment 17 Jan Synacek 2019-03-11 15:29:14 UTC
IMHO, you have to do it the other way around. Isn't there a piece of code that rsyslog already uses to deal with time jumps? If you detect a jump, simply reset the cursor. Or, you can simply do it once when you get -EADDRNOTAVAIL from the journal API. I don't know how rsyslog uses the API, so I can't be any more precise.

Comment 18 Jiří Vymazal 2019-03-11 16:30:32 UTC
(In reply to Jan Synacek from comment #17)
> IMHO, you have to do it the other way around. Isn't there a piece of code
> that rsyslog already uses to deal with time jumps? If you detect a jump,
> simply reset the cursor. Or, you can simply do it once when you get
> -EADDRNOTAVAIL from the journal API. I don't know how rsyslog uses the API,
> so I can't be any more precise.

Thanks for suggestion, while I could add another switch which will ignore this error resulting in quieting the flooding, I find it strange that you issue errors on otherwise finely working usage of the API; I am definitely not going to risk to change the logic now that it is finally working as it should for all use-cases we need. If it comes to silencing the error, are there any other "ignorable" failures I should be handling on rsyslog side?

Comment 20 Jan Synacek 2019-03-13 13:05:46 UTC
After debugging this in person, we've managed to find the cause and the correct place to fix this in rsyslog's imjournal. Jiri is currently working on the fix and as soon as one is available, I will test it again and confirm the fix.

Comment 29 Jiří Vymazal 2019-03-22 07:39:00 UTC
*** Bug 1647307 has been marked as a duplicate of this bug. ***

Comment 35 Chris Williams 2019-05-22 20:02:52 UTC
*** Bug 1698218 has been marked as a duplicate of this bug. ***

Comment 37 errata-xmlrpc 2019-08-06 12:48:13 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2019:2110