Bug 1632211
| Summary: | sd_journal_get_cursor() failed - when time is changed | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | Ondrej Moriš <omoris> | ||||
| Component: | rsyslog | Assignee: | Jiří Vymazal <jvymazal> | ||||
| Status: | CLOSED ERRATA | QA Contact: | Dalibor Pospíšil <dapospis> | ||||
| Severity: | high | Docs Contact: | |||||
| Priority: | urgent | ||||||
| Version: | 7.6 | CC: | anrussel, apmukher, bkurt, cedrie+rhelbugzilla, dapospis, dawwu, dbodnarc, dgross, dtardon, emarquez, fredrik, gregory.young, hmatsumo, jocatala, jsynacek, jvymazal, kfujii, klaas, mezhang, mezhu, mpoole, myamazak, pasik, qguo, rhel, rmeggins, santony, sfroemer, shgao, systemd-maint-list, systemd-maint, systems, thomas.oulevey, troels, yozone | ||||
| Target Milestone: | rc | Keywords: | Triaged | ||||
| Target Release: | --- | ||||||
| Hardware: | All | ||||||
| OS: | Linux | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | rsyslog-8.24.0-37.el7 | Doc Type: | If docs needed, set a value | ||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | |||||||
| : | 1698218 1722165 (view as bug list) | Environment: | |||||
| Last Closed: | 2019-08-06 12:48:13 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Bug Depends On: | |||||||
| Bug Blocks: | 1647307, 1698218, 1707454, 1722165 | ||||||
| Attachments: |
|
||||||
|
Description
Ondrej Moriš
2018-09-24 11:15:45 UTC
*** Bug 1647307 has been marked as a duplicate of this bug. *** This also seems to happen on every journal reload. Jan 01 22:57:48 server.domain.tld rsyslogd[64946]: sd_journal_get_cursor() failed: 'Cannot assign requested address' [v8.24.0-34.el7] Jan 01 22:57:48 server.domain.tld rsyslogd[64946]: imjournal: journal reloaded... [v8.24.0-34.el7 try http://www.rsyslog.com/e/0 ] Jan 05 09:34:16 server.domain.tld rsyslogd[64946]: sd_journal_get_cursor() failed: 'Cannot assign requested address' [v8.24.0-34.el7] Jan 05 09:34:16 server.domain.tld rsyslogd[64946]: imjournal: journal reloaded... [v8.24.0-34.el7 try http://www.rsyslog.com/e/0 ] I've opened a support case about this: https://access.redhat.com/support/cases/#/case/02278579 Support engineer suggested workaround from ticket upstream ticket: https://bugzilla.redhat.com/show_bug.cgi?id=1615014#c6 907 _public_ int sd_journal_get_cursor(sd_journal *j, char **cursor) {
908 Object *o;
909 int r;
910 char bid[33], sid[33];
911
912 assert_return(j, -EINVAL);
913 assert_return(!journal_pid_changed(j), -ECHILD);
914 assert_return(cursor, -EINVAL);
915
916 if (!j->current_file || j->current_file->current_offset <= 0) // <----------------- HERE
917 return -EADDRNOTAVAIL;
918
919 r = journal_file_move_to_object(j->current_file, OBJECT_ENTRY, j->current_file->current_offset, &o);
920 if (r < 0)
921 return r;
I've marked the place where the code exists from the sd-journal library. The j->current_file is set to NULL.
I believe that this is simply a user error while handling the API. rsyslog should react to the time change and reset the cursor properly.
Created attachment 1542831 [details]
simple systemtap probe
(In reply to Jan Synacek from comment #14) > 907 _public_ int sd_journal_get_cursor(sd_journal *j, char **cursor) { > 908 Object *o; > 909 int r; > 910 char bid[33], sid[33]; > 911 > 912 assert_return(j, -EINVAL); > 913 assert_return(!journal_pid_changed(j), -ECHILD); > 914 assert_return(cursor, -EINVAL); > 915 > 916 if (!j->current_file || j->current_file->current_offset <= 0) // > <----------------- HERE > 917 return -EADDRNOTAVAIL; > 918 > 919 r = journal_file_move_to_object(j->current_file, OBJECT_ENTRY, > j->current_file->current_offset, &o); > 920 if (r < 0) > 921 return r; > > I've marked the place where the code exists from the sd-journal library. The > j->current_file is set to NULL. > > I believe that this is simply a user error while handling the API. rsyslog > should react to the time change and reset the cursor properly. Hi, thanks for analysis, however how can I know that j->current_file is NULL in user code? You do not export journal object properties via API (unless there is some usable hack around it?) IMHO, you have to do it the other way around. Isn't there a piece of code that rsyslog already uses to deal with time jumps? If you detect a jump, simply reset the cursor. Or, you can simply do it once when you get -EADDRNOTAVAIL from the journal API. I don't know how rsyslog uses the API, so I can't be any more precise. (In reply to Jan Synacek from comment #17) > IMHO, you have to do it the other way around. Isn't there a piece of code > that rsyslog already uses to deal with time jumps? If you detect a jump, > simply reset the cursor. Or, you can simply do it once when you get > -EADDRNOTAVAIL from the journal API. I don't know how rsyslog uses the API, > so I can't be any more precise. Thanks for suggestion, while I could add another switch which will ignore this error resulting in quieting the flooding, I find it strange that you issue errors on otherwise finely working usage of the API; I am definitely not going to risk to change the logic now that it is finally working as it should for all use-cases we need. If it comes to silencing the error, are there any other "ignorable" failures I should be handling on rsyslog side? After debugging this in person, we've managed to find the cause and the correct place to fix this in rsyslog's imjournal. Jiri is currently working on the fix and as soon as one is available, I will test it again and confirm the fix. *** Bug 1647307 has been marked as a duplicate of this bug. *** *** Bug 1698218 has been marked as a duplicate of this bug. *** Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2019:2110 |