Description of problem: I F18, things crash every here and there. Trying to report via abrt, I'm getting false positives aout sensitive informations in backtraces and memory maps - the Artificial non-Inteligence matches the string "login" in the name of library "libsystemd-login.so.0" Version-Release number of selected component (if applicable): abrt-2.0.20-1.fc18.x86_64 How reproducible: always (if libsystemd-login.so.0 is involved in the crash) Steps to Reproduce: 1. install Fedora 18 2. try to use it for five minutes 3. run abrt-gui to report the crashes that happened meanwhile 4. choose some crash and try to report it Actual results: the wizard tells you there is sensitive info in backtrace examining the textarea, you see that the string "login" in "libsystemd-login.so.0" is highlighted in red Expected results: "login" is not highlighted if it is a part of packaged file name Additional info: btw, it would be nice if the whole line with the offending keyword would be highlighted (with the keyword being in bold)so that you can spot the problems easier, even some buttons allowing to scroll to next/previous problem would be handy ... but that's a RFE material for a new bug, and now I'm lazy to search if somebody hasn't entered it already, bugzilla responses are in tenths of seconds now :-(
It has to be that "not-intelligent" because in some backtraces there can be variables like "mylogin", "mypassword", "tmppassword", .etc... So ignoring it just because it's part of the name of some library would be dangerous because the UI logic doesn't know what is the content it's highlighting... As for the additional info part: There are arrows to jump thru all the highlighted words, so highlighting the whole line is not necessary.
(In reply to comment #1) > It has to be that "not-intelligent" because in some backtraces there can be > variables like "mylogin", "mypassword", "tmppassword", .etc... So ignoring > it just because it's part of the name of some library would be dangerous > because the UI logic doesn't know what is the content it's highlighting... please read carefully I'm NOT proposing to ignore the string "login" > As for the additional info part: There are arrows to jump thru all the > highlighted words, so highlighting the whole line is not necessary. pardon my blindness, but I really do not see such arrows - could you highlight them on the attached screenshot please?
Created attachment 685121 [details] abrt-gui screenshot
(In reply to comment #2) > (In reply to comment #1) > > It has to be that "not-intelligent" because in some backtraces there can be > > variables like "mylogin", "mypassword", "tmppassword", .etc... So ignoring > > it just because it's part of the name of some library would be dangerous > > because the UI logic doesn't know what is the content it's highlighting... > > please read carefully > > I'm NOT proposing to ignore the string "login" - ok, so you're just proposing to not highlight it, but I'm saying it's not possible to determine if the word "login" is mentioned in some security sensitive context or not without teaching ABRT to understand the context which is not worth it > > > As for the additional info part: There are arrows to jump thru all the > > highlighted words, so highlighting the whole line is not necessary. > > pardon my blindness, but I really do not see such arrows - could you > highlight them on the attached screenshot please? - of course, please see attached picture
Created attachment 685135 [details] abrt gui with highlighted search arrorws
There is no generic way to say whether string is OK or is not. It is better to show all suspicious strings than skip one important.
(In reply to Jiri Moskovcak from comment #4) > - ok, so you're just proposing to not highlight it, but I'm saying it's not > possible to determine if the word "login" is mentioned in some security > sensitive context or not without teaching ABRT to understand the context > which is not worth it would that be possible to elaborate a bit what does that mean "not worth it"? like, for example: we think of two possible solutions a) to keep false positive list - easy to implement, estimated developer time 2 hours - hard to maintain, estimated half a day monthly b) create AI that would check the context of the suspicious match if that's a real filename - harder to implement, estimated developer time one full day - no maintenance cost - little slowdown in processing, unimportant to the user as generating the backtraces and other tasks urge the user to go take a nap anyways this compares with the fact that - we are getting half a million number of abrt reports monthly and the number is expected to grow in the future - our estimate (based on what?) is that only 0,1% suffer from such false positives, that makes 500 affected reports monthly - examining such a false positive takes 5 minutes to inexperienced user and half a minute to experienced users, which makes 1 minute average in the mix - that sums to one workday wasted monthly by examining false positives as we value user time 100 times less than developer time, we may discard a) and b) would mean that it'd take 100 months at current reporting rate = over 8 years to outweight the precious developer time, and we do not plan to maintain this code that long ... of course the numbers are completely made up > > > As for the additional info part: There are arrows to jump thru all the > > > highlighted words, so highlighting the whole line is not necessary. > > > > pardon my blindness, but I really do not see such arrows - could you > > highlight them on the attached screenshot please? > > - of course, please see attached picture thankyou ... not quite obvious to me, but once learned, it makes my life easier (In reply to Michal Toman from comment #6) > There is no generic way to say whether string is OK or is not. I'm not talking about "generic way", I'm talking about one concrete case of false positive => you completely change point of this report, reopening > It is better to show all suspicious strings than skip one important. do you have any proof to this bold statement? - usually, the reality proves otherwise ... for example, recently I've read some analysis that too much traffic signs leads to drivers overlooking and ignoring them rather than thinking about each one carefully, and I believe this principle to be generic
I'm also often affected by this. libsecret or similar string is highlighted in error. You claim that it's hard to distinguish user-related and system-related info in a plaintext file. I agree. But some of the whitelists could be really simple. For example, can some user data even end up in "maps" tab? From what I see, those are just memory addresses and file names. Shouldn't that be excluded from private info search completely? As for "backtrace" tab, can't we whitelist something like this? (/usr)?/lib(64)?/\S*libsecret-\S*\.so\.\S* (/usr)?/lib(64)?/\S*libpasswd-\S*\.so\.\S* Or maybe anything in /lib? (/usr)?/lib(64)?/\S* Or, I agree with Karel, it would be really simple to just test whether the matched text sequence is an existing file or not. I don't think it's that hard to implement and it would avoid so many false positives. Are there some other reasons why not to do this?
*** Bug 1008914 has been marked as a duplicate of this bug. ***
(In reply to Kamil Páral from comment #8) > I'm also often affected by this. libsecret or similar string is highlighted > in error. > > You claim that it's hard to distinguish user-related and system-related info > in a plaintext file. I agree. But some of the whitelists could be really > simple. > > For example, can some user data even end up in "maps" tab? From what I see, > those are just memory addresses and file names. Shouldn't that be excluded > from private info search completely? > > As for "backtrace" tab, can't we whitelist something like this? > (/usr)?/lib(64)?/\S*libsecret-\S*\.so\.\S* > (/usr)?/lib(64)?/\S*libpasswd-\S*\.so\.\S* > > Or maybe anything in /lib? > (/usr)?/lib(64)?/\S* - yes, except the current implementation doesn't support regexps > > Or, I agree with Karel, it would be really simple to just test whether the > matched text sequence is an existing file or not. - not without hardcoding some info (checking all the files from the list of loaded libraries might not be a good idea) > > I don't think it's that hard to implement and it would avoid so many false > positives. Are there some other reasons why not to do this? - yes. it's actually quite complicated, because libreport doesn't know what data it processes and it simply searches all the text files - so without adding some additional metadata the only solution is a simple whitelist which currently contains only: login.so libsecret.so - if you have any other candidates, please add them into commit to rhbz#1009730
Also: SHELL=/sbin/nologin/
*** This bug has been marked as a duplicate of bug 1009730 ***