Bug 488146 - rpmlint: complains about "mixed-use-of-spaces-and-tabs" when it shouldn't
Summary: rpmlint: complains about "mixed-use-of-spaces-and-tabs" when it shouldn't
Keywords:
Status: CLOSED NEXTRELEASE
Alias: None
Product: Fedora
Classification: Fedora
Component: rpmlint
Version: 10
Hardware: All
OS: Linux
low
medium
Target Milestone: ---
Assignee: Ville Skyttä
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2009-03-02 21:18 UTC by Mattias Ellert
Modified: 2009-04-09 16:18 UTC (History)
3 users (show)

Fixed In Version: 0.87-1.fc10
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2009-04-06 20:26:09 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)

Description Mattias Ellert 2009-03-02 21:18:57 UTC
Description of problem:
rpmlint complains about "mixed-use-of-spaces-and-tabs" when it shouldn't

Version-Release number of selected component (if applicable):
rpmlint-0.85-2.fc9.noarch
rpmlint-0.85-2.fc10.noarch

How reproducible:
Always

Steps to Reproduce:
1. run rpmlint on a specfile that uses tabs for indentation
2. see it complain about indentation using spaces (unless you are extremely lucky)
  
Actual results:
Warnings about "mixed-use-of-spaces-and-tabs"

Expected results:
No warnings

Additional info:

The problem is due to badly constructed regexes in SpecCheck.py:

indent_spaces_regex = re.compile(' {3}.*\S')
indent_tabs_regex = re.compile('\t.*\S')

If these are replaced by the more carefully crafted versions:

indent_spaces_regex = re.compile('( \t|(^|\t)([^\t]{8})*[^\t]{6}[^\t]?  )')
indent_tabs_regex = re.compile('\t')

the problem is fixed.

Comment 1 Ville Skyttä 2009-03-02 21:54:03 UTC
Could you spell out in human readable form what you're after in indent_spaces_regex?  It isn't immediately obvious from reading it, and it seems quite suspicious that it apparently "allows" tabs in some positions when the primary purpose of the regex is to detect lines that use _spaces_ for indentation.  Some test cases would be nice, too (preferably as attachments).

Comment 2 Mattias Ellert 2009-03-03 23:11:08 UTC
In order to detect if spaces are used for indentation you must take tabs into account. Only if a space is used immediately before a tab or if a row of two or more consequtive spaces ends at or crosses a 8 character boundary ("tab stop") you can say that spaces are used for indentation on that line. The tabs in the regexp is there to properly find the 8 character boundaries, not to "allow" tabs. The regexp is not designed to detect lines that uses only spaces for indentation, but lines that uses spaces for indentation somewhere on the line - there may well be a tab somewhere on the line too.

The indent_spaces_regex matches any line that uses spaces to create indentation that could have been created using tabs - which is the kind of indentation that is not allowed if tabs are used for indentation. Using spaces to create indentation that can not be created using tabs is not an error in a document that uses tabs for indentation and rpmlint should not issue false warnings about it.

Comment 3 Ville Skyttä 2009-03-04 17:54:31 UTC
I think I understand how these changes would be helpful for cases where the writer wishes to use tabs (where possible) for indentation.  However, if the writer wishes to use spaces for indentation, there should be only spaces, and no tabs *anywhere* which I believe would no longer be detected.

The intent of this check is to work for both tabs "only" and spaces only indented specfiles as well as possible.  It's not optimal for files intentionally using tabs at the moment, but should AFAIK always do the right thing from "spaces only" POV.  Unless I missed something, your changes would just flip it around (tabbed files would always or at least more often be detected properly, spaces only not) which I don't think justifies making the change as is.

Feel free to reopen if I still missed something, or if I didn't and you find out a way to make both cases work correctly; prefarably with pointers to or attached test cases.

Comment 4 Mattias Ellert 2009-03-05 15:45:37 UTC
No, no, no, no, no, ....

Please let me reply before closing this bug.

If there is a tab anywhere it will match the indent_tabs_regex:

indent_tabs_regex = re.compile('\t')

This will only be allowed if there is no match anywhere for the indent_spaces_regex.

A spec using spaces for indentation that has a tab in it will fail the test, except in the rare case where all the spaces are in places where they would be allowed in a document using tabs - which would be extremely rare. And in this case the document would be a valid "uses tabs" document so rpmlint should not complain, since rpmlint can not know that the user intended to use spaces in this case.

Comment 5 Ville Skyttä 2009-03-05 21:03:58 UTC
Ok, I still can't for some reason get my brain wrapped around this but I ran some tests and the results speak for themselves and indicate that this would be an improvement.  I tested all [ab]*.spec files in current Fedora devel CVS, results:

1) False positives introduced, could be "fixed" by ignoring trailing whitespace: airsnort.spec, beecrypt.spec

2) False positives introduced, could be "fixed" by adding a special case that doesn't treat (exactly) two whitespaces after '.', '!', '?' (common writing practice) as indentation with spaces: aboot.spec, bzip2.spec, biosdevname.spec

3) False positives introduced, debatable and/or probably very hard to "fix": at.spec, bacula.spec, bibletime.spec

4) Fixed false positives: aldrin.spec, anthy.spec, apg.spec, audit.spec, bluez-utils.spec, bug-buddy.spec

5) Added good catches: abicheck.spec, ace.spec, afuse.spec, alevt.spec, anjuta-gdl.spec, asunder.spec, beagle.spec, boinc-client.spec

6) No longer caught cases: none.

So, in this test set, there's almost twice as many cases that have improved compared to ones that have changed for worse so I've applied a trivially modified version of your change upstream as http://rpmlint.zarb.org/cgi-bin/trac.cgi/changeset/1561

Thank you for your persistence :).  If you still have some energy to spare on improving this, I think the "false positives" in 2) and maybe 1) above could be eliminated with moderate effort without causing other problems.

Comment 6 Mattias Ellert 2009-03-18 03:11:24 UTC
I think this regex will fix your case 2:

'( \t|(^|\t)([^\t]{8})*[^\t]{4}[^\t]?([^\t][^\t.!?]|[^\t]?[.!?] )  )'

Comment 7 Ville Skyttä 2009-03-19 21:05:44 UTC
Looks good, applied upstream as http://rpmlint.zarb.org/cgi-bin/trac.cgi/changeset/1568.  Thanks again!

Comment 8 Fedora Update System 2009-03-19 23:29:35 UTC
rpmlint-0.87-1.fc10 has been submitted as an update for Fedora 10.
http://admin.fedoraproject.org/updates/rpmlint-0.87-1.fc10

Comment 9 Fedora Update System 2009-03-19 23:31:03 UTC
rpmlint-0.87-1.fc9 has been submitted as an update for Fedora 9.
http://admin.fedoraproject.org/updates/rpmlint-0.87-1.fc9

Comment 10 Fedora Update System 2009-03-23 15:56:32 UTC
rpmlint-0.87-1.fc10 has been pushed to the Fedora 10 testing repository.  If problems still persist, please make note of it in this bug report.
 If you want to test the update, you can install it with 
 su -c 'yum --enablerepo=updates-testing update rpmlint'.  You can provide feedback for this update here: http://admin.fedoraproject.org/updates/F10/FEDORA-2009-2946

Comment 11 Fedora Update System 2009-04-06 20:25:40 UTC
rpmlint-0.87-1.fc10 has been pushed to the Fedora 10 stable repository.  If problems still persist, please make note of it in this bug report.

Comment 12 Fedora Update System 2009-04-09 16:18:35 UTC
rpmlint-0.87-1.fc9 has been pushed to the Fedora 9 stable repository.  If problems still persist, please make note of it in this bug report.


Note You need to log in before you can comment on or make changes to this bug.