Bug 567285
Summary: | Abusive spell-checker | ||
---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Nicolas Mailhot <nicolas.mailhot> |
Component: | rpmlint | Assignee: | Ville Skyttä <ville.skytta> |
Status: | CLOSED ERRATA | QA Contact: | Fedora Extras Quality Assurance <extras-qa> |
Severity: | medium | Docs Contact: | |
Priority: | low | ||
Version: | rawhide | CC: | manuel.wolfshant, tmz, ville.skytta |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | rpmlint-0.95-2.fc13 | Doc Type: | Bug Fix |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2010-03-03 20:49:45 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Nicolas Mailhot
2010-02-22 15:07:14 UTC
(In reply to comment #0) > Therefore, it always triggers on packages that > include upstream name(s) in the description. It shouldn't always trigger on upstream names, it tries to avoid warning about "components" of the name of the package being checked among other things. But sure, it will warn about things it doesn't know about and finds misspelled. > (yes I know of en_US camelcase > sentence conventions and no, they are not international conventions and people > who use them should not make other packagers suffer) http://fedoraproject.org/wiki/Packaging:Guidelines#Summary_and_description "Please put personal preferences aside and use American English spelling in the summary and description." > Since it is also too dumb to consolidate multiple occurrences of the same > spelling warning, those warnings tend to drown all other rpmlint messages. It has code to avoid warning multiple times about the same word when the word occurs multiple times in the same tag's value. Filtering across different tags would be misleading IMO. Do you have a reproducer where these features don't work, or concrete ideas how to improve them? You can filter out the spell checker messages altogether if you don't like them, or disable the Enchant spell checker which results in the internal (very basic, not far from useless) spell checker being used which generates much less output. See /usr/share/doc/rpmlint-*/config.example (In reply to comment #1) > (In reply to comment #0) > > Therefore, it always triggers on packages that > > include upstream name(s) in the description. > > It shouldn't always trigger on upstream names, it tries to avoid warning about > "components" of the name of the package being checked among other things. But > sure, it will warn about things it doesn't know about and finds misspelled. rpmlint /tmp/*rpm |grep spelling |sort |uniq gfs-goschen-fonts.src: W: spelling-error %description -l en_US Bodoni -> Bodkin, Bordon, Bodice gfs-goschen-fonts.src: W: spelling-error %description -l en_US Didot -> Dido, Di dot, Di-dot gfs-goschen-fonts.src: W: spelling-error %description -l en_US Georg -> George, Ge org, Ge-org gfs-goschen-fonts.src: W: spelling-error %description -l en_US Göschen -> Gretchen, Gaucheness, Schelling gfs-goschen-fonts.src: W: spelling-error %description -l en_US Göschensche -> Schenectady, Nonscheduled, Gaucheness gfs-goschen-fonts.src: W: spelling-error %description -l en_US Griesbach -> Grievance, Grievous, Gorbachev gfs-goschen-fonts.src: W: spelling-error %description -l en_US Jakob -> Jacob, Jake, Jakarta gfs-goschen-fonts.src: W: spelling-error %description -l en_US Joachim -> Poaching, Joaquin, Machismo gfs-goschen-fonts.src: W: spelling-error %description -l en_US Matthiopoulos -> Matthias gfs-goschen-fonts.src: W: spelling-error %description -l en_US Prillwitz -> Priscilla, Prioritize, Primarily gfs-goschen-fonts.src: W: spelling-error %description -l en_US Verlagsbuchhandlung gfs-goschen-fonts.src: W: spelling-error Summary(en_US) th -> ht, Th, t mplus-fonts.src: W: spelling-error %description -l en_US combinations -> combination, combination's, combination s mplus-fonts.src: W: spelling-error %description -l en_US fullwidth -> full width, full-width, Fullerton mplus-fonts.src: W: spelling-error %description -l en_US halfwidth -> half width, half-width, halfwit mplus-fonts.src: W: spelling-error Summary(en_US) Coji -> Colic, Coir, Coin mplus-fonts.src: W: spelling-error Summary(en_US) Morishita -> Morison, Moorish, Morita mplus-fonts.src: W: spelling-error Summary(en_US) superfamily -> super family, super-family, superficially paktype-nashk-basic-fonts.src: W: spelling-error %description -l en_US Lateef -> Latest, Latent, Lateral paktype-nashk-basic-fonts.src: W: spelling-error %description -l en_US naskh -> Nash, nasal, nasty paktype-nashk-basic-fonts.src: W: spelling-error %description -l en_US Sagar -> Saar, Agar, Sagan paratype-pt-sans-fonts.src: W: spelling-error %description -l en_US Korolkova -> Tsiolkovsky, Tereshkova, Walkover paratype-pt-sans-fonts.src: W: spelling-error %description -l en_US libre -> lire, lib re, lib-re paratype-pt-sans-fonts.src: W: spelling-error %description -l en_US th -> ht, Th, t paratype-pt-sans-fonts.src: W: spelling-error %description -l en_US Umpeleva -> Relevant, Elevator, Elevate paratype-pt-sans-fonts.src: W: spelling-error %description -l en_US Yefimov -> Asimov, Immovable, Immovably ubuntutitle-fonts.src: W: spelling-error %description -l en_US Fitzsimon -> Fitzroy, Fitzpatrick, Fitzgerald (random selection, culling all the duplicates) > > (yes I know of en_US camelcase > > sentence conventions and no, they are not international conventions and people > > who use them should not make other packagers suffer) > > http://fedoraproject.org/wiki/Packaging:Guidelines#Summary_and_description > "Please put personal preferences aside and use American English spelling in the > summary and description." This is not spelling, and in fact even the over-anal spellcheker rpmlint uses now does not require camelcase sentence. American English can thanksfully be written with normal casing conventions that help identify names that should not be spellchecked. > > Since it is also too dumb to consolidate multiple occurrences of the same > > spelling warning, those warnings tend to drown all other rpmlint messages. > > It has code to avoid warning multiple times about the same word when the word > occurs multiple times in the same tag's value. Filtering across different tags > would be misleading IMO. What's misleading today is that there are so much noise about spelling which has no value being checked people are missing the actual warnings they should do something about. > Do you have a reproducer where these features don't work, or concrete ideas how > to improve them? Do not check capitalized words not at the beginning of a sentence > You can filter out the spell checker messages altogether if you don't like > them, or disable the Enchant spell checker which results in the internal (very > basic, not far from useless) spell checker being used which generates much less > output. See /usr/share/doc/rpmlint-*/config.example Already done that (got sick of missing warnings), does not help when reviewing other people packages that include problems that should have been detected at rpmlint time. (In reply to comment #2) > rpmlint /tmp/*rpm |grep spelling |sort |uniq [...] None of these seem to be reports of a "component" of the name of the package in question being flagged as a misspelling. But never mind. > (random selection, culling all the duplicates) It's a bug if exact duplicate messages are emitted. Could you check your random selection if there are any? > Do not check capitalized words not at the beginning of a sentence I don't think this can be done using the python-enchant API, but we can skip all capitalized words. I'll experiment with this. (In reply to comment #3) > (In reply to comment #2) > > rpmlint /tmp/*rpm |grep spelling |sort |uniq > [...] > > None of these seem to be reports of a "component" of the name of the package in > question being flagged as a misspelling. But never mind. It still upstream names (mainly the author names, not the component names, though that was the case for Göschen) that seem to be caught most often. > > (random selection, culling all the duplicates) > > It's a bug if exact duplicate messages are emitted. Could you check your > random selection if there are any? Will do as soon as the multi-hour rpmlint-using script which is running today is finished. Otherwise it will wedge results Without being able to test, I think that what happens most often is that packages with multiple subpackages all share parts of the same description, so rpmlint *rpm in the build dir will report the same bogus spellcheck errors many times over, and hide other messages in the mass (In reply to comment #3) > (In reply to comment #2) > > > Do not check capitalized words not at the beginning of a sentence > > I don't think this can be done using the python-enchant API, but we can skip > all capitalized words. I'll experiment with this. I was wrong, it can be done with the python-enchant API, and as expected, does reduce noise significantly at the expense of missing a few positives that should be flagged and were flagged before. But I think it's an improvement overall and is committed upstream now. (In reply to comment #4) > Without being able to test, I think that what happens most often is that > packages with multiple subpackages all share parts of the same description, so > rpmlint *rpm in the build dir will report the same bogus spellcheck errors many > times over The scope of dupe avoidance could in theory be extended to common srpm when multiple packages are being checked but I'm not quite convinced that it's necessarily a good thing or worth the trouble. As long as the noise is reduced to reasonable levels I agree it's not worth the trouble. To reduce it more the en_US dictionnary needs to be corrected and I have not the faintest idea where such correction demands can be sent "man enchant ; cat /usr/share/enchant/enchant.ordering" gives some hints. AFAIU the "myspell" they talk about means hunspell in Fedora. Thanks a lot rpmlint-0.95-2.fc13 has been submitted as an update for Fedora 13. http://admin.fedoraproject.org/updates/rpmlint-0.95-2.fc13 rpmlint-0.95-2.fc12 has been submitted as an update for Fedora 12. http://admin.fedoraproject.org/updates/rpmlint-0.95-2.fc12 rpmlint-0.95-2.fc13 has been pushed to the Fedora 13 stable repository. If problems still persist, please make note of it in this bug report. rpmlint-0.95-2.fc12 has been pushed to the Fedora 12 stable repository. If problems still persist, please make note of it in this bug report. |