Bug 1496905
| Summary: | wc -l gives wrong line count for files with windows line breaks and single trailing CRNL | ||||||
|---|---|---|---|---|---|---|---|
| Product: | [Fedora] Fedora | Reporter: | ell1e <el> | ||||
| Component: | coreutils | Assignee: | Kamil Dudka <kdudka> | ||||
| Status: | CLOSED NOTABUG | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||
| Severity: | unspecified | Docs Contact: | |||||
| Priority: | unspecified | ||||||
| Version: | 27 | CC: | admiller, el, jamartis, jarodwilson, kdudka, kzak, ooprala, ovasik, p, skisela, twaugh | ||||
| Target Milestone: | --- | ||||||
| Target Release: | --- | ||||||
| Hardware: | Unspecified | ||||||
| OS: | Unspecified | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2017-09-29 10:39:47 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Attachments: |
|
||||||
Sorry, here is a less confusing hex dump output which shows the two \r\n sequences more clearly: jonas@cyberman:~$ hexdump -b test.txt 0000000 141 142 143 015 012 144 145 146 015 012 000000a Those two outputs of hexdump do not match with each other. If you convert the sequence from comment #0 to octals, you get: 142 141 015 143 144 012 146 145 012 015 The problem here is that 015 (CR) and 012 (LF) are not next to each other, so it cannot be recognized as a CR-LF sequence. If I use the input from comment #1, I get the expected result: % for x in 141 142 143 015 012 144 145 146 015 012; do printf "\x$(printf "%x" $((0$x)))"; done | hexdump -C 00000000 61 62 63 0d 0a 64 65 66 0d 0a |abc..def..| 0000000a % for x in 141 142 143 015 012 144 145 146 015 012; do printf "\x$(printf "%x" $((0$x)))"; done | wc -l 2 Please attach the exact input file you are giving to 'wc -l' on input and paste the exact result that you get out of 'wc -l'. (In reply to Kamil Dudka from comment #2) > 142 141 015 143 144 012 146 145 012 015 > > The problem here is that 015 (CR) and 012 (LF) are not next to each other, Moreover, the second sequence is LF-CR instead of CR-LF. The hexdump in the initial bug description is misleading. Please only consider the hexdump in Comment 1 with the proper byte-wise format. The file from Comment 1 does NOt yield the expected result for me: jonas@cyberman:~$ hexdump -b test.txt 0000000 141 142 143 015 012 144 145 146 015 012 000000a jonas@cyberman:~$ wc -l test.txt 2 test.txt jonas@cyberman:~$ Please note the expected result is 3(!) lines, because a trailing \r\n on Windows implies an additional trailing empty line (see also Notepad screenshot which clearly shows three lines). (just to be super clear: a trailing \n on Linux doesn't imply a following empty line because on Linux \n just terminates the previous line as per POSIX: http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap03.html#tag_03_206 However, on Windows \r\n appears to be always a true line break no matter if something follows or not.) The correct result is just 2 lines for the sequence in comment #1 because you have only two CR-LF sequences there. It does not really matter if something follows after the last CR-LF sequence or not. 'wc -l' just counts the newline characters. Alright, that makes sense. I just tried a file with no \n but other contents and indeed it shows 0, which is consistent with that. I did assume it counts the lines in the file, but I probably should have read the man page better.. |
Created attachment 1332099 [details] notepad.png Description of problem: wc -l gives the wrong line count for files with windows line breaks and single trailing CRNL. These are the contents of a file with which this can be tested: jonas@cyberman:~$ hexdump test.txt 0000000 6261 0d63 640a 6665 0a0d 000000a jonas@cyberman:~$ Check notepad.png (attached) to see that this file shows up as 3 lines with a trailing empty line on Microsoft Windows. (which I suggest should be canonical on how to interpret Windows line breaks) This is what wc -l says on this exact file: jonas@cyberman:~$ wc -l test.txt 2 test.txt jonas@cyberman:~$ Version-Release number of selected component (if applicable): coreutils-8.27-16.fc27.x86_64 How reproducible: 100% Steps to Reproduce: 1. Create above file with Windows line breaks 2. Check it out in Notepad on Windows 3. Try wc -l file.txt Actual results: wc -l doesn't agree with Notepad.exe about how many lines this file has Expected results: wc -l prints out same line count as visible in Notepad.exe on Windows Additional info: