| Summary: | readpst -r incorrectly names a file according to its type: Recoverable Items/Calendar Logging/mbox | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | [Fedora] Fedora | Reporter: | Ivan Zakharyaschev <imz> | ||||||
| Component: | libpst | Assignee: | Carl Byington <carl> | ||||||
| Status: | CLOSED CURRENTRELEASE | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||||
| Severity: | medium | Docs Contact: | |||||||
| Priority: | unspecified | ||||||||
| Version: | 27 | CC: | carl, ppisar | ||||||
| Target Milestone: | --- | Flags: | carl:
needinfo-
carl: rhel-rawhide- |
||||||
| Target Release: | --- | ||||||||
| Hardware: | All | ||||||||
| OS: | All | ||||||||
| Whiteboard: | |||||||||
| Fixed In Version: | 0.6.69 | Doc Type: | If docs needed, set a value | ||||||
| Doc Text: | Story Points: | --- | |||||||
| Clone Of: | Environment: | ||||||||
| Last Closed: | 2017-08-16 15:50:07 UTC | Type: | Bug | ||||||
| Regression: | --- | Mount Type: | --- | ||||||
| Documentation: | --- | CRM: | |||||||
| Verified Versions: | Category: | --- | |||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||
| Attachments: |
|
||||||||
I must have explained the problem not clear enough: the types output by file are correct. (I have looked inside the file, and it indeed contains calendar cards, not mail.) /tmp/readpst/root/mailtst/Recoverable Items/Calendar Logging/mbox: vCalendar calendar file But the name as used by readpst -r is wrong. It is named "mbox", but this doesn't correspond to the actual type. It should be named "calendar". Created attachment 1193404 [details] fgrep -1 'I have a' ../tmp/readpst/root/mailtst.readpst.log > Can you re-run that readpst command with > > -d some.log.file.txt > > and then > > grep "I have" some.log.file.txt I've done that, although the log file became really huge (27G). I have grepped for "I have a" not to get extra content from the message bodies (I've just checked the readpst.c source code, and this seems to be a narrower pattern that wouldn't miss any debugging message). I've also include one line of context. I believe now that the problem is a more global one:
Some folders contain items of different several types.
I've run readpst -e, and see that in this specific case the first item parsed is probably not an email, whereas the folder type is probably undefined, therefore it is created as mbox, but then the type gets overridden, and calendar items are saved there:
$ ls readpst/root.extensions/mailtst/Recoverable\ Items/Calendar\ Logging/ | head
1.ics
10.ics
100.eml
101.ics
102.eml
103.ics
104.eml
105.ics
106.ics
107.eml
$
This is not nice: naming something "mbox", but not saving well-formed emails there.
I believe this specific issue requires a fix.
Globally, I'm thinking about a solution where readpst -r would create several files in the same folder (mbox, calendar) for such cases. This would allow to save everything and not mess things up. (Other modes except for readpst -r are not very attractive for me, because they do no create mbox--the only output format understood completely by dovecot, and hence, doveadm sync (for reading and importing whole user's accounts). MH is not supported by dovecot.)
Here one can see all the folders with different types of items:
$ find readpst/root.extensions/ -type d -print -exec sh -c 'ls "{}" | fgrep . | cut -d. --fields=2 | sort -u' ';'
readpst/root.extensions/
readpst/root.extensions/mailtst
readpst/root.extensions/mailtst/Заметки
readpst/root.extensions/mailtst/МСЭД
eml
readpst/root.extensions/mailtst/Входящие
eml
readpst/root.extensions/mailtst/Входящие/103
eml
readpst/root.extensions/mailtst/Входящие/Миграция ADEX
eml
readpst/root.extensions/mailtst/Входящие/Оперативка
eml
readpst/root.extensions/mailtst/Входящие/СПО - ФСТЭК
eml
readpst/root.extensions/mailtst/Входящие/СУТП
eml
readpst/root.extensions/mailtst/Входящие/Схемы ЛВС
eml
readpst/root.extensions/mailtst/Нежелательная почта
readpst/root.extensions/mailtst/ТТ на согласование
eml
readpst/root.extensions/mailtst/Junk
eml
readpst/root.extensions/mailtst/Задачи
readpst/root.extensions/mailtst/Прочитать и дать ответ!
eml
readpst/root.extensions/mailtst/Прочитать и дать ответ!/Что-то важное
eml
readpst/root.extensions/mailtst/Отправленные
eml
readpst/root.extensions/mailtst/aрхив
eml
readpst/root.extensions/mailtst/Контакты
vcf
readpst/root.extensions/mailtst/Контакты/Recipient Cache
vcf
readpst/root.extensions/mailtst/Черновики
readpst/root.extensions/mailtst/Календарь
ics
readpst/root.extensions/mailtst/Sent
eml
readpst/root.extensions/mailtst/Удаленные
eml
ics
readpst/root.extensions/mailtst/ЕКП
eml
readpst/root.extensions/mailtst/Recoverable Items
readpst/root.extensions/mailtst/Recoverable Items/Deletions
eml
ics
readpst/root.extensions/mailtst/Recoverable Items/Calendar Logging
eml
ics
readpst/root.extensions/mailtst/Предлагаемые контакты
vcf
readpst/root.extensions/mailtst/Ошибки синхронизации
readpst/root.extensions/mailtst/Ошибки синхронизации/Конфликты
eml
$
So, the problematic folders are not normal ones:
* Удаленные (means "Deleted" in Russian)
* Recoverable Items/Deletions
* Recoverable Items/Calendar Logging
Created attachment 1193451 [details]
libpst-no-bad-mboxes.patch
Fixed the problem with bad mboxes by the attached patch.
A related question that appeared to me was: in Thunderbird mode, is it OK that zero would be written to .type for such folders? (I have not changed this.)
A check of the result I've made by comparing to the previous output (a big account, around 6G):
-bash-4.3$ diff -rq --exclude='*.log' -Iboundary -Iname -IDTSTAMP root root.nobad
Files root/mailtst/Recoverable Items/Calendar Logging/mbox and root.nobad/mailtst/Recoverable Items/Calendar Logging/mbox differ
-bash-4.3$ find root.nobad/ -name mbox -print0 | xargs -0 file
root.nobad/mailtst/МСЭД/mbox: ISO-8859 mail text, with very long lines
root.nobad/mailtst/Входящие/Оперативка/mbox: UTF-8 Unicode mail text
root.nobad/mailtst/Входящие/Схемы ЛВС/mbox: Non-ISO extended-ASCII mail text, with very long lines
root.nobad/mailtst/Входящие/103/mbox: UTF-8 Unicode mail text, with very long lines
root.nobad/mailtst/Входящие/mbox: ISO-8859 mail text, with very long lines
root.nobad/mailtst/Входящие/СПО - ФСТЭК/mbox: Non-ISO extended-ASCII mail text, with very long lines, with LF, NEL line terminators
root.nobad/mailtst/Входящие/Миграция ADEX/mbox: UTF-8 Unicode mail text, with very long lines
root.nobad/mailtst/Входящие/СУТП/mbox: UTF-8 Unicode mail text, with very long lines
root.nobad/mailtst/ТТ на согласование/mbox: UTF-8 Unicode mail text, with very long lines
root.nobad/mailtst/Junk/mbox: UTF-8 Unicode mail text, with very long lines
root.nobad/mailtst/Прочитать и дать ответ!/Что-то важное/mbox: Non-ISO extended-ASCII mail text, with very long lines, with LF, NEL line terminators
root.nobad/mailtst/Прочитать и дать ответ!/mbox: Non-ISO extended-ASCII mail text, with very long lines, with LF, NEL line terminators
root.nobad/mailtst/Отправленные/mbox: UTF-8 Unicode HTML document text, with very long lines
root.nobad/mailtst/aрхив/mbox: Non-ISO extended-ASCII mail text, with very long lines
root.nobad/mailtst/Sent/mbox: UTF-8 Unicode text
root.nobad/mailtst/Удаленные/mbox: Non-ISO extended-ASCII mail text, with very long lines, with LF, NEL line terminators
root.nobad/mailtst/ЕКП/mbox: UTF-8 Unicode mail text
root.nobad/mailtst/Recoverable Items/Deletions/mbox: UTF-8 Unicode mail text, with very long lines
root.nobad/mailtst/Recoverable Items/Calendar Logging/mbox: UTF-8 Unicode mail text
root.nobad/mailtst/Ошибки синхронизации/Конфликты/mbox: UTF-8 Unicode mail text
-bash-4.3$
Everything is fine.
BTW, the previous output seems to have saved the calendars to "root/mailtst/Recoverable Items/Calendar Logging/mbox" -- those ones which are also present as attachments in the new correct output mbox.
fixed. This bug appears to have been reported against 'rawhide' during the Fedora 27 development cycle. Changing version to '27'. fixed in 0.6.69 |
Description of problem: readpst -r incorrectly names a file according to its type: Recoverable Items/Calendar Logging/mbox When doing the recursive folders with: readpst -w -o /tmp/readpst//root -r -D /root/mailtst.pst the result has a mismatch between the name and the type at Recoverable Items/Calendar Logging/mbox Version-Release number of selected component (if applicable): 0.6.67 How reproducible: 100% Steps to Reproduce: readpst -w -o /tmp/readpst//root -r -D /root/mailtst.pst This special source file was not easy to find (my previous test files didn't contain this folder). And this example is huge and contains private information, so I can't send it (but I can perform actions needed to debug and test it -- those which you need). Actual results: the result has a mismatch between the name and the type at Recoverable Items/Calendar Logging/mbox: # find /tmp/readpst/root/mailtst/ -type f -print0 | xargs -0 file ... /tmp/readpst/root/mailtst/Recoverable Items/Calendar Logging/mbox: vCalendar calendar file /tmp/readpst/root/mailtst/Recoverable Items/Deletions/mbox: UTF-8 Unicode mail text, with very long lines ... /tmp/readpst/root/mailtst/Календарь/calendar: vCalendar calendar file ... Expected results: It is expected to be named "calendar". Additional info: This prevents a correct use of the output as a mail_location for dovecot; as in: doveadm -Dv sync -u USER -1 -R mbox:/tmp/readpst//root/mailtst:UTF-8:DIRNAME=mbox:INDEX=/tmp/readpst//root/mailtst/.dovecot-index The mail from me to the libspt mainatainer (carl at five-ten-sg etc) gets rejected. Perhaps, the mail from bugzilla will somehow reach him...