Bug 1369499 - readpst -r incorrectly names a file according to its type: Recoverable Items/Calendar Logging/mbox
Summary: readpst -r incorrectly names a file according to its type: Recoverable Items/...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Fedora
Classification: Fedora
Component: libpst
Version: 27
Hardware: All
OS: All
unspecified
medium
Target Milestone: ---
Assignee: Carl Byington
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-08-23 15:00 UTC by Ivan Zakharyaschev
Modified: 2021-05-14 04:44 UTC (History)
2 users (show)

Fixed In Version: 0.6.69
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-08-16 15:50:07 UTC
Type: Bug
Embargoed:
carl: needinfo-
carl: rhel-rawhide-


Attachments (Terms of Use)
fgrep -1 'I have a' ../tmp/readpst/root/mailtst.readpst.log (20.08 KB, text/plain)
2016-08-23 17:46 UTC, Ivan Zakharyaschev
no flags Details
libpst-no-bad-mboxes.patch (2.68 KB, patch)
2016-08-24 01:16 UTC, Ivan Zakharyaschev
carl: review-
Details | Diff

Description Ivan Zakharyaschev 2016-08-23 15:00:05 UTC
Description of problem:

readpst -r incorrectly names a file according to its type: Recoverable Items/Calendar Logging/mbox

When doing the recursive folders with: readpst -w -o /tmp/readpst//root -r -D
/root/mailtst.pst

the result has a mismatch between the name and the type at Recoverable
Items/Calendar Logging/mbox


Version-Release number of selected component (if applicable):

0.6.67

How reproducible:

100%

Steps to Reproduce:

readpst -w -o /tmp/readpst//root -r -D /root/mailtst.pst

This special source file was not easy to find (my previous test files didn't contain this folder). And this example is huge and contains private information, so I can't send it (but I can perform actions needed to debug and test it -- those which you need).

Actual results:

the result has a mismatch between the name and the type at Recoverable
Items/Calendar Logging/mbox:

# find /tmp/readpst/root/mailtst/ -type f -print0 | xargs -0 file
...
/tmp/readpst/root/mailtst/Recoverable Items/Calendar Logging/mbox:             
                  vCalendar calendar file
/tmp/readpst/root/mailtst/Recoverable Items/Deletions/mbox:                    
                  UTF-8 Unicode mail text, with very long lines
...
/tmp/readpst/root/mailtst/Календарь/calendar:                                  
                  vCalendar calendar file
...


Expected results:

It is expected to be named "calendar".

Additional info:

This prevents a correct use of the output as a mail_location for dovecot; as
in:

doveadm -Dv sync -u USER -1 -R
mbox:/tmp/readpst//root/mailtst:UTF-8:DIRNAME=mbox:INDEX=/tmp/readpst//root/mailtst/.dovecot-index

The mail from me to the libspt mainatainer (carl at five-ten-sg etc) gets rejected. Perhaps, the mail from bugzilla will somehow reach him...

Comment 1 Ivan Zakharyaschev 2016-08-23 15:24:22 UTC
I must have explained the problem not clear enough:

the types output by file are correct. (I have looked inside the file, and
it indeed contains calendar cards, not mail.)

/tmp/readpst/root/mailtst/Recoverable Items/Calendar Logging/mbox: vCalendar calendar file

But the name as used by readpst -r is wrong. It is named "mbox", but this
doesn't correspond to the actual type. It should be named "calendar".

Comment 2 Ivan Zakharyaschev 2016-08-23 17:46:37 UTC
Created attachment 1193404 [details]
fgrep -1 'I have a' ../tmp/readpst/root/mailtst.readpst.log

> Can you re-run that readpst command with
> 
>   -d some.log.file.txt
> 
> and then
> 
>   grep "I have" some.log.file.txt

I've done that, although the log file became really huge (27G).

I have grepped for "I have a" not to get extra content from the message bodies (I've just checked the readpst.c source code, and this seems to be a narrower pattern that wouldn't miss any debugging message).

I've also include one line of context.

Comment 3 Ivan Zakharyaschev 2016-08-23 19:53:08 UTC
I believe now that the problem is a more global one:

Some folders contain items of different several types.

I've run readpst -e, and see that in this specific case the first item parsed is probably not an email, whereas the folder type is probably undefined, therefore it is created as mbox, but then the type gets overridden, and calendar items are saved there:

$ ls readpst/root.extensions/mailtst/Recoverable\ Items/Calendar\ Logging/ | head
1.ics
10.ics
100.eml
101.ics
102.eml
103.ics
104.eml
105.ics
106.ics
107.eml
$ 

This is not nice: naming something "mbox", but not saving well-formed emails there.

I believe this specific issue requires a fix.

Globally, I'm thinking about a solution where readpst -r would create several files in the same folder (mbox, calendar) for such cases. This would allow to save everything and not mess things up. (Other modes except for readpst -r are not very attractive for me, because they do no create mbox--the only output format understood completely by dovecot, and hence, doveadm sync (for reading and importing whole user's accounts). MH is not supported by dovecot.)

Here one can see all the folders with different types of items:

$ find readpst/root.extensions/ -type d -print -exec sh -c 'ls "{}" | fgrep . | cut -d. --fields=2 | sort -u' ';'
readpst/root.extensions/
readpst/root.extensions/mailtst
readpst/root.extensions/mailtst/Заметки
readpst/root.extensions/mailtst/МСЭД
eml
readpst/root.extensions/mailtst/Входящие
eml
readpst/root.extensions/mailtst/Входящие/103
eml
readpst/root.extensions/mailtst/Входящие/Миграция ADEX
eml
readpst/root.extensions/mailtst/Входящие/Оперативка
eml
readpst/root.extensions/mailtst/Входящие/СПО - ФСТЭК
eml
readpst/root.extensions/mailtst/Входящие/СУТП
eml
readpst/root.extensions/mailtst/Входящие/Схемы ЛВС
eml
readpst/root.extensions/mailtst/Нежелательная почта
readpst/root.extensions/mailtst/ТТ на согласование
eml
readpst/root.extensions/mailtst/Junk
eml
readpst/root.extensions/mailtst/Задачи
readpst/root.extensions/mailtst/Прочитать и дать ответ!
eml
readpst/root.extensions/mailtst/Прочитать и дать ответ!/Что-то важное
eml
readpst/root.extensions/mailtst/Отправленные
eml
readpst/root.extensions/mailtst/aрхив
eml
readpst/root.extensions/mailtst/Контакты
vcf
readpst/root.extensions/mailtst/Контакты/Recipient Cache
vcf
readpst/root.extensions/mailtst/Черновики
readpst/root.extensions/mailtst/Календарь
ics
readpst/root.extensions/mailtst/Sent
eml
readpst/root.extensions/mailtst/Удаленные
eml
ics
readpst/root.extensions/mailtst/ЕКП
eml
readpst/root.extensions/mailtst/Recoverable Items
readpst/root.extensions/mailtst/Recoverable Items/Deletions
eml
ics
readpst/root.extensions/mailtst/Recoverable Items/Calendar Logging
eml
ics
readpst/root.extensions/mailtst/Предлагаемые контакты
vcf
readpst/root.extensions/mailtst/Ошибки синхронизации
readpst/root.extensions/mailtst/Ошибки синхронизации/Конфликты
eml
$ 

So, the problematic folders are not normal ones:

* Удаленные (means "Deleted" in Russian)
* Recoverable Items/Deletions
* Recoverable Items/Calendar Logging

Comment 4 Ivan Zakharyaschev 2016-08-24 01:16:22 UTC
Created attachment 1193451 [details]
libpst-no-bad-mboxes.patch

Fixed the problem with bad mboxes by the attached patch.

A related question that appeared to me was: in Thunderbird mode, is it OK that zero would be written to .type for such folders? (I have not changed this.)

A check of the result I've made by comparing to the previous output (a big account, around 6G):

-bash-4.3$ diff -rq --exclude='*.log' -Iboundary -Iname -IDTSTAMP root root.nobad  
Files root/mailtst/Recoverable Items/Calendar Logging/mbox and root.nobad/mailtst/Recoverable Items/Calendar Logging/mbox differ
-bash-4.3$ find root.nobad/ -name mbox -print0 | xargs -0 file
root.nobad/mailtst/МСЭД/mbox:                                  ISO-8859 mail text, with very long lines
root.nobad/mailtst/Входящие/Оперативка/mbox:                   UTF-8 Unicode mail text
root.nobad/mailtst/Входящие/Схемы ЛВС/mbox:                    Non-ISO extended-ASCII mail text, with very long lines
root.nobad/mailtst/Входящие/103/mbox:                          UTF-8 Unicode mail text, with very long lines
root.nobad/mailtst/Входящие/mbox:                              ISO-8859 mail text, with very long lines
root.nobad/mailtst/Входящие/СПО - ФСТЭК/mbox:                  Non-ISO extended-ASCII mail text, with very long lines, with LF, NEL line terminators
root.nobad/mailtst/Входящие/Миграция ADEX/mbox:                UTF-8 Unicode mail text, with very long lines
root.nobad/mailtst/Входящие/СУТП/mbox:                         UTF-8 Unicode mail text, with very long lines
root.nobad/mailtst/ТТ на согласование/mbox:                    UTF-8 Unicode mail text, with very long lines
root.nobad/mailtst/Junk/mbox:                                  UTF-8 Unicode mail text, with very long lines
root.nobad/mailtst/Прочитать и дать ответ!/Что-то важное/mbox: Non-ISO extended-ASCII mail text, with very long lines, with LF, NEL line terminators
root.nobad/mailtst/Прочитать и дать ответ!/mbox:               Non-ISO extended-ASCII mail text, with very long lines, with LF, NEL line terminators
root.nobad/mailtst/Отправленные/mbox:                          UTF-8 Unicode HTML document text, with very long lines
root.nobad/mailtst/aрхив/mbox:                                 Non-ISO extended-ASCII mail text, with very long lines
root.nobad/mailtst/Sent/mbox:                                  UTF-8 Unicode text
root.nobad/mailtst/Удаленные/mbox:                             Non-ISO extended-ASCII mail text, with very long lines, with LF, NEL line terminators
root.nobad/mailtst/ЕКП/mbox:                                   UTF-8 Unicode mail text
root.nobad/mailtst/Recoverable Items/Deletions/mbox:           UTF-8 Unicode mail text, with very long lines
root.nobad/mailtst/Recoverable Items/Calendar Logging/mbox:    UTF-8 Unicode mail text
root.nobad/mailtst/Ошибки синхронизации/Конфликты/mbox:        UTF-8 Unicode mail text
-bash-4.3$ 

Everything is fine.

BTW, the previous output seems to have saved the calendars to "root/mailtst/Recoverable Items/Calendar Logging/mbox" -- those ones which are also present as attachments in the new correct output mbox.

Comment 6 Carl Byington 2016-09-05 16:14:14 UTC
fixed.

Comment 7 Jan Kurik 2017-08-15 08:32:02 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 27 development cycle.
Changing version to '27'.

Comment 8 Carl Byington 2017-08-16 15:50:07 UTC
fixed in 0.6.69


Note You need to log in before you can comment on or make changes to this bug.