Bug 1824878 - cryptsetup luksFormat fails with "Cannot wipe header on device"
Summary: cryptsetup luksFormat fails with "Cannot wipe header on device"
Keywords:
Status: CLOSED RAWHIDE
Alias: None
Product: Fedora
Classification: Fedora
Component: cryptsetup
Version: rawhide
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Milan Broz
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-04-16 15:31 UTC by Martin Pitt
Modified: 2020-04-19 18:14 UTC (History)
4 users (show)

Fixed In Version: cryptsetup-2.3.1-3.fc33
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-04-16 19:50:49 UTC
Type: Bug


Attachments (Terms of Use)

Description Martin Pitt 2020-04-16 15:31:15 UTC
Description of problem: In the last few days, Rawhide got a regression with cryptsetup formatting. This got spotted by cockpit's integration tests [1]


Version-Release number of selected component (if applicable):

cryptsetup-2.3.1-2.fc33.x86_64
kernel-core-5.7.0-0.rc1.20200414git8632e9b5645b.1.fc33.x86_64

How reproducible: Always


Steps to Reproduce:
1. Create a blank test device:

modprobe scsi_debug dev_size_mb=50

This should create a new /dev/sdX (check dmesg). The following steps use /dev/sda, but double check so that you don't kill your hard disk!

2. parted -s /dev/sda mktable msdos
3. parted -s /dev/sda mkpart primary ext2 1M 25
4. echo einszweidrei | cryptsetup luksFormat /dev/sda1


Actual results: Fails with

    Cannot wipe header on device /dev/sda1.


which is pretty bogus as this is just a fresh partition on a zero device:

# blkid -p /dev/sda1
/dev/sda1: PART_ENTRY_SCHEME="dos" PART_ENTRY_UUID="7d002ca3-01" PART_ENTRY_TYPE="0x83" PART_ENTRY_NUMBER="1" PART_ENTRY_OFFSET="2048" PART_ENTRY_SIZE="47104" PART_ENTRY_DISK="8:0"

Expected results: cryptsetup works.

The above is what our test does. But the partition isn't even necessary, this fails with an unpartitioned device as well:

# modprobe scsi_debug dev_size_mb=50
# echo einszweidrei | cryptsetup luksFormat /dev/sda
Cannot wipe header on device /dev/sda.

Additional info:

[1] https://bodhi.fedoraproject.org/updates/FEDORA-2020-7730bb45a4

Comment 1 Milan Broz 2020-04-16 16:23:46 UTC
Could you please run the failing command with --debug and post output here? (This will also include version for kernel / scsi_debug module.)

There were no changes in the code. Does it fails for normal disk (I expect sda is scsi_debug device)?
(Also, if you add "sleep 3" before luksFormat, does it still fail?)

Comment 2 Martin Pitt 2020-04-16 18:45:11 UTC
> run the failing command with --debug

# echo einszweidrei | cryptsetup --debug luksFormat /dev/sda
# cryptsetup 2.3.1 processing "cryptsetup --debug luksFormat /dev/sda"
# Running command luksFormat.
# Locking memory.
# Installing SIGINT/SIGTERM handler.
# Unblocking interruption on signal.
# Allocating context for crypt device /dev/sda.
# Trying to open and read device /dev/sda with direct-io.
# Initialising device-mapper backend library.
# STDIN descriptor passphrase entry requested.
# Checking new password using default pwquality settings.
# New password libpwquality score is 50.
# Crypto backend (OpenSSL 1.1.1f FIPS  31 Mar 2020) initialized in cryptsetup library version 2.3.1.
# Detected kernel Linux 5.7.0-0.rc1.20200414git8632e9b5645b.1.fc33.x86_64 x86_64.
# Only 3 active CPUs detected, PBKDF threads decreased from 4 to 3.
# Not enough physical memory detected, PBKDF max memory decreased from 1048576kB to 1002572kB.
# PBKDF argon2i, time_ms 2000 (iterations 0), max_memory_kb 1002572, parallel_threads 3.
# Formatting device /dev/sda as type LUKS2.
# Topology: IO (512/524288), offset = 0; Required alignment is 1048576 bytes.
# Checking if cipher aes-xts-plain64 is usable.
# Using userspace crypto wrapper to access keyslot area.
# Formatting LUKS2 with JSON metadata area 12288 bytes and keyslots area 16744448 bytes.
# Creating new digest 0 (pbkdf2).
# Setting PBKDF2 type key digest 0.
# Running pbkdf2(sha256) benchmark.
# PBKDF benchmark: memory cost = 0, iterations = 546133, threads = 0 (took 60 ms)
# PBKDF benchmark: memory cost = 0, iterations = 587108, threads = 0 (took 893 ms)
# Benchmark returns pbkdf2(sha256) 587108 iterations, 0 memory, 0 threads (for 512-bits key).
# Segment 0 assigned to digest 0.
# Segment "0" is missing "offset" (string) specification.
Cannot wipe header on device /dev/sda.
# Releasing crypt device /dev/sda context.
# Releasing device-mapper backend.
# Closing read only fd for /dev/sda.
# Unlocking memory.
Command failed with code -1 (wrong or missing parameters).

> There were no changes in the code.

Possibly due to the new util-linux? https://bodhi.fedoraproject.org/updates/FEDORA-2020-77d2be2c3a just landed one day ago.

> Does it fails for normal disk (I expect sda is scsi_debug device)?

Yes, I tried that with scsi_debug, as per the description. That's about as close to a "normal disk" that you can get to with emulation. It fails in the same way for a loop device:

dd if=/dev/zero of=/var/tmp/disk bs=1M count=50
losetup -f --show /var/tmp/disk
echo einszweidrei | cryptsetup --debug luksFormat /dev/loop0 

I can try again with attaching another qemu disk, but honestly I don't expect that to look much different -- this part of cryptsetup deals with the block read/write level.

> if you add "sleep 3" before luksFormat, does it still fail?

Oh yes, this is not at all a race condition. You can run them all in one go, or one by one (or repeatedly) in an interactive shell.

I am trying this on the current Fedora rawhide cloud image, so should be easy enough to reproduce.

strace isn't really that insightful either, it's not really doing much aside from stating:

write(1, "# Benchmark returns pbkdf2(sha256) 424868 iterations, 0 memory, 0 threads (for 512-bits key).\n", 94# Benchmark returns pbkdf2(sha256) 424868 iterations, 0 memory, 0 threads (for 512-bits key).
) = 94
write(1, "# Segment 0 assigned to digest 0.\n", 34# Segment 0 assigned to digest 0.
) = 34
openat(AT_FDCWD, "/dev/sda", O_RDONLY)  = 8
fstat(8, {st_dev=makedev(0, 0x5), st_ino=39772, st_mode=S_IFBLK|0660, st_nlink=1, st_uid=0, st_gid=6, st_blksize=4096, st_blocks=0, st_rdev=makedev(0x8, 0), st_atime=1587062613 /* 2020-04-16T18:43:33.340451901+0000 */, st_atime_nsec=340451901, st_mtime=1587062613 /* 2020-04-16T18:43:33.340451901+0000 */, st_mtime_nsec=340451901, st_ctime=1587062613 /* 2020-04-16T18:43:33.340451901+0000 */, st_ctime_nsec=340451901}) = 0
ioctl(8, BLKGETSIZE64, [52428800])      = 0
close(8)                                = 0
write(1, "# Segment \"0\" is missing \"offset\" (string) specification.\n", 58# Segment "0" is missing "offset" (string) specification.
) = 58
write(2, "Cannot wipe header on device /dev/sda.\n", 39Cannot wipe header on device /dev/sda.
) = 39
write(1, "# Releasing crypt device /dev/sda context.\n", 43# Releasing crypt device /dev/sda context.
) = 43
write(1, "# Releasing device-mapper backend.\n", 35# Releasing device-mapper backend.
) = 35
write(1, "# Closing read only fd for /dev/sda.\n", 37# Closing read only fd for /dev/sda.
) = 37
close(5)                                = 0

Comment 3 Martin Pitt 2020-04-16 18:53:52 UTC
> Possibly due to the new util-linux? https://bodhi.fedoraproject.org/updates/FEDORA-2020-77d2be2c3a just landed one day ago.

Nack, my VM still actually has util-linux-2.35.1-7.fc33.x86_64, and that already landed 23 days ago (and this was working last week still). I ran `dnf update` now which pulled in some new bits, including the latest util-linux:

Upgrading:
 glibc                                          x86_64                       2.31.9000-9.fc33                           koji-f33-build                       3.5 M
 glibc-common                                   x86_64                       2.31.9000-9.fc33                           koji-f33-build                       1.8 M
 glibc-langpack-en                              x86_64                       2.31.9000-9.fc33                           koji-f33-build                       658 k
 glusterfs                                      x86_64                       7.5-1.fc33                                 koji-f33-build                       657 k
 glusterfs-api                                  x86_64                       7.5-1.fc33                                 koji-f33-build                        95 k
 glusterfs-cli                                  x86_64                       7.5-1.fc33                                 koji-f33-build                       188 k
 glusterfs-client-xlators                       x86_64                       7.5-1.fc33                                 koji-f33-build                       840 k
 glusterfs-fuse                                 x86_64                       7.5-1.fc33                                 koji-f33-build                       145 k
 glusterfs-libs                                 x86_64                       7.5-1.fc33                                 koji-f33-build                       433 k
 libblkid                                       x86_64                       2.35.1-9.fc33                              koji-f33-build                       152 k
 libfdisk                                       x86_64                       2.35.1-9.fc33                              koji-f33-build                       204 k
 libfido2                                       x86_64                       1.4.0-1.fc33                               koji-f33-build                        67 k
 libmount                                       x86_64                       2.35.1-9.fc33                              koji-f33-build                       179 k
 libsmartcols                                   x86_64                       2.35.1-9.fc33                              koji-f33-build                       120 k
 libssh                                         x86_64                       0.9.4-2.fc33                               koji-f33-build                       215 k
 libssh-config                                  noarch                       0.9.4-2.fc33                               koji-f33-build                        11 k
 libuuid                                        x86_64                       2.35.1-9.fc33                              koji-f33-build                        28 k
 pcre2                                          x86_64                       10.35-0.1.RC1.fc33                         koji-f33-build                       233 k
 pcre2-syntax                                   noarch                       10.35-0.1.RC1.fc33                         koji-f33-build                       141 k
 pcre2-utf32                                    x86_64                       10.35-0.1.RC1.fc33                         koji-f33-build                       199 k
 perl-Scalar-List-Utils                         x86_64                       3:1.55-441.fc33                            koji-f33-build                        70 k
 python3-jinja2                                 noarch                       2.11.2-1.fc33                              koji-f33-build                       490 k
 python3-unbound                                x86_64                       1.10.0-2.fc33                              koji-f33-build                       100 k
 selinux-policy                                 noarch                       3.14.6-12.fc33                             koji-f33-build                       112 k
 selinux-policy-targeted                        noarch                       3.14.6-12.fc33                             koji-f33-build                       8.1 M
 unbound-libs                                   x86_64                       1.10.0-2.fc33                              koji-f33-build                       531 k
 util-linux                                     x86_64                       2.35.1-9.fc33                              koji-f33-build                       2.7 M

But that does not make any difference, still the exact same error.

For comparison, this is an strace from Fedora 31 (on 32 it works as well):

7781  write(1, "# Benchmark returns pbkdf2(sha256) 714288 iterations, 0 memory, 0 threads (for 512-bits key)
.\n", 94) = 94
7781  write(1, "# Segment 0 assigned to digest 0.\n", 34) = 34
7781  openat(AT_FDCWD, "/dev/sda", O_RDONLY) = 6
7781  fstat(6, {st_dev=makedev(0, 0x6), st_ino=39914, st_mode=S_IFBLK|0660, st_nlink=1, st_uid=0, st_gid=6, st_blksize=4096, st_blocks=0, st_rdev=makedev(0x8, 0), st_atime=1587063079 /* 2020-04-16T14:51:19.576100675-0400 */, st_atime_nsec=576100675, st_mtime=1587063079 /* 2020-04-16T14:51:19.576100675-0400 */, st_mtime_nsec=576100675, st_ctime=1587063079 /* 2020-04-16T14:51:19.576100675-0400 */, st_ctime_nsec=576100675}) = 0
7781  ioctl(6, BLKGETSIZE64, [52428800]) = 0
7781  close(6)                          = 0
7781  write(1, "# Wiping LUKS areas (0x000000 - 0x1000000) with zeroes.\n", 56) = 56
7781  openat(AT_FDCWD, "/dev/sda", O_RDWR|O_DIRECT) = 6

So from a syscall POV, it looks pretty well exactly the same, except that in rawhide it complains about this segment/missing offset thing. But that's internal interpretation of the stat data, apparently not something that it even reads from the disk?

Comment 4 Milan Broz 2020-04-16 19:05:57 UTC
This is caused by rebuild for json-c and wrong patch. (I had NO IDEA this is going to rawhide already!)
 
I think we fixed this upstream already. The whole problem is that we store 64bit int as string, while new json-c uses integer in JSON.

This must be fixed ASAP, I'll do rebuild with the proper patch.

Thanks!

Comment 5 Milan Broz 2020-04-16 19:50:49 UTC
I built cryptsetup-2.3.1-3.fc33 that should fix the issue. Unfortunately we have to wait for another rawhide compose.

If you could confirm  that it fixes the issue, it would be nice (I tried one luksFormat and it works).

But this was apparent misuse of proven packager privileges when maintainers of the packages were not informed about the change.

Also, we have gating, and apparently it did nothing to stop this (many tests should fail in this config). 

Sigh. Anyway, thanks for the report!

Comment 6 Martin Pitt 2020-04-16 20:50:48 UTC
@Milan: Thanks! I grabbed the built rpms, updated my VM, and confirm that it's fixed now.

> Also, we have gating, and apparently it did nothing to stop this (many tests should fail in this config). 

I see that the tier-0.functional test in https://bodhi.fedoraproject.org/updates/FEDORA-2020-1d5c71159c failed, but clicking on it just gives me an empty jenkins view. However, you notice that "Test Gating" says "no tests are required", so you have tests, but not gating.

When you enable gating, you should instead see "n tests passed" (or failed), like in e. g. https://bodhi.fedoraproject.org/updates/FEDORA-2020-35ed246514

E. g. in Cockpit we use this gating file: https://src.fedoraproject.org/rpms/cockpit/blob/master/f/gating.yaml
But you don't have a gating.yml at all in https://src.fedoraproject.org/rpms/cryptsetup/tree/master.
The docs are here: https://docs.pagure.org/greenwave/policies.html , but of course feel free to grab and adjust cockpit's file. I enabled a few additional rpmgrill etc. tests as required as well, so that we get notified on regressions (unfortunately some tests always fail, so I can't gate on all of them.)

Comment 7 Martin Pitt 2020-04-16 20:51:07 UTC
Oh, and forgot -- thanks for the quick fix!

Comment 8 Björn 'besser82' Esser 2020-04-16 20:54:33 UTC
(In reply to Milan Broz from comment #5)
> But this was apparent misuse of proven packager privileges when maintainers
> of the packages were not informed about the change.

Just to clarify things a bit here:

I sent that particular patch upstream [1] indicating a change about json-c and it was merged by you.  By having the patch accepted upstream, it usually is safe to assume the patch is correct.  Another upstream maintainer found out the patch is not correct and commited a fix, but did not communicate that back in the merge request.
When I added the patch to the Fedora package, you as the downstream maintainer recieved an email from fedmsg about me having pushed a commit to the cryptsetup package; another email was sent to you by fedmsg after I had built the package.  Up to now I did not get any feedback about a problem with my patch has been identified and fixed upstream, otherwise I would have had the package fixed *immediately* after getting such information.  Communication is not a one-way road, but obviously it seems to be easier to rant at people, and accuse them to use their powers in an abusive way…


[1]  https://gitlab.com/cryptsetup/cryptsetup/-/merge_requests/88

Comment 9 Milan Broz 2020-04-16 21:20:49 UTC
(In reply to Björn 'besser82' Esser from comment #8)
> (In reply to Milan Broz from comment #5)
> > But this was apparent misuse of proven packager privileges when maintainers
> > of the packages were not informed about the change.
> 
> Just to clarify things a bit here:
> 
> I sent that particular patch upstream [1] indicating a change about json-c
> and it was merged by you.  By having the patch accepted upstream, it usually
> is safe to assume the patch is correct.  Another upstream maintainer found
> out the patch is not correct and commited a fix, but did not communicate
> that back in the merge request.
> When I added the patch to the Fedora package, you as the downstream
> maintainer recieved an email from fedmsg about me having pushed a commit to
> the cryptsetup package; another email was sent to you by fedmsg after I had
> built the package.  Up to now I did not get any feedback about a problem
> with my patch has been identified and fixed upstream, otherwise I would have
> had the package fixed *immediately* after getting such information. 
> Communication is not a one-way road, but obviously it seems to be easier to
> rant at people, and accuse them to use their powers in an abusive way…
> 
> 
> [1]  https://gitlab.com/cryptsetup/cryptsetup/-/merge_requests/88

I did not receive any messages from fedmsg. Perhaps misconfiguration, dunno. You should send mail anyway or at least send mail to fedora-devel with the plan.
Fedmsg is too late anyway - you should inform us in advance, not commit it directly.

Ondra commited change, but it was based on our discussion, I merged it with the plan we need to run more tests.
My plan was to release 2.3.2 and push to rawhide (2-3 weeks, depends on translators).

You should never ever submit patch to critical path package without approval of maintainers.
(Maybe only if they did not respond in some reasonable time.)

And now you are trying to say that it is problem on my side?

Comment 10 Milan Broz 2020-04-16 21:39:16 UTC
Anyway, one technical note - explanation what happened here and why I think json-c approach could be problematic in future.

Json-c now defines new uint64 type, stored as integer. But according to RFC 8259 (JSON):
"... numbers that are integers and are in the range [-(2**53)+1, (2**53)-1] are interoperable in the sense that implementations will agree exactly on their numeric values."

So, I am not sure this can be compatible with some other JSON parsers, that use this 53bit integers limitation (this is based on double precision stored in 53bits apparently).
(I mean, imagine you store UINT64_MAX in json-c and some other parses will expect max 53bit integer... no idea what happens. I think it is undefined.)

Cryptsetup need to store unsigned 64bit integers (kernel stores device size in uint64) and our approach was to use string for the number, and convert it internally (with 64bit range check).

Unfortunately, we used the same name for the function as new json-c - it looks almost correct :-) Just we store "1234" but json-c 1234 (plain number), and our internal validate function expects string there.
(So, in the end, it was broken, but libcryptsetup validation correctly stopped it to store this on disk.)

So there were two reasons for the additional upstream patch: 1) we expect string, so we cannot use json-c api for uint64; 2) I am not sure how the full 63 bit range is handled in other json parsers. => we revert to our wrappers for 64bit parsing, it is safe for now.

Comment 11 Björn 'besser82' Esser 2020-04-19 18:14:57 UTC
(In reply to Milan Broz from comment #9)
> And now you are trying to say that it is problem on my side?

First of all srorry for the late reply.  No, that wasn't my intention at all. Sorry for that, too.

All I wanted to point out was the circustance, if both sides are acting on implicit assumptions, things are likely to go wrong.  My mistake was to imply the upstream PR was information enough for you to see there will be some builds on Rawhide around json-c going on in near future.

Anyways, I still don't believe it is the right way in situations like this (communication with implied intentions on both sides) to point the finger at someone else and call them to be acting abusive, as there are always both (or even more) parties involved, who could have communicated better (read: express their intentions in a more precise way) with each other.

For my part: I've learned from this encounter in several ways and strive for acting better in the future.


Note You need to log in before you can comment on or make changes to this bug.