Bug 1824878
| Summary: | cryptsetup luksFormat fails with "Cannot wipe header on device" | ||
|---|---|---|---|
| Product: | [Fedora] Fedora | Reporter: | Martin Pitt <mpitt> |
| Component: | cryptsetup | Assignee: | Milan Broz <gmazyland> |
| Status: | CLOSED RAWHIDE | QA Contact: | Fedora Extras Quality Assurance <extras-qa> |
| Severity: | unspecified | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | rawhide | CC: | agk, besser82, gmazyland, okozina |
| Target Milestone: | --- | Keywords: | Regression |
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | cryptsetup-2.3.1-3.fc33 | Doc Type: | If docs needed, set a value |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2020-04-16 19:50:49 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
Could you please run the failing command with --debug and post output here? (This will also include version for kernel / scsi_debug module.) There were no changes in the code. Does it fails for normal disk (I expect sda is scsi_debug device)? (Also, if you add "sleep 3" before luksFormat, does it still fail?) > run the failing command with --debug # echo einszweidrei | cryptsetup --debug luksFormat /dev/sda # cryptsetup 2.3.1 processing "cryptsetup --debug luksFormat /dev/sda" # Running command luksFormat. # Locking memory. # Installing SIGINT/SIGTERM handler. # Unblocking interruption on signal. # Allocating context for crypt device /dev/sda. # Trying to open and read device /dev/sda with direct-io. # Initialising device-mapper backend library. # STDIN descriptor passphrase entry requested. # Checking new password using default pwquality settings. # New password libpwquality score is 50. # Crypto backend (OpenSSL 1.1.1f FIPS 31 Mar 2020) initialized in cryptsetup library version 2.3.1. # Detected kernel Linux 5.7.0-0.rc1.20200414git8632e9b5645b.1.fc33.x86_64 x86_64. # Only 3 active CPUs detected, PBKDF threads decreased from 4 to 3. # Not enough physical memory detected, PBKDF max memory decreased from 1048576kB to 1002572kB. # PBKDF argon2i, time_ms 2000 (iterations 0), max_memory_kb 1002572, parallel_threads 3. # Formatting device /dev/sda as type LUKS2. # Topology: IO (512/524288), offset = 0; Required alignment is 1048576 bytes. # Checking if cipher aes-xts-plain64 is usable. # Using userspace crypto wrapper to access keyslot area. # Formatting LUKS2 with JSON metadata area 12288 bytes and keyslots area 16744448 bytes. # Creating new digest 0 (pbkdf2). # Setting PBKDF2 type key digest 0. # Running pbkdf2(sha256) benchmark. # PBKDF benchmark: memory cost = 0, iterations = 546133, threads = 0 (took 60 ms) # PBKDF benchmark: memory cost = 0, iterations = 587108, threads = 0 (took 893 ms) # Benchmark returns pbkdf2(sha256) 587108 iterations, 0 memory, 0 threads (for 512-bits key). # Segment 0 assigned to digest 0. # Segment "0" is missing "offset" (string) specification. Cannot wipe header on device /dev/sda. # Releasing crypt device /dev/sda context. # Releasing device-mapper backend. # Closing read only fd for /dev/sda. # Unlocking memory. Command failed with code -1 (wrong or missing parameters). > There were no changes in the code. Possibly due to the new util-linux? https://bodhi.fedoraproject.org/updates/FEDORA-2020-77d2be2c3a just landed one day ago. > Does it fails for normal disk (I expect sda is scsi_debug device)? Yes, I tried that with scsi_debug, as per the description. That's about as close to a "normal disk" that you can get to with emulation. It fails in the same way for a loop device: dd if=/dev/zero of=/var/tmp/disk bs=1M count=50 losetup -f --show /var/tmp/disk echo einszweidrei | cryptsetup --debug luksFormat /dev/loop0 I can try again with attaching another qemu disk, but honestly I don't expect that to look much different -- this part of cryptsetup deals with the block read/write level. > if you add "sleep 3" before luksFormat, does it still fail? Oh yes, this is not at all a race condition. You can run them all in one go, or one by one (or repeatedly) in an interactive shell. I am trying this on the current Fedora rawhide cloud image, so should be easy enough to reproduce. strace isn't really that insightful either, it's not really doing much aside from stating: write(1, "# Benchmark returns pbkdf2(sha256) 424868 iterations, 0 memory, 0 threads (for 512-bits key).\n", 94# Benchmark returns pbkdf2(sha256) 424868 iterations, 0 memory, 0 threads (for 512-bits key). ) = 94 write(1, "# Segment 0 assigned to digest 0.\n", 34# Segment 0 assigned to digest 0. ) = 34 openat(AT_FDCWD, "/dev/sda", O_RDONLY) = 8 fstat(8, {st_dev=makedev(0, 0x5), st_ino=39772, st_mode=S_IFBLK|0660, st_nlink=1, st_uid=0, st_gid=6, st_blksize=4096, st_blocks=0, st_rdev=makedev(0x8, 0), st_atime=1587062613 /* 2020-04-16T18:43:33.340451901+0000 */, st_atime_nsec=340451901, st_mtime=1587062613 /* 2020-04-16T18:43:33.340451901+0000 */, st_mtime_nsec=340451901, st_ctime=1587062613 /* 2020-04-16T18:43:33.340451901+0000 */, st_ctime_nsec=340451901}) = 0 ioctl(8, BLKGETSIZE64, [52428800]) = 0 close(8) = 0 write(1, "# Segment \"0\" is missing \"offset\" (string) specification.\n", 58# Segment "0" is missing "offset" (string) specification. ) = 58 write(2, "Cannot wipe header on device /dev/sda.\n", 39Cannot wipe header on device /dev/sda. ) = 39 write(1, "# Releasing crypt device /dev/sda context.\n", 43# Releasing crypt device /dev/sda context. ) = 43 write(1, "# Releasing device-mapper backend.\n", 35# Releasing device-mapper backend. ) = 35 write(1, "# Closing read only fd for /dev/sda.\n", 37# Closing read only fd for /dev/sda. ) = 37 close(5) = 0 > Possibly due to the new util-linux? https://bodhi.fedoraproject.org/updates/FEDORA-2020-77d2be2c3a just landed one day ago.
Nack, my VM still actually has util-linux-2.35.1-7.fc33.x86_64, and that already landed 23 days ago (and this was working last week still). I ran `dnf update` now which pulled in some new bits, including the latest util-linux:
Upgrading:
glibc x86_64 2.31.9000-9.fc33 koji-f33-build 3.5 M
glibc-common x86_64 2.31.9000-9.fc33 koji-f33-build 1.8 M
glibc-langpack-en x86_64 2.31.9000-9.fc33 koji-f33-build 658 k
glusterfs x86_64 7.5-1.fc33 koji-f33-build 657 k
glusterfs-api x86_64 7.5-1.fc33 koji-f33-build 95 k
glusterfs-cli x86_64 7.5-1.fc33 koji-f33-build 188 k
glusterfs-client-xlators x86_64 7.5-1.fc33 koji-f33-build 840 k
glusterfs-fuse x86_64 7.5-1.fc33 koji-f33-build 145 k
glusterfs-libs x86_64 7.5-1.fc33 koji-f33-build 433 k
libblkid x86_64 2.35.1-9.fc33 koji-f33-build 152 k
libfdisk x86_64 2.35.1-9.fc33 koji-f33-build 204 k
libfido2 x86_64 1.4.0-1.fc33 koji-f33-build 67 k
libmount x86_64 2.35.1-9.fc33 koji-f33-build 179 k
libsmartcols x86_64 2.35.1-9.fc33 koji-f33-build 120 k
libssh x86_64 0.9.4-2.fc33 koji-f33-build 215 k
libssh-config noarch 0.9.4-2.fc33 koji-f33-build 11 k
libuuid x86_64 2.35.1-9.fc33 koji-f33-build 28 k
pcre2 x86_64 10.35-0.1.RC1.fc33 koji-f33-build 233 k
pcre2-syntax noarch 10.35-0.1.RC1.fc33 koji-f33-build 141 k
pcre2-utf32 x86_64 10.35-0.1.RC1.fc33 koji-f33-build 199 k
perl-Scalar-List-Utils x86_64 3:1.55-441.fc33 koji-f33-build 70 k
python3-jinja2 noarch 2.11.2-1.fc33 koji-f33-build 490 k
python3-unbound x86_64 1.10.0-2.fc33 koji-f33-build 100 k
selinux-policy noarch 3.14.6-12.fc33 koji-f33-build 112 k
selinux-policy-targeted noarch 3.14.6-12.fc33 koji-f33-build 8.1 M
unbound-libs x86_64 1.10.0-2.fc33 koji-f33-build 531 k
util-linux x86_64 2.35.1-9.fc33 koji-f33-build 2.7 M
But that does not make any difference, still the exact same error.
For comparison, this is an strace from Fedora 31 (on 32 it works as well):
7781 write(1, "# Benchmark returns pbkdf2(sha256) 714288 iterations, 0 memory, 0 threads (for 512-bits key)
.\n", 94) = 94
7781 write(1, "# Segment 0 assigned to digest 0.\n", 34) = 34
7781 openat(AT_FDCWD, "/dev/sda", O_RDONLY) = 6
7781 fstat(6, {st_dev=makedev(0, 0x6), st_ino=39914, st_mode=S_IFBLK|0660, st_nlink=1, st_uid=0, st_gid=6, st_blksize=4096, st_blocks=0, st_rdev=makedev(0x8, 0), st_atime=1587063079 /* 2020-04-16T14:51:19.576100675-0400 */, st_atime_nsec=576100675, st_mtime=1587063079 /* 2020-04-16T14:51:19.576100675-0400 */, st_mtime_nsec=576100675, st_ctime=1587063079 /* 2020-04-16T14:51:19.576100675-0400 */, st_ctime_nsec=576100675}) = 0
7781 ioctl(6, BLKGETSIZE64, [52428800]) = 0
7781 close(6) = 0
7781 write(1, "# Wiping LUKS areas (0x000000 - 0x1000000) with zeroes.\n", 56) = 56
7781 openat(AT_FDCWD, "/dev/sda", O_RDWR|O_DIRECT) = 6
So from a syscall POV, it looks pretty well exactly the same, except that in rawhide it complains about this segment/missing offset thing. But that's internal interpretation of the stat data, apparently not something that it even reads from the disk?
This is caused by rebuild for json-c and wrong patch. (I had NO IDEA this is going to rawhide already!) I think we fixed this upstream already. The whole problem is that we store 64bit int as string, while new json-c uses integer in JSON. This must be fixed ASAP, I'll do rebuild with the proper patch. Thanks! I built cryptsetup-2.3.1-3.fc33 that should fix the issue. Unfortunately we have to wait for another rawhide compose. If you could confirm that it fixes the issue, it would be nice (I tried one luksFormat and it works). But this was apparent misuse of proven packager privileges when maintainers of the packages were not informed about the change. Also, we have gating, and apparently it did nothing to stop this (many tests should fail in this config). Sigh. Anyway, thanks for the report! @Milan: Thanks! I grabbed the built rpms, updated my VM, and confirm that it's fixed now. > Also, we have gating, and apparently it did nothing to stop this (many tests should fail in this config). I see that the tier-0.functional test in https://bodhi.fedoraproject.org/updates/FEDORA-2020-1d5c71159c failed, but clicking on it just gives me an empty jenkins view. However, you notice that "Test Gating" says "no tests are required", so you have tests, but not gating. When you enable gating, you should instead see "n tests passed" (or failed), like in e. g. https://bodhi.fedoraproject.org/updates/FEDORA-2020-35ed246514 E. g. in Cockpit we use this gating file: https://src.fedoraproject.org/rpms/cockpit/blob/master/f/gating.yaml But you don't have a gating.yml at all in https://src.fedoraproject.org/rpms/cryptsetup/tree/master. The docs are here: https://docs.pagure.org/greenwave/policies.html , but of course feel free to grab and adjust cockpit's file. I enabled a few additional rpmgrill etc. tests as required as well, so that we get notified on regressions (unfortunately some tests always fail, so I can't gate on all of them.) Oh, and forgot -- thanks for the quick fix! (In reply to Milan Broz from comment #5) > But this was apparent misuse of proven packager privileges when maintainers > of the packages were not informed about the change. Just to clarify things a bit here: I sent that particular patch upstream [1] indicating a change about json-c and it was merged by you. By having the patch accepted upstream, it usually is safe to assume the patch is correct. Another upstream maintainer found out the patch is not correct and commited a fix, but did not communicate that back in the merge request. When I added the patch to the Fedora package, you as the downstream maintainer recieved an email from fedmsg about me having pushed a commit to the cryptsetup package; another email was sent to you by fedmsg after I had built the package. Up to now I did not get any feedback about a problem with my patch has been identified and fixed upstream, otherwise I would have had the package fixed *immediately* after getting such information. Communication is not a one-way road, but obviously it seems to be easier to rant at people, and accuse them to use their powers in an abusive way… [1] https://gitlab.com/cryptsetup/cryptsetup/-/merge_requests/88 (In reply to Björn 'besser82' Esser from comment #8) > (In reply to Milan Broz from comment #5) > > But this was apparent misuse of proven packager privileges when maintainers > > of the packages were not informed about the change. > > Just to clarify things a bit here: > > I sent that particular patch upstream [1] indicating a change about json-c > and it was merged by you. By having the patch accepted upstream, it usually > is safe to assume the patch is correct. Another upstream maintainer found > out the patch is not correct and commited a fix, but did not communicate > that back in the merge request. > When I added the patch to the Fedora package, you as the downstream > maintainer recieved an email from fedmsg about me having pushed a commit to > the cryptsetup package; another email was sent to you by fedmsg after I had > built the package. Up to now I did not get any feedback about a problem > with my patch has been identified and fixed upstream, otherwise I would have > had the package fixed *immediately* after getting such information. > Communication is not a one-way road, but obviously it seems to be easier to > rant at people, and accuse them to use their powers in an abusive way… > > > [1] https://gitlab.com/cryptsetup/cryptsetup/-/merge_requests/88 I did not receive any messages from fedmsg. Perhaps misconfiguration, dunno. You should send mail anyway or at least send mail to fedora-devel with the plan. Fedmsg is too late anyway - you should inform us in advance, not commit it directly. Ondra commited change, but it was based on our discussion, I merged it with the plan we need to run more tests. My plan was to release 2.3.2 and push to rawhide (2-3 weeks, depends on translators). You should never ever submit patch to critical path package without approval of maintainers. (Maybe only if they did not respond in some reasonable time.) And now you are trying to say that it is problem on my side? Anyway, one technical note - explanation what happened here and why I think json-c approach could be problematic in future. Json-c now defines new uint64 type, stored as integer. But according to RFC 8259 (JSON): "... numbers that are integers and are in the range [-(2**53)+1, (2**53)-1] are interoperable in the sense that implementations will agree exactly on their numeric values." So, I am not sure this can be compatible with some other JSON parsers, that use this 53bit integers limitation (this is based on double precision stored in 53bits apparently). (I mean, imagine you store UINT64_MAX in json-c and some other parses will expect max 53bit integer... no idea what happens. I think it is undefined.) Cryptsetup need to store unsigned 64bit integers (kernel stores device size in uint64) and our approach was to use string for the number, and convert it internally (with 64bit range check). Unfortunately, we used the same name for the function as new json-c - it looks almost correct :-) Just we store "1234" but json-c 1234 (plain number), and our internal validate function expects string there. (So, in the end, it was broken, but libcryptsetup validation correctly stopped it to store this on disk.) So there were two reasons for the additional upstream patch: 1) we expect string, so we cannot use json-c api for uint64; 2) I am not sure how the full 63 bit range is handled in other json parsers. => we revert to our wrappers for 64bit parsing, it is safe for now. (In reply to Milan Broz from comment #9) > And now you are trying to say that it is problem on my side? First of all srorry for the late reply. No, that wasn't my intention at all. Sorry for that, too. All I wanted to point out was the circustance, if both sides are acting on implicit assumptions, things are likely to go wrong. My mistake was to imply the upstream PR was information enough for you to see there will be some builds on Rawhide around json-c going on in near future. Anyways, I still don't believe it is the right way in situations like this (communication with implied intentions on both sides) to point the finger at someone else and call them to be acting abusive, as there are always both (or even more) parties involved, who could have communicated better (read: express their intentions in a more precise way) with each other. For my part: I've learned from this encounter in several ways and strive for acting better in the future. |
Description of problem: In the last few days, Rawhide got a regression with cryptsetup formatting. This got spotted by cockpit's integration tests [1] Version-Release number of selected component (if applicable): cryptsetup-2.3.1-2.fc33.x86_64 kernel-core-5.7.0-0.rc1.20200414git8632e9b5645b.1.fc33.x86_64 How reproducible: Always Steps to Reproduce: 1. Create a blank test device: modprobe scsi_debug dev_size_mb=50 This should create a new /dev/sdX (check dmesg). The following steps use /dev/sda, but double check so that you don't kill your hard disk! 2. parted -s /dev/sda mktable msdos 3. parted -s /dev/sda mkpart primary ext2 1M 25 4. echo einszweidrei | cryptsetup luksFormat /dev/sda1 Actual results: Fails with Cannot wipe header on device /dev/sda1. which is pretty bogus as this is just a fresh partition on a zero device: # blkid -p /dev/sda1 /dev/sda1: PART_ENTRY_SCHEME="dos" PART_ENTRY_UUID="7d002ca3-01" PART_ENTRY_TYPE="0x83" PART_ENTRY_NUMBER="1" PART_ENTRY_OFFSET="2048" PART_ENTRY_SIZE="47104" PART_ENTRY_DISK="8:0" Expected results: cryptsetup works. The above is what our test does. But the partition isn't even necessary, this fails with an unpartitioned device as well: # modprobe scsi_debug dev_size_mb=50 # echo einszweidrei | cryptsetup luksFormat /dev/sda Cannot wipe header on device /dev/sda. Additional info: [1] https://bodhi.fedoraproject.org/updates/FEDORA-2020-7730bb45a4