etcd failed to build from source in Fedora rawhide/f38 https://koji.fedoraproject.org/koji/taskinfo?taskID=96325022 For details on the mass rebuild see: https://fedoraproject.org/wiki/Fedora_38_Mass_Rebuild Please fix etcd at your earliest convenience and set the bug's status to ASSIGNED when you start fixing it. If the bug remains in NEW state for 8 weeks, etcd will be orphaned. Before branching of Fedora 39, etcd will be retired, if it still fails to build. For more details on the FTBFS policy, please visit: https://docs.fedoraproject.org/en-US/fesco/Fails_to_build_from_source_Fails_to_install/
The rawhide RPM etcd-3.5.5-3.fc38.x86_64 builds fine on F38 on x86_64, just like in Koji. Failure only happens on other architectures. The initial failure was on i686, and then on aarch64 (scratch build). diff -u good bad: -ok go.etcd.io/etcd/pkg/v3/idutil 0.001s +ok go.etcd.io/etcd/pkg/v3/idutil 0.003s go.etcd.io/etcd/pkg/v3/idutil PASS -ok go.etcd.io/etcd/pkg/v3/idutil 0.001s -go.etcd.io/etcd/pkg/v3/ioutil -PASS -ok go.etcd.io/etcd/pkg/v3/ioutil 0.002s +ok go.etcd.io/etcd/pkg/v3/idutil 0.003s go.etcd.io/etcd/pkg/v3/ioutil -PASS -ok go.etcd.io/etcd/pkg/v3/ioutil 0.002s -go.etcd.io/etcd/pkg/v3/netutil -{"level":"info","msg":"resolved URL Host","url":"http://infra0.example.com:4001","host":"infra0.example.com:4001","resolved-addr":"10.0.1.10:4001"} ...................... +--- FAIL: TestPageWriterRandom (0.00s) + pagewriter_test.go:41: got 2385 bytes pending, expected less than 128 bytes +FAIL +exit status 1 +FAIL go.etcd.io/etcd/pkg/v3/ioutil 0.006s +error: Bad exit status from /var/tmp/rpm-tmp.OCzuL0 (%check) +RPM build errors: + Bad exit status from /var/tmp/rpm-tmp.OCzuL0 (%check)
OK, I see what's happening. The TestPageWriterRandom has 2 problems. Problem 1: it uses rand.Intn(), but forgets rand.Seed(). As a result, it passes or fails depending on how the platform sets up the module. This is why it passes on x86_64: the platform happens to seed the rand module in such a way that the sequence makes the buggy test to work. If I adds rand.Seed(time.Now().UnixNano()), the test begins to fail on my laptop in the exact way it fails in Koji on aarch64. Problem 2: The test is obviously incorrect, even though it existed with no change since it was committed in 2016, commit 2943bf908606ccbfaeda3bdf882a11a0138a0502. The condition it tests obviously may fail. The buffer can contain several pages (especially if they're this small). If the very first write is longer several pages, if len(p)+pw.bufferedBytes <= pw.bufWatermarkBytes { } triggers, and Write returns, while retaining everything written in the buffer. At that point, the as-tested condition is obviously violated, as cw.writeBytes is zero. I'm too lazy to construct a long scenario, but it obviously can happen on the last write too (for example, if the previous 4045 writes reset the buffer to zero by accident). The tested condition needs to change to a valid one.
See issue https://github.com/etcd-io/etcd/issues/16255 commit https://github.com/etcd-io/etcd/commit/fddd1add52b33649a99d7f756404924138344a10 I think the fix is incomplete, because it does not address the seeding of the rand module. However, it should be sufficient to unblock Koji.
This bug appears to have been reported against 'rawhide' during the Fedora Linux 39 development cycle. Changing version to 39.