Bug 2171486 - etcd: FTBFS in Fedora rawhide/f38
Summary: etcd: FTBFS in Fedora rawhide/f38
Keywords:
Status: NEW
Alias: None
Product: Fedora
Classification: Fedora
Component: etcd
Version: 39
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Jan Chaloupka
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks: F38FTBFS
TreeView+ depends on / blocked
 
Reported: 2023-02-20 11:47 UTC by Fedora Release Engineering
Modified: 2023-08-16 07:07 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:
Type: ---
Embargoed:


Attachments (Terms of Use)

Description Fedora Release Engineering 2023-02-20 11:47:37 UTC
etcd failed to build from source in Fedora rawhide/f38

https://koji.fedoraproject.org/koji/taskinfo?taskID=96325022


For details on the mass rebuild see:

https://fedoraproject.org/wiki/Fedora_38_Mass_Rebuild
Please fix etcd at your earliest convenience and set the bug's status to
ASSIGNED when you start fixing it. If the bug remains in NEW state for 8 weeks,
etcd will be orphaned. Before branching of Fedora 39,
etcd will be retired, if it still fails to build.

For more details on the FTBFS policy, please visit:
https://docs.fedoraproject.org/en-US/fesco/Fails_to_build_from_source_Fails_to_install/

Comment 1 Pete Zaitcev 2023-07-13 05:40:14 UTC
The rawhide RPM etcd-3.5.5-3.fc38.x86_64 builds fine on F38 on x86_64,
just like in Koji. Failure only happens on other architectures.
The initial failure was on i686, and then on aarch64 (scratch build).

diff -u good bad:

-ok  	go.etcd.io/etcd/pkg/v3/idutil	0.001s
+ok  	go.etcd.io/etcd/pkg/v3/idutil	0.003s
 go.etcd.io/etcd/pkg/v3/idutil
 PASS
-ok  	go.etcd.io/etcd/pkg/v3/idutil	0.001s
-go.etcd.io/etcd/pkg/v3/ioutil
-PASS
-ok  	go.etcd.io/etcd/pkg/v3/ioutil	0.002s
+ok  	go.etcd.io/etcd/pkg/v3/idutil	0.003s
 go.etcd.io/etcd/pkg/v3/ioutil
-PASS
-ok  	go.etcd.io/etcd/pkg/v3/ioutil	0.002s
-go.etcd.io/etcd/pkg/v3/netutil
-{"level":"info","msg":"resolved URL Host","url":"http://infra0.example.com:4001","host":"infra0.example.com:4001","resolved-addr":"10.0.1.10:4001"}
......................
+--- FAIL: TestPageWriterRandom (0.00s)
+    pagewriter_test.go:41: got 2385 bytes pending, expected less than 128 bytes
+FAIL
+exit status 1
+FAIL	go.etcd.io/etcd/pkg/v3/ioutil	0.006s
+error: Bad exit status from /var/tmp/rpm-tmp.OCzuL0 (%check)
+RPM build errors:
+    Bad exit status from /var/tmp/rpm-tmp.OCzuL0 (%check)

Comment 2 Pete Zaitcev 2023-07-20 04:11:00 UTC
OK, I see what's happening.

The TestPageWriterRandom has 2 problems.

Problem 1: it uses rand.Intn(), but forgets rand.Seed().
As a result, it passes or fails depending on how the platform
sets up the module. This is why it passes on x86_64: the platform
happens to seed the rand module in such a way that the sequence
makes the buggy test to work.

If I adds rand.Seed(time.Now().UnixNano()), the test begins to
fail on my laptop in the exact way it fails in Koji on aarch64.

Problem 2: The test is obviously incorrect, even though it
existed with no change since it was committed in 2016,
commit 2943bf908606ccbfaeda3bdf882a11a0138a0502.

The condition it tests obviously may fail. The buffer can
contain several pages (especially if they're this small).
If the very first write is longer several pages,
  if len(p)+pw.bufferedBytes <= pw.bufWatermarkBytes { }
triggers, and Write returns, while retaining everything
written in the buffer. At that point, the as-tested
condition is obviously violated, as cw.writeBytes is zero.
I'm too lazy to construct a long scenario, but it obviously
can happen on the last write too (for example, if the
previous 4045 writes reset the buffer to zero by accident).

The tested condition needs to change to a valid one.

Comment 3 Pete Zaitcev 2023-07-20 14:58:04 UTC
See
issue https://github.com/etcd-io/etcd/issues/16255
commit https://github.com/etcd-io/etcd/commit/fddd1add52b33649a99d7f756404924138344a10

I think the fix is incomplete, because it does not
address the seeding of the rand module. However,
it should be sufficient to unblock Koji.

Comment 4 Fedora Release Engineering 2023-08-16 07:07:48 UTC
This bug appears to have been reported against 'rawhide' during the Fedora Linux 39 development cycle.
Changing version to 39.


Note You need to log in before you can comment on or make changes to this bug.