Bug 2171486

Summary: etcd: FTBFS in Fedora rawhide/f38
Product: [Fedora] Fedora Reporter: Fedora Release Engineering <releng>
Component: etcdAssignee: Jan Chaloupka <jchaloup>
Status: NEW --- QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 39CC: go-sig, gscrivan, jcajka, jchaloup, lacypret, lemenkov, zaitcev
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 2117176    

Description Fedora Release Engineering 2023-02-20 11:47:37 UTC
etcd failed to build from source in Fedora rawhide/f38

https://koji.fedoraproject.org/koji/taskinfo?taskID=96325022


For details on the mass rebuild see:

https://fedoraproject.org/wiki/Fedora_38_Mass_Rebuild
Please fix etcd at your earliest convenience and set the bug's status to
ASSIGNED when you start fixing it. If the bug remains in NEW state for 8 weeks,
etcd will be orphaned. Before branching of Fedora 39,
etcd will be retired, if it still fails to build.

For more details on the FTBFS policy, please visit:
https://docs.fedoraproject.org/en-US/fesco/Fails_to_build_from_source_Fails_to_install/

Comment 1 Pete Zaitcev 2023-07-13 05:40:14 UTC
The rawhide RPM etcd-3.5.5-3.fc38.x86_64 builds fine on F38 on x86_64,
just like in Koji. Failure only happens on other architectures.
The initial failure was on i686, and then on aarch64 (scratch build).

diff -u good bad:

-ok  	go.etcd.io/etcd/pkg/v3/idutil	0.001s
+ok  	go.etcd.io/etcd/pkg/v3/idutil	0.003s
 go.etcd.io/etcd/pkg/v3/idutil
 PASS
-ok  	go.etcd.io/etcd/pkg/v3/idutil	0.001s
-go.etcd.io/etcd/pkg/v3/ioutil
-PASS
-ok  	go.etcd.io/etcd/pkg/v3/ioutil	0.002s
+ok  	go.etcd.io/etcd/pkg/v3/idutil	0.003s
 go.etcd.io/etcd/pkg/v3/ioutil
-PASS
-ok  	go.etcd.io/etcd/pkg/v3/ioutil	0.002s
-go.etcd.io/etcd/pkg/v3/netutil
-{"level":"info","msg":"resolved URL Host","url":"http://infra0.example.com:4001","host":"infra0.example.com:4001","resolved-addr":"10.0.1.10:4001"}
......................
+--- FAIL: TestPageWriterRandom (0.00s)
+    pagewriter_test.go:41: got 2385 bytes pending, expected less than 128 bytes
+FAIL
+exit status 1
+FAIL	go.etcd.io/etcd/pkg/v3/ioutil	0.006s
+error: Bad exit status from /var/tmp/rpm-tmp.OCzuL0 (%check)
+RPM build errors:
+    Bad exit status from /var/tmp/rpm-tmp.OCzuL0 (%check)

Comment 2 Pete Zaitcev 2023-07-20 04:11:00 UTC
OK, I see what's happening.

The TestPageWriterRandom has 2 problems.

Problem 1: it uses rand.Intn(), but forgets rand.Seed().
As a result, it passes or fails depending on how the platform
sets up the module. This is why it passes on x86_64: the platform
happens to seed the rand module in such a way that the sequence
makes the buggy test to work.

If I adds rand.Seed(time.Now().UnixNano()), the test begins to
fail on my laptop in the exact way it fails in Koji on aarch64.

Problem 2: The test is obviously incorrect, even though it
existed with no change since it was committed in 2016,
commit 2943bf908606ccbfaeda3bdf882a11a0138a0502.

The condition it tests obviously may fail. The buffer can
contain several pages (especially if they're this small).
If the very first write is longer several pages,
  if len(p)+pw.bufferedBytes <= pw.bufWatermarkBytes { }
triggers, and Write returns, while retaining everything
written in the buffer. At that point, the as-tested
condition is obviously violated, as cw.writeBytes is zero.
I'm too lazy to construct a long scenario, but it obviously
can happen on the last write too (for example, if the
previous 4045 writes reset the buffer to zero by accident).

The tested condition needs to change to a valid one.

Comment 3 Pete Zaitcev 2023-07-20 14:58:04 UTC
See
issue https://github.com/etcd-io/etcd/issues/16255
commit https://github.com/etcd-io/etcd/commit/fddd1add52b33649a99d7f756404924138344a10

I think the fix is incomplete, because it does not
address the seeding of the rand module. However,
it should be sufficient to unblock Koji.

Comment 4 Fedora Release Engineering 2023-08-16 07:07:48 UTC
This bug appears to have been reported against 'rawhide' during the Fedora Linux 39 development cycle.
Changing version to 39.