Bug 1435832

Summary:	PostgreSQL DB Restore: unexpected data beyond EOF
Product:	[Community] GlusterFS	Reporter:	javishi
Component:	nfs	Assignee:	Humble Chirammal <hchiramm>
Status:	CLOSED EOL	QA Contact:
Severity:	unspecified	Docs Contact:
Priority:	unspecified
Version:	3.8	CC:	bugs, fuzz, kostrzewa, rgoncalves, tcarlin
Target Milestone:	---	Keywords:	Triaged
Target Release:	---
Hardware:	x86_64
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:
Clones:	1512691 (view as bug list)		Environment:
Last Closed:	2017-11-07 10:40:31 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description javishi 2017-03-24 23:15:16 UTC

Description of problem:

I'm running Gluster in a kubernetes cluster with the help of https://github.com/gluster/gluster-kubernetes. I have a postgresql container where the /var/lib/postgresql/data/pgdata directory is a glusterfs mounted persistent volume. I then run another container to restore a PostgreSQL backup. It successfully restores all tables except one, which happens to be the largest sized table >100MB. The error given for that table is:

```
ERROR: unexpected data beyond EOF in block 14917 of relation base/78620/78991
HINT: This has been seen to occur with buggy kernels; consider updating your system.
CONTEXT: COPY es_config_app_solutiondraft, line 906
```

I have tried several different containers to perform the restore on including ubuntu:16:04, postgresql:9.6.2, and alpine:3.5. All have the same issue. Interestingly, the entire restore works including the large table if I run it directly on the postgresql container. That makes me think this is related to container to container networking and not necessarily gluster's fault but wanted to report it in case there are any suggestions or kernel setting tweaks to fix the issue.

Version-Release number of selected component (if applicable):

GlusterFS 3.8.5

PostgreSQL 9.6.2 Container:
uname -a
Linux develop-postgresql-3992946951-3srqg 4.4.0-65-generic #86-Ubuntu SMP Thu Feb 23 17:49:58 UTC 2017 x86_64 GNU/Linux

Other containers used for the restore are running the same 4.4.0-65-generic kernel.

Kubernetes 1.5.1
Docker 1.12.6

How reproducible:

First, get kubernetes working with gluster and heketi. See https://github.com/gluster/gluster-kubernetes

Steps to Reproduce:
1. Start a PostgreSQL "pod" with the /var/lib/postgresql/data/pgdata set up as persistent volume.
2. Start a second container that can access the postgres container.
3. Attempt to restore a backup containing a large table >100MB.

Actual results:

Restore fails on large table with above error.

Expected results:

Restore applies cleanly, even for large tables.

Additional info:

Volume is mounted as type fuse.glusterfs. From postgresql container:
# mount
10.163.148.196:vol_6d09a586370e26a718a74d5d280f8dfd on /var/lib/postgresql/data/pgdata type fuse.glusterfs (rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072)

Googling this error does return some fairly old results that don't really have anything conclusive.

Comment 1 javishi 2017-03-28 18:19:25 UTC

My thoughts about it being a container networking issue were incorrect.  I now believe this is truly a glusterfs + postgresql issue.  I confirmed that I occasionally do get restore failures on the postgresql container itself which eliminates the container networking interface (CNI).  I also get occasional successful restores on separate restore containers which further eliminates CNI.  The "unexpected data beyond EOF" error occurs intermittently with about a ~30% success rate regardless of how the restore is attempted.

Also, the table size for the failing table is actually 244MB.  All other tables that do successfully restore are under 10MB.

Comment 2 Zbigniew Kostrzewa 2017-09-28 05:29:38 UTC

Just recently I bumped into the same error using GlusterFS 3.10.5 and 3.12.1 (from SIG repositories).
I have created a cluster of 3 VMs with CentOS 7.2 (uname below) and spin up a PostgreSQL 9.6.2 docker (v17.06) container. GlusterFS volume was bind-mounted into the container to default location where PostgreSQL stores its data (/var/lib/postgresql/data). When filling up the database with data at some point I got this "unexpected data beyond EOF" error.

On PostgreSQL's mailing list similar issue was discussed but about PostgreSQL on NFS. In fact such issue was reported and fixed already in RHEL5 (https://bugzilla.redhat.com/show_bug.cgi?id=672981).

I tried using latest PostgreSQL's docker image (i.e. 9.6.5), unfortunately with the same results.

uname -a:
Linux node-10-9-4-109 3.10.0-327.el7.x86_64 #1 SMP Thu Nov 19 22:10:57 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

Comment 3 Rui 2017-10-23 10:28:54 UTC

I'm having the same problem here.

I have installed postgresql 9.6.5 on 3.10.0-693.2.2.el7.x86_64 and executed pgbench with a scale factor of 1000, i.e. 10.000.000 accounts. 

First run was executed using the O.S filesystem. Everything went well.

After that I have stopped postgresql, created a GlusterFS replicated volume (3 replicas), and copied postgresql data directory into the GlusterFS volume.The volume is mounted as type fuse.glusterfs.

10.112.76.37:gv0 on /mnt/batatas type fuse.glusterfs (rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072)


After that I've tried to run pgbench. Running with concurrency level of one, things work fine. However, running with concurrency level > 1, this error occurs:

client 1 aborted in state 9: ERROR:  unexpected data beyond EOF in block 316 of relation base/16384/16516
HINT:  This has been seen to occur with buggy kernels; consider updating your system.

I'm using glusterfs 3.12.2.

Any idea?

Comment 4 Niels de Vos 2017-11-07 10:40:31 UTC

This bug is getting closed because the 3.8 version is marked End-Of-Life. There will be no further updates to this version. Please open a new bug against a version that still receives bugfixes if you are still facing this issue in a more current release.