Bug 1686611

Summary: Heketi Pod fails to deploy after storage node restart
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Bledi Agolli <bagolli>
Component: heketiAssignee: John Mulligan <jmulligan>
Status: CLOSED DUPLICATE QA Contact: Prasanth <pprakash>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: ocs-3.11CC: hchiramm, jmulligan, kramdoss, lsantann, madam, pasik, rgowdapp, rhs-bugs, rtalur, storage-qa-internal
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-09-23 18:15:15 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
heketi.db files from 3 nodes none

Description Bledi Agolli 2019-03-07 20:17:37 UTC
Created attachment 1541969 [details]
heketi.db files from 3 nodes

Description of problem:
Heketi-storage pod fails to deploy after two out of 4 app-storage nodes are restarted. There was not enough time between node reboots for the Gluster volumes to become ready. After further investigation, it seems that the heketi database is corrupted.

Version-Release number of selected component (if applicable):
v3.11

How reproducible:
Error reproduced every time.

Steps to Reproduce:
1. Redeploy heketi-storage deployment. Heketi storage pod fails
2.
3.

Actual results:
Pod fails with the following error.
```Heketi 8.0.0
[heketi] INFO 2019/03/07 20:04:12 Loaded kubernetes executor
ERROR: Unable to start application```

Expected results:
Pod deploys without error

Additional info:

# Volume Info

sh-4.2# gluster volume info heketidbstorage

Volume Name: heketidbstorage
Type: Replicate
Volume ID: 8c931c3d-9032-4b42-a19d-5f0179c70743
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: 128.160.65.213:/var/lib/heketi/mounts/vg_c15e6a953b5e3d000df2af594afd571e/brick_44ac83de22ea3441b549973120fc2c6c/brick
Brick2: 128.160.65.216:/var/lib/heketi/mounts/vg_02d7d1e8c058d7ebab61d63beb56e44b/brick_fbe52cb58fdf0ed2de53f638c77a3fc6/brick
Brick3: 128.160.65.215:/var/lib/heketi/mounts/vg_ecf17f15c801c0ef8ce32086f0445551/brick_fdbb2bcdf393995f54822ac154581a8d/brick
Options Reconfigured:
performance.client-io-threads: off
nfs.disable: on
transport.address-family: inet
server.tcp-user-timeout: 42
cluster.brick-multiplex: on

# Crash Trace
sh-4.2# ./heketi db export --dbfile=heketi.db --jsonfile=heketi.json
panic: invalid page type: 12: 10

goroutine 1 [running]:
github.com/boltdb/bolt.(*Cursor).search(0xc42066b2c8, 0xc42066b360, 0x6, 0x20, 0xc)
        /builddir/build/BUILD/heketi-8.0.0/src/github.com/boltdb/bolt/cursor.go:256 +0x40c
github.com/boltdb/bolt.(*Cursor).seek(0xc42066b2c8, 0xc42066b360, 0x6, 0x20, 0x0, 0x0, 0x9, 0xbf, 0x195554f, 0x2, ...)
        /builddir/build/BUILD/heketi-8.0.0/src/github.com/boltdb/bolt/cursor.go:159 +0xb1
github.com/boltdb/bolt.(*Bucket).Bucket(0xc4204442b8, 0xc42066b360, 0x6, 0x20, 0xc42066b360)
        /builddir/build/BUILD/heketi-8.0.0/src/github.com/boltdb/bolt/bucket.go:112 +0xfc
github.com/boltdb/bolt.(*Tx).Bucket(0xc4204442a0, 0xc42066b360, 0x6, 0x20, 0x6)
        /builddir/build/BUILD/heketi-8.0.0/src/github.com/boltdb/bolt/tx.go:101 +0x4f
github.com/heketi/heketi/apps/glusterfs.EntryKeys(0xc4204442a0, 0x18efa13, 0x6, 0x110, 0xc420194000, 0xc42066b470)
        /builddir/build/BUILD/heketi-8.0.0/src/github.com/heketi/heketi/apps/glusterfs/dbentry.go:57 +0xbb
github.com/heketi/heketi/apps/glusterfs.VolumeList(0xc4204442a0, 0x18f7c82, 0xd, 0x0, 0x0, 0x0)
        /builddir/build/BUILD/heketi-8.0.0/src/github.com/heketi/heketi/apps/glusterfs/volume_entry.go:73 +0x44
github.com/heketi/heketi/apps/glusterfs.dbDumpInternal.func1(0xc4204442a0, 0x19697a0, 0xc4204442a0)
        /builddir/build/BUILD/heketi-8.0.0/src/github.com/heketi/heketi/apps/glusterfs/db_operations.go:39 +0xe3
github.com/boltdb/bolt.(*DB).View(0xc4204d81e0, 0xc420010be0, 0x0, 0x0)
        /builddir/build/BUILD/heketi-8.0.0/src/github.com/boltdb/bolt/db.go:626 +0x9a
github.com/heketi/heketi/apps/glusterfs.dbDumpInternal(0x24d3740, 0xc4204d81e0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
        /builddir/build/BUILD/heketi-8.0.0/src/github.com/heketi/heketi/apps/glusterfs/db_operations.go:34 +0x2c9
github.com/heketi/heketi/apps/glusterfs.DbDump(0x7ffc38aaa856, 0xb, 0x7ffc38aaa841, 0x9, 0x0, 0x0)
        /builddir/build/BUILD/heketi-8.0.0/src/github.com/heketi/heketi/apps/glusterfs/db_operations.go:232 +0x18c
main.glob..func3(0x24ba740, 0xc4201227e0, 0x0, 0x2)
        /builddir/build/BUILD/heketi-8.0.0/main.go:126 +0x92
github.com/spf13/cobra.(*Command).execute(0x24ba740, 0xc4201227c0, 0x2, 0x2, 0x24ba740, 0xc4201227c0)
        /builddir/build/BUILD/heketi-8.0.0/src/github.com/spf13/cobra/command.go:651 +0x23d
github.com/spf13/cobra.(*Command).ExecuteC(0x24b9ec0, 0x160000000024, 0x98, 0x98)
        /builddir/build/BUILD/heketi-8.0.0/src/github.com/spf13/cobra/command.go:726 +0x2fe
github.com/spf13/cobra.(*Command).Execute(0x24b9ec0, 0x18dff40, 0x254afc0)
        /builddir/build/BUILD/heketi-8.0.0/src/github.com/spf13/cobra/command.go:685 +0x2b
main.main()
        /builddir/build/BUILD/heketi-8.0.0/main.go:447 +0x42

heketi.db files are attached.

Comment 4 Yaniv Kaul 2019-04-14 14:29:01 UTC
Status?

Comment 6 Levy Sant'Anna 2019-06-24 19:10:46 UTC
Any status update?

Comment 7 Levy Sant'Anna 2019-06-26 11:34:51 UTC
Status?