Login
[x]
Log in using an account from:
Fedora Account System
Red Hat Associate
Red Hat Customer
Or login using a Red Hat Bugzilla account
Forgot Password
Login:
Hide Forgot
Create an Account
Red Hat Bugzilla – Attachment 1472264 Details for
Bug 1569872
Avoid busy loop on reconfig failure
[?]
New
Simple Search
Advanced Search
My Links
Browse
Requests
Reports
Current State
Search
Tabular reports
Graphical reports
Duplicates
Other Reports
User Changes
Plotly Reports
Bug Status
Bug Severity
Non-Defaults
|
Product Dashboard
Help
Page Help!
Bug Writing Guidelines
What's new
Browser Support Policy
5.0.4.rh83 Release notes
FAQ
Guides index
User guide
Web Services
Contact
Legal
This site requires JavaScript to be enabled to function correctly, please enable it.
gdb logs
gdb logs (text/plain), 8.73 KB, created by
Sweta Anandpara
on 2018-08-02 06:00:58 UTC
(
hide
)
Description:
gdb logs
Filename:
MIME Type:
Creator:
Sweta Anandpara
Created:
2018-08-02 06:00:58 UTC
Size:
8.73 KB
patch
obsolete
> Num Type Disp Enb Address What > 1 breakpoint keep y 0x00007f6ccb9629a0 in tcmu_glfs_reconfig at /usr/src/debug/tcmu-runner-1.2.0/glfs.c:670 > breakpoint already hit 1 time > 8 breakpoint keep y 0x000000000040a19a in __tcmu_reopen_dev at /usr/src/debug/tcmu-runner-1.2.0/tcmur_device.c:88 > (gdb) > > > 1, Create one 1K size target device: > > [root@dhcp35-120 ~]# gluster-block create vol/block0 ha 3 10.70.35.120,10.70.35.141,10.70.35.147 1k > IQN: iqn.2016-12.org.gluster-block:43017430-d063-4b4f-8621-bc3ef0cd3054 > PORTAL(S): 10.70.35.120:3260 10.70.35.141:3260 10.70.35.147:3260 > RESULT: SUCCESS > > 2, in another terminal attach the tcmu-runner to gdb: > [root@dhcp35-120 ~]# ps -aux |grep tcmu > root 9620 0.0 1.2 2778340 47324 ? Ssl Jul30 0:51 /usr/bin/tcmu-runner --tcmu-log-dir=/var/log/gluster-block/ > root 14599 0.0 0.0 112704 940 pts/1 R+ 05:54 0:00 grep --color=auto tcmu > [root@dhcp35-120 ~]# gdb attach 9620 > > (gdb) info breakpoints > Num Type Disp Enb Address What > 1 breakpoint keep y 0x00007f6ccb9629a0 in tcmu_glfs_reconfig at /usr/src/debug/tcmu-runner-1.2.0/glfs.c:670 > breakpoint already hit 1 time > 8 breakpoint keep y 0x000000000040a19a in __tcmu_reopen_dev at /usr/src/debug/tcmu-runner-1.2.0/tcmur_device.c:88 > > > 3, set 2 break point > ================= > you can see that there has 2 breakpoint, the Num is 1 and 8 > ================= > > 4, in another terminal try to modify the target's size to a larger one: > [root@dhcp35-120 ~]# gluster-block modify vol/block0 size 2k > IQN: iqn.2016-12.org.gluster-block:43017430-d063-4b4f-8621-bc3ef0cd3054 > SIZE: 2.0 KiB > SUCCESSFUL ON: 10.70.35.141 10.70.35.147 10.70.35.120 > RESULT: SUCCESS > > this commamd will be stuck, then we need to execute the continue command in gdb: > (gdb) continue > Continuing. > Breakpoint 1, tcmu_glfs_reconfig (dev=0xafeff0, cfg=0x7ffffe9e5490) at /usr/src/debug/tcmu-runner-1.2.0/glfs.c:670 > 670 { > (gdb) > and we will hit the breakpoint Num 1 > > 668 static int tcmu_glfs_reconfig(struct tcmu_device *dev, > 669 struct tcmulib_cfg_info *cfg) > 670 { > 671 struct glfs_state *gfsp = tcmu_get_dev_private(dev); > 672 struct stat st; > 673 int ret = -EIO; > 674 > 675 switch (cfg->type) { > 676 case TCMULIB_CFG_DEV_SIZE: > 677 ret = glfs_lstat(gfsp->fs, gfsp->hosts->path, &st); > 678 if (ret) { > 679 tcmu_dev_warn(dev, "glfs_lstat failed: %m\n"); > 680 tcmu_notify_conn_lost(dev); > 681 > 682 /* Let the targetcli command return success */ > 683 ret = 0; > 684 } else if (st.st_size != cfg->data.dev_size) { > 685 tcmu_dev_err(dev, > 686 "device size and backing size disagree: device %lld backing %lld\n", > 687 cfg->data.dev_size, (long long) st.st_size); > 688 ret = -EINVAL; > 689 } > 690 return ret; > 691 case TCMULIB_CFG_DEV_CFGSTR: > 692 case TCMULIB_CFG_WRITE_CACHE: > 693 default: > 694 return -EOPNOTSUPP; > 695 } > 696 } > if we want to invoke to call the function from the BZ's fixing, we must emulate the Line 678 and set the ret to non zero > in gdb > nomally it won't run the code from Line679 ~ 683 > and only when there is something wrong, it will run this code here, and we must emulate it so we need to modify the 'ret' value in Line 678 in gdb > > (gdb) c > Continuing. > Breakpoint 1, tcmu_glfs_reconfig (dev=0xafeff0, cfg=0x7ffffe9e5490) at /usr/src/debug/tcmu-runner-1.2.0/glfs.c:670 > 670 { > (gdb) n > 671 struct glfs_state *gfsp = tcmu_get_dev_private(dev); > (gdb) > 675 switch (cfg->type) { > (gdb) > 677 ret = glfs_lstat(gfsp->fs, gfsp->hosts->path, &st); > (gdb) > 678 if (ret) { > (gdb) print > The history is empty. > (gdb) print ret > $1 = 0 > (gdb) set ret=-1 > (gdb) n > 679 tcmu_dev_warn(dev, "glfs_lstat failed: %m\n"); > (gdb) > 680 tcmu_notify_conn_lost(dev); > (gdb) > =========== > > (gdb) c > Continuing. > [Switching to Thread 0x7f6c837fe700 (LWP 14941)] > Breakpoint 8, __tcmu_reopen_dev (dev=0xafeff0, in_lock_thread=in_lock_thread@entry=false) at /usr/src/debug/tcmu-runner-1.2.0/tcmur_device.c:88 > 88 rdev->flags &= ~TCMUR_DEV_FLAG_IS_OPEN; > (gdb) > > we have hit the Num 8 breakpoint > > (gdb) c > Continuing. > [Switching to Thread 0x7f6c837fe700 (LWP 14941)] > Breakpoint 8, __tcmu_reopen_dev (dev=0xafeff0, in_lock_thread=in_lock_thread@entry=false) at /usr/src/debug/tcmu-runner-1.2.0/tcmur_device.c:88 > 88 rdev->flags &= ~TCMUR_DEV_FLAG_IS_OPEN; > (gdb) list > 83 > 84 tcmu_dev_dbg(dev, "Closing device.\n"); > 85 rhandler->close(dev); > 86 > 87 pthread_mutex_lock(&rdev->state_lock); > 88 rdev->flags &= ~TCMUR_DEV_FLAG_IS_OPEN; > 89 ret = -EIO; > 90 while (ret != 0 && !(rdev->flags & TCMUR_DEV_FLAG_SHUTTING_DOWN)) { > 91 pthread_mutex_unlock(&rdev->state_lock); > 92 > (gdb) n > 90 while (ret != 0 && !(rdev->flags & TCMUR_DEV_FLAG_SHUTTING_DOWN)) { > (gdb) > 91 pthread_mutex_unlock(&rdev->state_lock); > (gdb) > 93 tcmu_dev_dbg(dev, "Opening device.\n"); > (gdb) > (gdb) n > 94 ret = rhandler->open(dev, true); > (gdb) > 95 if (ret) { > (gdb) print ret > $3 = 0 > (gdb) > then continue command in gdb > 94 ret = rhandler->open(dev, true); > > after the Line 94 > and in Line 95, I have print the ret value, it is 0 > =================== > 90 while (ret != 0 && !(rdev->flags & TCMUR_DEV_FLAG_SHUTTING_DOWN)) { > 91 pthread_mutex_unlock(&rdev->state_lock); > 92 > 93 tcmu_dev_dbg(dev, "Opening device.\n"); > 94 ret = rhandler->open(dev, true); > 95 if (ret) { > 96 /* Avoid busy loop ? */ > 97 sleep(1); > 98 } > 99 > 100 pthread_mutex_lock(&rdev->state_lock); > 101 if (!ret) { > 102 rdev->flags |= TCMUR_DEV_FLAG_IS_OPEN; > 103 } > 104 } > this is the source code, NOTE, the line number is not different from the gdb > Could you see there has one "sleep(1)" line ? > ====================== > before when this line is failed: 94 ret = rhandler->open(dev, true); > the ret value will be non zero > and the old code is : > 90 while (ret != 0 && !(rdev->flags & TCMUR_DEV_FLAG_SHUTTING_DOWN)) { > 91 pthread_mutex_unlock(&rdev->state_lock); > 92 > 93 tcmu_dev_dbg(dev, "Opening device.\n"); > 94 ret = rhandler->open(dev, true); > 95 if (!ret) { > 96 /* Avoid busy loop ? */ > 97 sleep(1); > 98 } > 99 > 100 pthread_mutex_lock(&rdev->state_lock); > 101 if (!ret) { > 102 rdev->flags |= TCMUR_DEV_FLAG_IS_OPEN; > 103 } > 104 } > if rhandler->open fails, ret = -1 for example. so then the sleep(1) won't be run, and then will goto the (ret != 0 && !(rdev->flags & TCMUR_DEV_FLAG_SHUTTING_DOWN)) {} loop again > > if the rhandler->open(dev, true) won't recovery in a short time, for example it will reocovery after 1 hour > so here will in busy loop for 1 hour without any break > > and the Line 95 is buggy > > only when the rhandler->open() is fail it should have a sleep and then goto have a try > > so this is why the patch comes from: https://github.com/open-iscsi/tcmu-runner/pull/399/commits/7fed83b5e7ad3258268fc26f4005e4258eef949d > this is fixing the BZ relevant > > >(gdb) n >91 pthread_mutex_unlock(&rdev->state_lock); >(gdb) >93 tcmu_dev_dbg(dev, "Opening device.\n"); >(gdb) >94 ret = rhandler->open(dev, true); >(gdb) print ret >$3 = -5 >(gdb) n >95 if (ret) { >(gdb) print ret >$4 = 0 >(gdb) set ret=-1 >(gdb) n >97 sleep(1); >(gdb) > >100 pthread_mutex_lock(&rdev->state_lock); >(gdb) >101 if (!ret) { >(gdb) > >So we can see that when the rhandler->open() fails, it will sleep 1 second and then back to the while(){} loop. Without the [1] change, there will be no sleep before back to the while() loop, and it will be one busy loop always trying to rhandler->open() .
You cannot view the attachment while viewing its details because your browser does not support IFRAMEs.
View the attachment on a separate page
.
View Attachment As Raw
Actions:
View
Attachments on
bug 1569872
: 1472264