Bug 1917387 - python-pingouin: FTBFS in Fedora rawhide
Summary: python-pingouin: FTBFS in Fedora rawhide
Keywords:
Status: ASSIGNED
Alias: None
Product: Fedora
Classification: Fedora
Component: python-pingouin
Version: rawhide
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Ankur Sinha (FranciscoD)
QA Contact: Fedora Extras Quality Assurance
URL: https://koschei.fedoraproject.org/pac...
Whiteboard:
Depends On:
Blocks: F34FTBFS PYTHON3.10
TreeView+ depends on / blocked
 
Reported: 2021-01-18 12:04 UTC by Tomáš Hrnčiar
Modified: 2021-02-09 16:24 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:
Type: Bug


Attachments (Terms of Use)

Description Tomáš Hrnčiar 2021-01-18 12:04:37 UTC
Description of problem:
Package python-pingouin fails to build from source in Fedora rawhide.

Version-Release number of selected component (if applicable):
0.3.8-1.fc34

Steps to Reproduce:
koji build --scratch f34 python-pingouin-0.3.8-1.fc34.src.rpm

Additional info:
This package is tracked by Koschei. See:
https://koschei.fedoraproject.org/package/python-pingouin

=================================== FAILURES ===================================
____________________ TestRegression.test_linear_regression _____________________

self = <pingouin.tests.test_regression.TestRegression testMethod=test_linear_regression>

    def test_linear_regression(self):
        """Test function linear_regression.
    
        Compare against JASP and R lm() function.
        """
        # Simple regression (compare to R lm())
        lm = linear_regression(df['X'], df['Y'])  # Pingouin
        sc = linregress(df['X'], df['Y'])  # SciPy
        # When using assert_equal, we need to use .to_numpy()
        assert_equal(lm['names'].to_numpy(), ['Intercept', 'X'])
        assert_almost_equal(lm['coef'][1], sc.slope)
        assert_almost_equal(lm['coef'][0], sc.intercept)
        assert_almost_equal(lm['se'][1], sc.stderr)
        assert_almost_equal(lm['pval'][1], sc.pvalue)
        assert_almost_equal(np.sqrt(lm['r2'][0]), sc.rvalue)
        assert lm.residuals_.size == df['Y'].size
        assert_equal(lm['CI[2.5%]'].round(5).to_numpy(), [1.48155, 0.17553])
        assert_equal(lm['CI[97.5%]'].round(5).to_numpy(), [4.23286, 0.61672])
        assert round(lm['r2'].iloc[0], 4) == 0.1147
        assert round(lm['adj_r2'].iloc[0], 4) == 0.1057
        assert lm.df_model_ == 1
        assert lm.df_resid_ == 98
    
        # Multiple regression with intercept (compare to JASP)
        X = df[['X', 'M']].to_numpy()
        y = df['Y'].to_numpy()
        lm = linear_regression(X, y, as_dataframe=False)  # Pingouin
        sk = LinearRegression(fit_intercept=True).fit(X, y)  # SkLearn
        assert_equal(lm['names'], ['Intercept', 'x1', 'x2'])
        assert_almost_equal(lm['coef'][1:], sk.coef_)
        assert_almost_equal(lm['coef'][0], sk.intercept_)
        assert_almost_equal(sk.score(X, y), lm['r2'])
        assert lm['residuals'].size == y.size
        # No need for .to_numpy here because we're using a dict and not pandas
        assert_equal([.605, .110, .101], np.round(lm['se'], 3))
        assert_equal([3.145, 0.361, 6.321], np.round(lm['T'], 3))
        assert_equal([0.002, 0.719, 0.000], np.round(lm['pval'], 3))
        assert_equal([.703, -.178, .436], np.round(lm['CI[2.5%]'], 3))
        assert_equal([3.106, .257, .835], np.round(lm['CI[97.5%]'], 3))
    
        # No intercept
        lm = linear_regression(X, y, add_intercept=False, as_dataframe=False)
        sk = LinearRegression(fit_intercept=False).fit(X, y)
        assert_almost_equal(lm['coef'], sk.coef_)
        # Scikit-learn gives wrong R^2 score when no intercept present because
        # sklearn.metrics.r2_score always assumes that an intercept is present
        # https://stackoverflow.com/questions/54614157/scikit-learn-statsmodels-which-r-squared-is-correct
        # assert_almost_equal(sk.score(X, y), lm['r2'])
        # Instead, we compare to R lm() function:
        assert round(lm['r2'], 4) == 0.9096
        assert round(lm['adj_r2'], 4) == 0.9078
        assert lm['df_model'] == 2
        assert lm['df_resid'] == 98
    
        # Test other arguments
        linear_regression(df[['X', 'M']], df['Y'], coef_only=True)
        linear_regression(df[['X', 'M']], df['Y'], alpha=0.01)
        linear_regression(df[['X', 'M']], df['Y'], alpha=0.10)
    
        # With missing values
        linear_regression(df_nan[['X', 'M']], df_nan['Y'], remove_na=True)
    
        # With columns with only one unique value
        lm1 = linear_regression(df[['X', 'M', 'One']], df['Y'])
        lm2 = linear_regression(df[['X', 'M', 'One']], df['Y'],
                                add_intercept=False)
        assert lm1.shape[0] == 3
        assert lm2.shape[0] == 3
        assert np.isclose(lm1.at[0, 'r2'], lm2.at[0, 'r2'])
    
        # With zero-only column
        lm1 = linear_regression(df[['X', 'M', 'Zero', 'One']], df['Y'])
        lm2 = linear_regression(df[['X', 'M', 'Zero', 'One']],
                                df['Y'].to_numpy(), add_intercept=False)
        lm3 = linear_regression(df[['X', 'Zero', 'M', 'Zero']].to_numpy(),
                                df['Y'], add_intercept=False)
        assert_equal(lm1.loc[:, 'names'].to_numpy(), ['Intercept', 'X', 'M'])
        assert_equal(lm2.loc[:, 'names'].to_numpy(), ['X', 'M', 'One'])
        assert_equal(lm3.loc[:, 'names'].to_numpy(), ['x1', 'x3'])
    
        # With duplicate columns
        lm1 = linear_regression(df[['X', 'One', 'Zero', 'M', 'M', 'X']],
                                df['Y'])
        lm2 = linear_regression(
            df[['X', 'One', 'Zero', 'M', 'M', 'X']].to_numpy(),
            df['Y'], add_intercept=False
        )
        assert_equal(lm1.loc[:, 'names'].to_numpy(), ['Intercept', 'X', 'M'])
        assert_equal(lm2.loc[:, 'names'].to_numpy(), ['x1', 'x2', 'x4'])
    
        # Relative importance
        # Compare to R package relaimpo
        # >>> data <- read.csv('mediation.csv')
        # >>> lm1 <- lm(Y ~ X + M, data = data)
        # >>> calc.relimp(lm1, type=c("lmg"))
>       lm = linear_regression(df[['X', 'M']], df['Y'], relimp=True)

X          = array([[ 6,  5],
       [ 7,  5],
       [ 7,  7],
       [ 8,  4],
       [ 4,  3],
       [ 4,  4],
       [ 9,  7],...       [ 7,  3],
       [ 6,  7],
       [ 5,  2],
       [ 8,  4],
       [ 7,  4],
       [ 2,  2],
       [ 5,  4]])
lm         = {'CI[2.5%]': array([0.1017897 , 0.51242501]), 'CI[97.5%]': array([0.43932124, 0.91602289]), 'T': array([3.18138289, 7.... [ 7,  3],
       [ 6,  7],
       [ 5,  2],
       [ 8,  4],
       [ 7,  4],
       [ 2,  2],
       [ 5,  4]]), ...}
lm1        =        names      coef        se  ...    adj_r2  CI[2.5%]  CI[97.5%]
0  Intercept  1.904269  0.605458  ...  0.360071  ....360071 -0.178018   0.257226
2          M  0.635495  0.100534  ...  0.360071  0.435963   0.835027

[3 rows x 9 columns]
lm2        =   names      coef        se         T  ...        r2    adj_r2  CI[2.5%]  CI[97.5%]
0    x1  0.039604  0.109648  0.361...01   3.105936
2    x4  0.635495  0.100534  6.321194  ...  0.372999  0.360071  0.435963   0.835027

[3 rows x 9 columns]
lm3        =   names      coef        se         T  ...        r2    adj_r2  CI[2.5%]  CI[97.5%]
0    x1  0.270555  0.085043  3.181...90   0.439321
1    x3  0.714224  0.101689  7.023596  ...  0.909607  0.907762  0.512425   0.916023

[2 rows x 9 columns]
sc         = LinregressResult(slope=0.3961261171467491, intercept=2.8572045582909733, rvalue=0.33869891283150894, pvalue=0.0005671128490823392, stderr=0.11115978809240681, intercept_stderr=0.6932129732105715)
self       = <pingouin.tests.test_regression.TestRegression testMethod=test_linear_regression>
sk         = LinearRegression(fit_intercept=False)
y          = array([ 6,  5,  4,  8,  5,  7,  8,  4,  7,  4,  4,  3, 10,  6,  4,  3,  5,
        4,  4,  6,  7,  6,  4,  4,  7,  8, ... 3,  4, 10,  4,  6,  4,  7,  7,  4,  1,  8,  5,  6,
        3,  5,  9,  8,  8,  6,  5,  4,  6,  5,  2,  1,  5,  1,  5])

pingouin/tests/test_regression.py:130: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
pingouin/regression.py:454: in linear_regression
    reli = _relimp(data.drop(columns=['Intercept']).cov())
        T          = array([3.1451684 , 0.36119001, 6.32119439])
        X          = array([[ 1.,  6.,  5.],
       [ 1.,  7.,  5.],
       [ 1.,  7.,  7.],
       [ 1.,  8.,  4.],
       [ 1.,  4.,  3.]...      [ 1.,  5.,  2.],
       [ 1.,  8.,  4.],
       [ 1.,  7.,  4.],
       [ 1.,  2.,  2.],
       [ 1.,  5.,  4.]])
        X_gd       = True
        Xw         = array([[ 1.,  6.,  5.],
       [ 1.,  7.,  5.],
       [ 1.,  7.,  7.],
       [ 1.,  8.,  4.],
       [ 1.,  4.,  3.]...      [ 1.,  5.,  2.],
       [ 1.,  8.,  4.],
       [ 1.,  7.,  4.],
       [ 1.,  2.,  2.],
       [ 1.,  5.,  4.]])
        _          = array([80.82058916, 13.03421192,  2.67239351])
        add_intercept = True
        adj_r2     = 0.36007139062480464
        alpha      = 0.05
        as_dataframe = True
        beta_se    = array([0.60545846, 0.10964844, 0.10053394])
        beta_var   = array([0.36657995, 0.01202278, 0.01010707])
        coef       = array([1.90426882, 0.03960392, 0.63549459])
        coef_only  = False
        constant   = 1
        crit       = 1.984723185927883
        data       =     y  Intercept    X    M
0   6        1.0  6.0  5.0
1   5        1.0  7.0  5.0
2   4        1.0  7.0  7.0
3   8     ... 1.0  8.0  4.0
97  5        1.0  7.0  4.0
98  1        1.0  2.0  2.0
99  5        1.0  5.0  4.0

[100 rows x 4 columns]
        df_model   = 2
        df_resid   = 97
        idx_duplicate = []
        idx_unique = array([0])
        idx_zero   = array([], dtype=int64)
        ll         = array([ 0.70260138, -0.17801788,  0.43596254])
        ll_name    = 'CI[2.5%]'
        marg_error = array([1.20166745, 0.2176218 , 0.19953204])
        mse        = 2.661262704705674
        n          = 100
        n_nonzero  = array([100,  99,  99])
        names      = ['Intercept', 'X', 'M']
        p          = 3
        pair       = (1, 2)
        pred       = array([5.31936528, 5.3589692 , 6.62995838, 4.76307854, 3.96916827,
       4.60466285, 6.70916622, 2.10228843, 6.669562...52, 5.99446379, 7.30505688, 4.08798003, 6.59035446,
       3.3732776 , 4.76307854, 4.72347461, 3.25446584, 4.64426677])
        pval       = array([2.20383086e-03, 7.18742902e-01, 7.92264206e-09])
        r2         = 0.372999241319253
        rank       = 3
        relimp     = True
        remove_na  = False
        resid      = array([ 0.68063472, -0.3589692 , -2.62995838,  3.23692146,  1.03083173,
        2.39533715,  1.29083378,  1.89771157, ...446379, -3.30505688,  1.91201997, -1.59035446,
       -1.3732776 , -3.76307854,  0.27652539, -2.25446584,  0.35573323])
        ss_res     = 258.1424823564504
        ss_tot     = 3147
        ss_wtot    = 411.71000000000004
        stats      = {'CI[2.5%]': array([ 0.70260138, -0.17801788,  0.43596254]), 'CI[97.5%]': array([3.10593627, 0.25722572, 0.83502663]), 'T': array([3.1451684 , 0.36119001, 6.32119439]), 'adj_r2': 0.36007139062480464, ...}
        ul         = array([3.10593627, 0.25722572, 0.83502663])
        ul_name    = 'CI[97.5%]'
        w          = array([1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
       1., 1., 1., 1., 1., 1., 1., 1., 1., ...1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
       1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.])
        weights    = None
        y          = array([ 6,  5,  4,  8,  5,  7,  8,  4,  7,  4,  4,  3, 10,  6,  4,  3,  5,
        4,  4,  6,  7,  6,  4,  4,  7,  8, ... 3,  4, 10,  4,  6,  4,  7,  7,  4,  1,  8,  5,  6,
        3,  5,  9,  8,  8,  6,  5,  4,  6,  5,  2,  1,  5,  1,  5])
        y_gd       = True
        yw         = array([ 6,  5,  4,  8,  5,  7,  8,  4,  7,  4,  4,  3, 10,  6,  4,  3,  5,
        4,  4,  6,  7,  6,  4,  4,  7,  8, ... 3,  4, 10,  4,  6,  4,  7,  7,  4,  1,  8,  5,  6,
        3,  5,  9,  8,  8,  6,  5,  4,  6,  5,  2,  1,  5,  1,  5])
pingouin/regression.py:535: in _relimp
    ss_reg_without = pinv(S.iloc[p, p]) @ S_without @ S_without
        S          =           y         X         M
y  4.158687  1.204343  2.365859
X  1.204343  3.040303  1.705657
M  2.365859  1.705657  3.616566
        S_without  = Series([], Name: y, dtype: float64)
        all_preds  = []
        betas      = array([0.03960392, 0.63549459])
        cols       = ['y', 'X', 'M']
        k          = 0
        loo        = array([2])
        npred      = 2
        p          = []
        p_with     = [1]
        pred       = 1
        predictors = ['X', 'M']
        predictors_int = array([1, 2])
        r2_full    = 0.3729992413192531
        r2_seq     = []
        r2_seq_mean = []
        ss_reg_precomp = {}
        ss_tot     = 4.158686868686868
        target_int = 0
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

a = array([], shape=(0, 0), dtype=float64), cond = None, rcond = None
return_rank = False, check_finite = True

    def pinv(a, cond=None, rcond=None, return_rank=False, check_finite=True):
        """
        Compute the (Moore-Penrose) pseudo-inverse of a matrix.
    
        Calculate a generalized inverse of a matrix using a least-squares
        solver.
    
        Parameters
        ----------
        a : (M, N) array_like
            Matrix to be pseudo-inverted.
        cond, rcond : float, optional
            Cutoff factor for 'small' singular values. In `lstsq`,
            singular values less than ``cond*largest_singular_value`` will be
            considered as zero. If both are omitted, the default value
            ``max(M, N) * eps`` is passed to `lstsq` where ``eps`` is the
            corresponding machine precision value of the datatype of ``a``.
    
            .. versionchanged:: 1.3.0
                Previously the default cutoff value was just `eps` without the
                factor ``max(M, N)``.
    
        return_rank : bool, optional
            if True, return the effective rank of the matrix
        check_finite : bool, optional
            Whether to check that the input matrix contains only finite numbers.
            Disabling may give a performance gain, but may result in problems
            (crashes, non-termination) if the inputs do contain infinities or NaNs.
    
        Returns
        -------
        B : (N, M) ndarray
            The pseudo-inverse of matrix `a`.
        rank : int
            The effective rank of the matrix. Returned if return_rank == True
    
        Raises
        ------
        LinAlgError
            If computation does not converge.
    
        Examples
        --------
        >>> from scipy import linalg
        >>> a = np.random.randn(9, 6)
        >>> B = linalg.pinv(a)
        >>> np.allclose(a, np.dot(a, np.dot(B, a)))
        True
        >>> np.allclose(B, np.dot(B, np.dot(a, B)))
        True
    
        """
        a = _asarray_validated(a, check_finite=check_finite)
        # If a is sufficiently tall it is cheaper to compute using the transpose
>       trans = a.shape[0] / a.shape[1] >= 1.1
E       ZeroDivisionError: division by zero

a          = array([], shape=(0, 0), dtype=float64)
check_finite = True
cond       = None
rcond      = None
return_rank = False

/usr/lib64/python3.10/site-packages/scipy/linalg/basic.py:1290: 
ZeroDivisionError
=========================== short test summary info ============================
FAILED pingouin/tests/test_regression.py::TestRegression::test_linear_regression
============ 1 failed, 85 passed, 3006 warnings in 66.17s (0:01:06) ============

Comment 1 Fedora Release Engineering 2021-01-24 04:23:00 UTC
Dear Maintainer,

your package has an open Fails To Build From Source bug for Fedora 34.
Action is required from you.

If you can fix your package to build, perform a build in koji, and either create
an update in bodhi, or close this bug without creating an update, if updating is
not appropriate [1]. If you are working on a fix, set the status to ASSIGNED to
acknowledge this. If you have already fixed this issue, please close this Bugzilla report.

Following the policy for such packages [2], your package will be orphaned if
this bug remains in NEW state more than 8 weeks (not sooner than 2021-03-15).

A week before the mass branching of Fedora 35 according to the schedule [3],
any packages not successfully rebuilt at least on Fedora 33 will be
retired regardless of the status of this bug.

[1] https://docs.fedoraproject.org/en-US/fesco/Updates_Policy/
[2] https://docs.fedoraproject.org/en-US/fesco/Fails_to_build_from_source_Fails_to_install/
[3] https://fedorapeople.org/groups/schedule/f-35/f-35-key-tasks.html

Comment 2 Ben Cotton 2021-02-09 15:41:08 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 34 development cycle.
Changing version to 34.


Note You need to log in before you can comment on or make changes to this bug.