Skip to content

Stats

These are utility functions used to compute simple statistics across simplices and neighborhoods.

Functions to average (functional or structural) metrics across simplices or neighborhoods. Author(s): Daniela Egas Santander, Last update: 11.2023

edge_stats_participation(participation, vals, condition=operator.eq, dims=None)

Get statistics of the values in vals across edges filtered using edge participation

Parameters:

Name Type Description Default
participation DataFrame

DataFrame of edge participation with index the edges of an NxN matrix to consider, columns are dimensions and values are edge participation.

required
values Series

pandas Series with index the edges of the NxN matrix of which edge participation has been computed and vals the values on that edge to be averaged.

required
condition operator

operator with which to filter the nodes. The default operator.eq filters nodes such that their maximal dimension of edge participation is a given value. Alternatively, operator.ge filters edges such that their maximal dimension of node participation is at least a given value.

eq
dims iterable

dimensions for which to run the analysis, if None all the columns of participation will be analyzed

None

Returns:

Type Description
DataFrame

with index, the dimensions for which the analysis have been run and columns the statistics of the values in vals where the nodes have been grouped according to the condition given.

Source code in src/connalysis/network/stats.py
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
def edge_stats_participation(participation, vals, condition=operator.eq, dims=None):
    """ Get statistics of the values in vals across edges filtered using edge participation

    Parameters
    ----------
    participation : DataFrame
        DataFrame of edge participation with index the edges of an NxN matrix to consider,
        columns are dimensions and values are edge participation.
    values : Series
        pandas Series with index the edges of the NxN matrix of which edge participation has been computed
        and vals the values on that edge to be averaged.
    condition : operator
        operator with which to filter the nodes. The default ``operator.eq`` filters nodes such that their maximal
        dimension of edge participation is a given value.
        Alternatively, ``operator.ge`` filters edges such that their maximal dimension of node participation is at least a given
        value.
    dims : iterable
        dimensions for which to run the analysis, if ``None`` all the columns of participation will be analyzed

    Returns
    -------
    DataFrame
        with index, the dimensions for which the analysis have been run and columns the statistics of the values in vals
        where the nodes have been grouped according to the condition given.
    """
    par_df = participation.copy()
    if dims is None:
        dims = par_df.columns
    vals = vals.rename("values")
    par_df["max_dim"] = (par_df > 0).sum(axis=1)  # maximal dimension an edge is part of. Note that edge participation in dimension 0 is 0
    stats_vals = {}
    for dim in dims:
        mask = condition(par_df.max_dim, dim)
        c = pd.DataFrame(vals.loc[par_df[mask].index], columns=["values"])
        c["weight"] = par_df[mask][dim]
        w_mean = c.apply(np.product, axis=1).sum() / (c["weight"].sum())
        stats_vals[dim] = (c.shape[0],  # Number of nodes fulfilling the condition
                           np.nanmean(c["values"]),
                           np.nanstd(c["values"]),
                           stats.sem(c["values"], nan_policy="omit"),
                           w_mean  # mean weighted by participation
                           )
    stats_vals = pd.DataFrame.from_dict(stats_vals, orient="index",
                                        columns=["counts", "mean", "std", "sem", "weighted_mean"])
    stats_vals.index.name = "dim"
    return stats_vals.drop(0)

node_stats_neighborhood(values, adj=None, pre=True, post=True, all_nodes=True, centers=None, include_center=True, precomputed=False, neighborhoods=None)

Get basic statistics of the property values on the neighbhood of the nodes in centers in the graph described by adj.

Parameters:

Name Type Description Default
values Series

pandas Series with index the nodes of the NxN matrix of which the simplices are listed, and values the values on that node to be averaged.

required
adj sparse matrix or 2d array

The adjacency matrix of the graph

None
pre bool

If True compute the nodes mapping to the nodes in centers (the in-neighbors of the centers)

True
post bool

If True compute the nodes that the centers map to (the out-neighbors of the centers)

True
all_nodes bool

If True compute the neighbors of all nodes in adj, if False compute only the neighbors of the nodes listed in centers

True
centers 1d-array

The indices of the nodes for which the neighbors need to be computed. This entry is ignored if all_nodes is True and required if all_nodes is False

None
include_center bool

If True it includes the center in the computation otherwise it ignores it

True
precomputed bool

If False it precomputes the neighbhorhoods in adj, if False it skips the computation and reads it fromt the input

False
neighborhoods DataFrame

DataFrame of neighbhoord indices. Required if precomputed is True

None

Returns:

Type Description
DataFrame

with index, centers to be considered and columns the sum, mean, standard deviation and standard error of the mean of the values in that neighborhood.

See Also

[neighborhood_indices] (network_local.md#src.connalysis.network.local.neighborhood_indices): Function to precompute the neighborhood_indices that can be used if precomputed is set True. Precomputing the neighborhoods would increase efficiency if multiple properties are averaged across neighborhoods.

Source code in src/connalysis/network/stats.py
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
def node_stats_neighborhood(values, adj=None, pre=True, post=True, all_nodes=True, centers=None,
                            include_center=True, precomputed=False, neighborhoods=None):
    """ Get basic statistics of the property values on the neighbhood of the nodes in centers in the
    graph described by adj.
    Parameters
    ----------
    values : Series
        pandas Series with index the nodes of the NxN matrix of which the simplices are listed,
        and values the values on that node to be averaged.
    adj : sparse matrix or 2d array
        The adjacency matrix of the graph
    pre : bool
        If ``True`` compute the nodes mapping to the nodes in centers (the in-neighbors of the centers)
    post : bool
        If ``True`` compute the nodes that the centers map to (the out-neighbors of the centers)
    all_nodes : bool
        If ``True`` compute the neighbors of all nodes in adj, if ``False`` compute only the neighbors of the nodes
        listed in centers
    centers : 1d-array
        The indices of the nodes for which the neighbors need to be computed.  This entry is ignored if
        all_nodes is ``True`` and required if all_nodes is ``False``
    include_center : bool
        If ``True`` it includes the center in the computation otherwise it ignores it
    precomputed : bool
        If ``False`` it precomputes the neighbhorhoods in adj,
        if ``False`` it skips the computation and reads it fromt the input
    neighborhoods : DataFrame
        DataFrame of neighbhoord indices. Required if precomputed is ``True``

    Returns
    -------
    DataFrame
        with index, centers to be considered and columns the sum, mean, standard deviation and
        standard error of the mean of the values in that neighborhood.

    See Also
    --------
    [neighborhood_indices] (network_local.md#src.connalysis.network.local.neighborhood_indices):
    Function to precompute the neighborhood_indices that can be used if precomputed is set ``True``.
    Precomputing the neighborhoods would increase efficiency if multiple properties are averaged across neighborhoods.
    """

    # Single value functions for DataFrames
    def append_center(x):
        # To include center in the computation
        return np.append(x["center"], x["neighbors"])

    def mean_nbd(nbd_indices, v):
        df = v[nbd_indices]
        return [np.nansum(df), np.nanmean(df), np.nanstd(df), stats.sem(df, nan_policy="omit")]

    # Get neighborhoods
    if precomputed:
        assert isinstance(neighborhoods,
                          pd.Series), "If precomputed a Series of neighbhoords indexed by their center must be provided"
    else:
        assert (adj is not None), "If not precomputed and adjancecy matrix must be provided"
        neighborhoods = neighborhood_indices(adj, pre=pre, post=post, all_nodes=all_nodes, centers=centers)
    centers = neighborhoods.index
    if include_center:
        neighborhoods = neighborhoods.reset_index().apply(append_center, axis=1)
    else:
        neighborhoods = neighborhoods.reset_index(drop=True)
    stat_vals = pd.DataFrame.from_records(neighborhoods.map(lambda x: mean_nbd(x, values)),
                                          columns=["sum", "mean", "std", "sem"])
    stat_vals["center"] = centers
    return stat_vals.set_index("center")

node_stats_participation(participation, vals, condition=operator.eq, dims=None)

Get statistics of the values in vals across nodes filtered using node participation

Parameters:

Name Type Description Default
participation DataFrame

DataFrame of node participation with index the nodes in nodes of an NxN matrix to consider, columns are dimensions and values are node participation computed with node_participation.

required
values Series

pandas Series with index the nodes of the NxN matrix of where node participation has been computed and vals the values on that node to be averaged.

required
condition operator

operator with which to filter the nodes. The default operator.eq filters nodes such that their maximal dimension of node participation is a given value. Alternatively, operator.ge filters nodes such that their maximal dimension of node participation is at least a given value.

eq
dims iterable

dimensions for which to run the analysis, if None all the columns of participation will be analyzed

None

Returns:

Type Description
DataFrame

with index, the dimensions for which the analysis have been run and columns the statistics of the values in vals where the nodes have been grouped according to the condition given.

See Also

node_stats_per_position_single: A similar function where the position of the nodes in the simplex are taken into account. Note in particular that if condition = operator.ge the weighted_mean of this analyisis is equivalent than the value given by this function for position all. However the computation using node_participation is more efficient.

Source code in src/connalysis/network/stats.py
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
def node_stats_participation(participation, vals, condition=operator.eq, dims=None):
    """ Get statistics of the values in vals across nodes filtered using node participation
    Parameters
    ----------
    participation : DataFrame
        DataFrame of node participation with index the nodes in nodes of an NxN matrix to consider,
        columns are dimensions and values are node participation computed with
        [node_participation](network_topology.md#src.connalysis.network.topology.node_participation).
    values : Series
        pandas Series with index the nodes of the NxN matrix of where node participation has been computed
        and vals the values on that node to be averaged.
    condition : operator
        operator with which to filter the nodes. The default ``operator.eq`` filters nodes such that their maximal
        dimension of node participation is a given value.
        Alternatively, ``operator.ge`` filters nodes such that their maximal dimension of node participation is at least a given
        value.
    dims : iterable
        dimensions for which to run the analysis, if ``None`` all the columns of participation will be analyzed

    Returns
    -------
    DataFrame
        with index, the dimensions for which the analysis have been run and columns the statistics of the values in vals
        where the nodes have been grouped according to the condition given.

    See Also
    --------
    [node_stats_per_position_single](network_stats.md#src.connalysis.network.stats.node_stats_per_position_single):
    A similar function where the position of the nodes in the simplex are taken into account.  Note in particular that
    if condition = ``operator.ge`` the weighted_mean of this analyisis is equivalent than the value given by this function for position ``all``.
    However the computation using
    [node_participation](network_topology.md#src.connalysis.network.topology.node_participation)
    is more efficient.
    """
    par_df = participation.copy()
    if dims is None:
        dims = par_df.columns
    vals = vals.rename("values")
    par_df["max_dim"] = (par_df > 0).sum(axis=1) - 1  # maximal dimension a node is part of
    stats_vals = {}
    for dim in dims:
        mask = condition(par_df.max_dim, dim)
        c = pd.DataFrame(vals.loc[par_df[mask].index], columns=["values"])
        c["weight"] = par_df[mask][dim]
        w_mean = c.apply(np.product, axis=1).sum() / (c["weight"].sum())
        stats_vals[dim] = (c.shape[0],  # Number of nodes fulfilling the condition
                           np.nanmean(c["values"]),
                           np.nanstd(c["values"]),
                           stats.sem(c["values"], nan_policy="omit"),
                           w_mean  # mean weighted by participation
                           )
    stats_vals = pd.DataFrame.from_dict(stats_vals, orient="index",
                                        columns=["counts", "mean", "std", "sem", "weighted_mean"])
    stats_vals.index.name = "dim"
    return stats_vals

node_stats_per_position(simplex_lists, values, dims=None, with_multiplicity=True)

Get across dimensions mean, standard deviation and standard error of the mean averaged across simplex lists and filtered per position

Parameters:

Name Type Description Default
simplex_lists dict

keys : are int values representing dimensions values : for key k array of dimension (no. of simplices, k) listing simplices to be considered. Each row corresponds to a list of nodes on a simplex indexed by the order of the nodes in an NxN matrix. All entries must be an index in values

required
values Series

pandas Series with index the nodes of the NxN matrix of which the simplices are listed, and values the values on that node to be averaged.

required
with_multiplicity bool

if True the values are averaged with multiplicity i.e., they are weighted by the number of times a node participates in a simplex in a given position if False repetitions of a node in a given position are ignored.

True
dims iterable

dimensions for which to run the analysis, if None all the keys of simplex lists will be analyzed

None

Returns:

Type Description
dict

keys the dimensions anlayzed and values for key k a DataFrame with index, the possible positions of o node in a k-simplex and columns the mean, standard deviation and standard error of the mean for that position.

Source code in src/connalysis/network/stats.py
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
def node_stats_per_position(simplex_lists, values, dims=None, with_multiplicity=True):
    """ Get across dimensions mean, standard deviation and standard error of the mean averaged across simplex lists
    and filtered per position
    Parameters
    ----------
    simplex_lists : dict
        keys : are int values representing dimensions
        values : for key ``k`` array of dimension (no. of simplices, ``k``) listing simplices to be considered.
        Each row corresponds to a list of nodes on a simplex indexed by the order of the nodes in an NxN matrix.
        All entries must be an index in values
    values : Series
        pandas Series with index the nodes of the NxN matrix of which the simplices are listed,
        and values the values on that node to be averaged.
    with_multiplicity : bool
        if ``True`` the values are averaged with multiplicity i.e., they are weighted by the number of times a node
        participates in a simplex in a given position
        if ``False`` repetitions of a node in a given position are ignored.
    dims : iterable
        dimensions for which to run the analysis, if ``None`` all the keys of simplex lists will be analyzed

    Returns
    -------
    dict
        keys the dimensions anlayzed and values for key ``k`` a DataFrame
        with index, the possible positions of o node in a ``k``-simplex and columns the mean, standard deviation and
        standard error of the mean for that position.
    """
    if dims is None:
        dims = simplex_lists.index
    stats_dict = {}
    for dim in tqdm(dims):
        sl = simplex_lists.loc[dim]
        stats_dict[dim] = node_stats_per_position_single(sl, values, with_multiplicity=with_multiplicity)
    return stats_dict

node_stats_per_position_single(simplex_list, values, with_multiplicity=True)

Get mean, standard deviation and standard error of the mean averaged across simplex lists and filtered per position

Parameters:

Name Type Description Default
simplex_list 2d-array

Array of dimension (no. of simplices, dimension) listing simplices to be considered. Each row corresponds to a list of nodes on a simplex indexed by the order of the nodes in an NxN matrix. All entries must be an index in values

required
values Series

pandas Series with index the nodes of the NxN matrix of which the simplices are listed, and values the values on that node to be averaged.

required
with_multiplicity bool

if True the values are averaged with multiplicity i.e., they are weighted by the number of times a node participates in a simplex in a given position if False repetitions of a node in a given position are ignored.

True

Returns:

Type Description
DataFrame

with index, the possible positions of o node in a k-simplex and columns the mean, standard deviation and standard error of the mean for that position

Source code in src/connalysis/network/stats.py
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
def node_stats_per_position_single(simplex_list, values, with_multiplicity=True):
    """ Get mean, standard deviation and standard error of the mean averaged across simplex lists and filtered per position
    Parameters
    ----------
    simplex_list : 2d-array
        Array of dimension (no. of simplices, dimension) listing simplices to be considered.
        Each row corresponds to a list of nodes on a simplex indexed by the order of the nodes in an NxN matrix.
        All entries must be an index in values
    values : Series
        pandas Series with index the nodes of the NxN matrix of which the simplices are listed,
        and values the values on that node to be averaged.
    with_multiplicity : bool
        if ``True`` the values are averaged with multiplicity i.e., they are weighted by the number of times a node
        participates in a simplex in a given position
        if ``False`` repetitions of a node in a given position are ignored.

    Returns
    -------
    DataFrame
        with index, the possible positions of o node in a ``k``-simplex and columns the mean, standard deviation and
        standard error of the mean for that position
    """
    # Filter values
    if with_multiplicity:
        vals_sl = values.loc[simplex_list.flatten()].to_numpy().reshape(simplex_list.shape)
    else:
        vals_sl = pd.concat([values.loc[np.unique(simplex_list[:, pos])] for pos in range(simplex_list.shape[1])],
                            axis=1, keys=range(simplex_list.shape[1]))
    # Compute stats
    stats_vals = pd.DataFrame(index=pd.Index(range(simplex_list.shape[1]), name="position"))
    # Stats per position
    stats_vals["mean"] = np.nanmean(vals_sl, axis=0)
    stats_vals["std"] = np.nanstd(vals_sl, axis=0)
    stats_vals["sem"] = stats.sem(vals_sl, axis=0, nan_policy="omit")
    # Stats in any position
    stats_vals.loc["all", "mean"] = np.nanmean(vals_sl)
    stats_vals.loc["all", "std"] = np.nanstd(vals_sl)
    stats_vals.loc["all", "sem"] = stats.sem(vals_sl, axis=None, nan_policy="omit")
    return stats_vals