deeprob.utils package

Submodules

deeprob.utils.data module

class deeprob.utils.data.DataTransform[source]

Bases: ABC

Abstract data transformation.

abstract fit(data)[source]

Fit the data transform with some data.

Parameters: data (ndarray) – The data for fitting.

abstract forward(data)[source]

Apply the data transform to some data.

Parameters: data (ndarray) – The data to transform.
Returns: The transformed data.
Return type: ndarray

abstract backward(data)[source]

Apply the backward data transform to some data.

Parameters: data (ndarray) – The data to transform.
Returns: The transformed data.
Return type: ndarray

class deeprob.utils.data.DataFlatten[source]

Bases: DataTransform

Build the data flatten transformation.

fit(data)[source]

Fit the data transform with some data.

Parameters: data (ndarray) – The data for fitting.

forward(data)[source]

Apply the data transform to some data.

Parameters: data (ndarray) – The data to transform.
Returns: The transformed data.
Return type: ndarray

backward(data)[source]

Apply the backward data transform to some data.

Parameters: data (ndarray) – The data to transform.
Returns: The transformed data.
Return type: ndarray

class deeprob.utils.data.DataNormalizer(interval=None, clip=False, dtype=<class 'numpy.float32'>)[source]

Bases: DataTransform

Build the data normalizer transformation.

Parameters

interval (Optional[Tuple[float, float]]) – The normalizing interval. If None data will be normalized in [0, 1].
clip (bool) – Whether to clip data if out of interval.
dtype – The type for type conversion.

Raises

ValueError – If the normalizing interval is out of domain.

fit(data)[source]

Fit the data transform with some data.

Parameters: data (ndarray) – The data for fitting.

forward(data)[source]

Apply the data transform to some data.

Parameters: data (ndarray) – The data to transform.
Returns: The transformed data.
Return type: ndarray

backward(data)[source]

Apply the backward data transform to some data.

Parameters: data (ndarray) – The data to transform.
Returns: The transformed data.
Return type: ndarray

class deeprob.utils.data.DataStandardizer(sample_wise=True, eps=1e-07, dtype=<class 'numpy.float32'>)[source]

Bases: DataTransform

Build the data standardizer transformation.

Parameters

sample_wise (bool) – Whether to apply sample wise standardization.
eps (float) – The epsilon value for standardization.
dtype – The type for type conversion.

Raises

ValueError – If the epsilon value is out of domain.

fit(data)[source]

Fit the data transform with some data.

Parameters: data (ndarray) – The data for fitting.

forward(data)[source]

Apply the data transform to some data.

Parameters: data (ndarray) – The data to transform.
Returns: The transformed data.
Return type: ndarray

backward(data)[source]

Apply the backward data transform to some data.

Parameters: data (ndarray) – The data to transform.
Returns: The transformed data.
Return type: ndarray

deeprob.utils.data.ohe_data(data, domain)[source]

One-Hot-Encoding function.

Parameters

data (ndarray) – The 1D data to encode.
domain (Union[List[int], ndarray]) – The domain to use.

Returns

The One Hot encoded data.

Return type

ndarray

deeprob.utils.data.mixed_ohe_data(data, domains)[source]

One-Hot-Encoding function, applied on mixed data (both continuous and non-binary discrete). Note that One-Hot-Encoding is applied only on categorical random variables having more than two values.

Parameters

data (ndarray) – The data matrix to encode.
domains (List[Union[list, tuple]]) – The domains to use.

Returns

The One Hot encoded data.

Raises

ValueError – If there are inconsistencies between the data and domains.

Return type

ndarray

deeprob.utils.data.ecdf_data(data)[source]

Empirical Cumulative Distribution Function (ECDF).

Parameters: data (ndarray) – The data.
Returns: The result of the ECDF on data.
Return type: ndarray

deeprob.utils.data.check_data_dtype(data, dtype=<class 'numpy.float32'>)[source]

Check whether the data is compatible with a given dtype (defaults to np.float32). If the data dtype is not compatible, then cast it.

Parameters

data (ndarray) – The data.
dtype (Type[dtype]) – The desidered dtype compatibility (defaults to np.float32).

Returns

The casted data if necessary, otherwise returns data itself.

deeprob.utils.graph module

class deeprob.utils.graph.TreeNode(node_id, parent=None)[source]

Bases: object

Initialize a binary CLT.

Parameters

node_id (int) – The ID of the node.
parent (TreeNode) – The parent node.

get_id()[source]

Get the ID of the node.

Returns: The ID of the node.
Return type: int

get_parent()[source]

Get the parent node.

Returns: The parent node, None if the node has no parent.
Return type: TreeNode

get_children()[source]

Get the children list of the node.

Returns: The children list of the node.
Return type: List[TreeNode]

set_parent(parent)[source]

Set the parent node and update its children list.

Parameters: parent (TreeNode) – The parent node.

is_leaf()[source]

Check whether the node is leaf.

Returns: True if the node is leaf, False otherwise.
Return type: bool

get_n_nodes()[source]

Get the number of the nodes of the tree rooted at self.

Returns: The number of nodes of the tree rooted at self.
Return type: int

get_tree_scope()[source]

Return the list of predecessors and the related scope of the tree rooted at self. Note that tree[root] must be -1, as it doesn’t have a predecessor.

Return tree: List of predecessors.
Return scope: The related scope list.
Return type: Tuple[list, list]

deeprob.utils.graph.build_tree_structure(tree, scope=None)[source]

Build a Tree node recursive data structure given a tree structure encoded as a list of predecessors. Note that tree[root] must be -1, as it doesn’t have a predecessor. Optionally, a scope can be used to specify the tree node ids.

Parameters

tree (Union[List[int], ndarray]) – The tree structure, as a sequence of predecessors.
scope (Optional[List[int]]) – An optional scope, as a list of ids.

Returns

The Tree node structure’s root.

Raises

ValueError – If the tree structure is not compatible with the root node.
ValueError – If the scope contains duplicates.
ValueError – If the scope is incompatible with the tree structure.

Return type

TreeNode

deeprob.utils.graph.compute_bfs_ordering(tree)[source]

Compute the breadth-first-search variable ordering given a tree structure. Note that tree[root] must be -1, as it doesn’t have a predecessor.

Parameters: tree (Union[List[int], ndarray]) – The tree structure, as a sequence of predecessors.
Returns: The BFS variable ordering as a Numpy array.
Return type: Union[List[int], ndarray]

deeprob.utils.graph.maximum_spanning_tree(root, adj_matrix)[source]

Compute the maximum spanning tree of a graph starting from a given root node.

Parameters

root (int) – The root node index.
adj_matrix (ndarray) – The graph’s adjacency matrix.

Returns

The breadth first traversal ordering and the maximum spanning tree. The maximum spanning tree is given as a list of predecessors.

Return type

Tuple[ndarray, ndarray]

deeprob.utils.random module

deeprob.utils.random.RandomState

A random state type is either an integer seed value or a Numpy RandomState instance.

alias of Union[int, RandomState]

deeprob.utils.random.check_random_state(random_state=None)[source]

Check a possible input random state and return it as a Numpy’s RandomState object.

Parameters: random_state (Optional[Union[int, RandomState]]) – The random state to check. If None a new Numpy RandomState will be returned. If not None, it can be either a seed integer or a np.random.RandomState instance. In the latter case, itself will be returned.
Returns: A Numpy’s RandomState object.
Raises: ValueError – If the random state is not None or a seed integer or a Numpy RandomState object.
Return type: RandomState

deeprob.utils.region module

class deeprob.utils.region.RegionGraph(n_features, depth, random_state=None)[source]

Bases: object

Initialize a region graph.

A region graph is defined w.r.t. a set of indices of random variable in a SPN. A region R is defined as a non-empty subset of the indices, and represented as sorted tuples with unique entries. A partition P of a region R is defined as a collection of non-empty sets, which are non-overlapping, and whose union is R. R is also called parent region of P. Any region C such that C is in partition P is called child region of P. So, a region is represented as a sorted tuple of integers (unique elements) and a partition is represented as a sorted tuple of regions (non-overlapping, not-empty, at least 2). A region graph is an acyclic, directed, bi-partite graph over regions and partitions. So, any child of a region R is a partition of R, and any child of a partition is a child region of the partition. The root of the region graph is a sorted tuple composed of all the elements. The leaves of the region graph must also be regions. They are called input regions, or leaf regions. Given a region graph, we can easily construct a corresponding SPN: 1) Associate I distributions to each input region. 2) Associate K sum nodes to each other (non-input) region. 3) For each partition P in the region graph, take all cross-products (as product nodes) of distributions/sum nodes associated with the child regions. Connect these products as children of all sum nodes in the parent region of P. In the end, this procedure will always deliver a complete and decomposable SPN.

Parameters

n_features (int) – The number of features.
depth (int) – The maximum depth.
random_state (Optional[Union[int, RandomState]]) – The random state. It can be either None, a seed integer or a Numpy RandomState.

Raises

ValueError – If a parameter is out of domain.

random_layers()[source]

Generate a list of layers randomly over a single repetition of features.

Returns: A list of layers, alternating between regions and partitions.
Return type: List[List[tuple]]

make_layers(n_repetitions=1)[source]

Generate a random graph’s layers over multiple repetitions of features.

Parameters: n_repetitions (int) – The number of repetitions.
Returns: A list of layers, alternating between regions and partitions.
Raises: ValueError – If a parameter is out of domain.
Return type: List[List[tuple]]

deeprob.utils.statistics module

deeprob.utils.statistics.compute_mean_quantiles(data, n_quantiles)[source]

Compute the mean quantiles of a dataset (Poon-Domingos).

Parameters

data (ndarray) – The data.
n_quantiles (int) – The number of quantiles.

Returns

The mean quantiles.

Raises

ValueError – If the number of quantiles is not valid.

Return type

ndarray

deeprob.utils.statistics.compute_mutual_information(priors, joints)[source]

Compute the mutual information between each features, given priors and joints distributions.

Parameters

priors (ndarray) – The priors probability distributions, as a (N, D) Numpy array having priors[i, k] = P(X_i=k).
joints (ndarray) – The joints probability distributions, as a (N, N, D, D) Numpy array having joints[i, j, k, l] = P(X_i=k, X_j=l).

Returns

The mutual information between each pair of features, as a (N, N) Numpy symmetric matrix.

Raises

ValueError – If there are inconsistencies between priors and joints arrays.
ValueError – If joints array is not symmetric.
ValueError – If priors or joints arrays don’t encode valid probability distributions.

Return type

ndarray

deeprob.utils.statistics.estimate_priors_joints(data, alpha=0.1)[source]

Estimate both priors and joints probability distributions from binary data.

This function returns both the prior distributions and the joint distributions. Note that priors[i, k] = P(X_i=k) and joints[i, j, k, l] = P(X_i=k, X_j=l).

Parameters

data (ndarray) – The binary data matrix.
alpha (float) – The Laplace smoothing factor.

Returns

A pair of priors and joints distributions.

Raises

ValueError – If the Laplace smoothing factor is out of domain.

Return type

Tuple[ndarray, ndarray]

deeprob.utils.statistics.compute_gini(probs)[source]

Computes the Gini index given some probabilities.

Parameters: probs (ndarray) – The probabilities.
Returns: The Gini index.
Raises: ValueError – If the probabilities doesn’t sum up to one.
Return type: float

deeprob.utils.statistics.compute_bpp(avg_ll, shape)[source]

Compute the average number of bits per pixel (BPP).

Parameters

avg_ll (float) – The average log-likelihood, expressed in nats.
shape (Union[int, tuple, list]) – The number of dimensions or, alternatively, a sequence of dimensions.

Returns

The average number of bits per pixel.

deeprob.utils.statistics.compute_fid(mean1, cov1, mean2, cov2, blocksize=64, eps=1e-06)[source]

Computes the Frechet Inception Distance (FID) between two multivariate Gaussian distributions. This implementation has been readapted from https://github.com/mseitzer/pytorch-fid.

Parameters

mean1 (ndarray) – The mean of the first multivariate Gaussian.
cov1 (ndarray) – The covariance of the first multivariate Gaussian.
mean2 (ndarray) – The mean of the second multivariate Gaussian.
cov2 (ndarray) – The covariance of the second multivariate Gaussian.
blocksize (int) – The block size used by the matrix square root algorithm.
eps (float) – Epsilon value used to avoid singular matrices.

Returns

The FID score.

Raises

ValueError – If there is a shape mismatch between input arrays.

Return type

float

deeprob.utils.statistics.compute_prior_counts(data)[source]

Compute the counts of the values of an RV given the data.

Parameters: data (ndarray) – The binary data matrix.
Returns: The counts.

deeprob.utils.statistics.compute_joint_counts(data)[source]

Compute the counts of the configurations of an RV and its parent given the data.

Parameters: data (ndarray) – The binary data matrix.
Returns: The counts.

deeprob.utils package

Submodules

deeprob.utils.data module

deeprob.utils.graph module

deeprob.utils.random module

deeprob.utils.region module

deeprob.utils.statistics module

Module contents