deeprob.utils package
Submodules
deeprob.utils.data module
- class deeprob.utils.data.DataTransform[source]
Bases:
ABC
Abstract data transformation.
- abstract fit(data)[source]
Fit the data transform with some data.
- Parameters
data (ndarray) – The data for fitting.
- class deeprob.utils.data.DataFlatten[source]
Bases:
DataTransform
Build the data flatten transformation.
- fit(data)[source]
Fit the data transform with some data.
- Parameters
data (ndarray) – The data for fitting.
- class deeprob.utils.data.DataNormalizer(interval=None, clip=False, dtype=<class 'numpy.float32'>)[source]
Bases:
DataTransform
Build the data normalizer transformation.
- Parameters
- Raises
ValueError – If the normalizing interval is out of domain.
- fit(data)[source]
Fit the data transform with some data.
- Parameters
data (ndarray) – The data for fitting.
- class deeprob.utils.data.DataStandardizer(sample_wise=True, eps=1e-07, dtype=<class 'numpy.float32'>)[source]
Bases:
DataTransform
Build the data standardizer transformation.
- Parameters
- Raises
ValueError – If the epsilon value is out of domain.
- fit(data)[source]
Fit the data transform with some data.
- Parameters
data (ndarray) – The data for fitting.
- deeprob.utils.data.mixed_ohe_data(data, domains)[source]
One-Hot-Encoding function, applied on mixed data (both continuous and non-binary discrete). Note that One-Hot-Encoding is applied only on categorical random variables having more than two values.
deeprob.utils.graph module
- class deeprob.utils.graph.TreeNode(node_id, parent=None)[source]
Bases:
object
Initialize a binary CLT.
- get_parent()[source]
Get the parent node.
- Returns
The parent node, None if the node has no parent.
- Return type
- set_parent(parent)[source]
Set the parent node and update its children list.
- Parameters
parent (TreeNode) – The parent node.
- is_leaf()[source]
Check whether the node is leaf.
- Returns
True if the node is leaf, False otherwise.
- Return type
- deeprob.utils.graph.build_tree_structure(tree, scope=None)[source]
Build a Tree node recursive data structure given a tree structure encoded as a list of predecessors. Note that tree[root] must be -1, as it doesn’t have a predecessor. Optionally, a scope can be used to specify the tree node ids.
- Parameters
- Returns
The Tree node structure’s root.
- Raises
ValueError – If the tree structure is not compatible with the root node.
ValueError – If the scope contains duplicates.
ValueError – If the scope is incompatible with the tree structure.
- Return type
- deeprob.utils.graph.compute_bfs_ordering(tree)[source]
Compute the breadth-first-search variable ordering given a tree structure. Note that tree[root] must be -1, as it doesn’t have a predecessor.
deeprob.utils.random module
- deeprob.utils.random.RandomState
A random state type is either an integer seed value or a Numpy RandomState instance.
alias of
Union
[int
,RandomState
]
- deeprob.utils.random.check_random_state(random_state=None)[source]
Check a possible input random state and return it as a Numpy’s RandomState object.
- Parameters
random_state (Optional[Union[int, RandomState]]) – The random state to check. If None a new Numpy RandomState will be returned. If not None, it can be either a seed integer or a np.random.RandomState instance. In the latter case, itself will be returned.
- Returns
A Numpy’s RandomState object.
- Raises
ValueError – If the random state is not None or a seed integer or a Numpy RandomState object.
- Return type
deeprob.utils.region module
- class deeprob.utils.region.RegionGraph(n_features, depth, random_state=None)[source]
Bases:
object
Initialize a region graph.
A region graph is defined w.r.t. a set of indices of random variable in a SPN. A region R is defined as a non-empty subset of the indices, and represented as sorted tuples with unique entries. A partition P of a region R is defined as a collection of non-empty sets, which are non-overlapping, and whose union is R. R is also called parent region of P. Any region C such that C is in partition P is called child region of P. So, a region is represented as a sorted tuple of integers (unique elements) and a partition is represented as a sorted tuple of regions (non-overlapping, not-empty, at least 2). A region graph is an acyclic, directed, bi-partite graph over regions and partitions. So, any child of a region R is a partition of R, and any child of a partition is a child region of the partition. The root of the region graph is a sorted tuple composed of all the elements. The leaves of the region graph must also be regions. They are called input regions, or leaf regions. Given a region graph, we can easily construct a corresponding SPN: 1) Associate I distributions to each input region. 2) Associate K sum nodes to each other (non-input) region. 3) For each partition P in the region graph, take all cross-products (as product nodes) of distributions/sum nodes associated with the child regions. Connect these products as children of all sum nodes in the parent region of P. In the end, this procedure will always deliver a complete and decomposable SPN.
- Parameters
- Raises
ValueError – If a parameter is out of domain.
- make_layers(n_repetitions=1)[source]
Generate a random graph’s layers over multiple repetitions of features.
- Parameters
n_repetitions (int) – The number of repetitions.
- Returns
A list of layers, alternating between regions and partitions.
- Raises
ValueError – If a parameter is out of domain.
- Return type
deeprob.utils.statistics module
- deeprob.utils.statistics.compute_mean_quantiles(data, n_quantiles)[source]
Compute the mean quantiles of a dataset (Poon-Domingos).
- Parameters
- Returns
The mean quantiles.
- Raises
ValueError – If the number of quantiles is not valid.
- Return type
- deeprob.utils.statistics.compute_mutual_information(priors, joints)[source]
Compute the mutual information between each features, given priors and joints distributions.
- Parameters
- Returns
The mutual information between each pair of features, as a (N, N) Numpy symmetric matrix.
- Raises
ValueError – If there are inconsistencies between priors and joints arrays.
ValueError – If joints array is not symmetric.
ValueError – If priors or joints arrays don’t encode valid probability distributions.
- Return type
- deeprob.utils.statistics.estimate_priors_joints(data, alpha=0.1)[source]
Estimate both priors and joints probability distributions from binary data.
This function returns both the prior distributions and the joint distributions. Note that priors[i, k] = P(X_i=k) and joints[i, j, k, l] = P(X_i=k, X_j=l).
- deeprob.utils.statistics.compute_gini(probs)[source]
Computes the Gini index given some probabilities.
- Parameters
probs (ndarray) – The probabilities.
- Returns
The Gini index.
- Raises
ValueError – If the probabilities doesn’t sum up to one.
- Return type
- deeprob.utils.statistics.compute_bpp(avg_ll, shape)[source]
Compute the average number of bits per pixel (BPP).
- deeprob.utils.statistics.compute_fid(mean1, cov1, mean2, cov2, blocksize=64, eps=1e-06)[source]
Computes the Frechet Inception Distance (FID) between two multivariate Gaussian distributions. This implementation has been readapted from https://github.com/mseitzer/pytorch-fid.
- Parameters
mean1 (ndarray) – The mean of the first multivariate Gaussian.
cov1 (ndarray) – The covariance of the first multivariate Gaussian.
mean2 (ndarray) – The mean of the second multivariate Gaussian.
cov2 (ndarray) – The covariance of the second multivariate Gaussian.
blocksize (int) – The block size used by the matrix square root algorithm.
eps (float) – Epsilon value used to avoid singular matrices.
- Returns
The FID score.
- Raises
ValueError – If there is a shape mismatch between input arrays.
- Return type