deeprob.spn.learning package
Subpackages
- deeprob.spn.learning.splitting package
- Submodules
- deeprob.spn.learning.splitting.cluster module
- deeprob.spn.learning.splitting.cols module
- deeprob.spn.learning.splitting.entropy module
- deeprob.spn.learning.splitting.gini module
- deeprob.spn.learning.splitting.gvs module
- deeprob.spn.learning.splitting.random module
- deeprob.spn.learning.splitting.rdc module
- deeprob.spn.learning.splitting.rows module
- Module contents
Submodules
deeprob.spn.learning.em module
- deeprob.spn.learning.em.expectation_maximization(root, data, num_iter=100, batch_perc=0.1, step_size=0.5, random_init=True, random_state=None, verbose=True)[source]
Learn the parameters of a SPN by batch Expectation-Maximization (EM). See https://arxiv.org/abs/1604.07243 and https://arxiv.org/abs/2004.06231 for details.
- Parameters
root (Node) – The spn structure.
data (ndarray) – The data to use to learn the parameters.
num_iter (int) – The number of iterations.
batch_perc (float) – The percentage of data to use for each step.
step_size (float) – The step size for batch EM.
random_init (bool) – Whether to random initialize the weights of the SPN.
random_state (Optional[Union[int, RandomState]]) – The random state. It can be either None, a seed integer or a Numpy RandomState.
verbose (bool) – Whether to enable verbose learning.
- Returns
The spn with learned parameters.
- Raises
ValueError – If a parameter is out of domain.
- Return type
deeprob.spn.learning.leaf module
- deeprob.spn.learning.leaf.LearnLeafFunc
A signature for a learn SPN leaf function.
alias of
Callable[[ndarray,List[Type[Leaf]],List[Union[list,tuple]],List[int],Any],Node]
- deeprob.spn.learning.leaf.learn_mle(data, distributions, domains, scope, alpha=0.1, random_state=None)[source]
Learn a leaf using Maximum Likelihood Estimate (MLE). If the data is multivariate, a naive factorized model is learned.
- Parameters
data (ndarray) – The data, where each column correspond to a random variable.
distributions (List[Type[Leaf]]) – The distributions of the random variables.
domains (List[Union[list, tuple]]) – The domains of the random variables.
alpha (float) – Laplace smoothing factor.
random_state (Optional[Union[int, RandomState]]) – The random state. It can be None.
- Returns
A leaf distribution.
- Raises
ValueError – If there are inconsistencies between the data, distributions and domains.
- Return type
- deeprob.spn.learning.leaf.learn_isotonic(data, distributions, domains, scope, alpha=0.1, random_state=None)[source]
Learn a leaf using Isotonic method. If the data is multivariate, a naive factorized model is learned.
- Parameters
data (ndarray) – The data.
distributions (List[Type[Leaf]]) – The distribution of the random variables.
domains (List[Union[list, tuple]]) – The domain of the random variables.
alpha (float) – Laplace smoothing factor.
random_state (Optional[Union[int, RandomState]]) – The random sate. It can be None.
- Returns
A leaf distribution.
- Raises
ValueError – If there are inconsistencies between the data, distributions and domains.
- Return type
- deeprob.spn.learning.leaf.learn_binary_clt(data, distributions, domains, scope, to_pc=False, alpha=0.1, random_state=None)[source]
Learn a leaf using a Binary Chow-Liu Tree (CLT). If the data is univariate, a Maximum Likelihood Estimate (MLE) leaf is returned.
- Parameters
data (ndarray) – The data.
distributions (List[Type[Leaf]]) – The distributions of the random variables.
domains (List[Union[list, tuple]]) – The domains of the random variables.
to_pc (bool) – Whether to convert the CLT into an equivalent PC.
alpha (float) – Laplace smoothing factor.
random_state (Optional[Union[int, RandomState]]) – The random state. It can be None.
- Returns
A leaf distribution.
- Raises
ValueError – If there are inconsistencies between the data, distributions and domains.
ValueError – If the data doesn’t follow a Bernoulli distribution.
- Return type
- deeprob.spn.learning.leaf.learn_naive_factorization(data, distributions, domains, scope, learn_leaf_func, **learn_leaf_kwargs)[source]
Learn a leaf as a naive factorized model.
- Parameters
data (ndarray) – The data.
distributions (List[Type[Leaf]]) – The distribution of the random variables.
domains (List[Union[list, tuple]]) – The domain of the random variables.
learn_leaf_func (Callable[[ndarray, List[Type[Leaf]], List[Union[list, tuple]], List[int], Any], Node]) – The function to use to learn the sub-distributions parameters.
learn_leaf_kwargs – Additional parameters for learn_leaf_func.
- Returns
A naive factorized model.
- Raises
ValueError – If there are inconsistencies between the data, distributions and domains.
- Return type
deeprob.spn.learning.learnspn module
- class deeprob.spn.learning.learnspn.OperationKind(value)[source]
Bases:
EnumOperation kind used by LearnSPN algorithm.
- REM_FEATURES = 1
- CREATE_LEAF = 2
- SPLIT_NAIVE = 3
- SPLIT_ROWS = 4
- SPLIT_COLS = 5
- class deeprob.spn.learning.learnspn.Task(parent, data, scope, no_cols_split=False, no_rows_split=False, is_first=False)[source]
Bases:
tupleCreate new instance of Task(parent, data, scope, no_cols_split, no_rows_split, is_first)
- Parameters
- deeprob.spn.learning.learnspn.learn_spn(data, distributions, domains, learn_leaf='mle', split_rows='kmeans', split_cols='rdc', learn_leaf_kwargs=None, split_rows_kwargs=None, split_cols_kwargs=None, min_rows_slice=256, min_cols_slice=2, random_state=None, verbose=True)[source]
Learn the structure and parameters of a SPN given some training data and several hyperparameters.
- Parameters
data (ndarray) – The training data.
distributions (List[Type[Leaf]]) – A list of distribution classes (one for each feature).
domains (List[Union[list, tuple]]) – A list of domains (one for each feature). Each domain is either a list of values, for discrete distributions, or a tuple (consisting of min value and max value), for continuous distributions.
learn_leaf (Union[str, Callable[[ndarray, List[Type[Leaf]], List[Union[list, tuple]], List[int], Any], Node]]) – The method to use to learn a distribution leaf node, It can be either ‘mle’, ‘isotonic’, ‘binary-clt’ or a custom LearnLeafFunc.
split_rows (Union[str, Callable[[ndarray, List[Type[Leaf]], List[Union[list, tuple]], RandomState, Any], ndarray]]) – The rows splitting method. It can be either ‘kmeans’, ‘gmm’, ‘rdc’, ‘random’ or a custom SplitRowsFunc function.
split_cols (Union[str, Callable[[ndarray, List[Type[Leaf]], List[Union[list, tuple]], RandomState, Any], ndarray]]) – The columns splitting method. It can be either ‘gvs’, ‘rgvs’, ‘wrgvs’, ‘ebvs’, ‘ebvs_ae’, ‘gbvs’, ‘gbvs_ag’, ‘rdc’, ‘random’ or a custom SplitColsFunc function.
learn_leaf_kwargs (Optional[dict]) – The parameters of the learn leaf method.
split_rows_kwargs (Optional[dict]) – The parameters of the rows splitting method.
split_cols_kwargs (Optional[dict]) – The parameters of the cols splitting method.
min_rows_slice (int) – The minimum number of samples required to split horizontally.
min_cols_slice (int) – The minimum number of features required to split vertically.
random_state (Optional[Union[int, RandomState]]) – The random state. It can be either None, a seed integer or a Numpy RandomState.
verbose (bool) – Whether to enable verbose mode.
- Returns
A learned valid SPN.
- Raises
ValueError – If a parameter is out of scope.
- Return type
deeprob.spn.learning.wrappers module
- deeprob.spn.learning.wrappers.learn_estimator(data, distributions, domains=None, method='learnspn', **kwargs)[source]
Learn a SPN density estimator given some training data, the features distributions and domains.
- Parameters
data (ndarray) – The training data.
distributions (List[Type[Leaf]]) – A list of distribution classes (one for each feature).
domains (Optional[List[Union[list, tuple]]]) – A list of domains (one for each feature). Each domain is either a list of values, for discrete distributions, or a tuple (consisting of min value and max value), for continuous distributions. If None, domains are determined automatically.
method (str) – The method used for structure learning. It can be either ‘learnspn’, ‘xpc’ or ‘ensemble-xpc’.
kwargs – Additional parameters for structure learning.
- Returns
A learned valid and optimized SPN.
- Raises
ValueError – If the method used for structure learning is not known.
ValueError – If the method is ‘xpc’ or ‘ensemble-xpc’ but the variable domains are not binary.
- Return type
- deeprob.spn.learning.wrappers.learn_classifier(data, distributions, domains=None, class_idx=- 1, verbose=True, **kwargs)[source]
Learn a SPN classifier given some training data, the features distributions and domains and the class index in the training data.
- Parameters
data (ndarray) – The training data.
distributions (List[Type[Leaf]]) – A list of distribution classes (one for each feature).
domains (Optional[List[Union[list, tuple]]]) – A list of domains (one for each feature). Each domain is either a list of values, for discrete distributions, or a tuple (consisting of min value and max value), for continuous distributions. If None, domains are determined automatically.
class_idx (int) – The index of the class feature in the training data.
verbose (bool) – Whether to enable verbose mode.
kwargs – Other parameters for structure learning.
- Returns
A learned valid and optimized SPN.
- Return type
- deeprob.spn.learning.wrappers.compute_data_domains(data, distributions)[source]
Compute the domains based on the training data and the features distributions.
- Parameters
- Returns
A list of domains. Each domain is either a list of values, for discrete distributions, or a tuple (consisting of min value and max value), for continuous distributions.
- Raises
ValueError – If an unknown distribution type is found.
- Return type
deeprob.spn.learning.xpc module
- deeprob.spn.learning.xpc.build_disjunction(data, scope, assignments=None, alpha=0.01)[source]
Build a disjunction (sum node) of conjunctions (product nodes). If assignments are given, every conjunction is associated to a specific assignment (the number of conjunctions is the same as the given assignments); otherwise, every conjunction will be associated to a specific assignment occurring in the input data (the number of conjunctions is the same as the unique assignments occurring in the data).
- deeprob.spn.learning.xpc.build_leaf(data, part, use_clt, trees_dict, det, alpha)[source]
Build a multivariate leaf distribution for an XPC.
- Parameters
data (ndarray) – The input data matrix.
part (Partition) – The partition associated to the leaf to build.
use_clt (bool) – True if it is possible to use CLTrees as leaf nodes, False otherwise.
trees_dict (dict) – A dictionary of trees (see the function build_trees_dict).
det (bool) – True to force determinism, False otherwise.
alpha (float) – Laplace smoothing factor.
- Return type
- deeprob.spn.learning.xpc.greedy_vars_ordering(data, conj_len, alpha=0.01)[source]
Return the ordering of the random variables according to the implemented heuristic.
- deeprob.spn.learning.xpc.build_trees_dict(data, cl_parts_l, conj_vars_l, alpha, random_state)[source]
- Return a dictionary where:
a key refers to a scope length
a value is a list of two lists: the first is a list of predecessors, the second its scope.
- Parameters
data (ndarray) – The input data matrix.
cl_parts_l (list) – List of lists. Every sublist is associated to a specific XPC and contains the leaf partitions over which a CLTree will be learnt.
conj_vars_l (list) – List of lists. Every sublist contains the variables of a conjunction (e.g. [[3, 5]]). If a sublist occurs before another, then the former has been used first. There are no duplicates.
alpha (float) – Laplace smoothing factor.
random_state (RandomState) – The random state.
- Return tree_dict
The dictionary.
- Return type
- deeprob.spn.learning.xpc.build_xpc(data, part_root, trees_dict, det, use_clt, alpha)[source]
Build the XPC induced by the partitions tree in a bottom up way. The building process is based on the post-order traversal exploration of the partitions tree.
- Parameters
data (ndarray) – The input data matrix.
part_root (Partition) – The root partition of the tree.
trees_dict (dict) – None if no dependency tree has to be respected, a dictionary of trees otherwise.
det (bool) – True to force determinism, False otherwise.
use_clt (bool) – True to use CLTrees as leaf nodes, False otherwise.
alpha (float) – Laplace smoothing factor.
- Returns
the XPC induced by the partition tree
- Return type
- deeprob.spn.learning.xpc.learn_xpc(data, det, sd, min_part_inst, conj_len, arity, n_max_parts=200, use_clt=True, use_greedy_ordering=False, alpha=0.01, random_seed=42)[source]
Learn an eXtremely randomized Probabilistic Circuit (XPC).
- Parameters
data (ndarray) – The input data matrix.
det (bool) – True to force determinism, False otherwise.
sd (bool) – True to force structured decomposability, False otherwise.
min_part_inst (int) – The minimum number of instances allowed per partition.
conj_len (int) – The conjunction length.
arity (int) – The maximum number of children for a sum node.
n_max_parts (int) – The maximum number of partitions for the partitions tree.
use_clt (bool) – True to use CLTrees as multivariate leaves, False otherwise.
use_greedy_ordering (Optional[bool]) – True to use a greedy ordering, False otherwise.
alpha (int) – Laplace smoothing factor.
random_seed (int) – Random State.
- Return type
- deeprob.spn.learning.xpc.learn_expc(data, ensemble_dim, det, sd_level, min_part_inst, conj_len, arity, n_max_parts=200, use_clt=True, alpha=0.01, random_seed=42)[source]
Learn an Ensemble (i.e. a mixture) of eXtremely randomized Probabilistic Circuit (EXPC).
- Parameters
data (ndarray) – The input data matrix.
ensemble_dim (int) – The number of circuits in the ensemble/mixture.
det (bool) – True to force determinism, False otherwise.
sd_level (int) – 0 a non-SD ensemble of non-SD PCs, 1 for a non-SD ensemble of SD PCs and 2 for a SD ensemble.
min_part_inst (int) – The minimum number of instances allowed per partition.
conj_len (int) – The conjunction length.
arity (int) – The maximum number of children for a Sum node.
n_max_parts (int) – The maximum number of partitions for the partitions tree.
use_clt (bool) – True to use CLTrees as multivariate leaves, False otherwise.
alpha (int) – Laplace smoothing factor.
random_seed (int) – A random seed.
- Return type