deeprob.spn.learning package

Subpackages

deeprob.spn.learning.splitting package

Submodules

deeprob.spn.learning.em module

deeprob.spn.learning.em.expectation_maximization(root, data, num_iter=100, batch_perc=0.1, step_size=0.5, random_init=True, random_state=None, verbose=True)[source]

Learn the parameters of a SPN by batch Expectation-Maximization (EM). See https://arxiv.org/abs/1604.07243 and https://arxiv.org/abs/2004.06231 for details.

Parameters

root (Node) – The spn structure.
data (ndarray) – The data to use to learn the parameters.
num_iter (int) – The number of iterations.
batch_perc (float) – The percentage of data to use for each step.
step_size (float) – The step size for batch EM.
random_init (bool) – Whether to random initialize the weights of the SPN.
random_state (Optional[Union[int, RandomState]]) – The random state. It can be either None, a seed integer or a Numpy RandomState.
verbose (bool) – Whether to enable verbose learning.

Returns

The spn with learned parameters.

Raises

ValueError – If a parameter is out of domain.

Return type

Node

deeprob.spn.learning.leaf module

deeprob.spn.learning.leaf.LearnLeafFunc

A signature for a learn SPN leaf function.

alias of Callable[[ndarray, List[Type[Leaf]], List[Union[list, tuple]], List[int], Any], Node]

deeprob.spn.learning.leaf.get_learn_leaf_method(learn_leaf)[source]

Get the learn leaf method.

Parameters: learn_leaf (str) – The learn leaf method string to use.
Returns: A learn leaf function.
Raises: ValueError – If the leaf learning method is unknown.
Return type: Callable[[ndarray, List[Type[Leaf]], List[Union[list, tuple]], List[int], Any], Node]

deeprob.spn.learning.leaf.learn_mle(data, distributions, domains, scope, alpha=0.1, random_state=None)[source]

Learn a leaf using Maximum Likelihood Estimate (MLE). If the data is multivariate, a naive factorized model is learned.

Parameters

data (ndarray) – The data, where each column correspond to a random variable.
distributions (List[Type[Leaf]]) – The distributions of the random variables.
domains (List[Union[list, tuple]]) – The domains of the random variables.
scope (List[int]) – The scope of the leaf.
alpha (float) – Laplace smoothing factor.
random_state (Optional[Union[int, RandomState]]) – The random state. It can be None.

Returns

A leaf distribution.

Raises

ValueError – If there are inconsistencies between the data, distributions and domains.

Return type

Node

deeprob.spn.learning.leaf.learn_isotonic(data, distributions, domains, scope, alpha=0.1, random_state=None)[source]

Learn a leaf using Isotonic method. If the data is multivariate, a naive factorized model is learned.

Parameters

data (ndarray) – The data.
distributions (List[Type[Leaf]]) – The distribution of the random variables.
domains (List[Union[list, tuple]]) – The domain of the random variables.
scope (List[int]) – The scope of the leaf.
alpha (float) – Laplace smoothing factor.
random_state (Optional[Union[int, RandomState]]) – The random sate. It can be None.

Returns

A leaf distribution.

Raises

ValueError – If there are inconsistencies between the data, distributions and domains.

Return type

Node

deeprob.spn.learning.leaf.learn_binary_clt(data, distributions, domains, scope, to_pc=False, alpha=0.1, random_state=None)[source]

Learn a leaf using a Binary Chow-Liu Tree (CLT). If the data is univariate, a Maximum Likelihood Estimate (MLE) leaf is returned.

Parameters

data (ndarray) – The data.
distributions (List[Type[Leaf]]) – The distributions of the random variables.
domains (List[Union[list, tuple]]) – The domains of the random variables.
scope (List[int]) – The scope of the leaf.
to_pc (bool) – Whether to convert the CLT into an equivalent PC.
alpha (float) – Laplace smoothing factor.
random_state (Optional[Union[int, RandomState]]) – The random state. It can be None.

Returns

A leaf distribution.

Raises

ValueError – If there are inconsistencies between the data, distributions and domains.
ValueError – If the data doesn’t follow a Bernoulli distribution.

Return type

Node

deeprob.spn.learning.leaf.learn_naive_factorization(data, distributions, domains, scope, learn_leaf_func, **learn_leaf_kwargs)[source]

Learn a leaf as a naive factorized model.

Parameters

data (ndarray) – The data.
distributions (List[Type[Leaf]]) – The distribution of the random variables.
domains (List[Union[list, tuple]]) – The domain of the random variables.
scope (List[int]) – The scope of the leaf.
learn_leaf_func (Callable[[ndarray, List[Type[Leaf]], List[Union[list, tuple]], List[int], Any], Node]) – The function to use to learn the sub-distributions parameters.
learn_leaf_kwargs – Additional parameters for learn_leaf_func.

Returns

A naive factorized model.

Raises

ValueError – If there are inconsistencies between the data, distributions and domains.

Return type

Node

deeprob.spn.learning.learnspn module

class deeprob.spn.learning.learnspn.OperationKind(value)[source]

Bases: Enum

Operation kind used by LearnSPN algorithm.

REM_FEATURES = 1

CREATE_LEAF = 2

SPLIT_NAIVE = 3

SPLIT_ROWS = 4

SPLIT_COLS = 5

class deeprob.spn.learning.learnspn.Task(parent, data, scope, no_cols_split=False, no_rows_split=False, is_first=False)[source]

Bases: tuple

Create new instance of Task(parent, data, scope, no_cols_split, no_rows_split, is_first)

Parameters

parent (Node) –
data (ndarray) –
scope (List[int]) –
no_cols_split (bool) –
no_rows_split (bool) –
is_first (bool) –

parent: Node: Alias for field number 0

data: ndarray: Alias for field number 1

scope: List[int]: Alias for field number 2

no_cols_split: bool: Alias for field number 3

no_rows_split: bool: Alias for field number 4

is_first: bool: Alias for field number 5

deeprob.spn.learning.learnspn.learn_spn(data, distributions, domains, learn_leaf='mle', split_rows='kmeans', split_cols='rdc', learn_leaf_kwargs=None, split_rows_kwargs=None, split_cols_kwargs=None, min_rows_slice=256, min_cols_slice=2, random_state=None, verbose=True)[source]

Learn the structure and parameters of a SPN given some training data and several hyperparameters.

Parameters

data (ndarray) – The training data.
distributions (List[Type[Leaf]]) – A list of distribution classes (one for each feature).
domains (List[Union[list, tuple]]) – A list of domains (one for each feature). Each domain is either a list of values, for discrete distributions, or a tuple (consisting of min value and max value), for continuous distributions.
learn_leaf (Union[str, Callable[[ndarray, List[Type[Leaf]], List[Union[list, tuple]], List[int], Any], Node]]) – The method to use to learn a distribution leaf node, It can be either ‘mle’, ‘isotonic’, ‘binary-clt’ or a custom LearnLeafFunc.
split_rows (Union[str, Callable[[ndarray, List[Type[Leaf]], List[Union[list, tuple]], RandomState, Any], ndarray]]) – The rows splitting method. It can be either ‘kmeans’, ‘gmm’, ‘rdc’, ‘random’ or a custom SplitRowsFunc function.
split_cols (Union[str, Callable[[ndarray, List[Type[Leaf]], List[Union[list, tuple]], RandomState, Any], ndarray]]) – The columns splitting method. It can be either ‘gvs’, ‘rgvs’, ‘wrgvs’, ‘ebvs’, ‘ebvs_ae’, ‘gbvs’, ‘gbvs_ag’, ‘rdc’, ‘random’ or a custom SplitColsFunc function.
learn_leaf_kwargs (Optional[dict]) – The parameters of the learn leaf method.
split_rows_kwargs (Optional[dict]) – The parameters of the rows splitting method.
split_cols_kwargs (Optional[dict]) – The parameters of the cols splitting method.
min_rows_slice (int) – The minimum number of samples required to split horizontally.
min_cols_slice (int) – The minimum number of features required to split vertically.
random_state (Optional[Union[int, RandomState]]) – The random state. It can be either None, a seed integer or a Numpy RandomState.
verbose (bool) – Whether to enable verbose mode.

Returns

A learned valid SPN.

Raises

ValueError – If a parameter is out of scope.

Return type

Node

deeprob.spn.learning.wrappers module

deeprob.spn.learning.wrappers.learn_estimator(data, distributions, domains=None, method='learnspn', **kwargs)[source]

Learn a SPN density estimator given some training data, the features distributions and domains.

Parameters

data (ndarray) – The training data.
distributions (List[Type[Leaf]]) – A list of distribution classes (one for each feature).
domains (Optional[List[Union[list, tuple]]]) – A list of domains (one for each feature). Each domain is either a list of values, for discrete distributions, or a tuple (consisting of min value and max value), for continuous distributions. If None, domains are determined automatically.
method (str) – The method used for structure learning. It can be either ‘learnspn’, ‘xpc’ or ‘ensemble-xpc’.
kwargs – Additional parameters for structure learning.

Returns

A learned valid and optimized SPN.

Raises

ValueError – If the method used for structure learning is not known.
ValueError – If the method is ‘xpc’ or ‘ensemble-xpc’ but the variable domains are not binary.

Return type

Node

deeprob.spn.learning.wrappers.learn_classifier(data, distributions, domains=None, class_idx=- 1, verbose=True, **kwargs)[source]

Learn a SPN classifier given some training data, the features distributions and domains and the class index in the training data.

Parameters

data (ndarray) – The training data.
distributions (List[Type[Leaf]]) – A list of distribution classes (one for each feature).
domains (Optional[List[Union[list, tuple]]]) – A list of domains (one for each feature). Each domain is either a list of values, for discrete distributions, or a tuple (consisting of min value and max value), for continuous distributions. If None, domains are determined automatically.
class_idx (int) – The index of the class feature in the training data.
verbose (bool) – Whether to enable verbose mode.
kwargs – Other parameters for structure learning.

Returns

A learned valid and optimized SPN.

Return type

Node

deeprob.spn.learning.wrappers.compute_data_domains(data, distributions)[source]

Compute the domains based on the training data and the features distributions.

Parameters

data (ndarray) – The training data.
distributions (List[Type[Leaf]]) – A list of distribution classes.

Returns

A list of domains. Each domain is either a list of values, for discrete distributions, or a tuple (consisting of min value and max value), for continuous distributions.

Raises

ValueError – If an unknown distribution type is found.

Return type

List[Union[list, tuple]]

deeprob.spn.learning.xpc module

deeprob.spn.learning.xpc.build_disjunction(data, scope, assignments=None, alpha=0.01)[source]

Build a disjunction (sum node) of conjunctions (product nodes). If assignments are given, every conjunction is associated to a specific assignment (the number of conjunctions is the same as the given assignments); otherwise, every conjunction will be associated to a specific assignment occurring in the input data (the number of conjunctions is the same as the unique assignments occurring in the data).

Parameters

data (ndarray) – The input data matrix.
scope (list) – The scope.
assignments (Optional[ndarray]) – The optional assignments.
alpha (float) – Laplace smoothing factor.

Return type

Node

deeprob.spn.learning.xpc.build_leaf(data, part, use_clt, trees_dict, det, alpha)[source]

Build a multivariate leaf distribution for an XPC.

Parameters

data (ndarray) – The input data matrix.
part (Partition) – The partition associated to the leaf to build.
use_clt (bool) – True if it is possible to use CLTrees as leaf nodes, False otherwise.
trees_dict (dict) – A dictionary of trees (see the function build_trees_dict).
det (bool) – True to force determinism, False otherwise.
alpha (float) – Laplace smoothing factor.

Return type

Node

deeprob.spn.learning.xpc.greedy_vars_ordering(data, conj_len, alpha=0.01)[source]

Return the ordering of the random variables according to the implemented heuristic.

Parameters

data (ndarray) – The input data matrix.
conj_len (int) – The conjunction length.
alpha (float) – Laplace smoothing factor.

Return ordering

The ordering.

Return type

list

deeprob.spn.learning.xpc.build_trees_dict(data, cl_parts_l, conj_vars_l, alpha, random_state)[source]

Return a dictionary where:

a key refers to a scope length
a value is a list of two lists: the first is a list of predecessors, the second its scope.

Parameters

data (ndarray) – The input data matrix.
cl_parts_l (list) – List of lists. Every sublist is associated to a specific XPC and contains the leaf partitions over which a CLTree will be learnt.
conj_vars_l (list) – List of lists. Every sublist contains the variables of a conjunction (e.g. [[3, 5]]). If a sublist occurs before another, then the former has been used first. There are no duplicates.
alpha (float) – Laplace smoothing factor.
random_state (RandomState) – The random state.

Return tree_dict

The dictionary.

Return type

dict

deeprob.spn.learning.xpc.build_xpc(data, part_root, trees_dict, det, use_clt, alpha)[source]

Build the XPC induced by the partitions tree in a bottom up way. The building process is based on the post-order traversal exploration of the partitions tree.

Parameters

data (ndarray) – The input data matrix.
part_root (Partition) – The root partition of the tree.
trees_dict (dict) – None if no dependency tree has to be respected, a dictionary of trees otherwise.
det (bool) – True to force determinism, False otherwise.
use_clt (bool) – True to use CLTrees as leaf nodes, False otherwise.
alpha (float) – Laplace smoothing factor.

Returns

the XPC induced by the partition tree

Return type

Node

deeprob.spn.learning.xpc.learn_xpc(data, det, sd, min_part_inst, conj_len, arity, n_max_parts=200, use_clt=True, use_greedy_ordering=False, alpha=0.01, random_seed=42)[source]

Learn an eXtremely randomized Probabilistic Circuit (XPC).

Parameters

data (ndarray) – The input data matrix.
det (bool) – True to force determinism, False otherwise.
sd (bool) – True to force structured decomposability, False otherwise.
min_part_inst (int) – The minimum number of instances allowed per partition.
conj_len (int) – The conjunction length.
arity (int) – The maximum number of children for a sum node.
n_max_parts (int) – The maximum number of partitions for the partitions tree.
use_clt (bool) – True to use CLTrees as multivariate leaves, False otherwise.
use_greedy_ordering (Optional[bool]) – True to use a greedy ordering, False otherwise.
alpha (int) – Laplace smoothing factor.
random_seed (int) – Random State.

Return type

Tuple[Node, dict]

deeprob.spn.learning.xpc.learn_expc(data, ensemble_dim, det, sd_level, min_part_inst, conj_len, arity, n_max_parts=200, use_clt=True, alpha=0.01, random_seed=42)[source]

Learn an Ensemble (i.e. a mixture) of eXtremely randomized Probabilistic Circuit (EXPC).

Parameters

data (ndarray) – The input data matrix.
ensemble_dim (int) – The number of circuits in the ensemble/mixture.
det (bool) – True to force determinism, False otherwise.
sd_level (int) – 0 a non-SD ensemble of non-SD PCs, 1 for a non-SD ensemble of SD PCs and 2 for a SD ensemble.
min_part_inst (int) – The minimum number of instances allowed per partition.
conj_len (int) – The conjunction length.
arity (int) – The maximum number of children for a Sum node.
n_max_parts (int) – The maximum number of partitions for the partitions tree.
use_clt (bool) – True to use CLTrees as multivariate leaves, False otherwise.
alpha (int) – Laplace smoothing factor.
random_seed (int) – A random seed.

Return type

Tuple[Node, list]

deeprob.spn.learning package

Subpackages

Submodules

deeprob.spn.learning.em module

deeprob.spn.learning.leaf module

deeprob.spn.learning.learnspn module

deeprob.spn.learning.wrappers module

deeprob.spn.learning.xpc module

Module contents