deeprob.spn.learning package

Subpackages

Submodules

deeprob.spn.learning.em module

deeprob.spn.learning.em.expectation_maximization(root, data, num_iter=100, batch_perc=0.1, step_size=0.5, random_init=True, random_state=None, verbose=True)[source]

Learn the parameters of a SPN by batch Expectation-Maximization (EM). See https://arxiv.org/abs/1604.07243 and https://arxiv.org/abs/2004.06231 for details.

Parameters
  • root (Node) – The spn structure.

  • data (ndarray) – The data to use to learn the parameters.

  • num_iter (int) – The number of iterations.

  • batch_perc (float) – The percentage of data to use for each step.

  • step_size (float) – The step size for batch EM.

  • random_init (bool) – Whether to random initialize the weights of the SPN.

  • random_state (Optional[Union[int, RandomState]]) – The random state. It can be either None, a seed integer or a Numpy RandomState.

  • verbose (bool) – Whether to enable verbose learning.

Returns

The spn with learned parameters.

Raises

ValueError – If a parameter is out of domain.

Return type

Node

deeprob.spn.learning.leaf module

deeprob.spn.learning.leaf.LearnLeafFunc

A signature for a learn SPN leaf function.

alias of Callable[[ndarray, List[Type[Leaf]], List[Union[list, tuple]], List[int], Any], Node]

deeprob.spn.learning.leaf.get_learn_leaf_method(learn_leaf)[source]

Get the learn leaf method.

Parameters

learn_leaf (str) – The learn leaf method string to use.

Returns

A learn leaf function.

Raises

ValueError – If the leaf learning method is unknown.

Return type

Callable[[ndarray, List[Type[Leaf]], List[Union[list, tuple]], List[int], Any], Node]

deeprob.spn.learning.leaf.learn_mle(data, distributions, domains, scope, alpha=0.1, random_state=None)[source]

Learn a leaf using Maximum Likelihood Estimate (MLE). If the data is multivariate, a naive factorized model is learned.

Parameters
  • data (ndarray) – The data, where each column correspond to a random variable.

  • distributions (List[Type[Leaf]]) – The distributions of the random variables.

  • domains (List[Union[list, tuple]]) – The domains of the random variables.

  • scope (List[int]) – The scope of the leaf.

  • alpha (float) – Laplace smoothing factor.

  • random_state (Optional[Union[int, RandomState]]) – The random state. It can be None.

Returns

A leaf distribution.

Raises

ValueError – If there are inconsistencies between the data, distributions and domains.

Return type

Node

deeprob.spn.learning.leaf.learn_isotonic(data, distributions, domains, scope, alpha=0.1, random_state=None)[source]

Learn a leaf using Isotonic method. If the data is multivariate, a naive factorized model is learned.

Parameters
Returns

A leaf distribution.

Raises

ValueError – If there are inconsistencies between the data, distributions and domains.

Return type

Node

deeprob.spn.learning.leaf.learn_binary_clt(data, distributions, domains, scope, to_pc=False, alpha=0.1, random_state=None)[source]

Learn a leaf using a Binary Chow-Liu Tree (CLT). If the data is univariate, a Maximum Likelihood Estimate (MLE) leaf is returned.

Parameters
  • data (ndarray) – The data.

  • distributions (List[Type[Leaf]]) – The distributions of the random variables.

  • domains (List[Union[list, tuple]]) – The domains of the random variables.

  • scope (List[int]) – The scope of the leaf.

  • to_pc (bool) – Whether to convert the CLT into an equivalent PC.

  • alpha (float) – Laplace smoothing factor.

  • random_state (Optional[Union[int, RandomState]]) – The random state. It can be None.

Returns

A leaf distribution.

Raises
  • ValueError – If there are inconsistencies between the data, distributions and domains.

  • ValueError – If the data doesn’t follow a Bernoulli distribution.

Return type

Node

deeprob.spn.learning.leaf.learn_naive_factorization(data, distributions, domains, scope, learn_leaf_func, **learn_leaf_kwargs)[source]

Learn a leaf as a naive factorized model.

Parameters
Returns

A naive factorized model.

Raises

ValueError – If there are inconsistencies between the data, distributions and domains.

Return type

Node

deeprob.spn.learning.learnspn module

class deeprob.spn.learning.learnspn.OperationKind(value)[source]

Bases: Enum

Operation kind used by LearnSPN algorithm.

REM_FEATURES = 1
CREATE_LEAF = 2
SPLIT_NAIVE = 3
SPLIT_ROWS = 4
SPLIT_COLS = 5
class deeprob.spn.learning.learnspn.Task(parent, data, scope, no_cols_split=False, no_rows_split=False, is_first=False)[source]

Bases: tuple

Create new instance of Task(parent, data, scope, no_cols_split, no_rows_split, is_first)

Parameters
parent: Node

Alias for field number 0

data: ndarray

Alias for field number 1

scope: List[int]

Alias for field number 2

no_cols_split: bool

Alias for field number 3

no_rows_split: bool

Alias for field number 4

is_first: bool

Alias for field number 5

deeprob.spn.learning.learnspn.learn_spn(data, distributions, domains, learn_leaf='mle', split_rows='kmeans', split_cols='rdc', learn_leaf_kwargs=None, split_rows_kwargs=None, split_cols_kwargs=None, min_rows_slice=256, min_cols_slice=2, random_state=None, verbose=True)[source]

Learn the structure and parameters of a SPN given some training data and several hyperparameters.

Parameters
  • data (ndarray) – The training data.

  • distributions (List[Type[Leaf]]) – A list of distribution classes (one for each feature).

  • domains (List[Union[list, tuple]]) – A list of domains (one for each feature). Each domain is either a list of values, for discrete distributions, or a tuple (consisting of min value and max value), for continuous distributions.

  • learn_leaf (Union[str, Callable[[ndarray, List[Type[Leaf]], List[Union[list, tuple]], List[int], Any], Node]]) – The method to use to learn a distribution leaf node, It can be either ‘mle’, ‘isotonic’, ‘binary-clt’ or a custom LearnLeafFunc.

  • split_rows (Union[str, Callable[[ndarray, List[Type[Leaf]], List[Union[list, tuple]], RandomState, Any], ndarray]]) – The rows splitting method. It can be either ‘kmeans’, ‘gmm’, ‘rdc’, ‘random’ or a custom SplitRowsFunc function.

  • split_cols (Union[str, Callable[[ndarray, List[Type[Leaf]], List[Union[list, tuple]], RandomState, Any], ndarray]]) – The columns splitting method. It can be either ‘gvs’, ‘rgvs’, ‘wrgvs’, ‘ebvs’, ‘ebvs_ae’, ‘gbvs’, ‘gbvs_ag’, ‘rdc’, ‘random’ or a custom SplitColsFunc function.

  • learn_leaf_kwargs (Optional[dict]) – The parameters of the learn leaf method.

  • split_rows_kwargs (Optional[dict]) – The parameters of the rows splitting method.

  • split_cols_kwargs (Optional[dict]) – The parameters of the cols splitting method.

  • min_rows_slice (int) – The minimum number of samples required to split horizontally.

  • min_cols_slice (int) – The minimum number of features required to split vertically.

  • random_state (Optional[Union[int, RandomState]]) – The random state. It can be either None, a seed integer or a Numpy RandomState.

  • verbose (bool) – Whether to enable verbose mode.

Returns

A learned valid SPN.

Raises

ValueError – If a parameter is out of scope.

Return type

Node

deeprob.spn.learning.wrappers module

deeprob.spn.learning.wrappers.learn_estimator(data, distributions, domains=None, method='learnspn', **kwargs)[source]

Learn a SPN density estimator given some training data, the features distributions and domains.

Parameters
  • data (ndarray) – The training data.

  • distributions (List[Type[Leaf]]) – A list of distribution classes (one for each feature).

  • domains (Optional[List[Union[list, tuple]]]) – A list of domains (one for each feature). Each domain is either a list of values, for discrete distributions, or a tuple (consisting of min value and max value), for continuous distributions. If None, domains are determined automatically.

  • method (str) – The method used for structure learning. It can be either ‘learnspn’, ‘xpc’ or ‘ensemble-xpc’.

  • kwargs – Additional parameters for structure learning.

Returns

A learned valid and optimized SPN.

Raises
  • ValueError – If the method used for structure learning is not known.

  • ValueError – If the method is ‘xpc’ or ‘ensemble-xpc’ but the variable domains are not binary.

Return type

Node

deeprob.spn.learning.wrappers.learn_classifier(data, distributions, domains=None, class_idx=- 1, verbose=True, **kwargs)[source]

Learn a SPN classifier given some training data, the features distributions and domains and the class index in the training data.

Parameters
  • data (ndarray) – The training data.

  • distributions (List[Type[Leaf]]) – A list of distribution classes (one for each feature).

  • domains (Optional[List[Union[list, tuple]]]) – A list of domains (one for each feature). Each domain is either a list of values, for discrete distributions, or a tuple (consisting of min value and max value), for continuous distributions. If None, domains are determined automatically.

  • class_idx (int) – The index of the class feature in the training data.

  • verbose (bool) – Whether to enable verbose mode.

  • kwargs – Other parameters for structure learning.

Returns

A learned valid and optimized SPN.

Return type

Node

deeprob.spn.learning.wrappers.compute_data_domains(data, distributions)[source]

Compute the domains based on the training data and the features distributions.

Parameters
  • data (ndarray) – The training data.

  • distributions (List[Type[Leaf]]) – A list of distribution classes.

Returns

A list of domains. Each domain is either a list of values, for discrete distributions, or a tuple (consisting of min value and max value), for continuous distributions.

Raises

ValueError – If an unknown distribution type is found.

Return type

List[Union[list, tuple]]

deeprob.spn.learning.xpc module

deeprob.spn.learning.xpc.build_disjunction(data, scope, assignments=None, alpha=0.01)[source]

Build a disjunction (sum node) of conjunctions (product nodes). If assignments are given, every conjunction is associated to a specific assignment (the number of conjunctions is the same as the given assignments); otherwise, every conjunction will be associated to a specific assignment occurring in the input data (the number of conjunctions is the same as the unique assignments occurring in the data).

Parameters
  • data (ndarray) – The input data matrix.

  • scope (list) – The scope.

  • assignments (Optional[ndarray]) – The optional assignments.

  • alpha (float) – Laplace smoothing factor.

Return type

Node

deeprob.spn.learning.xpc.build_leaf(data, part, use_clt, trees_dict, det, alpha)[source]

Build a multivariate leaf distribution for an XPC.

Parameters
  • data (ndarray) – The input data matrix.

  • part (Partition) – The partition associated to the leaf to build.

  • use_clt (bool) – True if it is possible to use CLTrees as leaf nodes, False otherwise.

  • trees_dict (dict) – A dictionary of trees (see the function build_trees_dict).

  • det (bool) – True to force determinism, False otherwise.

  • alpha (float) – Laplace smoothing factor.

Return type

Node

deeprob.spn.learning.xpc.greedy_vars_ordering(data, conj_len, alpha=0.01)[source]

Return the ordering of the random variables according to the implemented heuristic.

Parameters
  • data (ndarray) – The input data matrix.

  • conj_len (int) – The conjunction length.

  • alpha (float) – Laplace smoothing factor.

Return ordering

The ordering.

Return type

list

deeprob.spn.learning.xpc.build_trees_dict(data, cl_parts_l, conj_vars_l, alpha, random_state)[source]
Return a dictionary where:
  • a key refers to a scope length

  • a value is a list of two lists: the first is a list of predecessors, the second its scope.

Parameters
  • data (ndarray) – The input data matrix.

  • cl_parts_l (list) – List of lists. Every sublist is associated to a specific XPC and contains the leaf partitions over which a CLTree will be learnt.

  • conj_vars_l (list) – List of lists. Every sublist contains the variables of a conjunction (e.g. [[3, 5]]). If a sublist occurs before another, then the former has been used first. There are no duplicates.

  • alpha (float) – Laplace smoothing factor.

  • random_state (RandomState) – The random state.

Return tree_dict

The dictionary.

Return type

dict

deeprob.spn.learning.xpc.build_xpc(data, part_root, trees_dict, det, use_clt, alpha)[source]

Build the XPC induced by the partitions tree in a bottom up way. The building process is based on the post-order traversal exploration of the partitions tree.

Parameters
  • data (ndarray) – The input data matrix.

  • part_root (Partition) – The root partition of the tree.

  • trees_dict (dict) – None if no dependency tree has to be respected, a dictionary of trees otherwise.

  • det (bool) – True to force determinism, False otherwise.

  • use_clt (bool) – True to use CLTrees as leaf nodes, False otherwise.

  • alpha (float) – Laplace smoothing factor.

Returns

the XPC induced by the partition tree

Return type

Node

deeprob.spn.learning.xpc.learn_xpc(data, det, sd, min_part_inst, conj_len, arity, n_max_parts=200, use_clt=True, use_greedy_ordering=False, alpha=0.01, random_seed=42)[source]

Learn an eXtremely randomized Probabilistic Circuit (XPC).

Parameters
  • data (ndarray) – The input data matrix.

  • det (bool) – True to force determinism, False otherwise.

  • sd (bool) – True to force structured decomposability, False otherwise.

  • min_part_inst (int) – The minimum number of instances allowed per partition.

  • conj_len (int) – The conjunction length.

  • arity (int) – The maximum number of children for a sum node.

  • n_max_parts (int) – The maximum number of partitions for the partitions tree.

  • use_clt (bool) – True to use CLTrees as multivariate leaves, False otherwise.

  • use_greedy_ordering (Optional[bool]) – True to use a greedy ordering, False otherwise.

  • alpha (int) – Laplace smoothing factor.

  • random_seed (int) – Random State.

Return type

Tuple[Node, dict]

deeprob.spn.learning.xpc.learn_expc(data, ensemble_dim, det, sd_level, min_part_inst, conj_len, arity, n_max_parts=200, use_clt=True, alpha=0.01, random_seed=42)[source]

Learn an Ensemble (i.e. a mixture) of eXtremely randomized Probabilistic Circuit (EXPC).

Parameters
  • data (ndarray) – The input data matrix.

  • ensemble_dim (int) – The number of circuits in the ensemble/mixture.

  • det (bool) – True to force determinism, False otherwise.

  • sd_level (int) – 0 a non-SD ensemble of non-SD PCs, 1 for a non-SD ensemble of SD PCs and 2 for a SD ensemble.

  • min_part_inst (int) – The minimum number of instances allowed per partition.

  • conj_len (int) – The conjunction length.

  • arity (int) – The maximum number of children for a Sum node.

  • n_max_parts (int) – The maximum number of partitions for the partitions tree.

  • use_clt (bool) – True to use CLTrees as multivariate leaves, False otherwise.

  • alpha (int) – Laplace smoothing factor.

  • random_seed (int) – A random seed.

Return type

Tuple[Node, list]

Module contents