deeprob.spn.learning.splitting package
Submodules
deeprob.spn.learning.splitting.cluster module
- deeprob.spn.learning.splitting.cluster.gmm(data, distributions, domains, random_state, n=2)[source]
Execute GMM clustering on some data.
- Parameters
- Returns
An array where each element is the cluster where the corresponding data belong.
- Return type
- deeprob.spn.learning.splitting.cluster.kmeans(data, distributions, domains, random_state, n=2)[source]
Execute K-Means clustering on some data.
- Parameters
- Returns
An array where each element is the cluster where the corresponding data belong.
- Return type
- deeprob.spn.learning.splitting.cluster.kmeans_mb(data, distributions, domains, random_state, n=2)[source]
Execute MiniBatch K-Means clustering on some data.
- Parameters
- Returns
An array where each element is the cluster where the corresponding data belong.
- Return type
- deeprob.spn.learning.splitting.cluster.dbscan(data, distributions, domains, random_state, n=2)[source]
Execute DBSCAN clustering on some data (only on discrete data).
- Parameters
- Returns
An array where each element is the cluster where the corresponding data belong.
- Raises
ValueError – If the leaf distributions are NOT discrete.
- Return type
- deeprob.spn.learning.splitting.cluster.wald(data, distributions, domains, random_state, n=2)[source]
Execute Ward (Hierarchical) clustering on some data (only discrete data).
- Parameters
- Returns
An array where each element is the cluster where the corresponding data belong.
- Raises
ValueError – If the leaf distributions are NOT discrete.
- Return type
deeprob.spn.learning.splitting.cols module
- deeprob.spn.learning.splitting.cols.SplitColsFunc
A signature for a columns splitting function.
alias of
Callable[[ndarray,List[Type[Leaf]],List[Union[list,tuple]],RandomState,Any],ndarray]
- deeprob.spn.learning.splitting.cols.split_cols_clusters(data, clusters, scope)[source]
Split the data vertically given the clusters.
deeprob.spn.learning.splitting.entropy module
- deeprob.spn.learning.splitting.entropy.entropy_cols(data, distributions, domains, random_state, e=0.3, alpha=0.1)[source]
Entropy based column splitting method.
- Parameters
data (ndarray) – The data.
distributions (List[Type[Leaf]]) – Distributions of the features.
domains (List[Union[list, tuple]]) – Range of values of the features.
e (float) – Threshold of the considered entropy to be signficant.
alpha (float) – laplacian alpha to apply at frequence.
random_state (RandomState) –
- Returns
A partitioning of features.
- Return type
- deeprob.spn.learning.splitting.entropy.entropy_adaptive_cols(data, distributions, domains, random_state, e=0.3, alpha=0.1, size=None)[source]
Adaptive Entropy based column splitting method.
- Parameters
data (ndarray) – The data.
distributions (List[Type[Leaf]]) – Distributions of the features.
domains (List[Union[list, tuple]]) – Range of values of the features.
e (float) – Threshold of the considered entropy to be signficant.
alpha (float) – laplacian alpha to apply at frequence.
random_state (RandomState) –
- Returns
A partitioning of features.
- Raises
ValueError – If the size of the data is missing.
- Return type
deeprob.spn.learning.splitting.gini module
- deeprob.spn.learning.splitting.gini.gini_cols(data, distributions, domains, random_state, e=0.3, alpha=0.1)[source]
Gini index column splitting method.
- Parameters
data (ndarray) – The data.
distributions (List[Type[Leaf]]) – Distributions of the features.
domains (List[Union[list, tuple]]) – Range of values of the features.
e (float) – Threshold of the considered entropy to be signficant.
alpha (float) – laplacian alpha to apply at frequence.
random_state (RandomState) –
- Returns
A partitioning of features.
- Return type
- deeprob.spn.learning.splitting.gini.gini_adaptive_cols(data, distributions, domains, random_state, e=0.3, alpha=0.1, size=None)[source]
Adaptive Gini index column splitting method.
- Parameters
data (ndarray) – The data.
distributions (List[Type[Leaf]]) – Distributions of the features.
domains (List[Union[list, tuple]]) – Range of values of the features.
e (float) – Threshold of the considered entropy to be signficant.
alpha (float) – laplacian alpha to apply at frequence.
random_state (RandomState) –
- Returns
A partitioning of features.
- Raises
ValueError – If the size of the data is missing.
- Return type
deeprob.spn.learning.splitting.gvs module
- deeprob.spn.learning.splitting.gvs.gvs_cols(data, distributions, domains, random_state, p=5.0)[source]
Greedy Variable Splitting (GVS) independence test.
- Parameters
- Returns
A partitioning of features.
- Raises
ValueError – If the leaf distributions are discrete and continuous.
- Return type
- deeprob.spn.learning.splitting.gvs.rgvs_cols(data, distributions, domains, random_state, p=5.0)[source]
Random Greedy Variable Splitting (RGVS) independence test.
- Parameters
- Returns
A partitioning of features.
- Raises
ValueError – If the leaf distributions are discrete and continuous.
- Return type
- deeprob.spn.learning.splitting.gvs.wrgvs_cols(data, distributions, domains, random_state, p=5.0)[source]
Wiser Random Greedy Variable Splitting (WRGVS) independence test.
- Parameters
- Returns
A partitioning of features.
- Raises
ValueError – If the leaf distributions are discrete and continuous.
- Return type
- deeprob.spn.learning.splitting.gvs.gtest(data, i, j, distributions, domains, p=5.0, test=True)[source]
The G-Test independence test between two features.
- Parameters
- Returns
False if the features are assumed to be dependent, True otherwise.
- Raises
ValueError – If the leaf distributions are discrete and continuous.
- Return type
deeprob.spn.learning.splitting.random module
- deeprob.spn.learning.splitting.random.random_rows(data, distributions, domains, random_state, a=2.0, b=2.0)[source]
Choose a binary partition horizontally randomly. The proportion of the split is sampled from a beta distribution.
- Parameters
data (ndarray) – The data.
distributions (List[Type[Leaf]]) – The data distributions (not used).
domains (List[Union[list, tuple]]) – The data domains (not used).
random_state (RandomState) – The random state.
a (float) – The alpha parameter of the beta distribution.
b (float) – The beta parameter of the beta distribution.
- Returns
A binary partition.
- Return type
- deeprob.spn.learning.splitting.random.random_cols(data, distributions, domains, random_state, a=2.0, b=2.0)[source]
Choose a binary partition vertically randomly. The proportion of the split is sampled from a beta distribution.
- Parameters
data (ndarray) – The data.
distributions (List[Type[Leaf]]) – The data distributions (not used).
domains (List[Union[list, tuple]]) – The data domains (not used).
random_state (RandomState) – The random state.
a (float) – The alpha parameter of the beta distribution.
b (float) – The beta parameter of the beta distribution.
- Returns
A binary partition.
- Return type
deeprob.spn.learning.splitting.rdc module
- deeprob.spn.learning.splitting.rdc.rdc_cols(data, distributions, domains, random_state, d=0.3, k=20, s=0.16666666666666666, nl=<ufunc 'sin'>)[source]
Split the features using the RDC (Randomized Dependency Coefficient) method.
- Parameters
data (ndarray) – The data.
random_state (RandomState) – The random state.
d (float) – The threshold value that regulates the independence tests among the features.
k (int) – The size of the latent space.
s (float) – The standard deviation of the gaussian distribution.
nl (Callable[[ndarray], ndarray]) – The non linear function to use.
- Returns
A features partitioning.
- Return type
- deeprob.spn.learning.splitting.rdc.rdc_rows(data, distributions, domains, random_state, n=2, k=20, s=0.16666666666666666, nl=<ufunc 'sin'>)[source]
Split the samples using the RDC (Randomized Dependency Coefficient) method.
- Parameters
- Returns
A samples partitioning.
- Return type
- deeprob.spn.learning.splitting.rdc.rdc_scores(data, distributions, domains, random_state, k=20, s=0.16666666666666666, nl=<ufunc 'sin'>)[source]
Compute the RDC (Randomized Dependency Coefficient) score for each pair of features.
- deeprob.spn.learning.splitting.rdc.rdc_cca(i, j, features)[source]
Compute the RDC (Randomized Dependency Coefficient) using CCA (Canonical Correlation Analysis).
- deeprob.spn.learning.splitting.rdc.rdc_transform(data, distributions, domains, random_state, k=20, s=0.16666666666666666, nl=<ufunc 'sin'>)[source]
Execute the RDC (Randomized Dependency Coefficient) pipeline on some data.
- Parameters
- Returns
The transformed data.
- Raises
ValueError – If an unknown distribution type is found.
- Return type
deeprob.spn.learning.splitting.rows module
- deeprob.spn.learning.splitting.rows.SplitRowsFunc
A signature for a rows splitting function.
alias of
Callable[[ndarray,List[Type[Leaf]],List[Union[list,tuple]],RandomState,Any],ndarray]
- deeprob.spn.learning.splitting.rows.split_rows_clusters(data, clusters)[source]
Split the data horizontally given the clusters.