Title: | Quantile Classifier |
---|---|
Description: | Code for centroid, median and quantile classifiers. |
Authors: | Marco Berrettini, Christian Hennig, Cinzia Viroli |
Maintainer: | Cinzia Viroli <[email protected]> |
License: | GPL-3 |
Version: | 1.2 |
Built: | 2025-03-15 03:25:19 UTC |
Source: | https://github.com/cran/quantileDA |
Data on 102 male and 100 female athletes collected at the Australian Institute of Sport, courtesy of Richard Telford and Ross Cunningham.
data(ais)
data(ais)
A data frame with 202 observations on the following 13 variables.
sex
A factor with levels female
male
sport
A factor with levels B_Ball
Field
Gym
Netball
Row
Swim
T_400m
T_Sprnt
Tennis
W_Polo
rcc
A numeric vector: red cell count
wcc
A numeric vector: white cell count
Hc
A numeric vector: Hematocrit
Hg
A numeric vector: Hemoglobin
Fe
A numeric vector: plasma ferritin concentration
bmi
A numeric vector: body mass index
ssf
A numeric vector: sum of skin folds
Bfat
A numeric vector: body fat percentage
lbm
A numeric vector: lean body mass
Ht
A numeric vector: height (cm)
Wt
A numeric vector: weight (kg)
Cook and Weisberg (1994), An Introduction to Regression Graphics. John Wiley & Sons, New York.
data(ais) attach(ais) pairs(ais[,c(3:4,10:13)], main = "AIS data") plot(Wt~sport)
data(ais) attach(ais) pairs(ais[,c(3:4,10:13)], main = "AIS data") plot(Wt~sport)
Internal function used the cross-validation of the quantile classifier
Given a training and a test set, the function apply the centroid classifier and returns the classification labels of the observations in the training and in test set. It also gives the training misclassification rate and the test misclassification rate, if the truth class labels of the test set are provided in input.
centroidcl(train, test, cl, cl.test = NULL)
centroidcl(train, test, cl, cl.test = NULL)
train |
A matrix of data (the training set) with observations in rows and variables in column. It can be a matrix or a dataframe. |
test |
A matrix of data (the test set) with observations in rows and variables in columns. It can be a matrix or a dataframe. |
cl |
A vector of class labels for each sample of the training set. It can be factor or numerical. |
cl.test |
A vector of class labels for each sample of the test set (optional) |
centroidcl
carries out the centroid classifier and predicts classification.
A list with components
cl.train |
Predicted classification in the training set |
cl.test |
Predicted classification in the test set |
me.train |
Misclassification error in the training set |
me.test |
Misclassification error in the test set (only if |
Christian Hennig, Cinzia Viroli
See Also theta.cl
data(ais) x=ais[,3:13] cl=as.double(ais[,1]) set.seed(22) index=sample(1:202,152,replace=FALSE) train=x[index,] test=x[-index,] cl.train=cl[index] cl.test=cl[-index] out.c=centroidcl(train,test,cl.train,cl.test) out.c$me.test misc(out.c$cl.test,cl.test)
data(ais) x=ais[,3:13] cl=as.double(ais[,1]) set.seed(22) index=sample(1:202,152,replace=FALSE) train=x[index,] test=x[-index,] cl.train=cl[index] cl.test=cl[-index] out.c=centroidcl(train,test,cl.train,cl.test) out.c$me.test misc(out.c$cl.test,cl.test)
Internal function for the quantile classifier with variable-wise thetas
The function compute the Galton's skewness index on a set of observations.
galtonskew(x)
galtonskew(x)
x |
A vector of observations. |
A scalar which measures the Galton's skewness
Christian Hennig, Cinzia Viroli
See Also kelleyskew
data(ais) galtonskew(ais[,4])
data(ais) galtonskew(ais[,4])
Internal function for the quantile classifier
The function compute the Kelley's skewness index on a set of observations.
kelleyskew(x)
kelleyskew(x)
x |
A vector of observations. |
A scalar which measures the Kelley's skewness
Christian Hennig, Cinzia Viroli
See Also galtonskew
data(ais) kelleyskew(ais[,4])
data(ais) kelleyskew(ais[,4])
Internal function for the quantile classifier
An internal function which computes the misclassification error between two partitions
misc(classification, truth)
misc(classification, truth)
classification |
A numeric or character vector of class labels. |
truth |
A numeric or character vector of truth class labels. The length of truth should be the same as that of classification. |
The misclassification error (a scalar).
Internal function used by the quantile classifier
Internal function for plotting the results of the quantile classifier
Internal function for printing the results of the quantile classifier
The function applies the quantile classifier for a set of quantile probabilities and selects the optimal probability that minimize the misclassification rate in the training set.
quantilecl(train, test, cl, theta = NULL, cl.test = NULL, skew.correct="Galton")
quantilecl(train, test, cl, theta = NULL, cl.test = NULL, skew.correct="Galton")
train |
A matrix of data (the training set) with observations in rows and variables in columns. It can be a matrix or a dataframe. |
test |
A matrix of data (the test set) with observations in rows and variables in columns. It can be a matrix or a dataframe. |
cl |
A vector of class labels for each sample of the training set. It can be factor or numerical. |
theta |
A vector of quantile probabilities (optional) |
cl.test |
If available, a vector of class labels for each sample of the test set (optional) |
skew.correct |
Skewness measures applied to correct the skewness direction of the variables. The possibile choices are: Galton's skewness (default), Kelley's skewness and the conventional skewness index based on the third standardized moment |
quantile_cl
carries out the quantile classifier for a set of quantile probabilities and selects the optimal probability that minimize the misclassification rate in the training set. The values of the quantile probabilities can be given in input or automatically selected in a equispaced range of 49 values between 0 and 1. The data in the training and test samples are preprocessed so that the variables used for the quantile estimator all have the same (positive) direction of skewness according to different measures of skewness: Galton's skewness, Kelley's skewness or conventional skewness index.
A list with components
train.rates |
Misclassification errors for each quantile probability in the training set |
test.rates |
Misclassification errors for each quantile probability in the test set |
thetas |
The list of optimal quantile probabilities for each variable |
theta.choice |
The quantile probability that gives the less misclassification error in the training set |
me.train |
Misclassification error in the training set |
me.test |
Misclassification error in the test set (only if |
train |
The matrix of data (training set) with observations in rows and variables in columns |
test |
The matrix of data (test set) with observations in rows and variables in columns |
cl.train |
Predicted classification in the training set |
cl.test |
Predicted classification in the test set |
cl.train.0 |
The true classification labels in the training set |
cl.test.0 |
The true classification labels in the test set (if available) |
Christian Hennig, Cinzia Viroli
See Also quantilecl.vw
data(ais) x=ais[,3:13] cl=as.double(ais[,1]) set.seed(22) index=sample(1:202,152,replace=FALSE) train=x[index,] test=x[-index,] cl.train=cl[index] cl.test=cl[-index] out.q=quantilecl(train,test,cl.train,cl.test=cl.test) out.q$me.test print(out.q) plot(out.q)
data(ais) x=ais[,3:13] cl=as.double(ais[,1]) set.seed(22) index=sample(1:202,152,replace=FALSE) train=x[index,] test=x[-index,] cl.train=cl[index] cl.test=cl[-index] out.q=quantilecl(train,test,cl.train,cl.test=cl.test) out.q$me.test print(out.q) plot(out.q)
A function to apply the quantile classifier that uses a different optimal quantile probability for each variable
quantilecl.vw(train, test, cl, theta = NULL, cl.test = NULL)
quantilecl.vw(train, test, cl, theta = NULL, cl.test = NULL)
train |
A matrix of data (the training set) with observations in rows and variables in columns. It can be a matrix or a dataframe. |
test |
A matrix of data (the test set) with observations in rows and variables in columns. It can be a matrix or a dataframe. |
cl |
A vector of class labels for each sample of the training set. It can be factor or numerical. |
theta |
Given $p$ variables, a vector of length $p$ of quantile probabilities (optional) |
cl.test |
If available, a vector of class labels for each sample of the test set (optional) |
quantilecl.vw
carries out the quantile classifier by using a different optimal quantile probability for each variable selected in the training set.
A list with components
Vseq |
The value of the objective function at each iteration |
thetas |
The vector of quantile probabilities |
me.train |
Misclassification error for the best quantile probability in the training set |
me.test |
Misclassification error for the best quantile probability in the test set (only if |
cl.train |
Predicted classification in the training set |
cl.test |
Predicted classification in the test set |
lambda |
The vector of estimated scale parameters |
Marco Berrettini, Christian Hennig, Cinzia Viroli
See Also quantilecl
data(ais) x=ais[,3:7] cl=as.double(ais[,1]) set.seed(22) index=sample(1:202,152,replace=FALSE) train=x[index,] test=x[-index,] cl.train=cl[index] cl.test=cl[-index] out.q=quantilecl.vw(train,test,cl.train,cl.test=cl.test) out.q$me.test
data(ais) x=ais[,3:7] cl=as.double(ais[,1]) set.seed(22) index=sample(1:202,152,replace=FALSE) train=x[index,] test=x[-index,] cl.train=cl[index] cl.test=cl[-index] out.q=quantilecl.vw(train,test,cl.train,cl.test=cl.test) out.q$me.test
Balanced cross-validation for the quantile classifier
quantileCV(x, cl, nfold = min(table(cl)), folds = balanced.folds(cl, nfold), theta=NULL, seed = 1, varying = FALSE)
quantileCV(x, cl, nfold = min(table(cl)), folds = balanced.folds(cl, nfold), theta=NULL, seed = 1, varying = FALSE)
x |
A matrix of data (the training set) with observations in rows and variables in columns (it can be a matrix or a dataframe) |
cl |
A vector of class labels for each sample (factor or numerical) |
nfold |
Number of cross-validation folds. Default is the smallest class size. Admitted values are from 1 to the smallest class size as maximum fold number. |
folds |
A list with nfold components, each component a vector of indices of the samples in that fold. By default a (random) balanced cross-validation is used |
theta |
A vector of quantile probabilities (optional) |
seed |
Fix the seed of the running. Default is 1 |
varying |
If TRUE a different quantile for each variable is selected in the training set. If FALSE (default) an unique quantile is used. |
quantileCV
carries out cross-validation for a quantile classifier.
A list with components
test.rates |
Mean of misclassification errors in the cross-validation test sets for each quantile probability (available if |
train.rates |
Mean of misclassification errors in the cross-validation train sets for each quantile probability (available if |
thetas |
The fitted quantile probabilities |
theta.choice |
Value of the chosen quantile probability in the training set |
me.test |
Misclassification errors in the cross validation test sets for the best quantile probability |
me.train |
Misclassification errors in the cross validation training sets for the best quantile probability |
me.median |
Misclassification errors in the cross validation test sets of the median classifier |
me.centroid |
Misclassification errors in the cross validation test sets of the centroid classifier |
folds |
The cross-validation folds used |
Christian Hennig, Cinzia Viroli
data(ais) x=ais[,3:13] cl=as.double(ais[,1]) out=quantileCV(x,cl,nfold=2)
data(ais) x=ais[,3:13] cl=as.double(ais[,1]) out=quantileCV(x,cl,nfold=2)
A function that compute the conventional skewness measure according to the third standardized moment of x
skewness(x)
skewness(x)
x |
A vector of observations. |
A scalar which measures the skewness
Christian Hennig, Cinzia Viroli
See Also galtonskew
data(ais) skewness(ais[,4])
data(ais) skewness(ais[,4])
Given a certain quantile probability, the function compute the quantile classifier on the training set and gives the predicted class labels in the training and test set.It also computes the training misclassification rate and the test misclassification rate, when the truth labels of the test set are available. When the quantile probability is 0.5 the function compute the median classifier.
theta.cl(train, test, cl, theta, cl.test = NULL)
theta.cl(train, test, cl, theta, cl.test = NULL)
train |
A matrix of data (the training set) with observations in rows and variables in columns. It can be a matrix or a dataframe. |
test |
A matrix of data (the test set) with observations in rows and variables in columns. It can be a matrix or a dataframe. |
cl |
A vector of class labels for each sample of the training set. It can be factor or numerical. |
theta |
The quantile probability. If 0.5 the median classifier is applied |
cl.test |
If available, a vector of class labels for each sample of the test set (optional) |
theta.cl
carries out quantile classifier for a given quantile probability.
A list with components
cl.train |
Predicted classification in the training set |
cl.test |
Predicted classification in the test set |
me.train |
Misclassification error in the training set |
me.test |
Misclassification error in the test set (only if |
Christian Hennig, Cinzia Viroli
See Also centroidcl
data(ais) x=ais[,3:13] cl=as.double(ais[,1]) set.seed(22) index=sample(1:202,152,replace=FALSE) train=x[index,] test=x[-index,] cl.train=cl[index] cl.test=cl[-index] out.m=theta.cl(train,test,cl.train,0.5,cl.test) out.m$me.test misc(out.m$cl.test,cl.test)
data(ais) x=ais[,3:13] cl=as.double(ais[,1]) set.seed(22) index=sample(1:202,152,replace=FALSE) train=x[index,] test=x[-index,] cl.train=cl[index] cl.test=cl[-index] out.m=theta.cl(train,test,cl.train,0.5,cl.test) out.m$me.test misc(out.m$cl.test,cl.test)