R/kNN.confusionmatrix.R
kNN.confusionmatrix.Rd
Computes confusion matrices (one for each value of \(k\)) using \(k\)-NN classification from the results of two parametric bootstraps, one of these being labelled a holdout set and tested against the other one.
kNN.confusionmatrix( df, df.holdout, k, ties = "model2", print_genargs = TRUE, verbose = TRUE )
df | |
---|---|
df.holdout | |
k | Number of neighbours to consider in k-NN classification; may be a vector of integers |
ties | Which way to break ties in k-NN classification (see |
print_genargs | Should the generator arguments of the holdout distribution be included in the output? (See Details) |
verbose | If |
A data frame with the following columns:
k
Number of nearest neighbours
P
Number of positives
N
Number of negatives
TP
Number of true positives
FP
Number of false positives
TN
Number of true negatives
FN
Number of false negatives
alpha
Type I error (false positive) rate; equal to FP
divided by N
beta
Type II error (false negative) rate; equal to FN
divided by P
In addition to these columns, if print_genargs == TRUE
, each argument that was passed via genargs1
and genargs2
to pbcm.di
or pbcm.du
to generate df.holdout
is included as a column of its own.
The function takes each DeltaGoF
value from df.holdout
, compares it against the DeltaGoF
distributions in df
, and decides based on \(k\)-NN classification. By convention, we take model 2 as the null hypothesis and model 1 as the alternative. Hence a false positive, for instance, means the situation where model 2 generated the data but the decision was in favour of model 1.
x <- seq(from=0, to=1, length.out=100) mockdata <- data.frame(x=x, y=x + rnorm(100, 0, 0.5)) myfitfun <- function(data, p) { res <- nls(y~a*x^p, data, start=list(a=1.1)) list(a=coef(res), GoF=deviance(res)) } mygenfun <- function(model, p) { x <- seq(from=0, to=1, length.out=100) y <- model$a*x^p + rnorm(100, 0, 0.5) data.frame(x=x, y=y) } pb1 <- pbcm.di(data=mockdata, fun1=myfitfun, fun2=myfitfun, genfun1=mygenfun, genfun2=mygenfun, reps=20, args1=list(p=1), args2=list(p=2), genargs1=list(p=1), genargs2=list(p=2))#> Initializing output data frame... #> Bootstrapping... #> | | | 0% | |==== | 5% | |======= | 10% | |========== | 15% | |============== | 20% | |================== | 25% | |===================== | 30% | |======================== | 35% | |============================ | 40% | |================================ | 45% | |=================================== | 50% | |====================================== | 55% | |========================================== | 60% | |============================================== | 65% | |================================================= | 70% | |==================================================== | 75% | |======================================================== | 80% | |============================================================ | 85% | |=============================================================== | 90% | |================================================================== | 95% | |======================================================================| 100%pb2 <- pbcm.di(data=mockdata, fun1=myfitfun, fun2=myfitfun, genfun1=mygenfun, genfun2=mygenfun, reps=20, args1=list(p=1), args2=list(p=2), genargs1=list(p=1), genargs2=list(p=2))#> Initializing output data frame... #> Bootstrapping... #> | | | 0% | |==== | 5% | |======= | 10% | |========== | 15% | |============== | 20% | |================== | 25% | |===================== | 30% | |======================== | 35% | |============================ | 40% | |================================ | 45% | |=================================== | 50% | |====================================== | 55% | |========================================== | 60% | |============================================== | 65% | |================================================= | 70% | |==================================================== | 75% | |======================================================== | 80% | |============================================================ | 85% | |=============================================================== | 90% | |================================================================== | 95% | |======================================================================| 100%kNN.confusionmatrix(df=pb1, df.holdout=pb2, k=1:10)#> genargs1_p genargs2_p k P N TP FP TN FN alpha beta #> 1 1 2 1 20 20 14 4 16 6 0.20 0.3 #> 2 1 2 2 20 20 14 4 16 6 0.20 0.3 #> 3 1 2 3 20 20 12 3 17 8 0.15 0.4 #> 4 1 2 4 20 20 12 3 17 8 0.15 0.4 #> 5 1 2 5 20 20 14 4 16 6 0.20 0.3 #> 6 1 2 6 20 20 14 4 16 6 0.20 0.3 #> 7 1 2 7 20 20 14 4 16 6 0.20 0.3 #> 8 1 2 8 20 20 14 4 16 6 0.20 0.3 #> 9 1 2 9 20 20 14 4 16 6 0.20 0.3 #> 10 1 2 10 20 20 14 4 16 6 0.20 0.3