Carry out \(k\) Nearest Neighbours (\(k\)-NN) classification on the results of a parametric boostrap.

kNN.classification(df, DeltaGoF.emp, k, ties = "model2", verbose = TRUE)

Arguments

df

Results of bootstrap; the output of pbcm.di or pbcm.du

DeltaGoF.emp

Empirical value of goodness of fit (e.g. from empirical.GoF)

k

Number of neighbours to employ in classification; may be a vector of integers

ties

Which way should ties (when distance to the two distributions is equal) be broken? By default, we break in favour of model 2, taking this to be the null model in the comparison.

verbose

If TRUE, warnings are issued to the console

Value

A data frame containing the computed distances and decisions, one row per each value of k

Details

Calculates the cumulative distance (sum of squared differences) of DeltaGoF.emp to both DeltaGoF distributions found in df (i.e. one with model 1 as generator and one with model 2 as generator), taking into account the k nearest neighbours only. Decides in favour of model 1 if this cumulative distance to the model 1 distribution is smaller than than the distance to model 2, and vice versa. If distances are equal, decision is made according to the ties argument.

References

Schultheis, H. & Singhaniya, A. (2015) Decision criteria for model comparison using the parametric bootstrap cross-fitting method. Cognitive Systems Research, 33, 100–121. https://doi.org/10.1016/j.cogsys.2014.09.003

See also

Examples

x <- seq(from=0, to=1, length.out=100) mockdata <- data.frame(x=x, y=x + rnorm(100, 0, 0.5)) myfitfun <- function(data, p) { res <- nls(y~a*x^p, data, start=list(a=1.1)) list(a=coef(res), GoF=deviance(res)) } mygenfun <- function(model, p) { x <- seq(from=0, to=1, length.out=100) y <- model$a*x^p + rnorm(100, 0, 0.5) data.frame(x=x, y=y) } pb <- pbcm.di(data=mockdata, fun1=myfitfun, fun2=myfitfun, genfun1=mygenfun, genfun2=mygenfun, reps=20, args1=list(p=1), args2=list(p=2), genargs1=list(p=1), genargs2=list(p=2))
#> Initializing output data frame... #> Bootstrapping... #> | | | 0% | |==== | 5% | |======= | 10% | |========== | 15% | |============== | 20% | |================== | 25% | |===================== | 30% | |======================== | 35% | |============================ | 40% | |================================ | 45% | |=================================== | 50% | |====================================== | 55% | |========================================== | 60% | |============================================== | 65% | |================================================= | 70% | |==================================================== | 75% | |======================================================== | 80% | |============================================================ | 85% | |=============================================================== | 90% | |================================================================== | 95% | |======================================================================| 100%
emp <- empirical.GoF(mockdata, fun1=myfitfun, fun2=myfitfun, args1=list(p=1), args2=list(p=2)) kNN.classification(df=pb, DeltaGoF.emp=emp$DeltaGoF, k=c(10, 20))
#> k dist_model1 dist_model2 decision #> 1 10 19.5711 73.4745 model1 #> 2 20 112.9116 342.9465 model1