Carry out \(k\) Nearest Neighbours (\(k\)-NN) classification on the results of a parametric boostrap.
kNN.classification(df, DeltaGoF.emp, k, ties = "model2", verbose = TRUE)
df | |
---|---|
DeltaGoF.emp | Empirical value of goodness of fit (e.g. from |
k | Number of neighbours to employ in classification; may be a vector of integers |
ties | Which way should ties (when distance to the two distributions is equal) be broken? By default, we break in favour of model 2, taking this to be the null model in the comparison. |
verbose | If |
A data frame containing the computed distances and decisions, one row per each value of k
Calculates the cumulative distance (sum of squared differences) of DeltaGoF.emp
to both DeltaGoF
distributions found in df
(i.e. one with model 1 as generator and one with model 2 as generator), taking into account the k
nearest neighbours only. Decides in favour of model 1 if this cumulative distance to the model 1 distribution is smaller than than the distance to model 2, and vice versa. If distances are equal, decision is made according to the ties
argument.
Schultheis, H. & Singhaniya, A. (2015) Decision criteria for model comparison using the parametric bootstrap cross-fitting method. Cognitive Systems Research, 33, 100–121. https://doi.org/10.1016/j.cogsys.2014.09.003
x <- seq(from=0, to=1, length.out=100) mockdata <- data.frame(x=x, y=x + rnorm(100, 0, 0.5)) myfitfun <- function(data, p) { res <- nls(y~a*x^p, data, start=list(a=1.1)) list(a=coef(res), GoF=deviance(res)) } mygenfun <- function(model, p) { x <- seq(from=0, to=1, length.out=100) y <- model$a*x^p + rnorm(100, 0, 0.5) data.frame(x=x, y=y) } pb <- pbcm.di(data=mockdata, fun1=myfitfun, fun2=myfitfun, genfun1=mygenfun, genfun2=mygenfun, reps=20, args1=list(p=1), args2=list(p=2), genargs1=list(p=1), genargs2=list(p=2))#> Initializing output data frame... #> Bootstrapping... #> | | | 0% | |==== | 5% | |======= | 10% | |========== | 15% | |============== | 20% | |================== | 25% | |===================== | 30% | |======================== | 35% | |============================ | 40% | |================================ | 45% | |=================================== | 50% | |====================================== | 55% | |========================================== | 60% | |============================================== | 65% | |================================================= | 70% | |==================================================== | 75% | |======================================================== | 80% | |============================================================ | 85% | |=============================================================== | 90% | |================================================================== | 95% | |======================================================================| 100%emp <- empirical.GoF(mockdata, fun1=myfitfun, fun2=myfitfun, args1=list(p=1), args2=list(p=2)) kNN.classification(df=pb, DeltaGoF.emp=emp$DeltaGoF, k=c(10, 20))#> k dist_model1 dist_model2 decision #> 1 10 19.5711 73.4745 model1 #> 2 20 112.9116 342.9465 model1