k Nearest Neighbours Classification

Carry out \(k\) Nearest Neighbours (\(k\)-NN) classification on the results of a parametric boostrap.

kNN.classification(df, DeltaGoF.emp, k, ties = "model2", verbose = TRUE)

Arguments

df	Results of bootstrap; the output of `pbcm.di` or `pbcm.du`
DeltaGoF.emp	Empirical value of goodness of fit (e.g. from `empirical.GoF`)
k	Number of neighbours to employ in classification; may be a vector of integers
ties	Which way should ties (when distance to the two distributions is equal) be broken? By default, we break in favour of model 2, taking this to be the null model in the comparison.
verbose	If `TRUE`, warnings are issued to the console

Value

A data frame containing the computed distances and decisions, one row per each value of k

Details

Calculates the cumulative distance (sum of squared differences) of DeltaGoF.emp to both DeltaGoF distributions found in df (i.e. one with model 1 as generator and one with model 2 as generator), taking into account the k nearest neighbours only. Decides in favour of model 1 if this cumulative distance to the model 1 distribution is smaller than than the distance to model 2, and vice versa. If distances are equal, decision is made according to the ties argument.

References

Schultheis, H. & Singhaniya, A. (2015) Decision criteria for model comparison using the parametric bootstrap cross-fitting method. Cognitive Systems Research, 33, 100–121. https://doi.org/10.1016/j.cogsys.2014.09.003

Examples

x <- seq(from=0, to=1, length.out=100)
mockdata <- data.frame(x=x, y=x + rnorm(100, 0, 0.5))

myfitfun <- function(data, p) {
  res <- nls(y~a*x^p, data, start=list(a=1.1))
  list(a=coef(res), GoF=deviance(res))
}

mygenfun <- function(model, p) {
  x <- seq(from=0, to=1, length.out=100)
  y <- model$a*x^p + rnorm(100, 0, 0.5)
  data.frame(x=x, y=y)
}

pb <- pbcm.di(data=mockdata, fun1=myfitfun, fun2=myfitfun, genfun1=mygenfun,
        genfun2=mygenfun, reps=20, args1=list(p=1), args2=list(p=2),
        genargs1=list(p=1), genargs2=list(p=2))
#> Initializing output data frame...
#> Bootstrapping...
#> 
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |====                                                                  |   5%
  |                                                                            
  |=======                                                               |  10%
  |                                                                            
  |==========                                                            |  15%
  |                                                                            
  |==============                                                        |  20%
  |                                                                            
  |==================                                                    |  25%
  |                                                                            
  |=====================                                                 |  30%
  |                                                                            
  |========================                                              |  35%
  |                                                                            
  |============================                                          |  40%
  |                                                                            
  |================================                                      |  45%
  |                                                                            
  |===================================                                   |  50%
  |                                                                            
  |======================================                                |  55%
  |                                                                            
  |==========================================                            |  60%
  |                                                                            
  |==============================================                        |  65%
  |                                                                            
  |=================================================                     |  70%
  |                                                                            
  |====================================================                  |  75%
  |                                                                            
  |========================================================              |  80%
  |                                                                            
  |============================================================          |  85%
  |                                                                            
  |===============================================================       |  90%
  |                                                                            
  |==================================================================    |  95%
  |                                                                            
  |======================================================================| 100%

emp <- empirical.GoF(mockdata, fun1=myfitfun, fun2=myfitfun,
                     args1=list(p=1), args2=list(p=2))

kNN.classification(df=pb, DeltaGoF.emp=emp$DeltaGoF, k=c(10, 20))
#>    k dist_model1 dist_model2 decision
#> 1 10     19.5711     73.4745   model1
#> 2 20    112.9116    342.9465   model1

Arguments

Value

Details

References

See also

Examples