Skip to contents

Introduction

The AICcPermanova package provides an R implementation of model selection for Permanovas from the vegan package. This package enables users to generate all possible models for a given set of variables, calculate AICc values for those models, and perform model selection based on a given threshold.

Installation

To install this package, you can use the remotes package to install its development version:

remotes::install_github("Sustainscapes/AICcPerm")

Alternatively, you can install the stable version from CRAN:

install.packages("AICcPermanova")

Generating all possible models

To generate all possible models, you can use the make_models function. For example:

library(AICcPermanova)
AllModels <- make_models(vars = c("pH", "Sand", "Clay"), ncores = 1)

This function will create a table of all possible models for the specified variables, which you can see in the following table:

form
Distance ~ pH
Distance ~ Sand
Distance ~ Clay
Distance ~ pH + Sand
Distance ~ pH + Clay
Distance ~ Sand + Clay
Distance ~ pH + Sand + Clay
Distance ~ 1

Where you can see all possible models for those 3 variables.

Calculating AICc

To calculate AICc, you can use the AICc_permanova2 function. Here’s an example of how to use it:

library(vegan)
#> Loading required package: permute
#> Loading required package: lattice
#> This is vegan 2.6-4
data(dune)
data(dune.env)

# Run PERMANOVA using adonis2

Model <- adonis2(dune ~ Management * A1, data = dune.env)

# Calculate AICc
Table_permanova <- AICc_permanova2(Model)

The results of this calculation are displayed in the following table:

knitr::kable(Table_permanova)
AICc k N
-19.06395 8 20

In this example, we used the adonis2 function to run a PERMANOVA analysis on the dune dataset with the Management and A1 variables. We then calculated AICc using the AICc_permanova2 function

Full example

In this section, we’ll provide a complete example of the AICcPerm package workflow. First, we need to load datasets from the vegan package:

library(vegan)
data(dune)
data(dune.env)

Next, we’ll generate all possible first-order models for this dataset:

AllModels <- make_models(vars = c("A1", "Moisture", "Management", "Use", "Manure"), ncores = 1)
#> 1 of 5 ready 2024-01-22 11:58:38.793035
#> 2 of 5 ready 2024-01-22 11:58:40.301002
#> 3 of 5 ready 2024-01-22 11:58:41.860968
#> 4 of 5 ready 2024-01-22 11:58:43.405187
#> 5 of 5 ready 2024-01-22 11:58:44.980857

This results in 32 possible models, which are shown in the following table:

form
Distance ~ A1
Distance ~ Moisture
Distance ~ Management
Distance ~ Use
Distance ~ Manure
Distance ~ A1 + Moisture
Distance ~ A1 + Management
Distance ~ A1 + Use
Distance ~ A1 + Manure
Distance ~ Moisture + Management
Distance ~ Moisture + Use
Distance ~ Moisture + Manure
Distance ~ Management + Use
Distance ~ Management + Manure
Distance ~ Use + Manure
Distance ~ A1 + Moisture + Management
Distance ~ A1 + Moisture + Use
Distance ~ A1 + Moisture + Manure
Distance ~ A1 + Management + Use
Distance ~ A1 + Management + Manure
Distance ~ A1 + Use + Manure
Distance ~ Moisture + Management + Use
Distance ~ Moisture + Management + Manure
Distance ~ Moisture + Use + Manure
Distance ~ Management + Use + Manure
Distance ~ A1 + Moisture + Management + Use
Distance ~ A1 + Moisture + Management + Manure
Distance ~ A1 + Moisture + Use + Manure
Distance ~ A1 + Management + Use + Manure
Distance ~ Moisture + Management + Use + Manure
Distance ~ A1 + Moisture + Management + Use + Manure
Distance ~ 1

Avoiding multicollinearity

After generating all the models, it’s important to check for multicollinearity. We can use the filter_vif function to filter out models that have a high degree of collinearity (defined as having a maximum value of VIF of 5 or more):

NonColinear <- filter_vif(all_forms = AllModels, env_data = dune.env, ncores = 1)

This reduces the number of models to 21

Fittng the models

After filtering out collinear models, we can fit all the remaining non-collinear models by using the fit_models function:

Fitted <- fit_models(
  all_forms = NonColinear,
  com_data = dune,
  env_data = dune.env,
  ncores = 1,
  method = "bray"
)
#> Warning in e$fun(obj, substitute(ex), parent.frame(), e$data): already
#> exporting variable(s): env_data

This results in a table of fitted models ordered by AICc, which is displayed below:

form max_vif AICc k N A1 Moisture Management Use Manure Model
Distance ~ Moisture 0.000000 -30.363195 4 20 NA 0.4019903 NA NA NA NA
Distance ~ A1 0.000000 -29.723474 2 20 0.1681666 NA NA NA NA NA
Distance ~ 1 0.000000 -28.524673 1 20 NA NA NA NA NA 0
Distance ~ Management 0.000000 -28.439405 4 20 NA NA 0.3416107 NA NA NA
Distance ~ A1 + Moisture 3.000000 -28.211489 5 20 0.0423034 0.2761272 NA NA NA NA
Distance ~ A1 + Management 3.000000 -28.206906 5 20 0.1025557 NA 0.2759998 NA NA NA
Distance ~ A1 + Use 2.000000 -27.243308 4 20 0.1723656 NA NA 0.1328680 NA NA
Distance ~ Moisture + Use 3.000000 -26.706730 6 20 NA 0.3850987 NA 0.1117773 NA NA
Distance ~ Moisture + Management 3.000000 -26.219635 7 20 NA 0.2678801 0.2075005 NA NA NA
Distance ~ Use 0.000000 -26.001561 3 20 NA NA NA 0.1286690 NA NA
Distance ~ Manure 0.000000 -25.214897 5 20 NA NA NA NA 0.3544714 NA
Distance ~ A1 + Manure 4.000000 -25.187448 6 20 0.1209209 NA NA NA 0.3072258 NA
Distance ~ A1 + Moisture + Use 3.000000 -24.004036 7 20 0.0499753 0.2627084 NA 0.1194492 NA NA
Distance ~ Management + Use 3.000000 -23.493146 6 20 NA NA 0.3003444 0.0874027 NA NA
Distance ~ Moisture + Manure 4.000000 -22.939843 8 20 NA 0.3005223 NA NA 0.2530035 NA
Distance ~ A1 + Moisture + Management 3.000000 -22.910428 8 20 0.0449952 0.2103196 0.2101923 NA NA NA
Distance ~ A1 + Management + Use 3.000000 -21.475776 7 20 0.0759437 NA 0.2039225 0.0607907 NA NA
Distance ~ Use + Manure 4.000000 -19.071272 7 20 NA NA NA 0.0872434 0.3130459 NA
Distance ~ A1 + Moisture + Manure 4.508169 -18.590778 9 20 0.0414517 0.2210532 NA NA 0.2521518 NA
Distance ~ Moisture + Management + Use 3.648562 -15.655183 9 20 NA 0.2194405 0.1346862 0.0389631 NA NA
Distance ~ A1 + Moisture + Management + Use 4.710040 -8.837299 10 20 0.0274588 0.1709557 0.1121697 0.0214267 NA NA

If there is a block variable to be used (such as Use in the dune.env object), you can specify it using the strata argument:

Fitted2 <- fit_models(
  all_forms = NonColinear,
  com_data = dune,
  env_data = dune.env,
  ncores = 1,
  method = "bray",
  strata = "Use"
)
#> Warning in e$fun(obj, substitute(ex), parent.frame(), e$data): already
#> exporting variable(s): env_data

This results in a table of fitted models that takes into account the block variable, which is displayed below:

form max_vif AICc k N A1 Moisture Management Use Manure Model
Distance ~ Moisture 0.000000 -30.363195 4 20 NA 0.4019903 NA NA NA NA
Distance ~ A1 0.000000 -29.723474 2 20 0.1681666 NA NA NA NA NA
Distance ~ 1 0.000000 -28.524673 1 20 NA NA NA NA NA 0
Distance ~ Management 0.000000 -28.439405 4 20 NA NA 0.3416107 NA NA NA
Distance ~ A1 + Moisture 3.000000 -28.211489 5 20 0.0423034 0.2761272 NA NA NA NA
Distance ~ A1 + Management 3.000000 -28.206906 5 20 0.1025557 NA 0.2759998 NA NA NA
Distance ~ A1 + Use 2.000000 -27.243308 4 20 0.1723656 NA NA 0.1328680 NA NA
Distance ~ Moisture + Use 3.000000 -26.706730 6 20 NA 0.3850987 NA 0.1117773 NA NA
Distance ~ Moisture + Management 3.000000 -26.219635 7 20 NA 0.2678801 0.2075005 NA NA NA
Distance ~ Use 0.000000 -26.001561 3 20 NA NA NA 0.1286690 NA NA
Distance ~ Manure 0.000000 -25.214897 5 20 NA NA NA NA 0.3544714 NA
Distance ~ A1 + Manure 4.000000 -25.187448 6 20 0.1209209 NA NA NA 0.3072258 NA
Distance ~ A1 + Moisture + Use 3.000000 -24.004036 7 20 0.0499753 0.2627084 NA 0.1194492 NA NA
Distance ~ Management + Use 3.000000 -23.493146 6 20 NA NA 0.3003444 0.0874027 NA NA
Distance ~ Moisture + Manure 4.000000 -22.939843 8 20 NA 0.3005223 NA NA 0.2530035 NA
Distance ~ A1 + Moisture + Management 3.000000 -22.910428 8 20 0.0449952 0.2103196 0.2101923 NA NA NA
Distance ~ A1 + Management + Use 3.000000 -21.475776 7 20 0.0759437 NA 0.2039225 0.0607907 NA NA
Distance ~ Use + Manure 4.000000 -19.071272 7 20 NA NA NA 0.0872434 0.3130459 NA
Distance ~ A1 + Moisture + Manure 4.508169 -18.590778 9 20 0.0414517 0.2210532 NA NA 0.2521518 NA
Distance ~ Moisture + Management + Use 3.648562 -15.655183 9 20 NA 0.2194405 0.1346862 0.0389631 NA NA
Distance ~ A1 + Moisture + Management + Use 4.710040 -8.837299 10 20 0.0274588 0.1709557 0.1121697 0.0214267 NA NA

Model Selection

To select models, we use the select_models function, which chooses models with a maximum VIF of less than 5 and a delta AICc less than a specified threshold, delta_aicc. By default, delta_aicc is set to 2.

We can use the Fitted2 object generated in the previous section to select models:

Selected <- select_models(Fitted2)

The resulting table displays the selected models:

form max_vif AICc k N A1 Moisture Management Model DeltaAICc AICWeight
Distance ~ Moisture 0 -30.36319 4 20 NA 0.4019903 NA NA 0.0000000 0.3988462
Distance ~ A1 0 -29.72347 2 20 0.1681666 NA NA NA 0.6397207 0.2896622
Distance ~ 1 0 -28.52467 1 20 NA NA NA 0 1.8385219 0.1590653
Distance ~ Management 0 -28.43941 4 20 NA NA 0.3416107 NA 1.9237896 0.1524263

Note that the models in the table satisfy the criteria for maximum VIF and delta AICc as specified in the select_models function.

Summary weighted by AICc

finally you can do a summarized r squared weighted by AICc using the akaike_adjusted_rsq function as seen bellow:

Summary <- akaike_adjusted_rsq(Selected)

which results in the following table:

Variable Full_Akaike_Adjusted_RSq Number_of_models
Moisture 0.1603323 1
Management 0.0520704 1
A1 0.0487115 1

Conclusion

The AICcPerm package provides an easy-to-use implementation of model selection based on AICc, which is a useful tool for selecting the best model from a set of candidate models. By calculating AICc values for each model, we can compare them and select the model with the lowest AICc value. This method takes into account both model complexity and goodness-of-fit, making it a valuable approach for selecting models that balance these two factors.

Furthermore, the AICcPermanova package offers additional tools for model selection, such as model averaging and model comparison based on likelihood ratio tests. These can be used to further refine model selection and improve the accuracy of model predictions.

Overall, the AICcPermanova package is a powerful tool for model selection in a variety of contexts, including ecology, biology, and other fields that use statistical modeling to analyze complex data. By selecting the best model from a set of candidate models, researchers can make more accurate predictions and draw more reliable conclusions from their data.