Spatiotemporal resampling methods for mlr3

Last update: Nov 21, 2022

Related tags

Deep Learning r cross-validation spatial resampling r-package temporal resampling-methods mlr3

Overview

mlr3spatiotempcv

Package website: release | dev

Spatiotemporal resampling methods for mlr3.

This package extends the mlr3 package framework with spatiotemporal resampling and visualization methods.

If you prefer the tidymodels ecosystem, have a look at the {spatialsample} package for spatial sampling methods.

Installation

CRAN version

install.packages("mlr3spatiotempcv")

Development version

remotes::install_github("mlr-org/mlr3spatiotempcv")

# R Universe Repo
install.packages('mlr3spatiotempcv', mlrorg = 'https://mlr-org.r-universe.dev')

Get Started

See the "Get Started" vignette for a quick introduction.

For more detailed information including an usage example see the "Spatiotemporal Analysis" chapter in the mlr3book.

Article "Spatiotemporal Visualization" shows how 3D subplots grids can be created.

Citation

To cite the package in publications, use the output of citation("mlr3spatiotempcv").

Other spatiotemporal resampling packages

This list does not claim to be comprehensive.

Name	Language	Resources
blockCV	R	Paper, CRAN
CAST	R	Paper, CRAN
ENMeval	R	Paper, CRAN
spatialsample	R	CRAN
sperrorest	R	Paper, CRAN
Pyspatialml	Python	GitHub
spacv	Python	GitHub

FAQ

Which resampling method should I use?

There is no single-best resampling method. It depends on your dataset characteristics and what your model should is about to predict on. The resampling scheme should reflect the final purpose of the model - this concept is called "target-oriented" resampling. For example, if the model was trained on multiple forest plots and its purpose is to predict something on unknown forest stands, the resampling structure should reflect this.

Are there more resampling methods than the one {mlr3spatiotempcv} offers?

{mlr3spatiotempcv} aims to offer all resampling methods that exist in R. Though this does not mean that it covers all resampling methods. If there are some that you are missing, feel free to open an issue.

How can I use the "blocking" concept of the old {mlr}?

This concept is now supported via the "column roles" concept available in {mlr3} [Task](https://mlr3.mlr-org.com/reference/Task.html) objects. See [this documentation](https://mlr3.mlr-org.com/reference/Resampling.html#grouping-blocking) for more information.

For the methods that offer buffering, how can an appropriate value be chosen?

There is no easy answer to this question. Buffering train and test sets reduces the similarity between both. The degree of this reduction depends on the dataset itself and there is no general approach how to choosen an appropriate buffer size. Some studies used the distance at which the autocorrelation levels off. This buffer distance often removes quite a lot of observations and needs to be calculated first.

Comments

Support resampling method based on predefined spatiotemporal groups

Just as CAST::CreateSpacetimeFolds() does.

I am not sure if this approach can work with all currently implemented spatial sampling methods. Even if not, we should support exactly this way of creating resamplings since some people already asked me exactly for this. @HannaMeyer Is there a dedicated name for your method? If not, do you want to make a proposal? :) You can have a look at the current names of the other methods in the README.

It seems that @jannes-m has added temporal extension support for spcv-coords already. Let's have a look how this works in detail.

opened by pat-s 15
Support presence-background option in "Spatial Buffer CV"

If the target has a binary outcome, a presence-background approach (see blockCV::buffering) would be possible. Target needs to be transformed to 0/1 before sampling.

opened by be-marc 11
CRS-related warning in autoplot seems incoherent

autoplot throws the following warning when some of the resampling methods are applied to the ecuador task (see application of methods "spcv_disc" and "spcv_block" in the manuscript):

CRS not set, transforming to WGS84 (EPSG: 4326).

This doesn't seem to make sense, as a transformation to WGS84 is not possible when the source CRS is unknown. As far as I can see the ecuador dataset and task contains only UTM coordinates, not even lat/lon, therefore it is also not possibly to just assume that WGS84 is present or to guess which UTM zone is applicable. Just a minor issue, but potentially confusing...
Priority: Low

opened by alexanderbrenning 9
Instantiate spcv_coords for AutoTuner
Dear mlr3 team,

first of all, thanks for your efforts in developing this extension package, it is very much appreciated.

I am trying to apply spatial CV using "spcv_coords" to an AutoTuner in order to retrieve nested resampling following the process described in the mlr3 book

RT.at_sp <- AutoTuner$new( learner = reg.tree, resampling = spatial_CV, measure = opt.mse, search_space = param_set_RT, terminator = trm.evals, tuner = tnr.GridSearch)

However, I end up with the error message:

"Error: Resampling 'spcv_coords' may not be instantiated".

The same error message remains, even if I try to instantiate the task manually beforehand using the command

spatial_CV$instantiate(sp_task)

as described in 2.5.2.

As I am not an expert, do I make something wrong, or is spatial CV not yet implemented for use with AutoTuner?

Thank you very much! BR, Jürgen
Type: Question
opened by jue-d 9
Planar versus great circle distance

When using spatial object with unprojected CRS (i.e. lat/lon), does mlr3spatiotempcv use great circle distance (on the ellipsoid) or Euclidean distances based on lat/lon values? Is this handled consistently across resampling tools, e.g. buffering and clustering? This should be clarified in the paper, but perhaps it should be turned into a feature request...
Priority: Low Type: Question

opened by alexanderbrenning 8
Visualization
#6

plot.ResamplingSpCVBlock, plot.ResamplingSpCVEnv, plot.ResamplingSpCVKmeans are the same. We could create a super class to just have one plotting function. plot.ResamplingSpCVBuffer is different because it is a leave-one-out cross-validation, which cannot be visualized in the same way.

[x] Single fold plot

[x] Single train-test plot

[x] Multi train-test plot

[x] Unify redundant code

[x] Update documentation

[x] add tests

Examples are in inst/mlr3spatiotemporal_test.R at the end of the file.
opened by be-marc 8

Checkerboard pattern with spcv_block?

Dear mlr3spatiotempcv team,

First, many thanks for your hard work on this excellent resource.

I am having an issues producing a checkerboard sampling pattern using spcv_block. Instead of getting a checkerboard spatial partitioning, I always get something that looks more like a random sampling pattern. I have been successful creating a checkerboard pattern using the blockCV functions directly.

Here is a reproducible example that fails to produce a checkerboard sampling pattern:

library(blockCV)
library(mlr3)
library(mlr3spatiotempcv)

x <- runif(5000, -80.5, -75)
y <- runif(5000, 39.7, 42)

data <- data.frame(spp="test", 
                   label=factor(round(runif(length(x), 0, 1))),
                   x=x,
                   y=y)

testTask <- TaskClassifST$new(id = "test", 
                              backend = data, 
                              target = "label",
                              positive="1",
                              extra_args = list(coordinate_names=c("x", "y"),
                                                crs="EPSG: 4326"))

blockSamp <- rsmp("spcv_block",
                  folds=2,
                  range=50000,
                  selection="checkerboard")
blockSamp$instantiate(testTask)
autoplot(blockSamp, testTask)

Rplot01

Priority: High Status: In Progress Type: Bug

opened by fitzLab-AL 7

Code Review
Be reasonable with dependencies. E.g., we do not need stringr for str_detect, just use grepl() instead.

Some examples are in DONTRUN. Can we put them in if (requireNamespace(...)) blocks instead?

ResamplingSpCVBuffer looks similar to LOO. If this is the case, the instance should be stored more efficiently.

autoplot tests are not working for me or are waaay to slow for unit tests (I'm stuck in there with 100% CPU)

blockCV::spatialBlock() seems to call print(). It is convention (maybe even a CRAN policy?) to use message() which you can be suppressed. This should be reported upstream.

blockCV is terrible slow (and has a stupid long dependency chain). Is there an alternative?

Suggested packages should be explicitly attached with require_namespaces.
opened by mllg 7

`.$folds()` of all Repeated* classes returns wrong fold number

library(mlr3spatiotempcv)
library(mlr3)
rsp <- rsmp("repeated_spcv_coords", folds = 3, repeats = 5)
rsp$instantiate(tsk("ecuador"))

# should return 3
rsp$folds(6)
#> [1] 1

^{Created on 2021-04-17 by the reprex package (v2.0.0)}

This is because in https://github.com/mlr-org/mlr3spatiotempcv/blob/b9ded4ac098655dc00c300b48426bd6d4cd0a97a/R/ResamplingRepeatedSpCVCoords.R#L54 %% is used whereas it should be %/%.

But there is more to it - I think the method should look like

    folds = function(iters) {
      iters = assert_integerish(iters, any.missing = FALSE, coerce = TRUE)
      n_folds = ((self$iters - 1L) %/% as.integer(self$param_set$values$repeats)) + 1L

      if (all(iters <= n_folds)) {
        return(iters)
      } else {
        # modify all entries which are > n_folds
        iters[which(iters > n_folds)] = iters[which(iters > n_folds)] - n_folds
        return(iters)
      }
    }

Priority: High Status: Accepted Type: Bug

opened by pat-s 6

Expand Table listing all resampling methods
[x] add some use cases for each method

[x] list more implementations for each method (eventually also in other languages?)

Also consider to move methods operating in the feature space into a distinct group, e.g.:

spatial

spatiotemporal

feature space
opened by pat-s 6
New `Task*ST` API, consolidate `autoplot()`
arguments crs, coordinate_names and coords_as_features are now passed directly in the constructor instead of list extra_args

added argument label

improved as_task_* converters

Column role coordinates was renamed to coordinate to cope with the singular naming of column roles

Task printer only returns the first 10 coordinate rows

Consolidated autoplot() code internally

Improved CLUTO test setup

fixes #116
opened by pat-s 5
Please remove dependencies on **rgdal**, **rgeos**, and/or **maptools**

This package depends on (depends, imports or suggests) raster and one or more of the retiring packages rgdal, rgeos or maptools (https://r-spatial.org/r/2022/04/12/evolution.html, https://r-spatial.org/r/2022/12/14/evolution2.html). Since raster 3.6.3, all use of external FOSS library functionality has been transferred to terra, making the retiring packages very likely redundant. It would help greatly if you could remove dependencies on the retiring packages as soon as possible.

opened by rsbivand 0
`as_task_*_st` and friends could allow setting column roles directly
We could support this via the ellipsis. Otherwise setting the respective column roles could be somewhat easily forgotten. On the other hand behaviour would differ than compared to as_task_*() from mlr3 as such custom conversions would not be supported there.

@be-marc what do you think?

Example:

data("cookfarm_sample", package = "mlr3spatiotempcv") # data.frame as_task_regr_st(cookfarm_sample, target = "PHIHOX", coords_as_features = FALSE, crs = 26911, coordinate_names = c("x", "y"), column_role_space = "foo", column_role_time = "time" ) ```
Priority: Low Status: Review Needed Type: Optimization
opened by pat-s 2
Longterm play of Task*ST and DataBackends
With the addition of spatial DataBackends (DataBackendVector and DataBackendRaster) from {mlr3spatial} multiple combinations of Tasks and Backends are possible:

Task + spatial backends

Task*ST + non-spatial backend

Task*ST + spatial backend

Moving forward and to simplify both usage and development, we should pick one combination as the "recommended" one and potentially issue warnings for others.

cc @be-marc
Priority: Medium Status: In Progress Type: Optimization
opened by pat-s 1
New SpCV method Zalazar et al.

Unfortunately the GH repo leads to a 404. Contacted the author, he wants to fix it.

https://www.sciencedirect.com/science/article/abs/pii/S0920410521015023

opened by pat-s 0
Temporal CV
I currently have a task with a column that is a date. As the task is to basically predict values in the future, a cross-validation strategy that can take this into account would be required. Similar to see RollingWindowCV. As this is a very common use-case, we should perhaps think about implementing this.

This is implemented in mlr3forecasting, but for forecasting tasks instead of regular Classif|Regr Tasks.

Where should such a method live? mlr3spatiotempcv ?

How would we go about implementing this.

Priority: Medium Status: Accepted Type: Enhancement
opened by pfistfl 13

Releases(v2.0.3)

v2.0.3(Nov 19, 2022)
add label support for built-in tasks

adhere to CRAN "noSuggests" policy

wrap some long running examples in donttest{}

Source code(tar.gz)
Source code(zip)
v2.0.2(Aug 9, 2022)
Add error message when trying to create a TaskClassifST or TaskRegrST from an sf object

Synchronize TaskClassifST or TaskRegrST with {mlr3spatial}

Add support for mlr_reflections changes in {mlr3} > 0.13.4

Adjust "Getting Started" vignette to recent API changes

autoplot.ResamplingSptCVCstf(): Add missing support for argument axis_label_fontsize for x and y axes

Source code(tar.gz)
Source code(zip)
v2.0.1(Jun 23, 2022)
Bugfixes

autoplot.ResamplingSptCVCstf: when multiple folds are requested, the subplots are now returned again (before, the return was empty)

autoplot.ResamplingSptCVCstf: the legend item for the "omitted" observations now displays the correct color and label again

Source code(tar.gz)
Source code(zip)
v2.0.0(Jun 15, 2022)
Breaking

Rename task cookfarm to cookfarm_mlr3. This was done to distinguish the cookfarm task implementation in {mlr3} better from the original cookfarm dataset. cookfarm_mlr3 also now comes with all rows of the upstream cookfarm task and not with a random subset as before.

Rewrite mlr_resampling_spctcv_cstf implementation. The method will produce different fold results compared to {mlr3spatiotempcv} <= 1.0.1. This is because of a change/fix in the sampling behavior: before, an (unwanted) stratified sampling was done on time and space variables. While this matched the upstream implementation in {CAST}, this did not match with the actual theoretical underpinning described in the literature.

Features

Add support for DataBackendRaster (@be-marc, #191).

mlr_resampling_spctcv_cstf: a log message returns the column roles from the Task which are used for partitioning

The help pages for all methods now describe the methods manually rather than importing the upstream documentation of the respective method.

Task*ST classes now print column roles space and time (if set) (#198)

autoplot() gains plot_time_var argument for 3D visualizations of mlr_resamplings_sptcv_cstf resamplings with only 'space' used for partitioning (#197)

Vignette updates

Bugfixes

All {mlr3spatiotempcv} methods now comply with the {mlr3} man file declaration logic.

Misc

Escape all examples and tests for non-installed packages.

The cookfarm_mlr3 task now sets column roles "space" and "time" for variables SOURCEID and Date, respectively.

Harden CLUTO tests (#182)

Large update for the "spatiotemporal" section in the mlr3book

Source code(tar.gz)
Source code(zip)
v1.0.1(Mar 3, 2022)
Fixed a issue which caused coordinates to appear in the feature set when a data.frame was supplied (#166, @be-marc)

Add autoplot() support for "groups" column role in rsmp("cv")

Source code(tar.gz)
Source code(zip)
v1.0.0(Aug 19, 2021)
Breaking

autoplot(): removed argument crs. The CRS is now inferred from the supplied Task. Setting a different CRS than the task might lead to spurious issues and the initial idea of changing the CRS for plotting to have proper axes labeling does not apply (anymore) (#144)

Features

Added autoplot() support for ResamplingCustomCV (#140)

Bug fixes

"spcv_block": Assert error if folds > 2 when selection = "checkerboard" (#150)

Fixed row duplication when creating TaskRegrST tasks from sf objects (#152)

Miscellaneous

Upgrade tests to {vdiffr} 1.0.0

Add {rgdal} to suggests and required it in "spcv_block" since it is required in {blockCV} >= 2.1.4 and {sf} >= 1.0

Source code(tar.gz)
Source code(zip)
v0.4.1(Jun 24, 2021)
Upgrade tests to {vdiffr} 1.0.0

Add {rgdal} to suggests and required it in "spcv_block" since it is required in {blockCV} >= 2.1.4 and {sf} >= 1.0

Source code(tar.gz)
Source code(zip)
v0.4.0(Jun 3, 2021)
Features

Support clustering coords only for "sptcv_cluto"

Add as_task_* S3 generics: as_task_classif_st.data.frame(), as_task_classif_st.DataBackend(), as_task_classif_st.sf(), as_task_regr_st.data.frame(), as_task_regr_st.DataBackend(), as_task_regr_st.sf(), as_task_classif.TaskClassifST(), as_task_regr.TaskRegrST() (#99)

Add "spcv_tiles" and "repeated_spcv_tiles" (#121)

Add "spcv_disc" (#115)

Bug Fixes

Fixed train set issues for sptcv_cstf() with space and time var (#135)

Fixed $folds() active binding returning wrong fold number (#120)

Add missing man IDs (#122)

Misc

Add example 2D spatial plots to spatiotemp-viz vignette

Add {caret} to Suggests

"Cstf" methods: remove arguments in favor of param set to align with other methods (#122)

Inherit documentation from upstream functions (#117)

Vignette: Update and categorize table listing all implemented methods

Source code(tar.gz)
Source code(zip)
v0.3.0(Apr 13, 2021)
New Features

autoplot.ResamplingSptCVCstf(): add 2D plotting method (#106)

autoplot.ResamplingSptCVCstf(): add arguments show_omitted and static_image (#100)

autoplot() (all methods): allow adjusting point size via ... (#98)

Maintenance

Remove {GSIF} package due to CRAN archival and host the cookfarm dataset standalone

Use Cstf method for spatiotemporal viz vignette

Fix help page content of ResamplingRepeatedSptCVCstf (beforehand the Cluto method was referenced accidentally)

Fix segfault in autoplot.ResamplingSpcvBlock example when rendering pkgdown site (unclear why this happens when show_labels = TRUE)

Update autoplot() examples and related documentation

Remove duplicate resources in Tasks "see also" fields

Skip a test on Solaris and macOS 3.6

Optimize "Spatiotemporal Visualization" vignette

Source code(tar.gz)
Source code(zip)
v0.2.1(Mar 20, 2021)
Add support for rasterLayer argument in blockCV::spatialBlock() (#94)

Ensure that blockCV::spatialBlock() functions actually returns the same result when invoked via {mlr3spatiotempcv} (#93). Among other issues, blockCV::spatialBlock(selection = "checkerboard") was ignored.

Get coordinates names from {sf} objects dynamically. Before some functions would have errored if the coordinate names were not named "x" and "y".

Source code(tar.gz)
Source code(zip)
v0.2.0(Mar 8, 2021)
Add support for {sf} objects for Task*ST creation (#90)

"Getting Started" vignette: add example how to create a spatial task

Source code(tar.gz)
Source code(zip)
v0.1.1(Jan 6, 2021)
CRAN-related changes

Support ordered factors in TaskClassifST creation (#84)

Source code(tar.gz)
Source code(zip)
v0.1.0(Dec 26, 2020)
Initial CRAN release

Source code(tar.gz)
Source code(zip)
v0.0.0.9006(Oct 27, 2020)

Source code(tar.gz)
Source code(zip)

Spatiotemporal resampling methods for mlr3

Related tags

Overview

mlr3spatiotempcv

Installation

Get Started

Citation

Other spatiotemporal resampling packages

FAQ

Comments

Releases(v2.0.3)

v2.0.3(Nov 19, 2022)

v2.0.2(Aug 9, 2022)

v2.0.1(Jun 23, 2022)

Bugfixes

v2.0.0(Jun 15, 2022)

Breaking

Features

Bugfixes

Misc

v1.0.1(Mar 3, 2022)

v1.0.0(Aug 19, 2021)

Breaking

Features

Bug fixes

Miscellaneous

v0.4.1(Jun 24, 2021)

v0.4.0(Jun 3, 2021)

Features

Bug Fixes

Misc

v0.3.0(Apr 13, 2021)

New Features

Maintenance

v0.2.1(Mar 20, 2021)

v0.2.0(Mar 8, 2021)

v0.1.1(Jan 6, 2021)

v0.1.0(Dec 26, 2020)

v0.0.0.9006(Oct 27, 2020)

Owner

PyTorch Code of "Memory In Memory: A Predictive Neural Network for Learning Higher-Order Non-Stationarity from Spatiotemporal Dynamics"

This repository is an open-source implementation of the ICRA 2021 paper: Locus: LiDAR-based Place Recognition using Spatiotemporal Higher-Order Pooling.

PyTorch implementation of the R2Plus1D convolution based ResNet architecture described in the paper "A Closer Look at Spatiotemporal Convolutions for Action Recognition"

Physics-informed convolutional-recurrent neural networks for solving spatiotemporal PDEs

Multivariate Time Series Forecasting with efficient Transformers. Code for the paper "Long-Range Transformers for Dynamic Spatiotemporal Forecasting."

aka "Bayesian Methods for Hackers": An introduction to Bayesian methods + probabilistic programming with a computation/understanding-first, mathematics-second point of view. All in pure Python ;)

A PyTorch-based open-source framework that provides methods for improving the weakly annotated data and allows researchers to efficiently develop and compare their own methods.

Implementation of temporal pooling methods studied in [ICIP'20] A Comparative Evaluation Of Temporal Pooling Methods For Blind Video Quality Assessment

Python package facilitating the use of Bayesian Deep Learning methods with Variational Inference for PyTorch

SigOpt wrappers for scikit-learn methods

Use deep learning, genetic programming and other methods to predict stock and market movements

PyTorch implementation of the end-to-end coreference resolution model with different higher-order inference methods.

This is the dataset for testing the robustness of various VO/VIO methods

PyTorch CZSL framework containing GQA, the open-world setting, and the CGE and CompCos methods.

Revisiting Contrastive Methods for Unsupervised Learning of Visual Representations. [2021]

An integration of several popular automatic augmentation methods, including OHL (Online Hyper-Parameter Learning for Auto-Augmentation Strategy) and AWS (Improving Auto Augment via Augmentation Wise Weight Sharing) by Sensetime Research.

Exploration-Exploitation Dilemma Solving Methods

We evaluate our method on different datasets (including ShapeNet, CUB-200-2011, and Pascal3D+) and achieve state-of-the-art results, outperforming all the other supervised and unsupervised methods and 3D representations, all in terms of performance, accuracy, and training time.