Source code and Dataset creation for the paper "Neural Symbolic Regression That Scales"

Overview

NeuralSymbolicRegressionThatScales

Pytorch implementation and pretrained models for the paper "Neural Symbolic Regression That Scales", presented at ICML 2021. Our deep-learning based approach is the first symbolic regression method that leverages large scale pre-training. We procedurally generate an unbounded set of equations, and simultaneously pre-train a Transformer to predict the symbolic equation from a corresponding set of input-output-pairs.

For details, see Neural Symbolic Regression That Scales. [arXiv]

Installation

Please clone and install this repository via

git clone https://github.com/SymposiumOrganization/NeuralSymbolicRegressionThatScales.git
cd NeuralSymbolicRegressionThatScales/
pip3 install -e src/

This library requires python>3.7

Pretrained models

We offer two models, "10M" and "100M". Both are trained with parameter configuration showed in dataset_configuration.json (which contains details about how datasets are created) and scripts/config.yaml (which contains details of how models are trained). "10M" model is trained with 10 million datasets and "100M" model is trained with 100 millions dataset.

  • Link to 100M: [Link]
  • Link to 10M: [Link]

If you want to try the models out, look at jupyter/fit_func.ipynb. Before running the notebook, make sure to first create a folder named "weights" and to download the provided checkpoints there.

Dataset Generation

Before training, you need a dataset of equations. Here the steps to follow

Raw training dataset generation

The equation generator scripts are based on [SymbolicMathematics] First, if you want to change the defaults value, configure the dataset_configuration.json file:

{
    "max_len": 20, #Maximum length of an equation
    "operators": "add:10,mul:10,sub:5,div:5,sqrt:4,pow2:4,pow3:2,pow4:1,pow5:1,ln:4,exp:4,sin:4,cos:4,tan:4,asin:2", #Operator unnormalized probability
    "max_ops": 5, #Maximum number of operations
    "rewrite_functions": "", #Not used, leave it empty
    "variables": ["x_1","x_2","x_3"], #Variable names, if you want to add more add follow the convention i.e. x_4, x_5,... and so on
    "eos_index": 1,
    "pad_index": 0
}

There are two ways to generate this dataset:

  • If you are running on linux, you use makefile in terminal as follows:
export NUM=${NumberOfEquations} #Export num of equations
make data/raw_datasets/${NUM}: #Launch make file command

NumberOfEquations can be defined in two formats with K or M suffix. For instance 100K is equal to 100'000 while 10M is equal to 10'0000000 For example, if you want to create a 10M dataset simply:

export NUM=10M #Export num variable
make data/raw_datasets/10M: #Launch make file command
  • Run this script:
python3 scripts/data_creation/dataset_creation.py --number_of_equations NumberOfEquations --no-debug #Replace NumberOfEquations with the number of equations you want to generate

After this command you will have a folder named data/raw_data/NumberOfEquations containing .h5 files. By default, each of this h5 files contains a maximum of 5e4 equations.

Raw test dataset generation

This step is optional. You can skip it if you want to use our test set used for the paper (located in test_set/nc.csv). Use the same commands as before for generating a validation dataset. All equations in this dataset will be remove from the training dataset in the next stage, hence this validation dataset should be small. For our paper it constisted of 200 equations.

#Code for generating a 150 equation dataset 
python3 scripts/data_creation/dataset_creation.py --number_of_equations 150 --no-debug #This code creates a new folder data/raw_datasets/150

If you want, you can convert the newly created validation dataset in a csv format. To do so, run: python3 scripts/csv_handling/dataload_format_to_csv.py raw_test_path=data/raw_datasets/150 This command will create two csv files named test_nc.csv (equations without constants) and test_wc.csv (equation with constants) in the test_set folder.

Remove test and numerical problematic equations from the training dataset

The following steps will remove the validation equations from the training set and remove equations that are always nan, inf, etc.

  • path_to_data_folder=data/raw_datasets/100000 if you have created a 100K dataset
  • path_to_csv=test_set/test_nc.csv if you have created 150 equations for validation. If you want to use the one in the paper replace it with nc.csv
python3 scripts/data_creation/filter_from_already_existing.py --data_path path_to_data_folder --csv_path path_to_csv #You can leave csv_path empty if you do not want to create a validation set
python3 scripts/data_creation/apply_filtering.py --data_path path_to_data_folder 

You should now have a folder named data/datasets/100000. This will be the training folder.

Training

Once you have created your training and validation datasets run

python3 scripts/train.py

You can configure the config.yaml with the necessary options. Most important, make sure you have set train_path and val_path correctly. If you have followed the 100K example this should be set as:

train_path:  data/datasets/100000
val_path: data/raw_datasets/150
Comments
  • filter_from_already_existing.py Errors; 'asin' is not supported ?

    filter_from_already_existing.py Errors; 'asin' is not supported ?

    Great project, just trying to replicate your results and training, however running your instructions I get;

    python scripts/data_creation/filter_from_already_existing.py --data_path data/raw_datasets/100000 --csv_path test_set/test_nc.csv
    Loading metadata
    Creating image for validation set
    Traceback (most recent call last):
      File "/home/sam/code/discovery/NeuralSymbolicRegressionThatScales/scripts/data_creation/filter_from_already_existing.py", line 130, in <module>
        main()
      File "/home/sam/anaconda3/envs/nsrts/lib/python3.9/site-packages/click/core.py", line 1128, in __call__
        return self.main(*args, **kwargs)
      File "/home/sam/anaconda3/envs/nsrts/lib/python3.9/site-packages/click/core.py", line 1053, in main
        rv = self.invoke(ctx)
      File "/home/sam/anaconda3/envs/nsrts/lib/python3.9/site-packages/click/core.py", line 1395, in invoke
        return ctx.invoke(self.callback, **ctx.params)
      File "/home/sam/anaconda3/envs/nsrts/lib/python3.9/site-packages/click/core.py", line 754, in invoke
        return __callback(*args, **kwargs)
      File "/home/sam/code/discovery/NeuralSymbolicRegressionThatScales/scripts/data_creation/filter_from_already_existing.py", line 112, in main
        target_image = evaluate_validation_set(validation,support)
      File "/home/sam/code/discovery/NeuralSymbolicRegressionThatScales/scripts/data_creation/filter_from_already_existing.py", line 28, in evaluate_validation_set
        curr = lambdify(variables,row["eq"])(*support).numpy().astype('float16')
      File "<lambdifygenerated-16>", line 2, in _lambdifygenerated
    NameError: name 'asin' is not defined
    

    OS: Ubuntu 20.04.4 LTS (Focal Fossa) Python: Python 3.9.7

    Packages - from a clean venv, installed the same ones with the repo, and had to update one or two to make the previous scripts work - see previous closed issue;

    absl-py==1.0.0
    aiohttp==3.8.1
    aiosignal==1.2.0
    altair==4.2.0
    antlr4-python3-runtime==4.8
    argon2-cffi==21.3.0
    argon2-cffi-bindings==21.2.0
    astor==0.8.1
    asttokens==2.0.5
    async-timeout==4.0.2
    attrs==21.4.0
    backcall==0.2.0
    base58==2.1.1
    beautifulsoup4==4.10.0
    bleach==4.1.0
    blinker==1.4
    bokeh==2.4.2
    brotlipy==0.7.0
    bs4==0.0.1
    cachetools==5.0.0
    certifi==2021.10.8
    cffi @ file:///opt/conda/conda-bld/cffi_1642701102775/work
    charset-normalizer==2.0.12
    click==8.0.4
    compress-pickle==2.1.0
    cryptography @ file:///tmp/build/80754af9/cryptography_1639414572950/work
    dataclass-dict-convert==1.6.3
    debugpy==1.5.1
    decorator==5.1.1
    defusedxml==0.7.1
    docker-pycreds==0.4.0
    entrypoints==0.4
    executing==0.8.3
    frozenlist==1.3.0
    fsspec==2022.2.0
    future==0.18.2
    gitdb==4.0.9
    GitPython==3.1.27
    google-auth==2.6.2
    google-auth-oauthlib==0.4.6
    grpcio==1.44.0
    h5py==3.6.0
    hydra-core==1.0.0
    hydralit==1.0.12
    hydralit-components==1.0.9
    idna @ file:///tmp/build/80754af9/idna_1637925883363/work
    importlib-metadata==4.11.3
    importlib-resources==5.4.0
    ipykernel==6.9.2
    ipython==8.1.1
    ipython-genutils==0.2.0
    ipywidgets==7.7.0
    jedi==0.18.1
    Jinja2==3.0.3
    jsons==1.6.1
    jsonschema==4.4.0
    jupyter-client==7.1.2
    jupyter-core==4.9.2
    jupyterlab-pygments==0.1.2
    jupyterlab-widgets==1.1.0
    lxml==4.8.0
    Markdown==3.3.6
    MarkupSafe==2.1.1
    matplotlib-inline==0.1.3
    mistune==0.8.4
    mkl-fft==1.3.1
    mkl-random @ file:///tmp/build/80754af9/mkl_random_1626186066731/work
    mkl-service==2.4.0
    mpmath==1.2.1
    multidict==6.0.2
    nbclient==0.5.13
    nbconvert==6.4.4
    nbformat==5.2.0
    nest-asyncio==1.5.4
    -e git+ssh://[email protected]/SymposiumOrganization/NeuralSymbolicRegressionThatScales.git@92d7c46c0417aeb76ecebcac982b8ccf1a3f8860#egg=nesymres&subdirectory=src
    notebook==6.4.10
    numexpr==2.8.1
    numpy==1.22.3
    oauthlib==3.2.0
    omegaconf==2.1.1
    ordered-set==4.1.0
    packaging==21.3
    pandas==1.4.1
    pandocfilters==1.5.0
    parso==0.8.3
    pathtools==0.1.2
    pexpect==4.8.0
    pickleshare==0.7.5
    Pillow==9.0.1
    prometheus-client==0.13.1
    promise==2.3
    prompt-toolkit==3.0.28
    protobuf==3.19.4
    psutil==5.9.0
    ptyprocess==0.7.0
    pure-eval==0.2.2
    pyarrow==7.0.0
    pyasn1==0.4.8
    pyasn1-modules==0.2.8
    pycparser @ file:///tmp/build/80754af9/pycparser_1636541352034/work
    pydeck==0.7.1
    pyDeprecate==0.3.1
    Pygments==2.11.2
    Pympler==1.0.1
    pyOpenSSL @ file:///opt/conda/conda-bld/pyopenssl_1643788558760/work
    pyparsing==3.0.7
    pyrsistent==0.18.1
    PySocks @ file:///tmp/build/80754af9/pysocks_1605305812635/work
    python-dateutil==2.8.2
    pytorch-lightning==1.5.10
    pytz==2021.3
    pytz-deprecation-shim==0.1.0.post0
    PyYAML==6.0
    pyzmq==22.3.0
    requests @ file:///opt/conda/conda-bld/requests_1641824580448/work
    requests-oauthlib==1.3.1
    rsa==4.8
    scipy==1.8.0
    semver==2.13.0
    Send2Trash==1.8.0
    sentry-sdk==1.5.8
    setproctitle==1.2.2
    shortuuid==1.0.8
    six @ file:///tmp/build/80754af9/six_1644875935023/work
    smmap==5.0.0
    soupsieve==2.3.1
    stack-data==0.2.0
    streamlit==1.7.0
    stringcase==1.2.0
    sympy==1.10
    tensorboard==2.8.0
    tensorboard-data-server==0.6.1
    tensorboard-plugin-wit==1.8.1
    termcolor==1.1.0
    terminado==0.13.3
    testpath==0.6.0
    toml==0.10.2
    toolz==0.11.2
    torch==1.11.0
    torchaudio==0.11.0
    torchmetrics==0.7.2
    torchvision==0.12.0
    tornado==6.1
    tqdm==4.63.0
    traitlets==5.1.1
    typing-extensions @ file:///tmp/build/80754af9/typing_extensions_1631814937681/work
    typish==1.9.3
    tzdata==2022.1
    tzlocal==4.1
    urllib3==1.26.9
    validators==0.18.2
    wandb==0.12.11
    watchdog==2.1.6
    wcwidth==0.2.5
    webencodings==0.5.1
    Werkzeug==2.0.3
    widgetsnbextension==3.6.0
    yarl==1.7.2
    yaspin==2.1.0
    zipp==3.7.0
    

    Any help to point to the right direction is greatly appreciated, thank you so much,

    Best, Sam

    opened by samholt 17
  • 'RuntimeError: CUDA error: device-side assert triggered' when dataset config is changed

    'RuntimeError: CUDA error: device-side assert triggered' when dataset config is changed

    Hi, when I change the parameters: max_len -> 24, max_ops -> 10 and the number of variables -> 6 I get a runtime error for the embedding layer:

    [...]
    /tmp/pip-req-build-h953rg2q/aten/src/ATen/native/cuda/Indexing.cu:662: indexSelectLargeIndex: block: [200,0,0], thread: [127,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
    [...]
    File "code/NeuralSymbolicRegressionThatScales/src/nesymres/architectures/model.py", line 101, in forward
        pos = self.pos_embedding(
      File "miniconda3/envs/torchenv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
        result = self.forward(*input, **kwargs)
      File "miniconda3/envs/torchenv/lib/python3.9/site-packages/torch/nn/modules/sparse.py", line 145, in forward
        return F.embedding(
      File "miniconda3/envs/torchenv/lib/python3.9/site-packages/torch/nn/functional.py", line 1913, in embedding
        return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
    RuntimeError: CUDA error: device-side assert triggered
    

    I tried changing the variable length_eq in config.yaml but then I get a tensor size mismatch error. Should have something to do with the network sampling equations that are larger than it expected, as sometimes I reach 5 or 6 epochs before encountering the error.

    opened by alessandrosimon 7
  • error when try to  add more variables

    error when try to add more variables

    Thanks for making it public and really love your paper.

    At first I generate data and train the model as default settings, everything goes well. However, when I try to add two more variables in this model, some error occur. What I did is adding x_4 and x_5 in the dataset_configuration.json; regenerate training and validaion data and change the train_path as well as val_path in condig.yaml. Could you please tell me if there is some operations I missed to solve the problem. It is really important because I do want to reproduce it as one of the baseline. Thanks a lot 截屏2022-05-30 23 44 36

    opened by chenyuxin1999 2
  • dataload_format_to_csv script errors

    dataload_format_to_csv script errors

    Thanks for making the repo public! I'm trying to use the dataload_format_to_csv script within the scripts directory as in your README. I've followed the instructions exactly so far and when I am running python3 scripts/csv_handling/dataload_format_to_csv.py raw_test_path=data/raw_datasets/150

    I get the following error:

    scripts/csv_handling/dataload_format_to_csv.py:50: UserWarning: 
    config_path is not specified in @hydra.main().
    See https://hydra.cc/docs/next/upgrades/1.0_to_1.1/changes_to_hydra_main_config_path for more information.
      @hydra.main(config_name="../config")
    Could not override 'raw_test_path'.
    To append to your config use +raw_test_path=data/raw_datasets/150
    Key 'raw_test_path' is not in struct
        full_key: raw_test_path
        object_type=dict
    
    Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.
    

    I haven't changed anything within the repo, and am just running it as is. Is there a fix for this? Is it possible to adapt this script to not just the validation set but the training data as well?

    Thanks!

    opened by bowenyou 2
  • Unknown SymPy operator

    Unknown SymPy operator

    Hi, during training I sometimes get the warning: Unknown SymPy operator: zoo (ComplexInfinity in SymPy) Other unknown operators that came up are nan, oo, asinh(c*sqrt(-x_1)) The relevant method seems to be Generator.sympy_to_prefix() in src.nesymres.dataset.generator. The warning doesn't seem to interrupt the training but maybe it's better to handle those cases also.

    opened by alessandrosimon 2
  • dataload_format_to_csv.py error ?

    dataload_format_to_csv.py error ?

    Running : python scripts/csv_handling/dataload_format_to_csv.py raw_test_path=data/raw_datasets/200

    Errors with

    (nsrts) sam@lm-adastra:~/code/discovery/NeuralSymbolicRegressionThatScales$ python scripts/csv_handling/dataload_format_to_csv.py raw_test_path=data/raw_datasets/200 /home/sam/anaconda3/envs/nsrts/lib/python3.9/site-packages/hydra/core/plugins.py:202: UserWarning: Error importing 'hydra._internal.core_plugins.importlib_resources_config_source'. Plugin is incompatible with this Hydra version or buggy. Recommended to uninstall or upgrade plugin. ModuleNotFoundError : No module named 'importlib_resources' warnings.warn( Traceback (most recent call last): File "/home/sam/anaconda3/envs/nsrts/lib/python3.9/site-packages/hydra/_internal/utils.py", line 207, in run_and_report return func() File "/home/sam/anaconda3/envs/nsrts/lib/python3.9/site-packages/hydra/_internal/utils.py", line 329, in <lambda> lambda: Hydra.create_main_hydra2( File "/home/sam/anaconda3/envs/nsrts/lib/python3.9/site-packages/hydra/_internal/hydra.py", line 73, in create_main_hydra2 config_loader: ConfigLoader = ConfigLoaderImpl( File "/home/sam/anaconda3/envs/nsrts/lib/python3.9/site-packages/hydra/_internal/config_loader_impl.py", line 78, in __init__ self.repository: ConfigRepository = ConfigRepository( File "/home/sam/anaconda3/envs/nsrts/lib/python3.9/site-packages/hydra/_internal/config_repository.py", line 22, in __init__ source_type = SourcesRegistry.instance().resolve(scheme) File "/home/sam/anaconda3/envs/nsrts/lib/python3.9/site-packages/hydra/_internal/sources_registry.py", line 29, in resolve raise ValueError( ValueError: No config source registered for schema pkg, supported types : [file, structured]

    Python: 3.9.7 hydra-core==1.0.0 hydralit==1.0.12 hydralit-components==1.0.9

    OS: Ubuntu 20.04.4 LTS (Focal Fossa) (https://releases.ubuntu.com/20.04/)

    Previously created the test 200 equation dataset in ./data/raw_datasets/200

    Could you help point me in the right direction to run your repo ? Thank you so much !

    Best, Sam

    opened by samholt 1
  • question about assertion in model.py

    question about assertion in model.py

    Hi guys, sorry for asking so many questions lately XD

    Just wondering about this assertion in line 88 of model.py:

    assert not torch.isnan(enc_src).any()
    

    It seems when I am running the training script, this assertion fails consistently which halts training. Have you guys run into a similar issue before?

    opened by bowenyou 1
  • dataload_format_to_csv script

    dataload_format_to_csv script

    When I run this script to try and convert my training sets to csv format, I intermittently run into the issue of out of memory (128GBs). Also, for some equations, the script appears to freeze (hours at a time before I have to cancel it). Is there something in the backend that is causing this? Could some generated equations "break" the script?

    I've split the data into 1000 equations at a time and still run into this issue. Any help would be appreciated!

    opened by bowenyou 1
  • add custom operation

    add custom operation

    Hello again, It is amazing to experiment with this wonderful code. And what can I do if I want to add custom operation like relu or sign into operation set? Any suggestion is appreciated :).

    opened by chenyuxin1999 0
Owner
null
Reference implementation of code generation projects from Facebook AI Research. General toolkit to apply machine learning to code, from dataset creation to model training and evaluation. Comes with pretrained models.

This repository is a toolkit to do machine learning for programming languages. It implements tokenization, dataset preprocessing, model training and m

Facebook Research 408 Jan 1, 2023
code and data for paper "GIANT: Scalable Creation of a Web-scale Ontology"

GIANT Code and data for paper "GIANT: Scalable Creation of a Web-scale Ontology" https://arxiv.org/pdf/2004.02118.pdf Please cite our paper if this pr

Excalibur 39 Dec 29, 2022
Source code and dataset for ACL2021 paper: "ERICA: Improving Entity and Relation Understanding for Pre-trained Language Models via Contrastive Learning".

ERICA Source code and dataset for ACL2021 paper: "ERICA: Improving Entity and Relation Understanding for Pre-trained Language Models via Contrastive L

THUNLP 75 Nov 2, 2022
Source code for the GPT-2 story generation models in the EMNLP 2020 paper "STORIUM: A Dataset and Evaluation Platform for Human-in-the-Loop Story Generation"

Storium GPT-2 Models This is the official repository for the GPT-2 models described in the EMNLP 2020 paper [STORIUM: A Dataset and Evaluation Platfor

Nader Akoury 27 Dec 20, 2022
The dataset and source code for our paper: "Did You Ask a Good Question? A Cross-Domain Question IntentionClassification Benchmark for Text-to-SQL"

TriageSQL The dataset and source code for our paper: "Did You Ask a Good Question? A Cross-Domain Question Intention Classification Benchmark for Text

Yusen Zhang 22 Nov 9, 2022
This project aims to be a handler for input creation and running of multiple RICEWQ simulations.

What is autoRICEWQ? This project aims to be a handler for input creation and running of multiple RICEWQ simulations. What is RICEWQ? From the descript

Yass Fuentes 1 Feb 1, 2022
Deep Learning Datasets Maker is a QGIS plugin to make datasets creation easier for raster and vector data.

Deep Learning Dataset Maker Deep Learning Datasets Maker is a QGIS plugin to make datasets creation easier for raster and vector data. How to use Down

deepbands 25 Dec 15, 2022
This is the dataset and code release of the OpenRooms Dataset.

This is the dataset and code release of the OpenRooms Dataset.

Visual Intelligence Lab of UCSD 95 Jan 8, 2023
Official Implementation and Dataset of "PPR10K: A Large-Scale Portrait Photo Retouching Dataset with Human-Region Mask and Group-Level Consistency", CVPR 2021

Portrait Photo Retouching with PPR10K Paper | Supplementary Material PPR10K: A Large-Scale Portrait Photo Retouching Dataset with Human-Region Mask an

null 184 Dec 11, 2022
Empirical Study of Transformers for Source Code & A Simple Approach for Handling Out-of-Vocabulary Identifiers in Deep Learning for Source Code

Transformers for variable misuse, function naming and code completion tasks The official PyTorch implementation of: Empirical Study of Transformers fo

Bayesian Methods Research Group 56 Nov 15, 2022
A large dataset of 100k Google Satellite and matching Map images, resembling pix2pix's Google Maps dataset.

Larger Google Sat2Map dataset This dataset extends the aerial ⟷ Maps dataset used in pix2pix (Isola et al., CVPR17). The provide script download_sat2m

null 34 Dec 28, 2022
LoveDA: A Remote Sensing Land-Cover Dataset for Domain Adaptive Semantic Segmentation (NeurIPS2021 Benchmark and Dataset Track)

LoveDA: A Remote Sensing Land-Cover Dataset for Domain Adaptive Semantic Segmentation by Junjue Wang, Zhuo Zheng, Ailong Ma, Xiaoyan Lu, and Yanfei Zh

Kingdrone 174 Dec 22, 2022
Dataset used in "PlantDoc: A Dataset for Visual Plant Disease Detection" accepted in CODS-COMAD 2020

PlantDoc: A Dataset for Visual Plant Disease Detection This repository contains the Cropped-PlantDoc dataset used for benchmarking classification mode

Pratik Kayal 109 Dec 29, 2022
EMNLP 2021: Single-dataset Experts for Multi-dataset Question-Answering

MADE (Multi-Adapter Dataset Experts) This repository contains the implementation of MADE (Multi-adapter dataset experts), which is described in the pa

Princeton Natural Language Processing 68 Jul 18, 2022
EMNLP 2021: Single-dataset Experts for Multi-dataset Question-Answering

MADE (Multi-Adapter Dataset Experts) This repository contains the implementation of MADE (Multi-adapter dataset experts), which is described in the pa

Princeton Natural Language Processing 39 Oct 5, 2021
The Habitat-Matterport 3D Research Dataset - the largest-ever dataset of 3D indoor spaces.

Habitat-Matterport 3D Dataset (HM3D) The Habitat-Matterport 3D Research Dataset is the largest-ever dataset of 3D indoor spaces. It consists of 1,000

Meta Research 62 Dec 27, 2022
Code and data of the Fine-Grained R2R Dataset proposed in paper Sub-Instruction Aware Vision-and-Language Navigation

Fine-Grained R2R Code and data of the Fine-Grained R2R Dataset proposed in the EMNLP2020 paper Sub-Instruction Aware Vision-and-Language Navigation. C

YicongHong 34 Nov 15, 2022
The LaTeX and Python code for generating the paper, experiments' results and visualizations reported in each paper is available (whenever possible) in the paper's directory

This repository contains the software implementation of most algorithms used or developed in my research. The LaTeX and Python code for generating the

João Fonseca 3 Jan 3, 2023