Pipelines de datos, 2021.

Rodolfo Ferro

Last update: May 19, 2022

Related tags

Text Data & NLP luigi-pipeline-talk

Overview

Este repo ilustra un proceso sencillo de automatización de transformación y modelado de datos, a través de un pipeline utilizando Luigi.

Stack principal

Python 3.7+
Streamlit
Scikit-learn
Pandas
Luigi

Idea

El proceso completo es descrito en una app interactiva que encuentras en el script app.py. Checa los detalles de cómo levantar la app en la sección de cómo ejecutar los scripts.

Setup

Crea un entorno virtual (te recomiendo usar conda):
```
conda create --name data-pipes python=3.7
```
Activate the virtual environment:
```
conda activate data-pipes
```
Install requirements:
```
pip install -r requirements.txt
```

Ejecuta los scripts

App interactiva

Para ejecutar la app interactiva, simplemente ejecuta el comando de Streamlit con el entorno virtual activado:

(data-pipes) streamlit run app.py

Esto abrirá un servidor local en: http://localhost:8501.

Pipeline de datos

Si deseas ejecutar una tarea en específico ,supongamos la TareaX que se encuentra en el script tareas.py, entonces ejecuta el comando:

PYTHONPATH=. luigi --module tareas TareaX --local-scheduler

¡Puedes extender el código y agregar las tareas que tú desees!

You might also like...

This repository will contain the code for the CVPR 2021 paper "GIRAFFE: Representing Scenes as Compositional Generative Neural Feature Fields"

1.1k Dec 27, 2022

Findings of ACL 2021

Assessing Dialogue Systems with Distribution Distances [arXiv][code] We propose to measure the performance of a dialogue system by computing the distr

16 Feb 24, 2022

Code for ACL 2021 main conference paper "Conversations are not Flat: Modeling the Intrinsic Information Flow between Dialogue Utterances".

Conversations are not Flat: Modeling the Intrinsic Information Flow between Dialogue Utterances This repository contains the code and pre-trained mode

90 Dec 27, 2022

Pipelines de datos, 2021.

Related tags

Overview

Stack principal

Idea

Setup

Ejecuta los scripts

App interactiva

Pipeline de datos

You might also like...

This repository will contain the code for the CVPR 2021 paper "GIRAFFE: Representing Scenes as Compositional Generative Neural Feature Fields"

Findings of ACL 2021

Code for ACL 2021 main conference paper "Conversations are not Flat: Modeling the Intrinsic Information Flow between Dialogue Utterances".

Code for our paper "Mask-Align: Self-Supervised Neural Word Alignment" in ACL 2021

A PyTorch implementation of paper "Learning Shared Semantic Space for Speech-to-Text Translation", ACL (Findings) 2021

Code for our ACL 2021 paper - ConSERT: A Contrastive Framework for Self-Supervised Sentence Representation Transfer

Code for our ACL 2021 (Findings) Paper - Fingerprinting Fine-tuned Language Models in the wild .

Code for our paper "Transfer Learning for Sequence Generation: from Single-source to Multi-source" in ACL 2021.

Unifying Cross-Lingual Semantic Role Labeling with Heterogeneous Linguistic Resources (NAACL-2021).

Owner

Rodolfo Ferro

:hot_pepper: R²SQL: "Dynamic Hybrid Relation Network for Cross-Domain Context-Dependent Semantic Parsing." (AAAI 2021)

CrossNER: Evaluating Cross-Domain Named Entity Recognition (AAAI-2021)

Official code for "Parser-Free Virtual Try-on via Distilling Appearance Flows", CVPR 2021

Official code of our work, Unified Pre-training for Program Understanding and Generation [NAACL 2021].

Negative sampling for solving the unlabeled entity problem in NER. ICLR-2021 paper: Empirical Analysis of Unlabeled Entity Problem in Named Entity Recognition.

Repository to hold code for the cap-bot varient that is being presented at the SIIC Defence Hackathon 2021.

Code for CVPR 2021 paper: Revamping Cross-Modal Recipe Retrieval with Hierarchical Transformers and Self-supervised Learning

[WWW 2021 GLB] New Benchmarks for Learning on Non-Homophilous Graphs

[Preprint] Escaping the Big Data Paradigm with Compact Transformers, 2021

🏆 The 1st Place Submission to AICity Challenge 2021 Natural Language-Based Vehicle Retrieval Track (Alibaba-UTS submission)