Enterprise Scale NLP with Hugging Face & SageMaker Workshop series


Workshop: Enterprise-Scale NLP with Hugging Face & Amazon SageMaker

Earlier this year we announced a strategic collaboration with Amazon to make it easier for companies to use Hugging Face Transformers in Amazon SageMaker, and ship cutting-edge Machine Learning features faster. We introduced new Hugging Face Deep Learning Containers (DLCs) to train and deploy Hugging Face Transformers in Amazon SageMaker.

In addition to the Hugging Face Inference DLCs, we created a Hugging Face Inference Toolkit for SageMaker. This Inference Toolkit leverages the pipelines from the transformers library to allow zero-code deployments of models, without requiring any code for pre-or post-processing.

In October and November, we held a workshop series on “Enterprise-Scale NLP with Hugging Face & Amazon SageMaker”. This workshop series consisted out of 3 parts and covers:

  • Getting Started with Amazon SageMaker: Training your first NLP Transformer model with Hugging Face and deploying it
  • Going Production: Deploying, Scaling & Monitoring Hugging Face Transformer models with Amazon SageMaker
  • MLOps: End-to-End Hugging Face Transformers with the Hub & SageMaker Pipelines

We recorded all of them so you are now able to do the whole workshop series on your own to enhance your Hugging Face Transformers skills with Amazon SageMaker or vice-versa.

Below you can find all the details of each workshop and how to get started.

🧑🏻‍💻 Github Repository: https://github.com/philschmid/huggingface-sagemaker-workshop-series

📺   Youtube Playlist: https://www.youtube.com/playlist?list=PLo2EIpI_JMQtPhGR5Eo2Ab0_Vb89XfhDJ

Note: The Repository contains instructions on how to access a temporary AWS, which was available during the workshops. To be able to do the workshop now you need to use your own or your company AWS Account.

In Addition to the workshop we created a fully dedicated Documentation for Hugging Face and Amazon SageMaker, which includes all the necessary information. If the workshop is not enough for you we also have 15 additional getting samples Notebook Github repository, which cover topics like distributed training or leveraging Spot Instances.

Workshop 1: Getting Started with Amazon SageMaker: Training your first NLP Transformer model with Hugging Face and deploying it

In Workshop 1 you will learn how to use Amazon SageMaker to train a Hugging Face Transformer model and deploy it afterwards.

  • Prepare and upload a test dataset to S3
  • Prepare a fine-tuning script to be used with Amazon SageMaker Training jobs
  • Launch a training job and store the trained model into S3
  • Deploy the model after successful training

🧑🏻‍💻 Code Assets: https://github.com/philschmid/huggingface-sagemaker-workshop-series/tree/main/workshop_1_getting_started_with_amazon_sagemaker

📺  Youtube: https://www.youtube.com/watch?v=pYqjCzoyWyo&list=PLo2EIpI_JMQtPhGR5Eo2Ab0_Vb89XfhDJ&index=6&t=5s&ab_channel=HuggingFace

Workshop 2: Going Production: Deploying, Scaling & Monitoring Hugging Face Transformer models with Amazon SageMaker

In Workshop 2 learn how to use Amazon SageMaker to deploy, scale & monitor your Hugging Face Transformer models for production workloads.

  • Run Batch Prediction on JSON files using a Batch Transform
  • Deploy a model from hf.co/models to Amazon SageMaker and run predictions
  • Configure autoscaling for the deployed model
  • Monitor the model to see avg. request time and set up alarms

🧑🏻‍💻 Code Assets: https://github.com/philschmid/huggingface-sagemaker-workshop-series/tree/main/workshop_2_going_production

📺  Youtube: https://www.youtube.com/watch?v=whwlIEITXoY&list=PLo2EIpI_JMQtPhGR5Eo2Ab0_Vb89XfhDJ&index=6&t=61s

Workshop 3: MLOps: End-to-End Hugging Face Transformers with the Hub & SageMaker Pipelines

In Workshop 3 learn how to build an End-to-End MLOps Pipeline for Hugging Face Transformers from training to production using Amazon SageMaker.

We are going to create an automated SageMaker Pipeline which:

  • processes a dataset and uploads it to s3
  • fine-tunes a Hugging Face Transformer model with the processed dataset
  • evaluates the model against an evaluation set
  • deploys the model if it performed better than a certain threshold

🧑🏻‍💻 Code Assets: https://github.com/philschmid/huggingface-sagemaker-workshop-series/tree/main/workshop_3_mlops

📺  Youtube: https://www.youtube.com/watch?v=XGyt8gGwbY0&list=PLo2EIpI_JMQtPhGR5Eo2Ab0_Vb89XfhDJ&index=7

Access Workshop AWS Account

For this workshop you’ll get access to a temporary AWS Account already pre-configured with Amazon SageMaker Notebook Instances. Follow the steps in this section to login to your AWS Account and download the workshop material.

1. To get started navigate to - https://dashboard.eventengine.run/login


Click on Accept Terms & Login

2. Click on Email One-Time OTP (Allow for up to 2 mins to receive the passcode)


3. Provide your email address


4. Enter your OTP code


5. Click on AWS Console


6. Click on Open AWS Console


7. In the AWS Console click on Amazon SageMaker


8. Click on Notebook and then on Notebook instances


9. Create a new Notebook instance


10. Configure Notebook instances

  • Make sure to increase the Volume Size of the Notebook if you want to work with big models and datasets
  • Add your IAM_Role with permissions to run your SageMaker Training And Inference Jobs
  • Add the Workshop Github Repository to the Notebook to preload the notebooks: https://github.com/philschmid/huggingface-sagemaker-workshop-series.git


11. Open the Lab and select the right kernel you want to do and have fun!

Open the workshop you want to do (workshop_1_getting_started_with_amazon_sagemaker/) and select the pytorch kernel


  • Erorr occured when I modify real-time inference to batch transform

    Erorr occured when I modify real-time inference to batch transform

    hello, Thanks for your amazing job~ I followed your code to build end-to-end Hugging Face Transformers with Sagemaker Pipeline. This code use real-time inference and create an endpoint, but I want to achieve batch transform inference.

    My code looks like below. ` from sagemaker.huggingface.model import HuggingFaceModel from sagemaker.s3 import S3Uploader,s3_path_join from sagemaker.inputs import TransformInput from sagemaker.workflow.steps import TransformStep

    huggingfacemodel = HuggingFaceModel( model_data=step_train.properties.ModelArtifacts.S3ModelArtifacts, # path to your trained sagemaker mod role=role, transformers_version=_transformers_version, pytorch_version=_pytorch_version, py_version=_py_version, sagemaker_session=sagemaker_session, )

    output_s3_path = s3_path_join("s3://",'sagemaker-us-east-1-7348***',"**y-datalab-modela/output") transformer_instance_type=ParameterString(name="Fp16", default_value="False") transformer = huggingfacemodel.transformer( instance_count=1, instance_type="ml.m5.4xlarge", output_path=output_s3_path, strategy='SingleRecord', )

    batch_data = ParameterString( name="BatchData", default_value='s3://sagemaker-us-east-1-73483***/**y-datalab-modela/test/stary_data.jsonl', ) transform_step = TransformStep( name="Batch", transformer=transformer, inputs=TransformInput( data=batch_data, content_type='application/json', split_type="Line", ) )


    but when I run the code, it returns error: ‘AttributeError: 'Properties' object has no attribute 'decode'’.

    Could you please help in this issue or could you please provide some sample code about how to use batch transform pipeline? Thank you!

    opened by isoyaoya 11
  • KeyError Length during training following workshop MLOps

    KeyError Length during training following workshop MLOps

    AlgorithmError: ExecuteUserScriptError: Command "/opt/conda/bin/python3.8 train.py --epochs 1 --eval_batch_size 64 --fp16 True --learning_rate 3e-5 --model_id distilbert-base-uncased --train_batch_size 32" Traceback (most recent call last): File "train.py", line 46, in train_dataset = load_from_disk(args.training_dir)

    opened by MrRobotV8 9
  • AttributeError: 'ParameterString' object has no attribute 'startswith'

    AttributeError: 'ParameterString' object has no attribute 'startswith'

    I am trying to run this tutorial as it is and always run into AttributeError: 'ParameterString' object has no attribute 'startswith', when json.loads(pipeline.definition()) is executed.

    AttributeError                            Traceback (most recent call last)
    <ipython-input-32-d1b69f80bc6f> in <module>
          1 import json
    ----> 3 json.loads(pipeline.definition())
    /opt/conda/lib/python3.7/site-packages/sagemaker/workflow/pipeline.py in definition(self)
        319     def definition(self) -> str:
        320         """Converts a request structure to string representation for workflow service calls."""
    --> 321         request_dict = self.to_request()
        322         self._interpolate_step_collection_name_in_depends_on(request_dict["Steps"])
        323         request_dict["PipelineExperimentConfig"] = interpolate(
    /opt/conda/lib/python3.7/site-packages/sagemaker/workflow/pipeline.py in to_request(self)
        103             if self.pipeline_experiment_config is not None
        104             else None,
    --> 105             "Steps": list_to_request(self.steps),
        106         }
    /opt/conda/lib/python3.7/site-packages/sagemaker/workflow/utilities.py in list_to_request(entities)
         51     for entity in entities:
         52         if isinstance(entity, Entity):
    ---> 53             request_dicts.append(entity.to_request())
         54         elif isinstance(entity, StepCollection):
         55             request_dicts.extend(entity.request_dicts())
    /opt/conda/lib/python3.7/site-packages/sagemaker/workflow/steps.py in to_request(self)
        497     def to_request(self) -> RequestType:
        498         """Updates the request dictionary with cache configuration."""
    --> 499         request_dict = super().to_request()
        500         if self.cache_config:
        501             request_dict.update(self.cache_config.config)
    /opt/conda/lib/python3.7/site-packages/sagemaker/workflow/steps.py in to_request(self)
        349     def to_request(self) -> RequestType:
        350         """Gets the request structure for `ConfigurableRetryStep`."""
    --> 351         step_dict = super().to_request()
        352         if self.retry_policies:
        353             step_dict["RetryPolicies"] = self._resolve_retry_policy(self.retry_policies)
    /opt/conda/lib/python3.7/site-packages/sagemaker/workflow/steps.py in to_request(self)
        118             "Name": self.name,
        119             "Type": self.step_type.value,
    --> 120             "Arguments": self.arguments,
        121         }
        122         if self.depends_on:
    /opt/conda/lib/python3.7/site-packages/sagemaker/workflow/steps.py in arguments(self)
        478             self.estimator._prepare_for_training(self.job_name)
        479             train_args = _TrainingJob._get_train_args(
    --> 480                 self.estimator, self.inputs, experiment_config=dict()
        481             )
        482             request_dict = self.estimator.sagemaker_session._get_train_request(**train_args)
    /opt/conda/lib/python3.7/site-packages/sagemaker/estimator.py in _get_train_args(cls, estimator, inputs, experiment_config)
       2038         train_args = config.copy()
       2039         train_args["input_mode"] = estimator.input_mode
    -> 2040         train_args["job_name"] = estimator._current_job_name
       2041         train_args["hyperparameters"] = hyperparameters
       2042         train_args["tags"] = estimator.tags
    /opt/conda/lib/python3.7/site-packages/sagemaker/estimator.py in training_image_uri(self, region)
       3034         trains the model, calls this method to find the hyperparameters.
    -> 3036         Returns:
       3037             dict[str, str]: The hyperparameters.
       3038         """
    /opt/conda/lib/python3.7/site-packages/sagemaker/image_uris.py in get_training_image_uri(region, framework, framework_version, py_version, image_uri, distribution, compiler_config, tensorflow_version, pytorch_version, instance_type)
        498     if tensorflow_version is not None or pytorch_version is not None:
    --> 499         processor = _processor(instance_type, ["cpu", "gpu"])
        500         is_native_huggingface_gpu = processor == "gpu" and not compiler_config
        501         container_version = "cu110-ubuntu18.04" if is_native_huggingface_gpu else None
    /opt/conda/lib/python3.7/site-packages/sagemaker/image_uris.py in _processor(instance_type, available_processors, serverless_inference_config)
        366         )
    --> 368     if instance_type.startswith("local"):
        369         processor = "cpu" if instance_type == "local" else "gpu"
        370     elif instance_type.startswith("neuron"):
    AttributeError: 'ParameterString' object has no attribute 'startswith'

    Converting ParameterString to str like,

    pipeline = Pipeline(
        steps=[step_process, step_train, step_eval, step_cond],

    results in TypeError: Pipeline variables do not support __str__ operation. Please use.to_string()to convert it to string type in execution timeor use.exprto translate it to Json for display purpose in Python SDK.

    Is there a way to solve this?

    opened by dilky-ascentic 2
  • Conflicting AWS and HF docs for AWS inf instances, confused about which one to follow

    Conflicting AWS and HF docs for AWS inf instances, confused about which one to follow

    opened by sidyakinian 2
  • CVE-2007-4559 Patch

    CVE-2007-4559 Patch

    Patching CVE-2007-4559

    Hi, we are security researchers from the Advanced Research Center at Trellix. We have began a campaign to patch a widespread bug named CVE-2007-4559. CVE-2007-4559 is a 15 year old bug in the Python tarfile package. By using extract() or extractall() on a tarfile object without sanitizing input, a maliciously crafted .tar file could perform a directory path traversal attack. We found at least one unsantized extractall() in your codebase and are providing a patch for you via pull request. The patch essentially checks to see if all tarfile members will be extracted safely and throws an exception otherwise. We encourage you to use this patch or your own solution to secure against CVE-2007-4559. Further technical information about the vulnerability can be found in this blog.

    If you have further questions you may contact us through this projects lead researcher Kasimir Schulz.

    opened by TrellixVulnTeam 0
Philipp Schmid
Machine Learning Engineer & Tech Lead at Hugging Face👨🏻‍💻 🤗 Cloud enthusiast ☁️ AWS ML HERO 🦸🏻‍♂️ Nuremberg 🇩🇪
Philipp Schmid
Grading tools for Advanced NLP (11-711)Grading tools for Advanced NLP (11-711)

Grading tools for Advanced NLP (11-711) Installation You'll need docker and unzip to use this repo. For docker, visit the official guide to get starte

Hao Zhu 2 Sep 27, 2022
:mag: Transformers at scale for question answering & neural search. Using NLP via a modular Retriever-Reader-Pipeline. Supporting DPR, Elasticsearch, HuggingFace's Modelhub...

Haystack is an end-to-end framework for Question Answering & Neural search that enables you to ... ... ask questions in natural language and find gran

deepset 6.4k Jan 9, 2023
Pytorch implementation of winner from VQA Chllange Workshop in CVPR'17

2017 VQA Challenge Winner (CVPR'17 Workshop) pytorch implementation of Tips and Tricks for Visual Question Answering: Learnings from the 2017 Challeng

Mark Dong 166 Dec 11, 2022
Subtitle Workshop (subshop): tools to download and synchronize subtitles

SUBSHOP Tools to download, remove ads, and synchronize subtitles. SUBSHOP Purpose Limitations Required Web Credentials Installation, Configuration, an

Joe D 4 Feb 13, 2022
Multilingual Emotion classification using BERT (fine-tuning). Published at the WASSA workshop (ACL2022).

XLM-EMO: Multilingual Emotion Prediction in Social Media Text Abstract Detecting emotion in text allows social and computational scientists to study h

MilaNLP 35 Sep 17, 2022
Python bindings to the dutch NLP tool Frog (pos tagger, lemmatiser, NER tagger, morphological analysis, shallow parser, dependency parser)

Frog for Python This is a Python binding to the Natural Language Processing suite Frog. Frog is intended for Dutch and performs part-of-speech tagging

Maarten van Gompel 46 Dec 14, 2022
💫 Industrial-strength Natural Language Processing (NLP) in Python

spaCy: Industrial-strength NLP spaCy is a library for advanced Natural Language Processing in Python and Cython. It's built on the very latest researc

Explosion 24.9k Jan 2, 2023
NLP, before and after spaCy

textacy: NLP, before and after spaCy textacy is a Python library for performing a variety of natural language processing (NLP) tasks, built on the hig

Chartbeat Labs Projects 2k Jan 4, 2023
Multilingual text (NLP) processing toolkit

polyglot Polyglot is a natural language pipeline that supports massive multilingual applications. Free software: GPLv3 license Documentation: http://p

RAMI ALRFOU 2.1k Jan 7, 2023
Basic Utilities for PyTorch Natural Language Processing (NLP)

Basic Utilities for PyTorch Natural Language Processing (NLP) PyTorch-NLP, or torchnlp for short, is a library of basic utilities for PyTorch NLP. tor

Michael Petrochuk 2.1k Jan 1, 2023
Official Stanford NLP Python Library for Many Human Languages

Stanza: A Python NLP Library for Many Human Languages The Stanford NLP Group's official Python NLP library. It contains support for running various ac

Stanford NLP 6.4k Jan 2, 2023

OlittleRer 运小筹公众号是致力于分享运筹优化(LP、MIP、NLP、随机规划、鲁棒优化)、凸优化、强化学习等研究领域的内容以及涉及到的算法的代码实现。编程语言和工具包括Java、Python、Matlab、CPLEX、Gurobi、SCIP 等。 关注我们: 运筹小公众号 有问题可以直接在

运小筹 151 Dec 30, 2022
NLP Core Library and Model Zoo based on PaddlePaddle 2.0

PaddleNLP 2.0拥有丰富的模型库、简洁易用的API与高性能的分布式训练的能力,旨在为飞桨开发者提升文本建模效率,并提供基于PaddlePaddle 2.0的NLP领域最佳实践。

null 6.9k Jan 1, 2023
💫 Industrial-strength Natural Language Processing (NLP) in Python

spaCy: Industrial-strength NLP spaCy is a library for advanced Natural Language Processing in Python and Cython. It's built on the very latest researc

Explosion 19.5k Feb 13, 2021
A very simple framework for state-of-the-art Natural Language Processing (NLP)

A very simple framework for state-of-the-art NLP. Developed by Humboldt University of Berlin and friends. IMPORTANT: (30.08.2020) We moved our models

flair 12.3k Dec 31, 2022
Data loaders and abstractions for text and NLP

torchtext This repository consists of: torchtext.data: Generic data loaders, abstractions, and iterators for text (including vocabulary and word vecto

null 3.2k Dec 30, 2022
An open-source NLP research library, built on PyTorch.

An Apache 2.0 NLP research library, built on PyTorch, for developing state-of-the-art deep learning models on a wide variety of linguistic tasks. Quic

AI2 11.4k Jan 1, 2023