SageMaker Python SDK is an open source library for training and deploying machine learning models on Amazon SageMaker.

Amazon Web Services

Last update: Jan 1, 2023

Related tags

Overview

SageMaker Python SDK

SageMaker Python SDK is an open source library for training and deploying machine learning models on Amazon SageMaker.

With the SDK, you can train and deploy models using popular deep learning frameworks Apache MXNet and TensorFlow. You can also train and deploy models with Amazon algorithms, which are scalable implementations of core machine learning algorithms that are optimized for SageMaker and GPU training. If you have your own algorithms built into SageMaker compatible Docker containers, you can train and host models using these as well.

For detailed documentation, including the API reference, see Read the Docs.

Installing the SageMaker Python SDK

The SageMaker Python SDK is built to PyPI and can be installed with pip as follows:

pip install sagemaker

You can install from source by cloning this repository and running a pip install command in the root directory of the repository:

git clone https://github.com/aws/sagemaker-python-sdk.git
cd sagemaker-python-sdk
pip install .

Supported Operating Systems

SageMaker Python SDK supports Unix/Linux and Mac.

Supported Python Versions

SageMaker Python SDK is tested on:

Python 3.6
Python 3.7
Python 3.8

AWS Permissions

As a managed service, Amazon SageMaker performs operations on your behalf on the AWS hardware that is managed by Amazon SageMaker. Amazon SageMaker can perform only operations that the user permits. You can read more about which permissions are necessary in the AWS Documentation.

The SageMaker Python SDK should not require any additional permissions aside from what is required for using SageMaker. However, if you are using an IAM role with a path in it, you should grant permission for iam:GetRole.

Licensing

Running tests

SageMaker Python SDK has unit tests and integration tests.

You can install the libraries needed to run the tests by running pip install --upgrade .[test] or, for Zsh users: pip install --upgrade .\[test\]

Unit tests

We run unit tests with tox, which is a program that lets you run unit tests for multiple Python versions, and also make sure the code fits our style guidelines. We run tox with all of our supported Python versions, so to run unit tests with the same configuration we do, you need to have interpreters for those Python versions installed.

To run the unit tests with tox, run:

tox tests/unit

Integrations tests

To run the integration tests, the following prerequisites must be met

AWS account credentials are available in the environment for the boto3 client to use.
The AWS account has an IAM role named SageMakerRole. It should have the AmazonSageMakerFullAccess policy attached as well as a policy with the necessary permissions to use Elastic Inference.

We recommend selectively running just those integration tests you'd like to run. You can filter by individual test function names with:

tox -- -k 'test_i_care_about'

You can also run all of the integration tests by running the following command, which runs them in sequence, which may take a while:

tox -- tests/integ

You can also run them in parallel:

tox -- -n auto tests/integ

Git Hooks

to enable all git hooks in the .githooks directory, run these commands in the repository directory:

find .git/hooks -type l -exec rm {} \;
find .githooks -type f -exec ln -sf ../../{} .git/hooks/ \;

To enable an individual git hook, simply move it from the .githooks/ directory to the .git/hooks/ directory.

Building Sphinx docs

Setup a Python environment, and install the dependencies listed in doc/requirements.txt:

# conda
conda create -n sagemaker python=3.7
conda activate sagemaker
conda install sphinx=3.1.1 sphinx_rtd_theme=0.5.0

# pip
pip install -r doc/requirements.txt

Clone/fork the repo, and install your local version:

pip install --upgrade .

Then cd into the sagemaker-python-sdk/doc directory and run:

make html

You can edit the templates for any of the pages in the docs by editing the .rst files in the doc directory and then running make html again.

Preview the site with a Python web server:

cd _build/html
python -m http.server 8000

View the website by visiting http://localhost:8000

SageMaker SparkML Serving

With SageMaker SparkML Serving, you can now perform predictions against a SparkML Model in SageMaker. In order to host a SparkML model in SageMaker, it should be serialized with MLeap library.

For more information on MLeap, see https://github.com/combust/mleap .

Supported major version of Spark: 2.4 (MLeap version - 0.9.6)

Here is an example on how to create an instance of SparkMLModel class and use deploy() method to create an endpoint which can be used to perform prediction against your trained SparkML Model.

sparkml_model = SparkMLModel(model_data='s3://path/to/model.tar.gz', env={'SAGEMAKER_SPARKML_SCHEMA': schema})
model_name = 'sparkml-model'
endpoint_name = 'sparkml-endpoint'
predictor = sparkml_model.deploy(initial_instance_count=1, instance_type='ml.c4.xlarge', endpoint_name=endpoint_name)

Once the model is deployed, we can invoke the endpoint with a CSV payload like this:

payload = 'field_1,field_2,field_3,field_4,field_5'
predictor.predict(payload)

For more information about the different content-type and Accept formats as well as the structure of the schema that SageMaker SparkML Serving recognizes, please see SageMaker SparkML Serving Container.

Comments

feature: Adding serial inference pipeline support to RegisterModel Step
Issue #, if available: #2291, #2014, #2485

Description of changes: This change enables multiple models to be packed together and registered as a single model package version, so that the models could be run as an inference pipeline.

Updated the inputs for RegisterModelStep to also take a model object or a PipelineModel for Serial Inference pipeline

Performs repack in a loop for containers in a serial inference pipeline, if required.

Ability to invoke RegisterModelStep without an Estimator.

The environment variables are auto-updated by the framework classes into the model package

Testing done: test_register_model_sip - Able to add more than 1 model during registration of model package.

Merge Checklist

Put an x in the boxes that apply. You can also fill these out after creating the PR. If you're unsure about any of them, don't hesitate to ask. We're here to help! This is simply a reminder of what we are going to look for before merging your pull request.

General

[X] I have read the CONTRIBUTING doc

[X] I used the commit message format described in CONTRIBUTING

[ ] I have passed the region in to all S3 and STS clients that I've initialized as part of this change.

[ ] I have updated any necessary documentation, including READMEs and API docs (if appropriate)

Tests

[X] I have added tests that prove my fix is effective or that my feature works (if appropriate)

[X] I have checked that my tests are not configured for a specific region or account (if appropriate)

[ ] I have used unique_name_from_base to create resource names in integ tests (if appropriate)

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
opened by sreedes 198
feature: processors that support multiple Python files, requirements.txt, and dependencies.
Issue #, if available: #1248, #2117

Description of changes: Propose processing classes that are feature-parity with estimator. These classes allow SDK users to runn a Python job that consists of multiple Python scripts, requirements.txt and additional dependencies.

Documentation provided as docstrings.

Testing done: on my own AWS account, ran processing jobs using the proposed classes (FrameworkProcessor and its subclasses) -- the testing scripts are located here, and usage is as outlined as in #1248 (this comment).

Merge Checklist

Put an x in the boxes that apply. You can also fill these out after creating the PR. If you're unsure about any of them, don't hesitate to ask. We're here to help! This is simply a reminder of what we are going to look for before merging your pull request.

General

[x] I have read the CONTRIBUTING doc

[x] I used the commit message format described in CONTRIBUTING

[ ] I have passed the region in to all S3 and STS clients that I've initialized as part of this change.

[x] I have updated any necessary documentation, including READMEs and API docs (if appropriate)

Tests

[x] I have added tests that prove my fix is effective or that my feature works (if appropriate)

[x] I have checked that my tests are not configured for a specific region or account (if appropriate)

[ ] I have used unique_name_from_base to create resource names in integ tests (if appropriate)

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
opened by verdimrc 175

feature: Inferentia Neuron support for HuggingFace

Issue #, if available:

Description of changes:

Added the necessary changes to incorporate the inf-neuron/hf image into the huggigface framework

Testing done:

Locally tested the feature to extract image string with the following code, where the output matched with the released DLC image

import sagemaker, boto3
sagemaker.image_uris.retrieve(framework="huggingface",
    region=boto3.Session().region_name,
    version="4.12.3",py_version="py37",
    base_framework_version="pytorch1.9.1",
    inference_tool="neuron",
    image_scope="inference")
'763104351884.dkr.ecr.us-west-2.amazonaws.com/huggingface-pytorch-inference-neuron:1.9.1-transformers4.12.3-neuron-py37-sdk1.17.1-ubuntu18.04'

from sagemaker.huggingface import HuggingFaceModel
import sagemaker
import boto3
sess = sagemaker.Session()

role = sagemaker.get_execution_role()

huggingface_model = HuggingFaceModel(
    model_data='s3://sagemaker-us-west-2-267274314323/hf-sagemaker-inf/model.tar.gz',
	transformers_version='4.12',
	pytorch_version='1.9',
	py_version='py37',
	role=role, 
    sagemaker_session=sess
)

huggingface_model._is_compiled_model = True
huggingface_model.prepare_container_def("ml.inf.xlarge")

{'Image': '763104351884.dkr.ecr.us-west-2.amazonaws.com/huggingface-pytorch-inference-neuron:1.9.1-transformers4.12.3-neuron-py37-sdk1.17.1-ubuntu18.04', 'Environment': {'SAGEMAKER_PROGRAM': '', 'SAGEMAKER_SUBMIT_DIRECTORY': '', 'SAGEMAKER_CONTAINER_LOG_LEVEL': '20', 'SAGEMAKER_REGION': 'us-west-2'}, 'ModelDataUrl': 's3://hf-sagemaker-inference/inferentia/model.tar.gz'}

import sagemaker, boto3
from sagemaker.huggingface import HuggingFaceModel
sess = sagemaker.Session()
role = sagemaker.get_execution_role()
huggingface_model = HuggingFaceModel(
    model_data='s3://sagemaker-us-west-2-267274314323/hf-sagemaker-inf/model.tar.gz',                
    transformers_version='4.12.3',
    pytorch_version='1.9.1',
    py_version='py37',
    role=role, 
    sagemaker_session=sess)
huggingface_model._is_compiled_model = True
predictor = huggingface_model.deploy(initial_instance_count=1, instance_type="ml.inf1.xlarge")
huggingface_model.image_uri


'763104351884.dkr.ecr.us-west-2.amazonaws.com/huggingface-pytorch-inference-neuron:1.9.1-transformers4.12.3-neuron-py37-sdk1.17.1-ubuntu18.04'

Tested the changes in this PR locally, where it passed all the tests (attached the local output below)

JT:sagemaker-python-sdk jeniyat$ ./.githooks/pre-push
GLOB sdist-make: /Users/jeniyat/Desktop/HuggingFace/source_repo/sagemaker-python-sdk/setup.py
✔ OK black-check in 10.19 seconds
✔ OK twine in 12.616 seconds
✔ OK pylint in 22.618 seconds
✔ OK docstyle in 24.769 seconds
✔ OK flake8 in 41.238 seconds
__________________________________________________________________________________________________________________________________________________________ summary ___________________________________________________________________________________________________________________________________________________________
  flake8: commands succeeded
  pylint: commands succeeded
  docstyle: commands succeeded
  black-check: commands succeeded
  twine: commands succeeded
  congratulations :)
=================== flake8,pylint,docstyle,black-check,twine execution time ===================
44 seconds

GLOB sdist-make: /Users/jeniyat/Desktop/HuggingFace/source_repo/sagemaker-python-sdk/setup.py
✔ OK doc8 in 7.752 seconds
✔ OK sphinx in 4 minutes, 57.083 seconds
__________________________________________________________________________________________________________________________________________________________ summary ___________________________________________________________________________________________________________________________________________________________
  sphinx: commands succeeded
  doc8: commands succeeded
  congratulations :)
=================== sphinx,doc8 execution time ===================
4 minutes and 59 seconds

Merge Checklist

Put an x in the boxes that apply. You can also fill these out after creating the PR. If you're unsure about any of them, don't hesitate to ask. We're here to help! This is simply a reminder of what we are going to look for before merging your pull request.

General

[x] I have read the CONTRIBUTING doc
[x] I certify that the changes I am introducing will be backword compatible, and I have discussed concerns about this, if any, with the Python SDK team
[x] I used the commit message format described in CONTRIBUTING
[x] I have passed the region in to all S3 and STS clients that I've initialized as part of this change.
[x] I have updated any necessary documentation, including READMEs and API docs (if appropriate)

Tests

[x] I have added tests that prove my fix is effective or that my feature works (if appropriate)
[x] I have added unit and/or integration tests as appropriate to ensure backward compatibility of the changes
[x] I have checked that my tests are not configured for a specific region or account (if appropriate)
[x] I have used unique_name_from_base to create resource names in integ tests (if appropriate)

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

HuggingFace

opened by jeniyat 136

feature: Add ModelStep for SageMaker Model Building Pipeline
Description of changes: feature: Add ModelStep for SageMaker Model Building Pipeline

Testing done: unit tests and integ tests

Merge Checklist

Put an x in the boxes that apply. You can also fill these out after creating the PR. If you're unsure about any of them, don't hesitate to ask. We're here to help! This is simply a reminder of what we are going to look for before merging your pull request.

General

[X] I have read the CONTRIBUTING doc

[X] I certify that the changes I am introducing will be backword compatible, and I have discussed concerns about this, if any, with the Python SDK team

[X] I used the commit message format described in CONTRIBUTING

[X] I have passed the region in to all S3 and STS clients that I've initialized as part of this change.

[ ] I have updated any necessary documentation, including READMEs and API docs (if appropriate)

Tests

[X] I have added tests that prove my fix is effective or that my feature works (if appropriate)

[X] I have added unit and/or integration tests as appropriate to ensure backward compatibility of the changes

[X] I have checked that my tests are not configured for a specific region or account (if appropriate)

[X] I have used unique_name_from_base to create resource names in integ tests (if appropriate)

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
opened by qidewenwhen 113
change: add type annotations for Lineage
Issue #, if available: https://sim.amazon.com/issues/AML-96242

Description of changes: Added Type Annotations for .py files under lineage directory

Testing done:

tox -e py39 -- tests/unit/sagemaker/lineage

collected 45 items

tests/unit/sagemaker/lineage/test_action.py ........... [ 24%] tests/unit/sagemaker/lineage/test_artifact.py ............ [ 51%] tests/unit/sagemaker/lineage/test_association.py ....... [ 66%] tests/unit/sagemaker/lineage/test_context.py ........... [ 91%] tests/unit/sagemaker/lineage/test_dataset_artifact.py . [ 93%] tests/unit/sagemaker/lineage/test_endpoint_context.py . [ 95%] tests/unit/sagemaker/lineage/test_model_artifact.py . [ 97%] tests/unit/sagemaker/lineage/test_visualizer.py . [100%]

-> tox -e py39 -- tests/integ/sagemaker/lineage

collected 28 items

tests/integ/sagemaker/lineage/test_action.py ....... [ 25%] tests/integ/sagemaker/lineage/test_artifact.py ........ [ 53%] tests/integ/sagemaker/lineage/test_association.py sss [ 64%] tests/integ/sagemaker/lineage/test_context.py ....... [ 89%] tests/integ/sagemaker/lineage/test_dataset_artifact.py . [ 92%] tests/integ/sagemaker/lineage/test_endpoint_context.py . [ 96%] tests/integ/sagemaker/lineage/test_model_artifact.py . [100%]

Merge Checklist

Put an x in the boxes that apply. You can also fill these out after creating the PR. If you're unsure about any of them, don't hesitate to ask. We're here to help! This is simply a reminder of what we are going to look for before merging your pull request.

General

[ ] I have read the CONTRIBUTING doc

[ ] I used the commit message format described in CONTRIBUTING

[ ] I have passed the region in to all S3 and STS clients that I've initialized as part of this change.

[ ] I have updated any necessary documentation, including READMEs and API docs (if appropriate)

Tests

[ ] I have added tests that prove my fix is effective or that my feature works (if appropriate)

[ ] I have checked that my tests are not configured for a specific region or account (if appropriate)

[ ] I have used unique_name_from_base to create resource names in integ tests (if appropriate)

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
opened by stisac 106
change: add data wrangler image uri
Issue #, if available: N/A

Description of changes:

Adding SageMaker Data Wrangler image URL configs.

Testing done: tox

Merge Checklist

Put an x in the boxes that apply. You can also fill these out after creating the PR. If you're unsure about any of them, don't hesitate to ask. We're here to help! This is simply a reminder of what we are going to look for before merging your pull request.

General

[x] I have read the CONTRIBUTING doc

[x] I used the commit message format described in CONTRIBUTING

[] I have passed the region in to all S3 and STS clients that I've initialized as part of this change.

[ ] I have updated any necessary documentation, including READMEs and API docs (if appropriate)

Tests

[x] I have added tests that prove my fix is effective or that my feature works (if appropriate)

[ ] I have checked that my tests are not configured for a specific region or account (if appropriate)

[ ] I have used unique_name_from_base to create resource names in integ tests (if appropriate)

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
opened by chenliu0831 103
feature: add HuggingFace model and predictor
Issue #, if available:

Description of changes: created model.py for HF. updated image uris and integ test. Testing done: unit tests

Merge Checklist

Put an x in the boxes that apply. You can also fill these out after creating the PR. If you're unsure about any of them, don't hesitate to ask. We're here to help! This is simply a reminder of what we are going to look for before merging your pull request.

General

[x] I have read the CONTRIBUTING doc

[x] I used the commit message format described in CONTRIBUTING

[ ] I have passed the region in to all S3 and STS clients that I've initialized as part of this change.

[ ] I have updated any necessary documentation, including READMEs and API docs (if appropriate)

Tests

[x] I have added tests that prove my fix is effective or that my feature works (if appropriate)

[x] I have checked that my tests are not configured for a specific region or account (if appropriate)

[ ] I have used unique_name_from_base to create resource names in integ tests (if appropriate)

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
opened by ahsan-z-khan 102
fix: jumpstart amt tracking
Issue #, if available:

Description of changes: Hyperparameter Tuning jobs launched with JumpStart artifacts (scripts, model) will be tagged with the artifact uris and the base name will also be modified to "sagemaker-jumpstart".

Testing done:

Merge Checklist

Put an x in the boxes that apply. You can also fill these out after creating the PR. If you're unsure about any of them, don't hesitate to ask. We're here to help! This is simply a reminder of what we are going to look for before merging your pull request.

General

[ ] I have read the CONTRIBUTING doc

[ ] I certify that the changes I am introducing will be backword compatible, and I have discussed concerns about this, if any, with the Python SDK team

[ ] I used the commit message format described in CONTRIBUTING

[ ] I have passed the region in to all S3 and STS clients that I've initialized as part of this change.

[ ] I have updated any necessary documentation, including READMEs and API docs (if appropriate)

Tests

[ ] I have added tests that prove my fix is effective or that my feature works (if appropriate)

[ ] I have added unit and/or integration tests as appropriate to ensure backward compatibility of the changes

[ ] I have checked that my tests are not configured for a specific region or account (if appropriate)

[ ] I have used unique_name_from_base to create resource names in integ tests (if appropriate)

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
opened by evakravi 100
feature: Support for remote docker host
*Issue #, if available:

*Description of changes: This PR adds support for remote docker host when using SageMaker Python SDK in local mode by changing hardcoded value "localhost" with sagemaker.utils.local.get_docker_host(). This allows using remote docker server.

Testing done:

Merge Checklist

Put an x in the boxes that apply. You can also fill these out after creating the PR. If you're unsure about any of them, don't hesitate to ask. We're here to help! This is simply a reminder of what we are going to look for before merging your pull request.

General

[x] I have read the CONTRIBUTING doc

[x] I certify that the changes I am introducing will be backword compatible, and I have discussed concerns about this, if any, with the Python SDK team

[x] I used the commit message format described in CONTRIBUTING

[x] I have passed the region in to all S3 and STS clients that I've initialized as part of this change.

[x] I have updated any necessary documentation, including READMEs and API docs (if appropriate)

Tests

[x] I have added tests that prove my fix is effective or that my feature works (if appropriate)

[x] I have added unit and/or integration tests as appropriate to ensure backward compatibility of the changes

[x] I have checked that my tests are not configured for a specific region or account (if appropriate)

[x] I have used unique_name_from_base to create resource names in integ tests (if appropriate)

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
opened by awssamdwar 100
Lzao/debugger rule removal
Issue #, if available: This is a draft.

Description of changes:

This changes adds a new field DisableProfiler in ProfilerConfig. It also disables the default profiler rules. The changes can be merged after the service package changes are deployed.

Testing done: All the relevant unit/integration tests are updated accordingly to reflect the changes.

Merge Checklist

Put an x in the boxes that apply. You can also fill these out after creating the PR. If you're unsure about any of them, don't hesitate to ask. We're here to help! This is simply a reminder of what we are going to look for before merging your pull request.

General

[x] I have read the CONTRIBUTING doc

[ ] I certify that the changes I am introducing will be backword compatible, and I have discussed concerns about this, if any, with the Python SDK team

[x] I used the commit message format described in CONTRIBUTING

[ ] I have passed the region in to all S3 and STS clients that I've initialized as part of this change.

[ ] I have updated any necessary documentation, including READMEs and API docs (if appropriate)

Tests

[x] I have added tests that prove my fix is effective or that my feature works (if appropriate)

[ ] I have added unit and/or integration tests as appropriate to ensure backward compatibility of the changes

[ ] I have checked that my tests are not configured for a specific region or account (if appropriate)

[ ] I have used unique_name_from_base to create resource names in integ tests (if appropriate)

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
do-not-merge
opened by zaoliu-aws 99
fix: update `sagemaker.serverless` integration test
Issue #, if available:

Description of changes: "CODEBUILD_BUILD_ID" not in os.environ was evaluating True during builds, so the test was getting skipped.

Made two fixes:

Stored cat image in S3 (it was previously hosted at some third-party website)

Changed the condition for skipping the test

Note that this branch is based on the branch bveeramani:rename-delete-endpoint. See #2529.

Testing done: Integration test.

Merge Checklist

Put an x in the boxes that apply. You can also fill these out after creating the PR. If you're unsure about any of them, don't hesitate to ask. We're here to help! This is simply a reminder of what we are going to look for before merging your pull request.

General

[x] I have read the CONTRIBUTING doc

[x] I used the commit message format described in CONTRIBUTING

[ ] I have passed the region in to all S3 and STS clients that I've initialized as part of this change.

[ ] I have updated any necessary documentation, including READMEs and API docs (if appropriate)

Tests

[x] I have added tests that prove my fix is effective or that my feature works (if appropriate)

[ ] I have checked that my tests are not configured for a specific region or account (if appropriate)

[x] I have used unique_name_from_base to create resource names in integ tests (if appropriate)

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
bug serverless
opened by bveeramani 92
ParameterString should be able to be an empty string

Describe the feature you'd like sagemaker.workflow.parameters.ParameterString should be able to be an empty string as a default value, and also be able to receive empty string as an input. In sagemaker local mode we are able to put empty string as parameter to the script processor, so it is unclear why in local mode it is working but in the cloud we will get an error.

Describe alternatives you've considered We are putting the string "(empty)" to represent empty values, Seems redundant.

opened by idanmoradarthas 0
Lambda boto3 wrappers doesn't have all the option that we need to create a lambda
Describe the feature you'd like We need that sagemaker.lambda_helper.Lambda will have more options like the aws lambda create-function cli command. The missing options that we need are:

environment

vpc-config

architectures

How would this feature be used? Please describe. The class sagemaker.lambda_helper.Lambda will have the missing attributes.

Describe alternatives you've considered What we do today is to create the lambda as part of the automation of our pipeline object upsert in codepipline, and we have a step in the pipeline for the same particular lambda in order it to be execute as part of the pipeline.

Additional context https://docs.aws.amazon.com/cli/latest/reference/lambda/create-function.html https://sagemaker.readthedocs.io/en/stable/api/utility/lambda_helper.html
opened by idanmoradarthas 0
I have lost all my jobs / runs / trials since last week

Hello,

I am not sure if this related to a recent update but as of last week there was an update in the UI and now I can't find the tab with "Experiments / trial names / jobs". There's only an "Experiment" tab and it says "no Jobs" even though there was a lot of jobs before.

Also when i try to read the doc there is nothing that looks like what I was using before. Here are images of the script I was using to launch my jobs.

Can anyone explain what is happening ?

bug

opened by arminvburren 0
ScriptProcessor does not check local_code config before uploading code to S3
Describe the bug When a LocalSession or LocalPipelineSession is configured to use local code, as follows

session.config = {'local': {'local_code': True}}

the code passed to a pipeline ProcessingStep or directly to the run method of a processor (ScriptProcessor, FrameworkProcessor, ...) should not be uploaded to S3.

However, ScriptProcessor does not honor this. Its _include_code_in_inputs method (which is called unconditionally by the _normalize_args of the base class Processor, which in turn is called both when running directly and through a pipeline) unconditionally tries to upload the code to S3. https://github.com/aws/sagemaker-python-sdk/blob/554952eac259979dc714a1a9002653ced342b876/src/sagemaker/processing.py#L625

Compare this to the Model class, used for example in the TrainingStep. Its _upload_code method checks the session configuration and does not upload to S3 when local code is enabled. https://github.com/aws/sagemaker-python-sdk/blob/554952eac259979dc714a1a9002653ced342b876/src/sagemaker/model.py#L532

To reproduce In the absence of any AWS credentials (which should not be needed when running completely locally), the following code will fail to upload the processing.py script to S3 (botocore.exceptions.NoCredentialsError). Note that, in addition to the following code, a processing.py file must exist in the working directory (but its contents don't matter).

Code

import boto3 import sagemaker from sagemaker.workflow.pipeline import Pipeline from sagemaker.workflow.pipeline_context import LocalPipelineSession from sagemaker.processing import ProcessingInput, ProcessingOutput, ScriptProcessor from sagemaker.workflow.steps import ProcessingStep role = 'arn:aws:iam::123456789012:role/MyRole' local_pipeline_session = LocalPipelineSession(boto_session = boto3.Session(region_name = 'eu-west-1')) local_pipeline_session.config = {'local': {'local_code': True}} script_processor = ScriptProcessor( image_uri = 'docker.io/library/python:3.8', command = ['python'], instance_type = 'local', instance_count = 1, sagemaker_session = local_pipeline_session, role = role, ) processing_step = ProcessingStep( name = 'Processing Step', processor = script_processor, code = 'processing.py', inputs = [ ProcessingInput( source = './input-data', destination = '/opt/ml/processing/input', ) ], outputs = [ ProcessingOutput( source = '/opt/ml/processing/output', destination = './output-data', ) ], ) pipeline = Pipeline( name = 'MyPipeline', steps = [processing_step], sagemaker_session = local_pipeline_session ) pipeline.upsert(role_arn = role) pipeline_run = pipeline.start()

System information A description of your system. Please provide:

SageMaker Python SDK version: 2.126.0

bug
opened by lodo1995 0
Fixed hashing problem for frameworkprocessors with identical source d…
Describe the bug When a pipeline has more than one FrameworkProcessor which have an identical source directory and identical dependencies but different entry points, then some of these FrameworkProcessors will use the incorrect entry point. They will all use the same entry point, the one of the processing job (which uses a FrameworkProcessor) that was created last during pipeline creation.

To reproduce Create a pipeline which has multiple steps that have a FrameworkProcessor. Make sure these processors use the same source directory and have the same dependencies (at least those dependencies which are used in generating the hash which is used for the s3 uri, they are specified in src/workflow/utilities.py in the get_code_hash method) but use a different entry point. What you will see is that all of the processing jobs, using a FrameworkProcessors with these nearly identical arguments, will generate the same s3 uri for uploading of the source directory (sourcedir.tar.gz) and run script (runproc). Hence, all but the last of these processing jobs will be using the wrong entry point (the source directory will also be overwritten every time but since they are all equal this doesn't lead to a problem) as they are overwritten every time.

Expected behavior That each of the processing jobs starts executing the entry point specified in the processor.

System information A description of your system. Please provide:

SageMaker Python SDK version: 2.125.0

Framework name (eg. PyTorch) or algorithm (eg. KMeans): TensorFlow

Framework version: 2.9

Python version: 3.9

CPU or GPU: CPU

Custom Docker image (Y/N): N

Additional context I have written a fix for my problem which involves including the code in the hash that generates the s3 uri to ensure that the uris are different. After testing my pipeline with the fix, the problem no longer occurred.

Description of changes: I have included an if statement which checks whether code is defined, if it is it will try to use it to generate the hash, alongside the already existing inputs.

Testing done: Tested using a pipeline that could not execute entirely as a result of the described issue, after the fix it was able to execute.

Merge Checklist

Put an x in the boxes that apply. You can also fill these out after creating the PR. If you're unsure about any of them, don't hesitate to ask. We're here to help! This is simply a reminder of what we are going to look for before merging your pull request.

General

[ ] I have read the CONTRIBUTING doc

[ ] I certify that the changes I am introducing will be backward compatible, and I have discussed concerns about this, if any, with the Python SDK team

[ ] I used the commit message format described in CONTRIBUTING

[ ] I have passed the region in to all S3 and STS clients that I've initialized as part of this change.

[ ] I have updated any necessary documentation, including READMEs and API docs (if appropriate)

Tests

[ ] I have added tests that prove my fix is effective or that my feature works (if appropriate)

[ ] I have added unit and/or integration tests as appropriate to ensure backward compatibility of the changes

[ ] I have checked that my tests are not configured for a specific region or account (if appropriate)

[ ] I have used unique_name_from_base to create resource names in integ tests (if appropriate)

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
opened by SeppeHannen 0

TransformStep transforms files it should not

Describe the bug When using the output of a processing step as input to a transformation step (step_process.properties.ProcessingOutputConfig.Outputs["test"].S3Output.S3Uri), the transformation step transforms files from directories at the same level as the input directory (test_small_sample_with_info). It should not.

To reproduce

[...]
step_process_args = pyspark_processor.run(
    submit_app="source/preprocess.py",
    submit_py_files=["source/preprocess_utils.py",
                     "source/spark_utils.py"],
    outputs=[
        ProcessingOutput(
            output_name="train",
            source="/opt/ml/processing/output/train",
            destination=Join(on="/", values=[s3_training_pipeline_preprocess_output_path, "train"]),
        ),
        ProcessingOutput(
            output_name="validation",
            source="/opt/ml/processing/output/validation",
            destination=Join(on="/", values=[s3_training_pipeline_preprocess_output_path, "validation"]),
        ),
        ProcessingOutput(
            output_name="test",
            source="/opt/ml/processing/output/test",
            destination=Join(on="/", values=[s3_training_pipeline_preprocess_output_path, "test"]),
        ),
        ProcessingOutput(
            output_name="test_small_sample",
            source="/opt/ml/processing/output/test_small_sample",
            destination=Join(on="/", values=[s3_training_pipeline_preprocess_output_path, "test_small_sample"]),
        ),
        ProcessingOutput(
            output_name="train_with_info",
            source="/opt/ml/processing/output/train_with_info",
            destination=Join(on="/", values=[s3_training_pipeline_preprocess_output_path, "train_with_info"]),
        ),
        ProcessingOutput(
            output_name="validation_with_info",
            source="/opt/ml/processing/output/validation_with_info",
            destination=Join(on="/", values=[s3_training_pipeline_preprocess_output_path, "validation_with_info"]),
        ),
        ProcessingOutput(
            output_name="test_with_info",
            source="/opt/ml/processing/output/test_with_info",
            destination=Join(on="/", values=[s3_training_pipeline_preprocess_output_path, "test_with_info"]),
        ),
        ProcessingOutput(
            output_name="test_small_sample_with_info",
            source="/opt/ml/processing/output/test_small_sample_with_info",
            destination=Join(on="/", values=[s3_training_pipeline_preprocess_output_path, "test_small_sample_with_info"]),
        ),
    ],
    arguments=[
        "--aws_account",
        aws_account,
        "--aws_env",
        aws_env,
        "--project_name",
        project_name,
        "--mode",
        "training",
    ],
)

step_process = ProcessingStep(
    name="PySparkPreprocessing", step_args=step_process_args,

[...]

transformer = Transformer(
    model_name=model_step.properties.ModelName,
    instance_count=transformer_instance_count,
    instance_type=transformer_instance_type,
    strategy="MultiRecord",
    assemble_with="Line",
    output_path=s3_training_pipeline_transform_output_path,
    accept="text/csv",
    max_concurrent_transforms=max_concurrent_transforms,
    max_payload=max_payload,
    sagemaker_session=pipeline_session,
    base_transform_job_name=[MASKED],
)

step_transform = TransformStep(
    name=[MASKED],
    transformer=transformer,
    inputs=TransformInput(
        data=step_process.properties.ProcessingOutputConfig.Outputs["test"].S3Output.S3Uri,
        content_type="text/csv",
        split_type="Line",
        input_filter="$[1:]",
    ),
    depends_on=[model_step],
    cache_config=cache_config,
)

Expected behavior Transform only the csv file in the test dir.

Screenshots or logs

test_small_sample_with_info should not be transformed.

System information A description of your system. Please provide:

SageMaker Python SDK version: 2.125.0
Framework name (eg. PyTorch) or algorithm (eg. KMeans):
Framework version:
Python version: 3.9.12
CPU or GPU: CPU
Custom Docker image (Y/N): Y

Additional context A workaround consists in specifying the URI to the csv file like this : inputs=TransformInput( data=Join( on="/", values=[step_process.properties.ProcessingOutputConfig.Outputs["test"].S3Output.S3Uri, "data.csv"], ), [...] But it is not the way proposed in this example : sagemaker-pipeline-model-monitor-clarify-steps

bug

opened by HarryPommier 0

Releases(v2.126.0)

v2.126.0(Dec 22, 2022)
Features

AutoGluon 0.6.1 image_uris

Bug Fixes and Other Changes

Fix broken link in doc

Do not specify S3 path for disabled profiler

Documentation Changes

fix the incorrect property reference

Source code(tar.gz)
Source code(zip)
v2.125.0(Dec 19, 2022)
Features

add RandomSeed to support reproducible HPO

Bug Fixes and Other Changes

Correct SageMaker Clarify API docstrings by changing JSONPath to JMESPath

Source code(tar.gz)
Source code(zip)
v2.124.0(Dec 16, 2022)
Features

Doc update for TableFormatEnum

Add p4de to smddp supported instance types

Add disable_profiler field in config and propagate changes

Added doc update for dataset builder

Bug Fixes and Other Changes

Use Async Inference Config when available for endpoint update

Documentation Changes

smdistributed libraries release notes

Source code(tar.gz)
Source code(zip)
v2.123.0(Dec 15, 2022)
Features

Add support for TF2.9.2 training images

Add SageMaker Experiment

Source code(tar.gz)
Source code(zip)
v2.122.0(Dec 14, 2022)
Features

Feature Store dataset builder, delete_record, get_record, list_feature_group

Add OSU region to frameworks for DLC

Bug Fixes and Other Changes

the Hyperband support fix for the HPO

unpin packaging version

Remove content type image/jpg from analysis configuration schema

Source code(tar.gz)
Source code(zip)
v2.121.2(Dec 12, 2022)
Bug Fixes and Other Changes

Update for Tensorflow Serving 2.11 inference DLCs

Revert "fix: type hint of PySparkProcessor init"

Skip Bad Transform Test

Source code(tar.gz)
Source code(zip)
v2.121.1(Dec 9, 2022)
Bug Fixes and Other Changes

Pop out ModelPackageName from pipeline definition

Fix failing jumpstart cache unit tests

Source code(tar.gz)
Source code(zip)
v2.121.0(Dec 8, 2022)
Features

Algorithms Region Expansion OSU/DXB

Bug Fixes and Other Changes

FrameworkProcessor S3 uploads

Add constraints file for apache-airflow

Source code(tar.gz)
Source code(zip)
v2.120.0(Dec 7, 2022)
Features

Add Neo image uri config for Pytorch 1.12

Adding support for SageMaker Training Compiler in PyTorch estimator starting 1.12

Update registries with new region account number mappings.

Add DXB region to frameworks by DLC

Bug Fixes and Other Changes

support idempotency for framework and spark processors

Source code(tar.gz)
Source code(zip)
v2.119.0(Dec 3, 2022)
Features

Add Code Owners file

Added transform with monitoring pipeline step in transformer

Update TF 2.9 and TF 2.10 inference DLCs

make estimator accept json file as modelparallel config

SageMaker Training Compiler does not support p4de instances

Add support for SparkML v3.3

Bug Fixes and Other Changes

Fix bug forcing uploaded tar to be named sourcedir

Update local_requirements.txt PyYAML version

refactoring : using with statement

Allow Py 3.7 for MMS Test Docker env

fix PySparkProcessor init params type

type hint of PySparkProcessor init

Return ARM XGB/SKLearn tags if image_scope is inference_graviton

Update scipy to 1.7.3 to support M1 development envs

Fixing type hints for Spark processor that has instance type/count params in reverse order

Add DeepAR ap-northeast-3 repository.

Fix AsyncInferenceConfig documentation typo

fix ml_inf to ml_inf1 in Neo multi-version support

Fix type annotations

add neo mvp region accounts

Source code(tar.gz)
Source code(zip)
v2.118.0(Dec 1, 2022)
Features

Update boto3 version to 1.26.20

support table format option for create feature group.

Support Amazon SageMaker Model Cards

support monitoring alerts api

Support Amazon SageMaker AutoMLStep

Bug Fixes and Other Changes

integration test in anticipate of ProfilerConfig API changes

Add more integ test logic for AutoMLStep

update get_execution_role_arn to use role from DefaultSpaceSettings

bug on AutoMLInput to allow PipelineVariable

FinalMetricDataList is missing from the training job search resu…

add integration tests for Model Card

update AutoMLStep with cache improvement

Documentation Changes

automlstep doc update

Source code(tar.gz)
Source code(zip)
v2.117.0(Nov 15, 2022)
Features

add support for PT1.12.1

Source code(tar.gz)
Source code(zip)
v2.116.0(Oct 28, 2022)
Features

support customized timeout for model data download and inference container startup health check for Hosting Endpoints

Trainium Neuron support for PyTorch

Pipelines cache keys update

Caching Improvements for SM Pipeline Workflows

Source code(tar.gz)
Source code(zip)
v2.115.0(Oct 27, 2022)
Features

Add support for TF 2.10 training

Disable profiler for Trainium instance type

support the Hyperband strategy with the StrategyConfig

support the GridSearch strategy for hyperparameter optimization

Bug Fixes and Other Changes

Update Graviton supported instance families

Source code(tar.gz)
Source code(zip)
v2.114.0(Oct 26, 2022)
Features

Graviton support for XGB and SKLearn frameworks

Graviton support for PyTorch and Tensorflow frameworks

do not expand estimator role when it is pipeline parameter

added support for batch transform with model monitoring

Bug Fixes and Other Changes

regex in tuning integs

remove debugger environment var set up

adjacent slash in s3 key

Fix Repack step auto install behavior

Add retry for airflow ParsingError

Documentation Changes

doc fix

Source code(tar.gz)
Source code(zip)
v2.113.0(Oct 21, 2022)
Features

support torch_distributed distribution for Trainium instances

Bug Fixes and Other Changes

bump apache-airflow from 2.4.0 to 2.4.1 in /requirements/extras

Documentation Changes

fix kwargs and descriptions of the smdmp checkpoint function

add the doc for the MonitorBatchTransformStep

Source code(tar.gz)
Source code(zip)
v2.112.2(Oct 11, 2022)
Bug Fixes and Other Changes

Update Neo-TF2.x versions to TF2.9(.2)

Documentation Changes

fix typo in PR template

Source code(tar.gz)
Source code(zip)
v2.112.1(Oct 10, 2022)
Bug Fixes and Other Changes

fix(local-mode): loosen docker requirement to allow 6.0.0

CreateModelPackage API error for Scikit-learn and XGBoost frameworkss

Source code(tar.gz)
Source code(zip)
v2.112.0(Oct 9, 2022)
Features

added monitor batch transform step (pipeline)

Bug Fixes and Other Changes

Add PipelineVariable annotation to framework estimators

Source code(tar.gz)
Source code(zip)
v2.111.0(Oct 5, 2022)
Features

Edit test file for supporting TF 2.10 training

Bug Fixes and Other Changes

support kms key in processor pack local code

security issue by bumping apache-airflow from 2.3.4 to 2.4.0

instance count retrieval logic

Add regex for short-form sagemaker-xgboost tags

Upgrade attrs>=20.3.0,<23

Add PipelineVariable annotation to Amazon estimators

Documentation Changes

add context for pytorch

Source code(tar.gz)
Source code(zip)
v2.110.0(Sep 27, 2022)
Features

Support KeepAlivePeriodInSeconds for Training APIs

added ANALYSIS_CONFIG_SCHEMA_V1_0 in clarify

add model monitor image accounts for ap-southeast-3

Bug Fixes and Other Changes

huggingface release test

Fixing the logic to return instanceCount for heterogeneousClusters

Disable type hints in doc signature and add PipelineVariable annotations in docstring

estimator hyperparameters in script mode

Documentation Changes

Added link to example notebook for Pipelines local mode

Source code(tar.gz)
Source code(zip)
v2.109.0(Sep 9, 2022)
Features

add search filters

Bug Fixes and Other Changes

local pipeline step argument parsing bug

support fail_on_violation flag for check steps

fix links per app security scan

Add PipelineVariable annotation for all processor subclasses

Documentation Changes

the SageMaker model parallel library 1.11.0 release

Source code(tar.gz)
Source code(zip)
v2.108.0(Sep 2, 2022)
Features

Adding support in HuggingFace estimator for Training Compiler enhanced PyTorch 1.11

Bug Fixes and Other Changes

add sagemaker clarify image account for cgk region

set PYTHONHASHSEED env variable to fixed value to fix intermittent failures in release pipeline

trcomp fixtures to override default fixtures for integ tests

Documentation Changes

add more info about volume_size

Source code(tar.gz)
Source code(zip)
v2.107.0(Aug 29, 2022)
Features

support python 3.10, update airflow dependency

Bug Fixes and Other Changes

Add retry in session.py to check if training is finished

Documentation Changes

remove Other tab in Built-in algorithms section and mi…

Source code(tar.gz)
Source code(zip)
v2.106.0(Aug 24, 2022)
Features

Implement Kendra Search in RTD website

Bug Fixes and Other Changes

Add primitive_or_expr() back to conditions

remove specifying env-vars when creating model from model package

Add CGK in config for Spark Image

Source code(tar.gz)
Source code(zip)
v2.105.0(Aug 19, 2022)
Features

Added endpoint_name to clarify.ModelConfig

adding workgroup functionality to athena query

Bug Fixes and Other Changes

disable debugger/profiler in cgk region

using unique name for lineage test to unblock PR checks

Documentation Changes

update first-party algorithms and structural updates

Source code(tar.gz)
Source code(zip)
v2.104.0(Aug 17, 2022)
Features

local mode executor implementation

Pipelines local mode setup

Add PT 1.12 support

added _AnalysisConfigGenerator for clarify

Bug Fixes and Other Changes

yaml safe_load sagemaker config

pipelines local mode minor bug fixes

add local mode integ tests

implement local JsonGet function

Add Pipeline annotation in model base class and tensorflow estimator

Allow users to customize trial component display names for pipeline launched jobs

Update localmode code to decode urllib response as UTF8

Documentation Changes

New content for Pipelines local mode

Correct documentation error

Source code(tar.gz)
Source code(zip)
v2.103.0(Aug 5, 2022)
Features

AutoGluon 0.4.3 and 0.5.2 image_uris

Bug Fixes and Other Changes

Revert "change: add a check to prevent launching a modelparallel job on CPU only instances"

Add gpu capability to local

Link PyTorch 1.11 to 1.11.0

Source code(tar.gz)
Source code(zip)
v2.102.0(Aug 4, 2022)
Features

add warnings for xgboost specific rules in debugger rules

Add PyTorch DDP distribution support

Add test for profiler enablement with debugger_hook false

Bug Fixes and Other Changes

Two letter language code must be supported

add a check to prevent launching a modelparallel job on CPU only instances

Allow StepCollection added in ConditionStep to be depended on

Add PipelineVariable annotation in framework models

skip managed spot training mxnet nb

Documentation Changes

smdistributed libraries currency updates

Source code(tar.gz)
Source code(zip)
v2.101.1(Jul 28, 2022)
Bug Fixes and Other Changes

added more ml frameworks supported by SageMaker Workflows

test: Vspecinteg2

Add PipelineVariable annotation in amazon models

Source code(tar.gz)
Source code(zip)