Build an Amazon SageMaker Pipeline to Transform Raw Texts to A Knowledge Graph
This repository provides a pipeline to create a knowledge graph from raw texts. The pipeline concatenate major steps including:
- Data processing: transform labeled text data to the Subject-Predicate-Object (SPO) format
- Training: use a RNN-based algorithm to train an AI model to predict SPOs from given texts
- Create a Neptune database: if the training metric (F1-Score) passes the threshold, create a Neptune database
- Batch Transform: use the model trained in the
Training
step to do inferences on the test data - Bulk load: transform the inference results to the format which can be recognized by the
bulkload
function of Neptune, and load the transformed data to the Neptune database.
Prerequisites
- Create an AWS account or use an existing AWS account.
- Create a SageMaker Notebook instance. When you set up the notebook instance, you need to pay attention to following configurations:
- IAM role: you should attach policies of
AmazonSageMakerFullAccess
,IAMFullAccess
,AmazonS3FullAccess
,AmazonSNSFullAccess
andNeptuneFullAccess
to the IAM role. - Network: in order to access the Neptune database created in the pipeline, a VPC is required to run the notebook.
- IAM role: you should attach policies of
Security
See CONTRIBUTING for more information.
License
This library is licensed under the MIT-0 License. See the LICENSE file.