Pachyderm: The Data Foundation for Machine Learning
Pachyderm provides the data layer that allows machine learning teams to productionize and scale their machine learning lifecycle. With Pachyderm’s industry leading data versioning, pipelines and lineage teams gain data driven automation, petabyte scalability and end-to-end reproducibility. Teams using Pachyderm get their ML projects to market faster, lower data processing and storage costs, and can more easily meet regulatory compliance requirements
Features
- Automated Data Versioning: Pachyderm’s Data Versioning gives teams an automated and performant way to keep track of all data changes.
- Data-Driven Pipelines: Pachyderm’s Containerized Pipelines speed data processing while lowering compute costs.
- Immutable Data Lineage: Pachyderm’s data lineage provides an immutable record for all activities and assets in the ML lifecycle.
- Console: The Pachyderm Console provides an intuitive visualization of your DAG (directed acyclic graph), and aids in reproducibility.
- Notebooks: Pachyderm Notebooks provide an easy way to interact with Pachyderm data versioning and pipelines via Jupyter notebooks.
Getting Started
To start deploying your end-to-end version-controlled data pipelines, try us for free on Hub with little to no setup or run Pachyderm locally. You can also deploy on AWS/GCE/Azure in about 5 minutes.
You can also refer to our complete documentation to see tutorials, check out example projects, and learn about advanced features of Pachyderm.
If you'd like to see some examples and learn about core use cases for Pachyderm:
Documentation
Community
Keep up to date and get Pachyderm support via:
- Follow us on Twitter.
- Join our community Slack Channel to get help from the Pachyderm team and other users.
Contributing
To get started, sign the Contributor License Agreement.
You should also check out our contributing guide.
Send us PRs, we would love to see what you do! You can also check our GH issues for things labeled "help-wanted" as a good place to start. We're sometimes bad about keeping that label up-to-date, so if you don't see any, just let us know.
Join Us
WE'RE HIRING! Love Docker, Go and distributed systems? Learn more about our open positions
Usage Metrics
Pachyderm automatically reports anonymized usage metrics. These metrics help us understand how people are using Pachyderm and make it better. They can be disabled by setting the env variable METRICS
to false
in the pachd container.
License Information
Pachyderm has moved some components of Pachyderm Platform to a source-available limited license.
We remain committed to the culture of open source, developing our product transparently and collaboratively with our community, and giving our community and customers source code access and the ability to study and change the software to suit their needs.
Under the Pachyderm Community License, you can access the source code and modify or redistribute it; there is only one thing you cannot do, and that is use it to make a competing offering.
Check out our License FAQ Page for more information.