Demonstrate the breadth and depth of your data science skills by earning all of the Databricks Data Scientist credentials

Trung-Duy Nguyen

Last update: Nov 1, 2022

Related tags

Data Analysis databricks-data-scientist-learning-path

Overview

Data Scientist Learning Plan

Demonstrate the breadth and depth of your data science skills by earning all of the Databricks Data Scientist credentials.

This learning path consists of several series of self-paced (E-Learning) courses and paid instructor-led courses. If you are interested in ILT, please be sure to search the course catalog for more information.

Learning Plan Structure

What is the Databricks Lakehouse Platform?

This course (formerly Fundamentals of the Databricks Lakehouse Platform) is designed for everyone who is brand new to the Platform and wants to learn more about what it is, why it was developed, what it does, and the components that make it up.

Our goal is that by the time you finish this course, you’ll have a better understanding of the Platform in general and be able to answer questions like: What is Databricks? Where does Databricks fit into my workflow? How have other customers been successful with Databricks?

Learning objectives
- Describe what the Databricks Lakehouse Platform is.
- Explain the origins of the Lakehouse data management paradigm.
- Outline fundamental problems that cause most enterprises to struggle with managing and making use of their data.
- Identify the most popular components of the Databricks Lakehouse - Platform used by data practitioners, depending on their unique role.
- Give examples of organizations that have used the Databricks Lakehouse Platform to streamline big data processing and analytics.
What is Delta Lake?

Today, many organizations struggle with achieving successful big data and artificial intelligence (AI) projects. One of the biggest challenges they face is ensuring that quality, reliable data is available to data practitioners running these projects. After all, an organization that does not have reliable data will not succeed with AI. To help organizations bring structure, reliability, and performance to their data lakes, Databricks created Delta Lake.

Delta Lake is an open format storage layer that sits on top of your organization’s data lake. It is the foundation of a cost-effective, highly scalable Lakehouse and is an integral part of the Databricks Lakehouse Platform.

In this course (formerly Fundamentals of Delta Lake), we’ll break down the basics behind Delta Lake - what it does, how it works, and why it is valuable from a business perspective, to any organization with big data and AI projects.

Learning objectives
- Describe how Delta Lake fits into the Databricks Lakehouse Platform.
- Explain the four elements encompassed by Delta Lake.
- Summarize high-level Delta Lake functionality that helps organizations solve common challenges related to enterprise-scale data analytics.
- Articulate examples of how organizations have employed Delta Lake on Databricks to improve business outcomes.
What is Databricks SQL?

Databricks SQL offers SQL users a platform for querying, analyzing, and visualizing data. This course (formerly Fundamentals of Databricks SQL) guides users through the interface and demonstrates many of the tools and features available in the Databricks SQL interface.

Learning objectives
- Describe the basics of the Databricks SQL service.
- Describe the benefits of using Databricks SQL to perform data analyses.
- Describe how to complete a basic query, visualization, and dashboard workflow using Databricks SQL.
What is Databricks Machine Learning?

Databricks Machine Learning offers data scientists and other machine learning practitioners a platform for completing and managing the end-to-end machine learning lifecycle. This course (formerly Fundamentals of Databricks Machine Learning) guides business leaders and practitioners through a basic overview of Databricks Machine Learning, the benefits of using Databricks Machine Learning, its fundamental components and functionalities, and examples of successful customer use.

Learning objectives
- Describe the basic overview of Databricks Machine Learning.
- Identify how using Databricks Machine Learning benefits data science and machine learning teams.
- Summarize the fundamental components and functionalities of Databricks Machine Learning.
- Exemplify successful use cases of Databricks Machine Learning by real Databricks customers.
Fundamentals of the Databricks Lakehouse Platform Accreditation
Apache Spark Programming with Databricks
Certification Overview Course for the Databricks Certified Associate Developer for Apache Spark Exam
Getting Started with Databricks Machine Learning
Scaling Machine Learning Pipelines

You might also like...

Lale is a Python library for semi-automated data science.

Lale is a Python library for semi-automated data science. Lale makes it easy to automatically select algorithms and tune hyperparameters of pipelines that are compatible with scikit-learn, in a type-safe fashion.

293 Dec 29, 2022

Data Science Environment Setup in single line

datascienv is package that helps your to setup your environment in single line of code with all dependency and it is also include pyforest that provide single line of import all required ml libraries

55 Dec 16, 2022

Open source platform for Data Science Management automation

Hydrosphere examples This repo contains demo scenarios and pre-trained models to show Hydrosphere capabilities. Data and artifacts management Some mod

6 Aug 10, 2021

MS in Data Science capstone project. Studying attacks on autonomous vehicles.

Surveying Attack Models for CAVs Guide to Installing CARLA and Collecting Data Our project focuses on surveying attack models for Connveced Autonomous

1 Dec 9, 2021

A Streamlit web-app for a data-science project that aims to evaluate if the answer to a question is helpful.

How useful is the aswer? A Streamlit web-app for a data-science project that aims to evaluate if the answer to a question is helpful. If you want to l

1 Dec 17, 2021

2019 Data Science Bowl

Kaggle-2019-Data-Science-Bowl-Solution - Here i present my solution to kaggle 2019 data science bowl and how i improved it to win a silver medal in that competition.

1 Jan 1, 2022

Python script to automate the plotting and analysis of percentage depth dose and dose profile simulations in TOPAS.

topas-create-graphs A script to automatically plot the results of a topas simulation Works for percentage depth dose (pdd) and dose profiles (dp). Dep

10 Dec 8, 2022

In this tutorial, raster models of soil depth and soil water holding capacity for the United States will be sampled at random geographic coordinates within the state of Colorado.

Raster_Sampling_Demo (Resulting graph of this demo) Background Sampling values of a raster at specific geographic coordinates can be done with a numbe

2 Dec 13, 2022

APIlocal_dbAWS_RDS - Disclaimer! All data used is for educational purposes only.

APIlocal_dbAWS_RDS Disclaimer! All data used is for educational purposes only. ETL pipeline diagram. Aim of project By creating a fully working pipe

0 Apr 25, 2022

Demonstrate the breadth and depth of your data science skills by earning all of the Databricks Data Scientist credentials

Related tags

Overview

Data Scientist Learning Plan

Learning Plan Structure

You might also like...

Lale is a Python library for semi-automated data science.

Data Science Environment Setup in single line

Open source platform for Data Science Management automation

MS in Data Science capstone project. Studying attacks on autonomous vehicles.

A Streamlit web-app for a data-science project that aims to evaluate if the answer to a question is helpful.

2019 Data Science Bowl

Python script to automate the plotting and analysis of percentage depth dose and dose profile simulations in TOPAS.

In this tutorial, raster models of soil depth and soil water holding capacity for the United States will be sampled at random geographic coordinates within the state of Colorado.

APIlocal_dbAWS_RDS - Disclaimer! All data used is for educational purposes only.

Owner

Trung-Duy Nguyen

Data Scientist in Simple Stock Analysis of PT Bukalapak.com Tbk for Long Term Investment

Utilize data analytics skills to solve real-world business problems using Humana’s big data

Demonstrate a Dataflow pipeline that saves data from an API into BigQuery table

A Pythonic introduction to methods for scaling your data science and machine learning work to larger datasets and larger models, using the tools and APIs you know and love from the PyData stack (such as numpy, pandas, and scikit-learn).

Improving your data science workflows with

Tuplex is a parallel big data processing framework that runs data science pipelines written in Python at the speed of compiled code

A lightweight, hub-and-spoke dashboard for multi-account Data Science projects

Driver Analysis with Factors and Forests: An Automated Data Science Tool using Python

Using Data Science with Machine Learning techniques (ETL pipeline and ML pipeline) to classify received messages after disasters.

Orchest is a browser based IDE for Data Science.