This repository contains answers of the Shopify Summer 2022 Data Science Intern Challenge.

Overview

Data-Science-Intern-Challenge

This repository contains answers of the Shopify Summer 2022 Data Science Intern Challenge.

Summer 2022 Data Science Intern Challenge

Please complete the following questions, and provide your thought process/work. You can attach your work in a text file, link, etc. on the application page. Please ensure answers are easily visible for reviewers!

Question 1: Given some sample data, write a program to answer the following: click here to access the required data set

On Shopify, we have exactly 100 sneaker shops, and each of these shops sells only one model of shoe. We want to do some analysis of the average order value (AOV). When we look at orders data over a 30 day window, we naively calculate an AOV of $3145.13. Given that we know these shops are selling sneakers, a relatively affordable item, something seems wrong with our analysis.

  1. Think about what could be going wrong with our calculation. Think about a better way to evaluate this data.

Answer: The wrong average was calculated using this method: total of all order values/ number of order_values. This is wrong because the formula didn't consider the fact that an order can have multiple items. I have tried to explain the problem with code. Click Here to view it.

  1. What metric would you report for this dataset?

Answer: The correct approach would be to divide the total of all order_values by the sum of total_items. By following this method, we would consider the fact that an order can have multiple items.

  1. What is its value?

Answer: $357.92

Question 2: For this question you’ll need to use SQL. Follow this link to access the data set required for the challenge. Please use queries to answer the following questions. Paste your queries along with your final numerical answers below.

  1. How many orders were shipped by Speedy Express in total?

Answer: 54

  1. What is the last name of the employee with the most orders?

Answer: Peacock

  1. What product was ordered the most by customers in Germany?

Answer: Boston Crab Meat. This product was ordered 160 times in total.

Click here to check the sql queries.

You might also like...
Providing the solutions for high-frequency trading (HFT) strategies using data science approaches (Machine Learning) on Full Orderbook Tick Data.
Providing the solutions for high-frequency trading (HFT) strategies using data science approaches (Machine Learning) on Full Orderbook Tick Data.

Modeling High-Frequency Limit Order Book Dynamics Using Machine Learning Framework to capture the dynamics of high-frequency limit order books. Overvi

An image base contains 490 images for learning (400 cars and 90 boats), and another 21 images for testingAn image base contains 490 images for learning (400 cars and 90 boats), and another 21 images for testing
An image base contains 490 images for learning (400 cars and 90 boats), and another 21 images for testingAn image base contains 490 images for learning (400 cars and 90 boats), and another 21 images for testing

SVM Données Une base d’images contient 490 images pour l’apprentissage (400 voitures et 90 bateaux), et encore 21 images pour fait des tests. Prétrait

This repository contains the source code and data for reproducing results of Deep Continuous Clustering paper
This repository contains the source code and data for reproducing results of Deep Continuous Clustering paper

Deep Continuous Clustering Introduction This is a Pytorch implementation of the DCC algorithms presented in the following paper (paper): Sohil Atul Sh

This repository contains the data and code for the paper
This repository contains the data and code for the paper "Diverse Text Generation via Variational Encoder-Decoder Models with Gaussian Process Priors" (SPNLP@ACL2022)

GP-VAE This repository provides datasets and code for preprocessing, training and testing models for the paper: Diverse Text Generation via Variationa

The 7th edition of NTIRE: New Trends in Image Restoration and Enhancement workshop will be held on June 2022 in conjunction with CVPR 2022.
The 7th edition of NTIRE: New Trends in Image Restoration and Enhancement workshop will be held on June 2022 in conjunction with CVPR 2022.

NTIRE 2022 - Image Inpainting Challenge Important dates 2022.02.01: Release of train data (input and output images) and validation data (only input) 2

[CVPR 2022] CoTTA Code for our CVPR 2022 paper Continual Test-Time Domain Adaptation

CoTTA Code for our CVPR 2022 paper Continual Test-Time Domain Adaptation Prerequisite Please create and activate the following conda envrionment. To r

UIUCTF 2021 Public Challenge Repository

UIUCTF-2021-Public UIUCTF 2021 Public Challenge Repository Notes: every challenge folder contains a challenge.yml file in the format for ctfcli, CTFd'

A Peer-to-peer Platform for Secure, Privacy-preserving, Decentralized Data Science
A Peer-to-peer Platform for Secure, Privacy-preserving, Decentralized Data Science

PyGrid is a peer-to-peer network of data owners and data scientists who can collectively train AI models using PySyft. PyGrid is also the central serv

Owner
null
"MST++: Multi-stage Spectral-wise Transformer for Efficient Spectral Reconstruction" (CVPRW 2022) & (Winner of NTIRE 2022 Challenge on Spectral Reconstruction from RGB)

MST++: Multi-stage Spectral-wise Transformer for Efficient Spectral Reconstruction (CVPRW 2022) Yuanhao Cai, Jing Lin, Zudi Lin, Haoqian Wang, Yulun Z

Yuanhao Cai 274 Jan 5, 2023
Imposter-detector-2022 - HackED 2022 Team 3IQ - 2022 Imposter Detector

HackED 2022 Team 3IQ - 2022 Imposter Detector By Aneeljyot Alagh, Curtis Kan, Jo

Joshua Ji 3 Aug 20, 2022
Datasets accompanying the paper ConditionalQA: A Complex Reading Comprehension Dataset with Conditional Answers.

ConditionalQA Datasets accompanying the paper ConditionalQA: A Complex Reading Comprehension Dataset with Conditional Answers. Disclaimer This dataset

null 2 Oct 14, 2021
ManiSkill-Learn is a framework for training agents on SAPIEN Open-Source Manipulation Skill Challenge (ManiSkill Challenge), a large-scale learning-from-demonstrations benchmark for object manipulation.

ManiSkill-Learn ManiSkill-Learn is a framework for training agents on SAPIEN Open-Source Manipulation Skill Challenge, a large-scale learning-from-dem

Hao Su's Lab, UCSD 48 Dec 30, 2022
This repository contains a pytorch implementation of "HeadNeRF: A Real-time NeRF-based Parametric Head Model (CVPR 2022)".

HeadNeRF: A Real-time NeRF-based Parametric Head Model This repository contains a pytorch implementation of "HeadNeRF: A Real-time NeRF-based Parametr

null 294 Jan 1, 2023
This repo contains the official code of our work SAM-SLR which won the CVPR 2021 Challenge on Large Scale Signer Independent Isolated Sign Language Recognition.

Skeleton Aware Multi-modal Sign Language Recognition By Songyao Jiang, Bin Sun, Lichen Wang, Yue Bai, Kunpeng Li and Yun Fu. Smile Lab @ Northeastern

Isen (Songyao Jiang) 128 Dec 8, 2022
Simulator for FRC 2022 challenge: Rapid React

rrsim Simulator for FRC 2022 challenge: Rapid React out-1.mp4 Usage In order to run the simulator use the following: python3 rrsim.py [config_path] wh

null 1 Jan 18, 2022
Nightmare-Writeup - Writeup for the Nightmare CTF Challenge from 2022 DiceCTF

Nightmare: One Byte to ROP // Alternate Solution TLDR: One byte write, no leak.

null 1 Feb 17, 2022
4st place solution for the PBVS 2022 Multi-modal Aerial View Object Classification Challenge - Track 1 (SAR) at PBVS2022

A Two-Stage Shake-Shake Network for Long-tailed Recognition of SAR Aerial View Objects 4st place solution for the PBVS 2022 Multi-modal Aerial View Ob

LinpengPan 5 Nov 9, 2022
Lowest memory consumption and second shortest runtime in NTIRE 2022 challenge on Efficient Super-Resolution

FMEN Lowest memory consumption and second shortest runtime in NTIRE 2022 on Efficient Super-Resolution. Our paper: Fast and Memory-Efficient Network T

null 33 Dec 1, 2022