- Add reference documentation links to classes like transforms and trainers. (i.e. LightGBM)
- Include parameter names in method calls. (i.e.
mlContext.Data.TrainTestSplit(data,testFraction: 0.2)
)
- Use real data for examples. It makes it easier to understand the problem that's being solved opposed to randomly generated data.
- Watch for code comments. Instead of embedding them in the code, promote them to text in a Markdown cell.
- Put related code together. Break up cells containing large chunks of code and add Markdown cells explaining what each of the cells is doing.
Example
Original
var context =new MLContext(seed: 1);
var pipeline = context.Transforms.Concatenate("Features", "X")
.Append(context.Auto().Regression("y", useLbfgs: false, useSdca: false, useFastForest: false));
var monitor = new NotebookMonitor();
var experiment = context.Auto().CreateExperiment();
experiment.SetPipeline(pipeline)
.SetEvaluateMetric(RegressionMetric.RootMeanSquaredError, "y")
.SetTrainingTimeInSeconds(30)
.SetDataset(trainTestSplit.TrainSet, trainTestSplit.TestSet)
.SetMonitor(monitor);
// Configure Visualizer
monitor.SetUpdate(monitor.Display());
var res = await experiment.RunAsync();
Update
Initialize MLContext
MLContext
is the starting point for all ML.NET applications.
var context =new MLContext(seed: 1);
Define training pipeline
Concatenate
: Takes the input column X and creates a feature vector in the Features column.
Regression
: Defines the task AutoML needs to find the best algorithm and hyperparameters for. In this case, Lbfgs, Sdca, and FastForest algorithms won't be explored since their respective parameters are set to false
.
var pipeline = context.Transforms.Concatenate("Features", "X")
.Append(context.Auto().Regression("y", useLbfgs: false, useSdca: false, useFastForest: false));
Initialize Monitor
The notebook monitor provides visualizations of the training progress as AutoML tries to find the best model for your data.
var monitor = new NotebookMonitor();
Initialize AutoML Experiment
An AutoML experiment is a collection of trials in which algorithms are explored.
var experiment = context.Auto().CreateExperiment();
Configure AutoML Experiment
The AutoML experiment tries to find the best algorithm using an evaluation metric. In this case, the evaluation metric selected is Root Mean Squared Error. The goal is to find the optimal evaluation metric in the provided training time which is set to 30 seconds. The longer you train, the more algorithms and hyperparameters AutoML is able to explore. The training set is the dataset that AutoML uses to train the model and the test set is used to calculate the evaluation metric to see how well a particular model selected by AutoML performs.
experiment.SetPipeline(pipeline)
.SetEvaluateMetric(RegressionMetric.RootMeanSquaredError, "y")
.SetTrainingTimeInSeconds(30)
.SetDataset(trainTestSplit.TrainSet, trainTestSplit.TestSet)
.SetMonitor(monitor);
Set monitor to display
monitor.SetUpdate(monitor.Display());
Run AutoML experiment
var res = await experiment.RunAsync();
- NotebookMonitor: Display evaluation metric for best trial, active trial, and y-axis on graph.
- When adding feeds, add link to document on how to reference them in VS / dotnet CLI
- When installing NuGet packages that are not part of the BCL, list them in a Markdown cell where the packages are installed, and add a link to NuGet. (i.e. Microsoft.ML).