Greppin' Logs: Leveling Up Log Analysis
Overview
This repo contains sample code and example datasets from Jon Stewart and Noah Rubin's presentation at the 2021 SANS DFIR Summit titled Greppin' Logs. The talk was centered around the idea that Forensics is Data Engineering and Data Science, and should be approached as such. Jon and Noah focused on the core (Unix) command line tools useful to anyone analyzing datasets from a terminal, purpose-built tools for handling structured tabular and JSON data, Stroz Friedberg's open source multipattern search tool Lightgrep, and scaling with AWS.
Repository Contents
Command Line Examples
The command-line directory contains shell scripts (.sh
files) with the commands from each CLI tool example from the presentation, as well as a Dockerfile containing the tools used in the presentation (including Lightgrep). To build the Docker image with the tag greppin-logs:latest
, make sure Docker is installed and run the following command from the root of the repo:
docker build -f command-line/Dockerfile -t greppin-logs:latest .
We've also included in the Docker image a Python virtual environment containing the foundational Python data science libraries (numpy, scipy, pandas, etc.), an installation of R and the Tidyverse packages, as well as the command line plotting tool Rush. Links to the documentation for each tool are present in comments in the Dockerfile. To run the Docker container and test out the tools with the sample datasets, run the following in root of the repo after building the image above:
docker run --rm --name greppin-logs-playground -v "$(pwd)/datasets":/workspaces/examples/datasets/ -it --entrypoint bash greppin-logs:latest
Datasets
The datasets directory contains some of the example datasets used in the presentation:
- employees.csv: Fake employees names, email addresses, and employment status keyed by
id
. - salaries.csv: Fake employee salaries keyed by
id
. - cloudtrail-log.gz: AWS CloudTrail sample log record.
Template AWS CDK App
The aws-lambda directory contains a template AWS CDK app and lambda function for processing files uploaded to an S3 bucket. See the README in that directory for more information on how to modify the Lambda code and deploy the stack to AWS.