A Pythonic framework for threat modeling

Izar Tarandach

Last update: Dec 20, 2022

Related tags

Machine Learning diagram threat dataflow threat-modeling hacktoberfest threats sequence-diagram secure-development data-flow-diagram dfd pythonic-framework threat-modeling-from-code hacktoberfest-accepted hacktoberfest2021

Overview

pytm: A Pythonic framework for threat modeling

Introduction

Traditional threat modeling too often comes late to the party, or sometimes not at all. In addition, creating manual data flows and reports can be extremely time-consuming. The goal of pytm is to shift threat modeling to the left, making threat modeling more automated and developer-centric.

Features

Based on your input and definition of the architectural design, pytm can automatically generate the following items:

Data Flow Diagram (DFD)
Sequence Diagram
Relevant threats to your system

Requirements

Linux/MacOS
Python 3.x
Graphviz package
Java (OpenJDK 10 or 11)
plantuml.jar

Getting Started

The tm.py is an example model. You can run it to generate the report and diagram image files that it references:

mkdir -p tm
./tm.py --report docs/basic_template.md | pandoc -f markdown -t html > tm/report.html
./tm.py --dfd | dot -Tpng -o tm/dfd.png
./tm.py --seq | java -Djava.awt.headless=true -jar $PLANTUML_PATH -tpng -pipe > tm/seq.png

There's also an example Makefile that wraps all these into targets that can be easily shared for multiple models. If you have GNU make installed (available by default on Linux distros but not on OSX), simply run:

make

To avoid installing all the dependencies, like pandoc or Java, the script can be run inside a container:

# do this only once
export USE_DOCKER=true
make image

# call this after every change in your model
make

Usage

All available arguments:

usage: tm.py [-h] [--sqldump SQLDUMP] [--debug] [--dfd] [--report REPORT]
             [--exclude EXCLUDE] [--seq] [--list] [--describe DESCRIBE]
             [--list-elements] [--json JSON] [--levels LEVELS [LEVELS ...]]
             [--stale_days STALE_DAYS]

optional arguments:
  -h, --help            show this help message and exit
  --sqldump SQLDUMP     dumps all threat model elements and findings into the
                        named sqlite file (erased if exists)
  --debug               print debug messages
  --dfd                 output DFD
  --report REPORT       output report using the named template file (sample
                        template file is under docs/template.md)
  --exclude EXCLUDE     specify threat IDs to be ignored
  --seq                 output sequential diagram
  --list                list all available threats
  --describe DESCRIBE   describe the properties available for a given element
  --list-elements       list all elements which can be part of a threat model
  --json JSON           output a JSON file
  --levels LEVELS [LEVELS ...]
                        Select levels to be drawn in the threat model (int
                        separated by comma).
  --stale_days STALE_DAYS
                        checks if the delta between the TM script and the code
                        described by it is bigger than the specified value in
                        days

The stale_days argument tries to determine how far apart in days the model script (which you are writing) is from the code that implements the system being modeled. Ideally, they should be pretty close in most cases of an actively developed system. You can run this periodically to measure the pulse of your project and the 'freshness' of your threat model.

Currently available elements are: TM, Element, Server, ExternalEntity, Datastore, Actor, Process, SetOfProcesses, Dataflow, Boundary and Lambda.

The available properties of an element can be listed by using --describe followed by the name of an element:


(pytm) ➜  pytm git:(master) ✗ ./tm.py --describe Element
Element class attributes:
  OS
  definesConnectionTimeout        default: False
  description
  handlesResources                default: False
  implementsAuthenticationScheme  default: False
  implementsNonce                 default: False
  inBoundary
  inScope                         Is the element in scope of the threat model, default: True
  isAdmin                         default: False
  isHardened                      default: False
  name                            required
  onAWS                           default: False

Creating a Threat Model

The following is a sample tm.py file that describes a simple application where a User logs into the application and posts comments on the app. The app server stores those comments into the database. There is an AWS Lambda that periodically cleans the Database.

#!/usr/bin/env python3

from pytm.pytm import TM, Server, Datastore, Dataflow, Boundary, Actor, Lambda, Data, Classification

tm = TM("my test tm")
tm.description = "another test tm"
tm.isOrdered = True

User_Web = Boundary("User/Web")
Web_DB = Boundary("Web/DB")

user = Actor("User")
user.inBoundary = User_Web

web = Server("Web Server")
web.OS = "CloudOS"
web.isHardened = True
web.sourceCode = "server/web.cc"

db = Datastore("SQL Database (*)")
db.OS = "CentOS"
db.isHardened = False
db.inBoundary = Web_DB
db.isSql = True
db.inScope = False
db.sourceCode = "model/schema.sql"

my_lambda = Lambda("cleanDBevery6hours")
my_lambda.hasAccessControl = True
my_lambda.inBoundary = Web_DB

my_lambda_to_db = Dataflow(my_lambda, db, "(λ)Periodically cleans DB")
my_lambda_to_db.protocol = "SQL"
my_lambda_to_db.dstPort = 3306

user_to_web = Dataflow(user, web, "User enters comments (*)")
user_to_web.protocol = "HTTP"
user_to_web.dstPort = 80
user_to_web.data = Data('Comments in HTML or Markdown', classification=Classification.PUBLIC)

web_to_user = Dataflow(web, user, "Comments saved (*)")
web_to_user.protocol = "HTTP"

web_to_db = Dataflow(web, db, "Insert query with comments")
web_to_db.protocol = "MySQL"
web_to_db.dstPort = 3306

db_to_web = Dataflow(db, web, "Comments contents")
db_to_web.protocol = "MySQL"
# this is a BAD way of defining a data object, here for a demo on how it
# will appear on the sample report. Use Data objects.
db_to_web.data = 'Results of insert op'


tm.process()

Generating Diagrams

Diagrams are output as Dot and PlantUML.

When --dfd argument is passed to the above tm.py file it generates output to stdout, which is fed to Graphviz's dot to generate the Data Flow Diagram:

tm.py --dfd | dot -Tpng -o sample.png

Generates this diagram:

Adding ".levels = [1,2]" attributes to an element will cause it (and its associated Dataflows if both flow endings are in the same DFD level) to render (or not) depending on the command argument "--levels 1 2".

The following command generates a Sequence diagram.

tm.py --seq | java -Djava.awt.headless=true -jar plantuml.jar -tpng -pipe > seq.png

Generates this diagram:

Creating a Report

The diagrams and findings can be included in the template to create a final report:

tm.py --report docs/basic_template.md | pandoc -f markdown -t html > report.html

The templating format used in the report template is very simple:


# Threat Model Sample
***

## System Description

{tm.description}

## Dataflow Diagram

![Level 0 DFD](dfd.png)

## Dataflows

Name|From|To |Data|Protocol|Port
----|----|---|----|--------|----
{dataflows:repeat:{{item.name}}|{{item.source.name}}|{{item.sink.name}}|{{item.data}}|{{item.protocol}}|{{item.dstPort}}
}

## Findings

{findings:repeat:* {{item.description}} on element "{{item.target}}"
}

To group findings by elements, use a more advanced, nested loop:

## Findings

{elements:repeat:{{item.findings:if:
### {{item.name}}

{{item.findings:repeat:
**Threat**: {{{{item.id}}}} - {{{{item.description}}}}

**Severity**: {{{{item.severity}}}}

**Mitigations**: {{{{item.mitigations}}}}

**References**: {{{{item.references}}}}

}}}}}

All items inside a loop must be escaped, doubling the braces, so {item.name} becomes {{item.name}}. The example above uses two nested loops, so items in the inner loop must be escaped twice, that's why they're using four braces.

Overrides

You can override attributes of findings (threats matching the model assets and/or dataflows), for example to set a custom CVSS score and/or response text:

user_to_web = Dataflow(user, web, "User enters comments (*)", protocol="HTTP", dstPort="80")
user_to_web.overrides = [
    Finding(
        # Overflow Buffers
        id="INP02",
        CVSS="9.3",
        response="""**To Mitigate**: run a memory sanitizer to validate the binary""",
    )
]

Threats database

For the security practitioner, you may supply your own threats file by setting TM.threatsFile. It should contain entries like:

{
   "SID":"INP01",
   "target": ["Lambda","Process"],
   "description": "Buffer Overflow via Environment Variables",
   "details": "This attack pattern involves causing a buffer overflow through manipulation of environment variables. Once the attacker finds that they can modify an environment variable, they may try to overflow associated buffers. This attack leverages implicit trust often placed in environment variables.",
   "Likelihood Of Attack": "High",
   "severity": "High",
   "condition": "target.usesEnvironmentVariables is True and target.controls.sanitizesInput is False and target.controls.checksInputBounds is False",
   "prerequisites": "The application uses environment variables.An environment variable exposed to the user is vulnerable to a buffer overflow.The vulnerable environment variable uses untrusted data.Tainted data used in the environment variables is not properly validated. For instance boundary checking is not done before copying the input data to a buffer.",
   "mitigations": "Do not expose environment variable to the user.Do not use untrusted data in your environment variables. Use a language or compiler that performs automatic bounds checking. There are tools such as Sharefuzz [R.10.3] which is an environment variable fuzzer for Unix that support loading a shared library. You can use Sharefuzz to determine if you are exposing an environment variable vulnerable to buffer overflow.",
   "example": "Attack Example: Buffer Overflow in $HOME A buffer overflow in sccw allows local users to gain root access via the $HOME environmental variable. Attack Example: Buffer Overflow in TERM A buffer overflow in the rlogin program involves its consumption of the TERM environmental variable.",
   "references": "https://capec.mitre.org/data/definitions/10.html, CVE-1999-0906, CVE-1999-0046, http://cwe.mitre.org/data/definitions/120.html, http://cwe.mitre.org/data/definitions/119.html, http://cwe.mitre.org/data/definitions/680.html"
 }

The target field lists classes of model elements to match this threat against. Those can be assets, like: Actor, Datastore, Server, Process, SetOfProcesses, ExternalEntity, Lambda or Element, which is the base class and matches any. It can also be a Dataflow that connects two assets.

All other fields (except condition) are available for display and can be used in the template to list findings in the final report.

WARNING

The threats.json file contains strings that run through eval(). Make sure the file has correct permissions or risk having an attacker change the strings and cause you to run code on their behalf.

The logic lives in the condition, where members of target can be logically evaluated. Returning a true means the rule generates a finding, otherwise, it is not a finding. Condition may compare attributes of target and/or control attributes of the 'target.control' and also call one of these methods:

target.oneOf(class, ...) where class is one or more: Actor, Datastore, Server, Process, SetOfProcesses, ExternalEntity, Lambda or Dataflow,
target.crosses(Boundary),
target.enters(Boundary),
target.exits(Boundary),
target.inside(Boundary).

If target is a Dataflow, remember you can access target.source and/or target.sink along with other attributes.

Conditions on assets can analyze all incoming and outgoing Dataflows by inspecting the target.input and target.output attributes. For example, to match a threat only against servers with incoming traffic, use any(target.inputs). A more advanced example, matching elements connecting to SQL datastores, would be any(f.sink.oneOf(Datastore) and f.sink.isSQL for f in target.outputs).

Currently supported threats

INP01 - Buffer Overflow via Environment Variables
INP02 - Overflow Buffers
INP03 - Server Side Include (SSI) Injection
CR01 - Session Sidejacking
INP04 - HTTP Request Splitting
CR02 - Cross Site Tracing
INP05 - Command Line Execution through SQL Injection
INP06 - SQL Injection through SOAP Parameter Tampering
SC01 - JSON Hijacking (aka JavaScript Hijacking)
LB01 - API Manipulation
AA01 - Authentication Abuse/ByPass
DS01 - Excavation
DE01 - Interception
DE02 - Double Encoding
API01 - Exploit Test APIs
AC01 - Privilege Abuse
INP07 - Buffer Manipulation
AC02 - Shared Data Manipulation
DO01 - Flooding
HA01 - Path Traversal
AC03 - Subverting Environment Variable Values
DO02 - Excessive Allocation
DS02 - Try All Common Switches
INP08 - Format String Injection
INP09 - LDAP Injection
INP10 - Parameter Injection
INP11 - Relative Path Traversal
INP12 - Client-side Injection-induced Buffer Overflow
AC04 - XML Schema Poisoning
DO03 - XML Ping of the Death
AC05 - Content Spoofing
INP13 - Command Delimiters
INP14 - Input Data Manipulation
DE03 - Sniffing Attacks
CR03 - Dictionary-based Password Attack
API02 - Exploit Script-Based APIs
HA02 - White Box Reverse Engineering
DS03 - Footprinting
AC06 - Using Malicious Files
HA03 - Web Application Fingerprinting
SC02 - XSS Targeting Non-Script Elements
AC07 - Exploiting Incorrectly Configured Access Control Security Levels
INP15 - IMAP/SMTP Command Injection
HA04 - Reverse Engineering
SC03 - Embedding Scripts within Scripts
INP16 - PHP Remote File Inclusion
AA02 - Principal Spoof
CR04 - Session Credential Falsification through Forging
DO04 - XML Entity Expansion
DS04 - XSS Targeting Error Pages
SC04 - XSS Using Alternate Syntax
CR05 - Encryption Brute Forcing
AC08 - Manipulate Registry Information
DS05 - Lifting Sensitive Data Embedded in Cache
SC05 - Removing Important Client Functionality
INP17 - XSS Using MIME Type Mismatch
AA03 - Exploitation of Trusted Credentials
AC09 - Functionality Misuse
INP18 - Fuzzing and observing application log data/errors for application mapping
CR06 - Communication Channel Manipulation
AC10 - Exploiting Incorrectly Configured SSL
CR07 - XML Routing Detour Attacks
AA04 - Exploiting Trust in Client
CR08 - Client-Server Protocol Manipulation
INP19 - XML External Entities Blowup
INP20 - iFrame Overlay
AC11 - Session Credential Falsification through Manipulation
INP21 - DTD Injection
INP22 - XML Attribute Blowup
INP23 - File Content Injection
DO05 - XML Nested Payloads
AC12 - Privilege Escalation
AC13 - Hijacking a privileged process
AC14 - Catching exception throw/signal from privileged block
INP24 - Filter Failure through Buffer Overflow
INP25 - Resource Injection
INP26 - Code Injection
INP27 - XSS Targeting HTML Attributes
INP28 - XSS Targeting URI Placeholders
INP29 - XSS Using Doubled Characters
INP30 - XSS Using Invalid Characters
INP31 - Command Injection
INP32 - XML Injection
INP33 - Remote Code Inclusion
INP34 - SOAP Array Overflow
INP35 - Leverage Alternate Encoding
DE04 - Audit Log Manipulation
AC15 - Schema Poisoning
INP36 - HTTP Response Smuggling
INP37 - HTTP Request Smuggling
INP38 - DOM-Based XSS
AC16 - Session Credential Falsification through Prediction
INP39 - Reflected XSS
INP40 - Stored XSS
AC17 - Session Hijacking - ServerSide
AC18 - Session Hijacking - ClientSide
INP41 - Argument Injection
AC19 - Reusing Session IDs (aka Session Replay) - ServerSide
AC20 - Reusing Session IDs (aka Session Replay) - ClientSide
AC21 - Cross Site Request Forgery

Comments

Remove usage of global attributes

Using global attributes when building models makes it really hard to share assets between different models. Multiple models have to be stored in separate sources, cannot be processed together and unused assets are still drawn in diagrams.

This MR removes all usages of global/static attributes and requires to explicitly pass elements to a model. Note that only Dataflows have to be passed. See the new tests added for difference in usage.

Also note that previously, the sequence diagram relied on the order in which assets were defined (not dataflows). Now dataflows are used to order assets. It's still not consistent with being able to manually order dataflows but this can be improved in future MRs.

opened by nineinchnick 18
add categories to threats

Add a categories field to threats so custom threat libraries can use any threat taxonomy.

Formatted the threats.json file using jq so it has consistent identation.

opened by nineinchnick 17
delete unused attributes
The following attributes are not used in any threat. Using them in models can create a false sense of security where no new findings would be reported.

authenticatedWith

authenticationScheme

codeType

handlesCrashes

handlesInterruptions

hasWriteAccess

implementsCommunicationProtocol

isAdmin

isSQL

onAWS

onRDS

OS

providesConfidentiality

storesLogData

storesPII

storesSensitiveData

tracksExecutionFlow

Its pretty easy for anyone to extend pytm classes and add these back when referencing them in custom threats. I believe that extending pytm this way should be encouraged.

This is controversial but it's easier to remove unused attributes than try to figure out correct description for them. Less is more.
opened by nineinchnick 16
TemplateEngine improvements, updated template.md

I updated the template to showcase #150 and #154. Also expanded template_engine to support the call operator with methods that return a list and support to call methods within a report utility class. Updated template.md with examples of each.

Updated tests and ensured tests passed. Also updated report test to write output_current.md At some point tests could use this but for now can be used to quickly commit the new expected value when the report changes. Added output_current.mdto.gitignore`

I also added an example in the template.md to get the parents of each Boundary, to see this I updated tm.py with a hierarchy of Boundaries. If we think this is too much for the intro tm.py I can back those changes out.

opened by nozmore 14
Adding uniqueId to get static references for findings
In my use case, I want to synchronize the findings with an external risk system. To be able to do so, I added these features -

Each element can be provided with a uniqueId. The idea is that these are specified when writing the model and thus static.

Findings will get a uniqueId which is a combination of the uniqueId and the findings's ID. In this way, these are also static and can be used as the primary key to the external risk system.

To make communication easier, the Element's uniqueId can be added to the name and thus become visible on the diagrams. This enables smoother dialog when talking about the output as talking about ABC becomes very precise.

I apologize upfront for these things -

I only changed the pytm.py file to now. Any changes to README.md etc, I'll await your first feedback.

I spend considerably time trying to get the value of the TM varStrings in the Element class. I could not find a way, so I added a global variable hack.

I did not add any tests. Any advice on what tests I should add would be welcome.
opened by per-oestergaard 13
Moved the model context from the TM class into the TM object

To make it possible to import other models into the current model without adding all elements the elements of the model are now stored in the TM object and not the class.

I addition I added some test models to check how they differ between each other and between the current master branch. But the result of the models have to be compared manual, because the UUID and the order often differ even though the resulting report and the diagrams are similar. Maybe it is possible to automate this process more.

Since I can only do manual checking it would be good if someone else checked if this PR is not changing the original behavior.

opened by raphaelahrens 12
Added isSQL and storesSensitiveData to Datastore, corrected typo on i…

…nAWS and updated logging threat to use storesSensitiveData

-Added missing getters and setters for Datastore -Corrected typo on Element, onAWS to inAWS -Changed default value for storesSensitiveData to True -Updated a logging threat to use storesSensitiveData

opened by nozmore 12

Error when executing tm.py

Hi all,

thanks for sharing this nice tool. I just wanted to explore the sample but getting an error

➜  pytm git:(master) ✗ ./tm.py --dfd | dot -Tpng -o sample1.png
2019-03-31 07:24:45.271 dot[27759:12821403] +[__NSCFConstantString length]: unrecognized selector sent to class 0x7fff95b4a8c0
2019-03-31 07:24:45.272 dot[27759:12821403] *** Terminating app due to uncaught exception 'NSInvalidArgumentException', reason: '+[__NSCFConstantString length]: unrecognized selector sent to class 0x7fff95b4a8c0'
*** First throw call stack:
(
	0   CoreFoundation                      0x00007fff3df1743d __exceptionPreprocess + 256
	1   libobjc.A.dylib                     0x00007fff69e25720 objc_exception_throw + 48
	2   CoreFoundation                      0x00007fff3df941a5 __CFExceptionProem + 0
	3   CoreFoundation                      0x00007fff3deb6ad0 ___forwarding___ + 1486
	4   CoreFoundation                      0x00007fff3deb6478 _CF_forwarding_prep_0 + 120
	5   CoreFoundation                      0x00007fff3de47f54 CFStringCompareWithOptionsAndLocale + 72
	6   ImageIO                             0x00007fff409b5367 _ZN17IIO_ReaderHandler15readerForUTTypeEPK10__CFString + 53
	7   ImageIO                             0x00007fff4098d527 _ZN14IIOImageSource14extractOptionsEP13IIODictionary + 183
	8   ImageIO                             0x00007fff409ba2e6 _ZN14IIOImageSourceC2EP14CGDataProviderP13IIODictionary + 72
	9   ImageIO                             0x00007fff409ba1bb CGImageSourceCreateWithDataProvider + 172
	10  libgvplugin_quartz.6.dylib          0x0000000107cfcc54 quartz_loadimage_quartz + 224
	11  libgvc.6.dylib                      0x0000000107c59781 gvloadimage + 269
	12  libgvc.6.dylib                      0x0000000107c587e0 gvrender_usershape + 955
	13  libgvc.6.dylib                      0x0000000107c8662e poly_gencode + 2129
	14  libgvc.6.dylib                      0x0000000107c92b7b emit_node + 1030
	15  libgvc.6.dylib                      0x0000000107c91805 emit_graph + 4769
	16  libgvc.6.dylib                      0x0000000107c96d0d gvRenderJobs + 4911
	17  dot                                 0x0000000107c4fd62 main + 697
	18  libdyld.dylib                       0x00007fff6aef3085 start + 1
)
libc++abi.dylib: terminating with uncaught exception of type NSException
[1]    27758 done       ./tm.py --dfd |
       27759 abort      dot -Tpng -o sample1.png

I installed graphviz via brew, using macOS 10.14 and Python 3.7.3.

opened by sushi2k 11

Added the list-element command

model.py --list-elements shows a list of all elements which can be used in a threat model with pytm.

Why? I often find my self looking up the exact names of the elements and there doc string.

Currently the output looks like this.

Elements:
Actor          -- An entity usually initiating actions
Asset          -- An asset with outgoing or incoming dataflows
Boundary       -- Trust boundary groups elements and data with the same trust level.
Dataflow       -- A data flow from a source to a sink
Datastore      -- An entity storing data
ExternalEntity --
Lambda         -- A lambda function running in a Function-as-a-Service (FaaS) environment
Process        -- An entity processing data
Server         -- An entity processing data
SetOfProcesses --

Atributes:
Action         -- Action taken when validating a threat model.
Classification -- An enumeration.
Data           -- Represents a single piece of data that traverses the system
Lifetime       -- An enumeration.
TLSVersion     -- An enumeration.

opened by raphaelahrens 9

Adding uniqueId and includeOrder

I want to have stable references so I can synchronize the findings with an risk management tool and make the model a living document. To do so, I allow includeOrder to be set on any component (Actor etc.) and when that is set and the order is specified (not -1), the name is changed to contain the order and the findings are containing the order as well.

In this version, there is no validation of whether the order is unique. That is up to the person writing the Python to ensure that.

On finding object's UniqueId: When order is present and includeOrder is true on the object, this will be formatted as findingId:order. E.g. if finding is INP01 and order is 123, the value becomes INP01:123."

On object's includeOrder: If True and Order is set (not -1), the displayed name will be formatted as 'order:name'. If you make Order unique, this will give you a stable reference you can use for synchronization etc.

opened by per-oestergaard 8
Pandoc error when producing example report

Cloned from master today and running the example reports with: ./tm.py --report docs/template.md | pandoc -f markdown -t html > report.html

Produces the following error: pandoc: (TagClose "script") is not a TagOpen CallStack (from HasCallStack): error, called at src/Text/HTML/TagSoup/Type.hs:128:19 in tgsp-0.14.8-2271f385:Text.HTML.TagSoup.Type

I think the error is with pandocs formatting of the following threat: { "SID": "SC04", "target": ["Server"], "description": "XSS Using Alternate Syntax", "details": "An adversary uses alternate forms of keywords or commands that result in the same action as the primary form but which may not be caught by filters. For example, many keywords are processed in a case insensitive manner. If the site's web filtering algorithm does not convert all tags into a consistent case before the comparison with forbidden keywords it is possible to bypass filters (e.g., incomplete black lists) by using an alternate case structure. For example, the script tag using the alternate forms of Script or ScRiPt may bypass filters where script is the only form tested. Other variants using different syntax representations are also possible as well as using pollution meta-characters or entities that are eventually ignored by the rendering engine. The attack can result in the execution of otherwise prohibited functionality.", "Likelihood Of Attack": "High", "severity": "High", "condition": "target.sanitizesInput is False or target.validatesInput is False or target.encodesOutput is False", "prerequisites": "Target client software must allow scripting such as JavaScript.", "mitigations": "Design: Use browser technologies that do not allow client side scripting.Design: Utilize strict type, character, and encoding enforcementImplementation: Ensure all content that is delivered to client is sanitized against an acceptable content specification.Implementation: Ensure all content coming from the client is using the same encoding; if not, the server-side application must canonicalize the data before applying any filtering.Implementation: Perform input validation for all remote content, including remote and user-generated contentImplementation: Perform output validation for all remote content.Implementation: Disable scripting languages such as JavaScript in browserImplementation: Patching software. There are many attack vectors for XSS on the client side and the server side. Many vulnerabilities are fixed in service packs for browser, web servers, and plug in technologies, staying current on patch release that deal with XSS countermeasures mitigates this.", "example": "In this example, the attacker tries to get a script executed by the victim's browser. The target application employs regular expressions to make sure no script is being passed through the application to the web page; such a regular expression could be ((?i)script), and the application would replace all matches by this regex by the empty string. An attacker will then create a special payload to bypass this filter: <scriscriptpt>alert(1)</scscriptript> when the applications gets this input string, it will replace all script (case insensitive) by the empty string and the resulting input will be the desired vector by the attacker. In this example, we assume that the application needs to write a particular string in a client-side JavaScript context (e.g., <script>HERE</script>). For the attacker to execute the same payload as in the previous example, he would need to send alert(1) if there was no filtering. The application makes use of the following regular expression as filter ((w+)s*(.*)|alert|eval|function|document) and replaces all matches by the empty string. For example each occurrence of alert(), eval(), foo() or even the string alert would be stripped. An attacker will then create a special payload to bypass this filter: this['al' + 'ert'](1) when the applications gets this input string, it won't replace anything and this piece of JavaScript has exactly the same runtime meaning as alert(1). The attacker could also have used non-alphanumeric XSS vectors to bypass the filter; for example, ($=[$=[]][(__=!$+$)[_=-~-~-~$]+({}+$)[_/_]+($$=($_=!''+$)[_/_]+$_[+$])])()[__[_/_]+__[_+~$]+$_[_]+$$](_/_) would be executed by the JavaScript engine like alert(1) is.", "references": "https://capec.mitre.org/data/definitions/199.html, http://cwe.mitre.org/data/definitions/87.html" },

Specifically I think it's having issues with the * characters following a script element.

As a file with just this content run through pandoc has the same error: <p>, <script>HERE</script> (w+)s\*(.\*)|</p>

Running on MacOS with pandoc version: pandoc 2.13 Compiled with pandoc-types 1.22, texmath 0.12.2, skylighting 0.10.5, citeproc 0.3.0.9, ipynb 0.1.0.1

I appreciate this might be an issue with pandoc but perhaps there is something we can do to escape characters in the description of the threat.

opened by mikehoyio 8
How are "target" and its relevant "condition" extracted from a particular threat?
I've been looking at the source code and trying to automate the logic extraction from the threat information. I've got a couple of questions:

How was the initial threats file produced? In other words, for "Flooding" in threats.json, how were the attributes for the "target" and "condition" extracted? Can you please put me in the right direction? Is this a personal interpretation of the CAPEC information?

So, for example, the "Flooding" attack is a parent of "TCP Flood," "UDP Flood," and so on. I wonder if one can use the same logic and prefill the "target" and, to some extent fill the "condition" field? I feel they pretty much inherit the same logic/condition, but of course, someone's expert opinion is really appreciated.

There are some attacks whose fields such as "severity", "likelihood of attack", "example", or "reference" are missing in CAPEC. I wonder if it was you, how would you fill them cautiously and correctly?

Thank you.
opened by amrmp 5
How are threats named, e.g., INPXX or AAXX?

I wonder how the acronyms for the threats are made. I can't find a logical relationship between the threat's name and its acronym. Let's take a look at the following example:

DO01 - Flooding

Is there any specific standard for such acronyms?

opened by amrmp 3
Error with data field in input JSON

When using the JSON model format as input to create a report I am getting an error "expecting a list of pytm.Data, item number 0 is a <class 'str'>" (line 213 in the code snippet). https://github.com/izar/pytm/blob/679ea0df19b7b92e7d8359891d53f7ed794d54a3/pytm/pytm.py#L194-L218 My input JSON for data and flows looks like this: "flows": [ { "name": "Actor 1 to Actor 2", "source": "Actor 1", "sink": "Actor 2", "order": 1, "data": [ "Data" ] },{ "name": "Actor 2 to Actor 3", "source": "Actor 2", "sink": "Actor 3", "description": "Another data flow", "data": [ ] } ], "data": [ { "name": "Data", "format": "Text", "isPII": true } ]

Which I believe matches the JSON format when using the JSON output of the CLI tool. Below is the function which leads to calling varData when it creates the Dataflow object. It seems like the varData function doesn't deal with a list of data name strings. input.json in the tests folder doesn't have a data field in it. Since data objects are not in either the boundaries, elements or flows section of the JSON should they be dealt with using their own function e.g. decode_data? https://github.com/izar/pytm/blob/679ea0df19b7b92e7d8359891d53f7ed794d54a3/pytm/json.py#L92-L107

opened by jmrenshaw 3
Documentation missing Controls class

Hi all,

Just noticed that there was no Controls class in the documentation under the docs folder. Not sure if you were aware of it. Just wanted to bring light to it.

opened by jharnois4512 3
How does versioning work?

After gotten a PR merged, I still cannot see the updated version after pip install pytm. This make we wonder how is versioning done and how packages are released? I get v1.2.1 from pip install. I find 1.2.0 in pyproject.toml and 1.2.0 in setup.py. I can also see a tag called v1.2.0.

I realize that I did not add anything to CHANGELOG.md as part of my changes.

opened by per-oestergaard 1

Releases(v1.2.0)

v1.2.0(Apr 30, 2021)
In this release, we are aiming at clearer reports and some more data-oriented facilities.

Breaking changes

Replace usesLatestTLSversion with minTLSVersion in assets and tlsVersion in data flows #123

When the data attribute of elements is initialied with a string, convert it to a Data object with undefined as name and the string as description; change the default classification from PUBLIC to UNKNOWN #148

New features

Separate actors and assets from elements when dumping the model to JSON #150

Add unique Finding ids #154

Allow to associate the threat model script with source code files and check their age difference #145

Adapt the DFD3 notation #143

Allow to override findings (threats) attributes #137

Allow to mark data as PII or credentials and check if it's protected #127

Added '--levels' - every element now has a 'levels' attribute, a list of integers denoting different DFD levels for rendering

Added HTML docs using pdoc #110

Added checksDestinationRevocation attribute to account for certificate revocation checks #109

Bug fixes

Escape HTML entities in Threat attributes #149

Fix generating reports for models with a Datastore that has isEncryptedAtRest set and a Data that has isStored set #141

Fix condition on the Data Leak threat so it does not always match #139

Fixed printing the data attribute in reports #123

Added a markdown file with threats #126

Fixed drawing nested boudnaries #117

Add missing provideIntegrity attribute in Actor and Asset classes #116

Source code(tar.gz)
Source code(zip)
v1.1.2(Sep 24, 2020)
Added Poetry #108

Fix drawing DFDs for nested Boundaries #107

Source code(tar.gz)
Source code(zip)
v1.1.0(Sep 17, 2020)
Breaking changes

Removed HandlesResources attribute from the Process class, which duplicates handlesResources

Change default Dataflow.dstPort attribute value from 10000 to -1

New features

Add dump of elements and findings to sqlite database using "--sqldump " (with result in ./sqldump/) #103

Add Data element and DataLeak finding to support creation of a data dictionary separate from the model #104

Add JSON input #105

Add JSON output #102

Use numbered dataflow labels in sequence diagram #94

Move authenticateDestination to base Element #88

Assign inputs and outputs to all elements #89

Allow detecting and/or hiding duplicate dataflows by setting TM.onDuplicates #100

Ignore unused elements if TM.ignoreUnused is True #84

Assign findings to elements #86

Add description to class attributes #91

New Element methods to be used in threat conditions #82

Provide a Docker image and allow running make targets in a container #87

Dataflow inherits source and/or sink attribute values #79

Merge edges in DFD when TM.mergeResponses is True; allow marking Dataflow as responses #76

Automatic ordering of dataflows when TM.isOrdered is True #66

Loading a custom threats file by setting TM.threatsFile #68

Setting properties on init #67

Wrap long labels in DFDs #65

Bug fixes

Ensure all items have correct color, based on scope #93

Add missing server isResilient property #63

Advanced templates in repeat blocks #81

Produce stable diagrams #79

Allow overriding classes #64

pytm-1.1.0.tar.gz
Source code(tar.gz)
Source code(zip)

Owner

Izar Tarandach

Just that guy.

GitHub

Uplift modeling and causal inference with machine learning algorithms

Disclaimer This project is stable and being incubated for long-term support. It may contain new experimental code, for which APIs are subject to chang

3.7k Jan 7, 2023

Probabilistic time series modeling in Python

GluonTS - Probabilistic Time Series Modeling in Python GluonTS is a Python toolkit for probabilistic time series modeling, built around Apache MXNet (

3.3k Jan 3, 2023

A python library for Bayesian time series modeling

PyDLM Welcome to pydlm, a flexible time series modeling library for python. This library is based on the Bayesian dynamic linear model (Harrison and W

438 Dec 17, 2022

Pyomo is an object-oriented algebraic modeling language in Python for structured optimization problems.

Pyomo is a Python-based open-source software package that supports a diverse set of optimization capabilities for formulating and analyzing optimization models. Pyomo can be used to define symbolic problems, create concrete problem instances, and solve these instances with standard solvers.

1.4k Dec 28, 2022

UpliftML: A Python Package for Scalable Uplift Modeling

UpliftML is a Python package for scalable unconstrained and constrained uplift modeling from experimental data. To accommodate working with big data, the package uses PySpark and H2O models as base learners for the uplift models. Evaluation functions expect a PySpark dataframe as input.

254 Dec 31, 2022

MICOM is a Python package for metabolic modeling of microbial communities

Welcome MICOM is a Python package for metabolic modeling of microbial communities currently developed in the Gibbons Lab at the Institute for Systems

57 Dec 21, 2022

A modular active learning framework for Python

Modular Active Learning framework for Python3 Page contents Introduction Active learning from bird's-eye view modAL in action From zero to one in a fe

1.9k Dec 31, 2022

Simple structured learning framework for python

PyStruct PyStruct aims at being an easy-to-use structured learning and prediction library. Currently it implements only max-margin methods and a perce

666 Jan 3, 2023

Karate Club: An API Oriented Open-source Python Framework for Unsupervised Learning on Graphs (CIKM 2020)

Karate Club is an unsupervised machine learning extension library for NetworkX. Please look at the Documentation, relevant Paper, Promo Video, and Ext

1.8k Jan 3, 2023

Probabilistic programming framework that facilitates objective model selection for time-varying parameter models.

Time series analysis today is an important cornerstone of quantitative science in many disciplines, including natural and life sciences as well as eco

129 Dec 24, 2022

A fast, distributed, high performance gradient boosting (GBT, GBDT, GBRT, GBM or MART) framework based on decision tree algorithms, used for ranking, classification and many other machine learning tasks.

Light Gradient Boosting Machine LightGBM is a gradient boosting framework that uses tree based learning algorithms. It is designed to be distributed a

14.5k Jan 7, 2023

A unified framework for machine learning with time series

Welcome to sktime A unified framework for machine learning with time series We provide specialized time series algorithms and scikit-learn compatible

6k Jan 6, 2023

An open source framework that provides a simple, universal API for building distributed applications. Ray is packaged with RLlib, a scalable reinforcement learning library, and Tune, a scalable hyperparameter tuning library.

Ray provides a simple, universal API for building distributed applications. Ray is packaged with the following libraries for accelerating machine lear

23.3k Dec 31, 2022

Distributed training framework for TensorFlow, Keras, PyTorch, and Apache MXNet.

Horovod Horovod is a distributed deep learning training framework for TensorFlow, Keras, PyTorch, and Apache MXNet. The goal of Horovod is to make dis

12.9k Jan 7, 2023

BigDL: Distributed Deep Learning Framework for Apache Spark

BigDL: Distributed Deep Learning on Apache Spark What is BigDL? BigDL is a distributed deep learning library for Apache Spark; with BigDL, users can w

4.1k Jan 9, 2023

A high performance and generic framework for distributed DNN training

BytePS BytePS is a high performance and general distributed training framework. It supports TensorFlow, Keras, PyTorch, and MXNet, and can run on eith

3.3k Dec 28, 2022

LiuAlgoTrader is a scalable, multi-process ML-ready framework for effective algorithmic trading

LiuAlgoTrader is a scalable, multi-process ML-ready framework for effective algorithmic trading. The framework simplify development, testing, deployment, analysis and training algo trading strategies. The framework automatically analyzes trading sessions, and the analysis may be used to train predictive models.

458 Dec 24, 2022

Python Research Framework

106 Dec 13, 2022

A Lucid Framework for Transparent and Interpretable Machine Learning Models.

Currently a Beta-Version lucidmode is an open-source, low-code and lightweight Python framework for transparent and interpretable machine learning mod

15 Aug 12, 2022

A Pythonic framework for threat modeling

Related tags

Overview

pytm: A Pythonic framework for threat modeling

Introduction

Features

Requirements

Getting Started

Usage

Creating a Threat Model

Generating Diagrams

Creating a Report

Overrides

Threats database

Currently supported threats

Comments

Releases(v1.2.0)

v1.2.0(Apr 30, 2021)

Breaking changes

New features

Bug fixes

v1.1.2(Sep 24, 2020)

v1.1.0(Sep 17, 2020)

Breaking changes

New features

Bug fixes

Owner

Izar Tarandach

Uplift modeling and causal inference with machine learning algorithms

Probabilistic time series modeling in Python

A python library for Bayesian time series modeling

Pyomo is an object-oriented algebraic modeling language in Python for structured optimization problems.

UpliftML: A Python Package for Scalable Uplift Modeling

MICOM is a Python package for metabolic modeling of microbial communities

A modular active learning framework for Python

Simple structured learning framework for python

Karate Club: An API Oriented Open-source Python Framework for Unsupervised Learning on Graphs (CIKM 2020)

Probabilistic programming framework that facilitates objective model selection for time-varying parameter models.

A fast, distributed, high performance gradient boosting (GBT, GBDT, GBRT, GBM or MART) framework based on decision tree algorithms, used for ranking, classification and many other machine learning tasks.

A unified framework for machine learning with time series

An open source framework that provides a simple, universal API for building distributed applications. Ray is packaged with RLlib, a scalable reinforcement learning library, and Tune, a scalable hyperparameter tuning library.

Distributed training framework for TensorFlow, Keras, PyTorch, and Apache MXNet.

BigDL: Distributed Deep Learning Framework for Apache Spark

A high performance and generic framework for distributed DNN training

LiuAlgoTrader is a scalable, multi-process ML-ready framework for effective algorithmic trading

Python Research Framework

A Lucid Framework for Transparent and Interpretable Machine Learning Models.