Picka: A Python module for data generation and randomization.

Anthony

Last update: Nov 30, 2021

Related tags

Data Analysis picka

Overview

Picka: A Python module for data generation and randomization.

Author:	Anthony Long
Version:	1.0.1 - Fixed the broken image stuff. Whoops

What is Picka?

Picka generates randomized data for testing.

Data is generated both from a database of known good data (which is included), or by generating realistic data (valid), using string formatting (behind the scenes).

Picka has a function for any field you would need filled in. With selenium, something like would populate the "field-name-here" box for you, 100 times with random names.

for x in xrange(101):
        self.selenium.type('field-name-here', picka.male_name())

But this is just the beginning. Other ways to implement this, include using dicts:

user_information = {
        "first_name": picka.male_name(),
        "last_name": picka.last_name(),
        "email_address": picka.email(10, extension='example.org'),
        "password": picka.password_numerical(6),
}

This would provide:

{
        "first_name": "Jack",
        "last_name": "Logan",
        "email_address": "uragnscsah@example.org",
        "password": "485444"
}

Don't forget, since all of the data is considered "clean" or valid - you can also use it to fill selects and other form fields with pre-defined values. For example, if you were to generate a state; picka.state() the result would be "Alabama". You can use this result to directly select a state in an address drop-down box.

Examples:

Selenium

def search_for_garbage():
        selenium.open('http://yahoo.com')
        selenium.type('id=search_box', picka.random_string(10))
        selenium.submit()

def test_search_for_garbage_results():
        search_for_garbage()
        selenium.wait_for_page_to_load('30000')
        assert selenium.get_xpath_count('id=results') == 0

Webdriver

driver = webdriver.Firefox()
driver.get("http://somesite.com")
x = {
        "name": [
                "#name",
                picka.name()
        ]
}
driver.find_element_by_css_selector(
        x["name"][0]).send_keys(x["name"][1]
)

Funcargs / pytest

def pytest_generate_tests(metafunc):
        if "test_string" in metafunc.funcargnames:
                for i in range(10):
                        metafunc.addcall(funcargs=dict(numiter=picka.random_string(20)))

def test_func(test_string):
        assert test_string.isalpha()
        assert len(test_string) == 20

MySQL / SQLite

first, last, age = picka.first_name(), picka.last_name(), picka.age()
cursor.execute(
   "insert into user_data (first_name, last_name, age) VALUES (?, ?, ?)",
   (first, last, age)
)

HTTP

def post(host, data):
        http = httplib.HTTP(host)
        return http.send(data)

def test_post_result():
        post("www.spam.egg/bacon.htm", picka.random_string(10))

Comments

No test suite

Slightly ironic, a test data generation toolkit which doesnt have a test suite.

Also setup.py doesnt declare Python 3 support, hence the need for a test suite to validate it works correctly.

opened by jayvdb 1
Additional Functionality for Testers to Add Their Own Data

Picka provides general data for testing. Leveraging this effort provides custom test data. Test data is not limited to just preconfigured values when it's possible to add custom test data. Data can be accessed sequentially, randomly or completely.

opened by bkuehlhorn 1
Fixed test file, added alternative sentence maker
Fixed usage of number in tests (it takes one arg, not two)

Added sentence_actual, which returns an actual sentence from the Sherlock text.

Added _picka._Book class to hold the text and split sentences read from Sherlock. Users can call sentence() without reading the entire file again and again.

Added test of sentence_actual to picka.tests

The sentence_actual function has some nice features:

You're much less likely to get a sentence fragment

You can specify a minimum and maximum number of words

It should be relatively efficient, because the split sentences are cached by the _Book class.

The sentences aren't always perfect, but I think that has to do with the source. A book other than Sherlock Holmes, preferably one with less dialog, would give more "normal" sentences.
opened by TadLeonard 1
Library does not take locale into account
The library assumes an English locale is used (e.g., English-language hardcoded month names). Ideally the library would use locale-dependent constants so that computations are done correctly (e.g., the duration of a month in month_and_day):

>>> locale.setlocale(locale.LC_ALL, 'it_IT') 'it_IT' >>> picka.month() 'Marzo' >>> picka.month_and_day() 'Maggio 2'
opened by svisser 0
picka.age will return ages outside of the bounds

If I call picka.age(1, 1) repeatedly I get 1 and 2 as results. I would have expected it to always return 1. Note that this situation can occur when passing variables to picka.age, I don't expect people to write this in their code themselves.

I can also get ages outside of the bounds when I call picka.age(0, 1) which resorts to using the default values and can therefore return any age within the default values.

opened by svisser 0
Module name means "cunt"

I'm not sure if this is a real issue, but when I look at this module I cannot do so with a straight face. "Picka" is "cunt" in Serbian, Macedonian, Bosnian, Croatian, and I'm unsure as to whether there are other languages where this holds.

While not grounds for any specific action, I find this largely amusing and just wanted to share.

opened by geomaster 2

Releases(v0.96)

v0.96(Jan 17, 2014)

hex, rbg, image and more.
Source code(tar.gz)
Source code(zip)
picka-0.9.6.tar.gz(8.13 MB)
picka-0.9.6.zip(8.18 MB)

Owner

Anthony

GitHub http://antlong.com

Python package for processing UC module spectral data.

UC Module Python Package How To Install clone repo. cd UC-module pip install . How to Use uc.module.UC(measurment=str, dark=str, reference=str, heade

1 Oct 20, 2021

Data Intelligence Applications - Online Product Advertising and Pricing with Context Generation

Data Intelligence Applications - Online Product Advertising and Pricing with Context Generation Overview Consider the scenario in which advertisement

2 Nov 18, 2021

Amundsen is a metadata driven application for improving the productivity of data analysts, data scientists and engineers when interacting with data.

3.7k Jan 3, 2023

🧪 Panel-Chemistry - exploratory data analysis and build powerful data and viz tools within the domain of Chemistry using Python and HoloViz Panel.

???? ??. The purpose of the panel-chemistry project is to make it really easy for you to do DATA ANALYSIS and build powerful DATA AND VIZ APPLICATIONS within the domain of Chemistry using using Python and HoloViz Panel.

97 Dec 8, 2022

A Python module for clustering creators of social media content into networks

sm_content_clustering A Python module for clustering creators of social media content into networks. Currently supports identifying potential networks

72 Dec 30, 2022

MeSH2Matrix - A set of Python codes for the generation of biomedical ontologies from the MeSH keywords of the PubMed scholarly publications

A set of Python codes for the generation of biomedical ontologies from the MeSH keywords of the PubMed scholarly publications

6 Nov 30, 2022

Created covid data pipeline using PySpark and MySQL that collected data stream from API and do some processing and store it into MYSQL database.

2 Nov 20, 2021

Autopsy Module to analyze Registry Hives based on bookmarks provided by EricZimmerman for his tool RegistryExplorer

13 Mar 31, 2022

Python data processing, analysis, visualization, and data operations

Python This is a Python data processing, analysis, visualization and data operations of the source code warehouse, book ISBN: 9787115527592 Descriptio

1 Jan 16, 2022

PrimaryBid - Transform application Lifecycle Data and Design and ETL pipeline architecture for ingesting data from multiple sources to redshift

Transform application Lifecycle Data and Design and ETL pipeline architecture for ingesting data from multiple sources to redshift This project is composed of two parts: Part1 and Part2

1 Jan 19, 2022

Tuplex is a parallel big data processing framework that runs data science pipelines written in Python at the speed of compiled code

Tuplex is a parallel big data processing framework that runs data science pipelines written in Python at the speed of compiled code. Tuplex has similar Python APIs to Apache Spark or Dask, but rather than invoking the Python interpreter, Tuplex generates optimized LLVM bytecode for the given pipeline and input data set.

791 Jan 4, 2023

Catalogue data - A Python Scripts to prepare catalogue data

catalogue_data Scripts to prepare catalogue data. Setup Clone this repo. Install

3 Mar 3, 2022

fds is a tool for Data Scientists made by DAGsHub to version control data and code at once.

Fast Data Science, AKA fds, is a CLI for Data Scientists to version control data and code at once, by conveniently wrapping git and dvc

359 Dec 22, 2022

Demonstrate the breadth and depth of your data science skills by earning all of the Databricks Data Scientist credentials

Data Scientist Learning Plan Demonstrate the breadth and depth of your data science skills by earning all of the Databricks Data Scientist credentials

27 Nov 1, 2022

Picka: A Python module for data generation and randomization.

Related tags

Overview

Picka: A Python module for data generation and randomization.

What is Picka?

Examples:

Selenium

Webdriver

Funcargs / pytest

MySQL / SQLite

HTTP

Comments

No test suite

Additional Functionality for Testers to Add Their Own Data

Fixed test file, added alternative sentence maker

Library does not take locale into account

picka.age will return ages outside of the bounds

Module name means "cunt"

Releases(v0.96)

v0.96(Jan 17, 2014)

Owner

Anthony

Python package for processing UC module spectral data.

Data Intelligence Applications - Online Product Advertising and Pricing with Context Generation

Amundsen is a metadata driven application for improving the productivity of data analysts, data scientists and engineers when interacting with data.

🧪 Panel-Chemistry - exploratory data analysis and build powerful data and viz tools within the domain of Chemistry using Python and HoloViz Panel.

A Python module for clustering creators of social media content into networks

MeSH2Matrix - A set of Python codes for the generation of biomedical ontologies from the MeSH keywords of the PubMed scholarly publications

Created covid data pipeline using PySpark and MySQL that collected data stream from API and do some processing and store it into MYSQL database.

Autopsy Module to analyze Registry Hives based on bookmarks provided by EricZimmerman for his tool RegistryExplorer

Python data processing, analysis, visualization, and data operations

PrimaryBid - Transform application Lifecycle Data and Design and ETL pipeline architecture for ingesting data from multiple sources to redshift

Tuplex is a parallel big data processing framework that runs data science pipelines written in Python at the speed of compiled code

Catalogue data - A Python Scripts to prepare catalogue data

fds is a tool for Data Scientists made by DAGsHub to version control data and code at once.

Demonstrate the breadth and depth of your data science skills by earning all of the Databricks Data Scientist credentials

A data parser for the internal syncing data format used by Fog of World.

Functional Data Analysis, or FDA, is the field of Statistics that analyses data that depend on a continuous parameter.

Fancy data functions that will make your life as a data scientist easier.

A Big Data ETL project in PySpark on the historical NYC Taxi Rides data

Utilize data analytics skills to solve real-world business problems using Humana’s big data