Brownant is a web data extracting framework.

Douban Inc.

Last update: Jan 6, 2022

Related tags

Web Content Extracting brownant

Overview

Brownant

Brownant is a lightweight web data extracting framework.

Who uses it?

At the moment, dongxi.douban.com (a.k.a. Douban Dongxi) uses Brownant in production environment.

Installation

$ pip install brownant

Issues

If you want to report bugs or request features, please create issues on GitHub Issues.

Contributes

You can send a pull reueqst on GitHub.

You might also like...

a small library for extracting rich content from urls

A small library for extracting rich content from urls. what does it do? micawber supplies a few methods for retrieving rich metadata about a variety o

588 Dec 27, 2022

A tool for extracting text from scanned documents (via OCR), with user-defined post-processing.

The project is based on older versions of tesseract and other tools, and is now superseded by another project which allows for more granular control o

32 Jul 24, 2022

A machine learning software for extracting information from scholarly documents

GROBID GROBID documentation Visit the GROBID documentation for more detailed information. Summary GROBID (or Grobid, but not GroBid nor GroBiD) means

1.9k Jan 8, 2023

A Poetry plugin for dynamically extracting the package version.

Poetry Version Plugin A Poetry plugin for dynamically extracting the package version. It can read the version from a file __init__.py with: # __init__

264 Dec 22, 2022

discovering subdomains, hidden paths, extracting unique links

python-website-crawler discovering subdomains, hidden paths, extracting unique links pip install -r requirements.txt discover subdomain: You can give

4 Sep 5, 2022

Quick script for automatically extracting syscall numbers for an OS

Syscalls-Extractor Quick script for automatically extracting syscall numbers for an OS $ python3 .\syscalls-extractor.py --help usage: syscalls-extrac

54 Feb 10, 2022

Extracting Summary Knowledge Graphs from Long Documents

GraphSum This repo contains the data and code for the G2G model in the paper: Extracting Summary Knowledge Graphs from Long Documents. The other basel

10 Oct 21, 2022

A python library for extracting text from PDFs without losing the formatting of the PDF content.

Multilingual PDF to Text Install Package from Pypi Install it using pip. pip install multilingual-pdf2text The library uses Tesseract which can be ins

49 Nov 7, 2022

PRIN/SPRIN: On Extracting Point-wise Rotation Invariant Features

PRIN/SPRIN: On Extracting Point-wise Rotation Invariant Features Overview This repository is the Pytorch implementation of PRIN/SPRIN: On Extracting P

17 Mar 2, 2022

Python script for extracting audio from video files and creating Mel spectrograms

video2spectrogram About This package is meant to automate the process of extracting audio files from videos and saving the plots computed from these a

1 Oct 28, 2021

Sample scripts to show extracting details directly from the AIQUM database

1 Nov 19, 2021

Extracting frames from video and create video using frames

Extracting frames from video and create video using frames This program uses opencv library to extract the frames from video and create video from ext

1 Nov 19, 2021

An easy to use, user-friendly and efficient code for extracting OpenAI CLIP (Global/Grid) features from image and text respectively.

Extracting OpenAI CLIP (Global/Grid) Features from Image and Text This repo aims at providing an easy to use and efficient code for extracting image &

13 Jan 6, 2023

Malicious Document IoC Extractor is a collection of scripts that helps extracting IoCs from various maldoc families.

MDIExtractor Malicious Document IoC Extractor (MDIExtractor) is a collection of scripts that helps extracting IoCs from various maldoc families. Prere

14 Nov 25, 2022

Analysiscsv.py for extracting analysis and exporting as CSV

wcc_analysis Lichess page documentation: https://lichess.org/page/world-championships Each WCC has a study, studies are fetched using: https://lichess

32 Apr 25, 2022

This repository contains Python scripts for extracting linguistic features from Filipino texts.

Filipino Text Linguistic Feature Extractors This repository contains scripts for extracting linguistic features from Filipino texts. The scripts were

1 Oct 5, 2021

Utility for Extracting all passwords from ConnectWise Automate

CWA Password Extractor Utility for Extracting all passwords from ConnectWise Automate (E.g. while migrating to a new system). Outputs a csv file with

1 Dec 9, 2021

A Telegram bot to extracting text from images. All languages supported.

OCR Bot A Telegram bot to extracting text from images. All languages supported. Deploy to Heroku Local Deploying Clone the repo git clone https://gith

6 Oct 21, 2022

WIP: extracting Geometry utilities from datacube-core

odc.geo This is still work in progress. This repository contains geometry related code extracted from Open Datacube. For details and motivation see OD

34 Jan 9, 2023

Comments

Provide some example code?

Right now in order to know the purpose framework('data extracting framework' is accurate enough but is a vague term without examples) we have to click though 'Documents' and then go to this page. The usage and purpose of this framework should be explicitly displayed on README.md and the demo scripts should be put in repository. My two cents.
enhancement

opened by CNBorn 3
change the lib desc: crawling -> extracting.

With the suggest of @CNBorn, this branch changed the library description by using "extracting" instead of "crawling".

@CNBorn @VeryCB Please review my changeset. Thank you.

opened by tonyseek 3
fix using lxml and requests gets unicode error.

when I use The Declarative Demo,I receive this error - Unicode strings with encoding declaration are not supported. Please use bytes input or XML fragments without declaration.

use content instead of text, it can work:-)

opened by Caratpine 1
Known Bugs
[x] The UnicodeDecodeError will be raised while path_info including non-ascii characters. Convert it into ascii string with urllib.quote.

[x] The url.hostname may be None for invalid URLs. Prevent those invalid input.

bug
opened by tonyseek 1

Releases(0.1.6)

0.1.6(Jul 9, 2015)

Now you can parse a JSON response with JSONResponseProperty.
Source code(tar.gz)
Source code(zip)

Owner

Douban Inc.

GitHub https://pypi.python.org/pypi/brownant/

Every web site provides APIs.

Toapi Overview Toapi give you the ability to make every web site provides APIs. Version v2.0.0, Completely rewrote. More elegant. More pythonic v1.0.0

3.3k Jan 5, 2023

Web-Extractor - Simple Tool To Extract IP-Adress From Website

IP-Adress Extractor Simple Tool To Extract IP-Adress From Website Socials: Langu

7 Jan 16, 2022

Export your data from Xiami

Xiami Exporter 导出虾米音乐的个人数据，功能：导出歌曲为 json 收藏歌曲收藏专辑播放列表导出收藏艺人为 json 导出收藏专辑为 json 导出播放列表为 json (个人和收藏) 将导出的数据整理至 sqlite 数据库收藏歌曲收藏艺人收藏专辑播放列表下载已导出

59 Nov 13, 2021

CKAN is an open-source DMS (data management system) for powering data hubs and data portals. CKAN makes it easy to publish, share and use data. It powers catalog.data.gov, open.canada.ca/data, data.humdata.org among many other sites.

CKAN: The Open Source Data Portal Software CKAN is the world’s leading open-source data portal platform. CKAN makes it easy to publish, share and work

3.6k Dec 27, 2022

A simple algorithm for extracting tree height in sparse scene from point cloud data.

TREE HEIGHT EXTRACTION IN SPARSE SCENES BASED ON UAV REMOTE SENSING This is the offical python implementation of the paper "Tree Height Extraction in

6 Oct 28, 2022

A python script for extracting/removing exif data from images by @AbirHasan2005

Image-Exif A Python script for extracting exif metadata from images. How to use? Using this script you can extract exif data from image and save in .c

13 Dec 16, 2022

Scrapping malaysianpaygap & Extracting data from the Instagram posts

Scrapping malaysianpaygap & Extracting data from the posts Recently @malaysianpaygap has gotten quite famous as a platform that enables workers throug

65 Nov 9, 2022

Elementary is an open-source data reliability framework for modern data teams. The first module of the framework is data lineage.

Data lineage made simple, reliable, and automated. Effortlessly track the flow of data, understand dependencies and analyze impact. Features Visualiza

898 Jan 9, 2023

a small library for extracting rich content from urls

A small library for extracting rich content from urls. what does it do? micawber supplies a few methods for retrieving rich metadata about a variety o

588 Dec 27, 2022

A tool for extracting plain text from Wikipedia dumps

WikiExtractor WikiExtractor.py is a Python script that extracts and cleans text from a Wikipedia database dump. The tool is written in Python and requ

3.2k Dec 31, 2022

Brownant is a web data extracting framework.

Related tags

Overview

Brownant

Who uses it?

Installation

Links

Issues

Contributes

You might also like...

a small library for extracting rich content from urls

A tool for extracting text from scanned documents (via OCR), with user-defined post-processing.

A machine learning software for extracting information from scholarly documents

A Poetry plugin for dynamically extracting the package version.

discovering subdomains, hidden paths, extracting unique links

Quick script for automatically extracting syscall numbers for an OS

Extracting Summary Knowledge Graphs from Long Documents

A python library for extracting text from PDFs without losing the formatting of the PDF content.

PRIN/SPRIN: On Extracting Point-wise Rotation Invariant Features

Python script for extracting audio from video files and creating Mel spectrograms

Sample scripts to show extracting details directly from the AIQUM database

Extracting frames from video and create video using frames

An easy to use, user-friendly and efficient code for extracting OpenAI CLIP (Global/Grid) features from image and text respectively.

Malicious Document IoC Extractor is a collection of scripts that helps extracting IoCs from various maldoc families.

Analysiscsv.py for extracting analysis and exporting as CSV

This repository contains Python scripts for extracting linguistic features from Filipino texts.

Utility for Extracting all passwords from ConnectWise Automate

A Telegram bot to extracting text from images. All languages supported.

WIP: extracting Geometry utilities from datacube-core

Comments

Provide some example code?

change the lib desc: crawling -> extracting.

fix using lxml and requests gets unicode error.

Known Bugs

Releases(0.1.6)

0.1.6(Jul 9, 2015)

Owner

Douban Inc.

Every web site provides APIs.

Web-Extractor - Simple Tool To Extract IP-Adress From Website

Export your data from Xiami

CKAN is an open-source DMS (data management system) for powering data hubs and data portals. CKAN makes it easy to publish, share and use data. It powers catalog.data.gov, open.canada.ca/data, data.humdata.org among many other sites.

A simple algorithm for extracting tree height in sparse scene from point cloud data.

A python script for extracting/removing exif data from images by @AbirHasan2005

Scrapping malaysianpaygap & Extracting data from the Instagram posts

Elementary is an open-source data reliability framework for modern data teams. The first module of the framework is data lineage.

a small library for extracting rich content from urls

A tool for extracting plain text from Wikipedia dumps