Automatically download and crop key information from the arxiv daily paper. (cpu version)

HeoLis

Last update: Jul 30, 2022

Related tags

Downloader FocusAX

Overview

FocusAX

按关键词筛选arxiv每日最新paper或从arxiv搜索。

自动下载、获取摘要、自动截取文中表格和图片。

安装必要的环境

安装 paddle

# GPU安装
python3 -m pip install paddlepaddle-gpu==2.1.1 -i https://mirror.baidu.com/pypi/simple

# CPU安装
 python3 -m pip install paddlepaddle==2.1.1 -i https://mirror.baidu.com/pypi/simple

安装 Layout-Parser

=2.2"">

pip3 install -U https://paddleocr.bj.bcebos.com/whl/layoutparser-0.0.0-py3-none-any.whl
pip install "paddleocr>=2.2"

按照其他必要的包

pip3 install -r requirements.txt

下载模型权重
将PubLayNet 下载解压后放置在paperparse目录下。目录结构如下

FocusAX
    - paperparse
        - ppyolov2_r50vd_dcn_365e_publaynet
            - inference.pdiparams
            - inference.pdiparams.info
            - inference.pdmodel
        - ...
    - downloader
        - ...
    - utils
        - ...
    - configs.py
    - focus_daily.py
    - focus_search.py
    - README.py
    - ...

使用教程

configs.py ：程序参数配置文件

# =============== 网络代理 ================
# proxy = None # 不使用代理
proxy = {"http": "socks5://127.0.0.1:8080", "https": "socks5://127.0.0.1:8080"}
# =============== 保存文件根目录 ================
root_path = "./arxiv"
# =============== DNN模型推理配置信息 ================
threshold = 0.5
enable_mkldnn = True
enforce_cpu = True
thread_num = 4

focus_daily.py ：按关键字过滤arxiv daily上的文章（仅当日）

if __name__ == '__main__':
    key_words = ['GAN'] # 要包含的关键词
    subject_words = ['ML', 'CV', 'AI']  # 要包含的类别
    start_parse(key_words, subject_words, needPDF=True, needZip=False)

focus_search.py ：按关键字在arxiv检索

start_parse('Keyword')

root_path 目录中将创建新的文件夹保存结果

效果图

每个文件夹中的abs.md文件保留的是当前pdf的介绍，使用Typora等markdown编辑器打开。

ps:论文排版不规范会导致截图混乱。

其他

服务器端推理版本（前后端分离）https://github.com/wmpscc/ArxivDailyOverview

FireDM is a python open source (Internet Download Manager) with multi-connections, high speed engine, it downloads general files and videos from youtube and tons of other streaming websites .

python open source (Internet Download Manager) with multi-connections, high speed engine, based on python, LibCurl, and youtube_dl https://github.com/firedm/FireDM

1.6k Apr 12, 2022

Using Youtube downloader is the fast and easy way to download and save any YouTube video.

Youtube video downloader using Django Using Django as a backend along with pytube module to create Youtbue Video Downloader. https://yt-videos-downloa

10 Jun 18, 2022

Advance Image Downloader/Extractor (Job) is a Python-Flask web-based app, which will help the user download the any kind of Images at any date and time over the internet. These images will get downloaded as a job and then let user know that the images have been downloaded by sending them a link over an email.

Advance Image Downloader/Extractor(Job) Advance Image Downloader/Extractor (Job) is a Python-Flask web-based app, which will help the user download th

13 Aug 27, 2022

Download and save Bing wallpapers and set as background for GNOME desktop

Save Bing wallpapers and set as background for GNOME desktop This script downloads the Bing wallpaper and sets it in the background of your gnome desk

2 Nov 6, 2021

Command-line program to download videos from YouTube.com and other video sites

youtube-dl - download videos from youtube.com or other video platforms

116.4k Jan 7, 2023

The free and open-source Download Manager written in pure Python

2.7k Dec 31, 2022

A Udemy downloader that can download DRM protected videos and non-DRM protected videos.

Udemy Downloader with DRM support NOTE This program is WIP, the code is provided as-is and i am not held resposible for any legal repercussions result

468 Dec 29, 2022

I sure love the mix of newsboat+mpv+youtube-dl to watch videos from my favourite creators directly from my command line. But sometimes I want to download them beforehand and have them sorted into different folders. Here is the script to do exactly that.

newsboat_video_downloader I sure love the mix of newsboat+mpv+youtube-dl to watch videos from my favourite creators directly from my command line. But

16 Dec 12, 2022

Scripts to download files and folders programmatically from Google Drive

Google Drive Downloader Scripts Every time I need to download a lot of files from Google Drive (e.g. a dataset), it's always incredibly frustrating an

6 Jul 22, 2021

Automatically download and crop key information from the arxiv daily paper. (cpu version)

Related tags

Overview

FocusAX

安装必要的环境

使用教程

效果图

其他

You might also like...

FireDM is a python open source (Internet Download Manager) with multi-connections, high speed engine, it downloads general files and videos from youtube and tons of other streaming websites .

Using Youtube downloader is the fast and easy way to download and save any YouTube video.

Download and save Bing wallpapers and set as background for GNOME desktop

Command-line program to download videos from YouTube.com and other video sites

The free and open-source Download Manager written in pure Python

A Udemy downloader that can download DRM protected videos and non-DRM protected videos.

I sure love the mix of newsboat+mpv+youtube-dl to watch videos from my favourite creators directly from my command line. But sometimes I want to download them beforehand and have them sorted into different folders. Here is the script to do exactly that.

Scripts to download files and folders programmatically from Google Drive

Owner

HeoLis

A toolkit to automatically crawl the paper list and download paper pdfs of ACL Ahthology.

Download Apple Music Cover Artwork in the best Quality by providing an Apple Music Link. It downloads the jpg, png and webp version since they often differ from another.

A modern CLI to download animes automatically from Twist

Automatically download multiple papers by keywords in CVPR

Can automatically download mods from a Curseforge modpack

A Celery application to collect data, download media and extract information from social media APIs

A tool to download program information from Bugcrowd, for use by researchers to compare programs they are eligible to participate in

A simple python script to fetch Bing daily images and set them randomly using hsetroot

Arxiv2Kindle is a simple script written in python that converts LaTeX source downloaded from Arxiv and recompiles it to better fit a Kindle or other similar reading devices.

DYA ( Ditch YouTube API ) is a package created to power the user with YouTube Data API functionality without any API Key