📰
Newsemble
📰
An API for fetching the current news.
🔖
About
🔖
Newsemble is an API that provides easy access to the current news for programmatic analysis. It has been built using Python, BeautifulSoup and MongoDB.
The data is scraped from these news websites every hour, stored in a database on the cloud and whenever requested, the most recent articles are promptly served.
Developers can make use of this API to fetch current data with each article having the following fields:
Headlines, Content, Source, Link and Time.
🗒️
Table of contents
💻
Technologies
Newsemble is created with:
- Python 3
- Flask
- PyMongo
- BeautifulSoup
📂
File Structure and Description
- app.py - Flask code for the API
- scraper.py - Collection of scrapers for the various news sites.
- db.py - Connecting and Using MongoDB
- utils.py - Utility Functions
- scheduler.py - Scheduler
- Procfile - For Deployment
- requirements.txt - Python Requirments
🛠️
Pipeline
🚀
Getting-started
This project can be accessed by using following setup
Links
Links | Description |
---|---|
www.newsemble.ml/news | Link to fetch all the data from all sources |
www.newsemble.ml/news/toi | Link to fetch data from Times of India |
www.newsemble.ml/news/th | Link to fetch data from The Hindu |
www.newsemble.ml/news/tie | Link to fetch data from The Indian Express |
www.newsemble.ml/news/ndtv | Link to fetch data from NDTV news |
www.newsemble.ml/news/it | Link to fetch data from India Today |
Request format
$ import requests
$ url = "http://www.newsemble.ml/news/"
$ requests.get(url).json()
Response format
{
‘link’ : $source_link$,
‘content’ : $content_text$,
‘source’ : $news_source$,
‘title’ : $headline$,
‘time : $date_time_of_article$
}
Sample output
⚙️
Currently Supported Sites
🙏
Thanks!
All contributions are welcome and appreciated.
If you liked this project, or found it useful in any way, please drop a
✍️
Authors
✍️