A tool for exporting Telegram group chats into static websites, preserving chat history like mailing list archives.

Overview

favicon

tg-archive is a tool for exporting Telegram group chats into static websites, preserving chat history like mailing list archives.

Preview

The @fossunited Telegram group archive.

image

How it works

tg-archive uses the Telethon Telegram API client to periodically sync messages from a group to a local SQLite database file, downloading only new messages since the last sync. It can then generate a static archive website of messages which can be published anywhere.

Features

  • Periodically sync Telegram group messages to a local DB.
  • Download user avatars locally.
  • Download and embed media (files, documents, photos).
  • Renders poll results.
  • Use emoji alternatives in place of stickers.
  • Single file Jinja HTML template for generating the static site.
  • Year / Month / Day indexes with deep linking across pages.
  • "In reply to" on replies with links to parent messages across pages.
  • RSS / Atom feed of recent messages.

Install

  • Get Telegram API credentials. Normal user account API and not the Bot API.
  • Install with pip3 install tg-archive (tested with Python 3.8.6).

Usage

  1. tg-archive --new --path=mysite (creates a new site. cd into mysite and edit config.yaml).
  2. tg-archive --sync (syncs data into data.sqlite). Note: First time connection will prompt for your phone number + a Telegram auth code sent to the app. On successful auth, a session.session file is created. DO NOT SHARE this session file publicly as it contains the API autorization for your account.
  3. tg-archive --build (builds the static site into the site directory, which can be published)

Customization

Edit the generated template.html and static assets in the ./static directory to customize the site.

Note

  • The sync can be stopped (Ctrl+C) any time to be resumed later.
  • Setup a cron job to periodically sync messages and re-publish the archive.
  • Downloading large media files and long message history from large groups continuously may run into Telegram API's rate limits. Watch the debug output.

Licensed under the MIT license.

Comments
  • Why running with crontab showing sqlite3.OperationalError: near

    Why running with crontab showing sqlite3.OperationalError: near "(": syntax error

    The script works well with command line . But while running with cron it shows the follwing error.

    2022-05-08 19:40:02,089: cryptg detected, it will be used for encryption 2022-05-08 19:40:02,596: starting Telegram sync (batch_size=4000, limit=0, wait=5, mode=standard) 2022-05-08 19:40:02,601: Connecting to 91.108.56.108:443/TcpFull... 2022-05-08 19:40:02,606: Connection to 91.108.56.108:443/TcpFull complete! 2022-05-08 19:40:02,688: fetching from last message id=2102 (2022-05-07 00:00:00) 2022-05-08 19:40:02,875: finished. fetched 0 messages. last message = 2022-05-07 00:00:00 2022-05-08 19:40:28,689: building site Traceback (most recent call last): File "/usr/local/bin/tg-archive", line 11, in load_entry_point('tg-archive==0.5.4', 'console_scripts', 'tg-archive')() File "/usr/local/lib/python3.6/site-packages/tgarchive/init.py", line 150, in main b.build() File "/usr/local/lib/python3.6/site-packages/tgarchive/build.py", line 58, in build for d in self.db.get_dayline(month.date.year, month.date.month, self.config["per_page"]): File "/usr/local/lib/python3.6/site-packages/tgarchive/db.py", line 128, in get_dayline """, (limit, "{}{:02d}".format(year, month))) sqlite3.OperationalError: near "(": syntax error

    opened by babynew 14
  • Can I use this repo to export public channal chats?

    Can I use this repo to export public channal chats?

    Hello: I have joined about 30 public channels, I want to export 2 of them, the channel chats every day. I am using Windows 10, and Telegram desktop version 4.0.1, I can export channel chats by hand from Telegram desktop by setting 6 different parameters. However, this type of job is tedious, I want to know if I can use this repo to make channel chats exporting job as a Python script, so I can call the Python script later on, I don't have to do the job by hand every day. Please advise: how to provide 6 parameters, so I can use the repo to export one public channel chats. Thanks,

    opened by zydjohnHotmail 9
  • Messages from foreign channel

    Messages from foreign channel

    While archiving messages from a public channel, messages from another channel @DebugSchool where included in the built website. Is this behaviour expected? And how can these messages be removed?

    With the introduction of sponsors, I'm afraid that these might be sponsor messages. If that is the case, I believe an option to exclude these when building the website, or at least the rss feed, would be a good idea.

    enhancement 
    opened by Farzat07 7
  • skipping admin posts

    skipping admin posts

    Hello,

    My tg-archive seems to be skipping admin posts, ever since a group I follow has multiple admins.. The admins post under NAMEOFGROUP (Admin1) or NAMEOFGROUP (Admin2)..

    Tg-archive doesn't seem to raise an exception, it just says fetched 0 messages when I point to the post in particular.

    Any ideas?

    Thanks in advance

    opened by caribmorn 6
  • The phone number is invalid (caused by SendCodeRequest)  and no data found to publish site

    The phone number is invalid (caused by SendCodeRequest) and no data found to publish site

    Running into this error when running tg-archive --sync

    Traceback (most recent call last):
    File "/home/lode/.local/bin/tg-archive", line 11, in <module>
    sys.exit(main())
    File "/home/lode/.local/lib/python3.6/site-packages/tgarchive/__init__.py", line 119, in main
    Sync(cfg, args.session, db.DB(args.data)).sync(args.id)
    File "/home/lode/.local/lib/python3.6/site-packages/tgarchive/sync.py", line 33, in __init__
    self.client.start()
    File "/home/lode/.local/lib/python3.6/site-packages/telethon/client/auth.py", line 133, in start
    else self.loop.run_until_complete(coro)
    File "/usr/lib/python3.6/asyncio/base_events.py", line 484, in run_until_complete
    return future.result()
    File "/home/lode/.local/lib/python3.6/site-packages/telethon/client/auth.py", line 189, in _start
    await self.send_code_request(phone, force_sms=force_sms)
    File "/home/lode/.local/lib/python3.6/site-packages/telethon/client/auth.py", line 515, in send_code_request
    phone, self.api_id, self.api_hash, types.CodeSettings()))
    File "/home/lode/.local/lib/python3.6/site-packages/telethon/client/users.py", line 30, in __call__
    return await self._call(self._sender, request, ordered=ordered)
    File "/home/lode/.local/lib/python3.6/site-packages/telethon/client/users.py", line 79, in _call
    result = await future
    telethon.errors.rpcerrorlist.PhoneNumberInvalidError: The phone number is invalid (caused by SendCodeRequest)
    

    session.session file is created though, running tg-archive --build gives 2021-05-25 17:02:31,213: building site 2021-05-25 17:02:31,236: no data found to publish site

    Do I need to paste the group ID with "@" in the yaml file? This is my yaml file, do I need to edit something?:

    # Telegram API ID and hash from the Telegram dev portal.
    # Signup for it here: https://my.telegram.org/auth?to=apps
    api_id: "redacted"
    api_hash: "redacted"
    # Telegram channel / group name to import. Your user account
    # that was used to creat the API ID should be a member of this group.
    group: "redacted"
    # Avatars and media will be downloaded into media_dir.
    download_media: True
    download_avatars: True
    avatar_size: [64, 64] # Width, Height.
    media_dir: "media"
    # These should be configured carefully to not get rate limited by Telegram.
    # Number of messages to fetch in one batch.
    fetch_batch_size: 2000
    # Seconds to wait after fetching one full batch and moving on to the next one.
    fetch_wait: 5
    # Max number of messages to fetch across all batches before the stopping.
    # This should be greater than fetch_batch_size.
    # Set to 0 to never stop until all messages have been fetched.
    fetch_limit: 0
    publish_dir: "site"
    static_dir: "static"
    per_page: 500
    show_day_index: True
    # URL to link Telegram group names and usernames.
    telegram_url: "https://t.me/{id}"
    # IMPORTANT: Telegram shows the full name on your (API key holder's)
    # phonebook for users who are in your phonebook.
    show_sender_fullname: False
    publish_rss_feed: True
    rss_feed_entries: 100 # Show Latest N messages in the RSS feed.
    # Root URL where the site will be hosted. No trailing slash.
    site_url: "https://mysite.com"
    site_name: "@{group} - Telegram group archive"
    site_description: "Public archive of Telegram messages."
    meta_description: "@{group} {date} - Telegram message archive."
    page_title: "Page {page} - {date} @{group} Telegram message archive."
                                                                                                                                                                                                                              
    
    question 
    opened by Lod3 6
  • AttributeError: 'Channel' object has no attribute 'bot'

    AttributeError: 'Channel' object has no attribute 'bot'

    tg-archive --sync

    It was second run, at first run I entered user tel and code

    2021-03-24 12:04:10,638: starting Telegram sync (batch_size=2000, limit=0, wait=5)
    2021-03-24 12:04:10,655: Connecting to 149.154.167.51:443/TcpFull...
    2021-03-24 12:04:10,703: Connection to 149.154.167.51:443/TcpFull complete!
    Traceback (most recent call last):
      File "/usr/local/bin/tg-archive", line 8, in <module>
        sys.exit(main())
      File "/usr/local/lib/python3.8/site-packages/tgarchive/__init__.py", line 112, in main
        Sync(cfg, args.session, db.DB(args.data)).sync(args.id)
      File "/usr/local/lib/python3.8/site-packages/tgarchive/sync.py", line 57, in sync
        for m in self._get_messages(self.config["group"],
      File "/usr/local/lib/python3.8/site-packages/tgarchive/sync.py", line 137, in _get_messages
        user=self._get_user(m.sender),
      File "/usr/local/lib/python3.8/site-packages/tgarchive/sync.py", line 143, in _get_user
        if u.bot:
    AttributeError: 'Channel' object has no attribute 'bot'
    
    bug 
    opened by volyx 6
  • Add takeout mode

    Add takeout mode

    Fixes #6

    Takeout mode Takeout mode allows for faster export as requests are made with a takeout session. The flood limits are presumably more generous (lower) for such requests, allowing the user to experiment with a bigger fetch_batch_size. The user can opt for or opt out of the new takeout mode through the use_takeout option in the config file.

    I believe that the takeout session was designed to export data and hence I took the liberty to set the use_takeout value to True in the example config, making it the default behavior for every new site. I have also set the fetch_batch_size as 4000. I did some testing and it seems to be good. We can discuss and make changes to these configs as needed.

    No avatar/profile photo handling download_profile_photo returns None when a user does not have a profile photo and tg-archive currently logs an unnecessary error ('cannot identify image file') when that happens. This has also been fixed.

    enhancement 
    opened by faraazb 5
  • Issues with sqlite3

    Issues with sqlite3

    I have upgraded and have the latest sqlite, pip and python - keep getting this error for some reason. What could be the issue?

    2022-09-25 15:37:30,792: fetching from last message id=0 (None) Traceback (most recent call last): File "/home/ec2-user/.local/bin/tg-archive", line 33, in sys.exit(load_entry_point('tg-archive==0.5.5', 'console_scripts', 'tg-archive')()) File "/home/ec2-user/.local/lib/python3.7/site-packages/tgarchive/init.py", line 130, in main s.sync(args.id, args.from_id) File "/home/ec2-user/.local/lib/python3.7/site-packages/tgarchive/sync.py", line 65, in sync self.db.insert_user(m.user) File "/home/ec2-user/.local/lib/python3.7/site-packages/tgarchive/db.py", line 174, in insert_user """, (u.id, u.username, u.first_name, u.last_nam

    question 
    opened by 4khil 4
  • Error: duplicate column name: takeout_id

    Error: duplicate column name: takeout_id

    Hello, could someone please help me with his issue. Doing a fresh install but getting below error:

    Thanks so much for your help and the great tool. Marcus

    root@xxx# tg-archive --sync 2022-07-20 18:29:51,555: cryptg detected, it will be used for encryption 2022-07-20 18:29:51,952: starting Telegram sync (batch_size=2000, limit=100, wait=5, mode=standard) Traceback (most recent call last): File "/usr/local/bin/tg-archive", line 33, in sys.exit(load_entry_point('tg-archive==0.5.5', 'console_scripts', 'tg-archive')()) File "/usr/local/lib/python3.10/dist-packages/tg_archive-0.5.5-py3.10.egg/tgarchive/init.py", line 140, in main s = Sync(cfg, args.session, DB(args.data)) File "/usr/local/lib/python3.10/dist-packages/tg_archive-0.5.5-py3.10.egg/tgarchive/sync.py", line 29, in init self.client = self.new_client(session_file, config) File "/usr/local/lib/python3.10/dist-packages/tg_archive-0.5.5-py3.10.egg/tgarchive/sync.py", line 102, in new_client client = TelegramClient(session, config["api_id"], config["api_hash"]) File "/usr/local/lib/python3.10/dist-packages/Telethon-1.24.0-py3.10.egg/telethon/client/telegrambaseclient.py", line 273, in init session = SQLiteSession(session) File "/usr/local/lib/python3.10/dist-packages/Telethon-1.24.0-py3.10.egg/telethon/sessions/sqlite.py", line 55, in init self._upgrade_database(old=version) File "/usr/local/lib/python3.10/dist-packages/Telethon-1.24.0-py3.10.egg/telethon/sessions/sqlite.py", line 147, in _upgrade_database c.execute("alter table sessions add column takeout_id integer") sqlite3.OperationalError: duplicate column name: takeout_id

    question 
    opened by icepaule 4
  • No Images/Movies Downloaded since Oct 2020

    No Images/Movies Downloaded since Oct 2020

    I have setup tg-archive to download the channel MARKmobil just to have a backup of all his work.

    The website can be found here: https://markmobil.borg.ch/telegram/2020-10.html#2020-10-06

    The web pages generated from the downloaded data look fine until September 2020, the last picture i can see is from October 2020, and after that all movies and pictures are missing.

    I can see many of the following error messages with differing media number: 2022-02-24 11:13:23,855: downloading media #2690 2022-02-24 11:13:23,856: Starting direct file download in chunks of 131072 at 0, stride 131072 2022-02-24 11:13:24,030: error downloading media: #2690: The file reference has expired and is no longer valid or it belongs to self-destructing media and cannot be resent (caused by GetFileRequest)

    If I look into the channel using telegram-desktop I can still see the latest pictures.

    question 
    opened by Martin-Furter 4
  • Incorrect sort order for years in sidebar

    Incorrect sort order for years in sidebar

    I'm using tg-archive 0.3.9 with a Telegram group which existed for a few years already.

    When creating the static website with tg-archive --build, the generated sidebar is sorting the years in the sidebar incorrectly (see screenshot).

    Sort order in sidebar
    opened by joschi 4
  • Traceback (most recent call last):   File os.chdir(group_path + "/../") FileNotFoundError: [Errno 2] No such file or directory: '/www/wwwroot/telegrama/telegramtest/messagename/tgarchive_store//me//../' ">

    Traceback (most recent call last): File "main.py", line 71, in os.chdir(group_path + "/../") FileNotFoundError: [Errno 2] No such file or directory: '/www/wwwroot/telegrama/telegramtest/messagename/tgarchive_store//me//../'

    Traceback (most recent call last): File "main.py", line 71, in os.chdir(group_path + "/../") FileNotFoundError: [Errno 2] No such file or directory: '/www/wwwroot/telegrama/telegramtest/messagename/tgarchive_store//me//../'

    opened by v835845728 0
  • [Security] Stored cross-site scripting

    [Security] Stored cross-site scripting

    Analysis

    In build.py that use unsafe options to render chat contents . This vulnerable can execute malware JS in user's browser when they export untrusted channel/group.

    def load_template(self, fname):
            with open(fname, "r", encoding="utf-8") as f:
                self.template = Template(f.read())
    

    POC

    real word exploit:

    image

    image

    Fix

    self.template = Template(f.read() , autoescape=True)
    

    please assign an CVE to this, thanks.

    opened by djerryz 0
  • Temporary file was not removed after failed download

    Temporary file was not removed after failed download

    After enconcting download error eg. "The file reference has expired and is not no longer valid or it belongs to self-destructing media and cannot be resent (caused by GetFileRequest)", the partially downloaded temporary file will not be deleted.

    opened by qk-li 2
  • timeless build

    timeless build

    add option for "timeless build" = on build, dont update ...

    site/index.xml <lastBuildDate>Wed, 08 Jun 2022 08:57:59 +0000</lastBuildDate>

    site/index.atom <updated>2022-06-08T08:57:59.732044+00:00</updated>

    these updates produce "diff noise" when the build is stored in git

    workaround: remote the tags from site/index.xml and site/index.atom

    sed -i -E 's|<lastBuildDate>[^<]+</lastBuildDate>||g' site/index.xml
    sed -i -E 's|<updated>[^<]+</updated>||g' site/index.atom
    
    enhancement 
    opened by milahu 0
  • Ability to embed video and generate thumbnails

    Ability to embed video and generate thumbnails

    Hello, this script is great, but when comparing it to inbuild telegram export feature (whih is slow and unefficient) i very miss embeding mp4 videos in page, i mean by it the thumbnail. Im have small knowrgle about html, but im almost sure editing template.html would not help due thumbs missing :/

    It would be cool to see added this option to script :)

    enhancement 
    opened by Bartixxx32 0
Owner
Kailash Nadh
CTO @zerodhatech | Volunteer @fossunited
Kailash Nadh
Filters to block and remove copycat-websites from DuckDuckGo and Google. Specific to dev websites like StackOverflow or GitHub.

uBlock-Origin-dev-filter Filters to block and remove copycat-websites from DuckDuckGo and Google. Specific to dev websites like StackOverflow or GitHu

null 1.7k Dec 30, 2022
A Bot to get RealTime Tweets to a Specific Chats from Desired Persons on Twitter to Telegram Chat.

TgTwitterStreamer A Bot to get RealTime Tweets to a Specific Chats from Desired Persons on Twitter to Telegram Chat. For Getting ENV's Refer this Link

Anonymous 69 Dec 20, 2022
Simple Telegram Bot to extract various types of archives from a telegram file or a direct link

Unzipper Bot A Telegram Bot to Extract Various Types Of Archives Features Extract various types of archives like rar, zip, tar, 7z, tar.xz etc. Passwo

I'm Not A Bot #Left_TG 93 Dec 27, 2022
This Is Advanced Version Of Old Radio Player, An Telegram Bot to Play Radio/Music in Channel or Group Voice Chats.

Telegram Radio Player V2 An Telegram Bot to Play Radio/Music in Channel or Group Voice Chats. This is also the source code of the bot which is being u

SAF ONE 81 Dec 3, 2022
Bot-moderator for Telegram group chats

Project title A little info about your project and/ or overview that explains what the project is about. ?? Hello everyone! This is the repository of

Maxim Zavalniuk 6 Nov 1, 2022
🎵 RythmReloaded 🎵 A bot that can play music on Telegram Group and Channel Voice Chats

?? RythmReloaded ?? A bot that can play music on Telegram Group and Channel Voice Chats POWERED BY MARSHALX TGCALLS Available on telegram as @OptimusP

null 0 Nov 3, 2021
📢 Video Chat Stream Telegram Bot. Can ⏳ Stream Live Videos, Radios, YouTube Videos & Telegram Video Files On Your Video Chat Of Channels & Groups !

Telegram Video Chat Bot (Beta) ?? Video Chat Stream Telegram Bot ?? Can Stream Live Videos, Radios, YouTube Videos & Telegram Video Files On Your Vide

brut✘⁶⁹ // ユスフ 15 Dec 24, 2022
veez music bot is a telegram music bot project, allow you to play music on voice chat group telegram.

?? Veez Music Bot Music bot for playing music on telegram voice chat group. Requirements ?? FFmpeg NodeJS nodesource.com Python 3.7+ PyTgCalls ?? Get

levina 143 Jun 19, 2022
veez music is a telegram music bot project, allow you to play music on voice chat group telegram.

?? VEEZ MUSIC BOT Veez Music is a telegram bot project that's allow you to play music on telegram voice chat group. Requirements ?? FFmpeg NodeJS node

levina 23 Aug 29, 2021
An Telegram Bot By @AsmSafone To Stream Videos in Telegram Voice Chat. This is Also The Source Code of The Bot Which is Being Used In @SafoTheBot Group! ❤️

Telegram Video Player Bot (Beta) An Telegram Bot By @AsmSafone To Stream Videos in Telegram Voice Chat. Special Features Supports Live Streaming From

SAF ONE 206 Jan 3, 2023
this is a telegram bot repository, that can stream video on telegram group video chat.

VIDEO STREAM BOT telegram bot project for streaming video on telegram video chat, powered by tgcalls and pyrogram ?? Commands: /vstream (reply to vide

levina 319 Aug 15, 2022
Video Stream is a telegram bot project that's allow you to play video on telegram group video chat

Video Stream is a telegram bot project that's allow you to play video on telegram group video chat ?? Get SESSION_NAME from below: Pyrogram ## ✨ Featu

null 1 Nov 10, 2021
Telegram Bot that's allow you to play Video & Music on Telegram Group Video Chat

WAR MUSIC / VIDEO PLAYER Bot Bot Link: ?? Get SESSION_NAME from below: Pyrogram ?? Preview ✨ Features Music & Video stream support MultiChat support P

Abhishek singh 11 Dec 25, 2022
Video Stream: an Advanced Telegram Bot that's allow you to play Video & Music on Telegram Group Video Chat

Video Stream is an Advanced Telegram Bot that's allow you to play Video & Music on Telegram Group Video Chat ?? Get SESSION_NAME from below: Pyrogram

Jonathan 6 Feb 8, 2022
Asad Alexa VC Bot Is A Telegram Bot Project That's Allow You To Play Audio And Video Music On Telegram Voice Chat Group.

Asad Alexa VC Bot Is A Telegram Bot Project That's Allow You To Play Audio And Video Music On Telegram Voice Chat Group.

Dr Asad Ali 6 Jun 20, 2022
Rocks vc Userbot: A Telegram Bot Project That's Allow You To Play Audio And Video Music On Telegram Voice Chat Group

⭐️ Rocks VC Userbot ⭐️ Telegram Userbot To Play Audio And Video Song On VC Chat

Dr Asad Ali 10 Jul 18, 2022
Video Bot: an Advanced Telegram Bot that's allow you to play Video & Music on Telegram Group Video Chat

Video Bot is an Advanced Telegram Bot that's allow you to play Video & Music on

null 5 Jan 26, 2022
Video Stream: an Advanced Telegram Bot that's allow you to play Video & Music on Telegram Group Video Chat

Video Stream is an Advanced Telegram Bot that's allow you to play Video & Music

SHU KURENAI TEAM 4 Nov 5, 2022
ELiza music is a telegram music bot project, allow you to play music on voice chat group telegram.

❤️ ?????????? ?????????? ❤️ Unmaintained. The new repo of @MrsElizaRobot is private. (It is no longer based on this source code. The completely rewrit

Team Eliza 2 Dec 8, 2022