Data, model training, and evaluation code for "PubTables-1M: Towards a universal dataset and metrics for training and evaluating table extraction models".

Microsoft

Last update: Jan 4, 2023

Related tags

Deep Learning table-transformer

Overview

PubTables-1M

This repository contains training and evaluation code for the paper "PubTables-1M: Towards a universal dataset and metrics for training and evaluating table extraction models".

The goal of PubTables-1M is to create a large, detailed, high-quality dataset for training and evaluating a wide variety of models for the tasks of table detection, table structure recognition, and functional analysis. It contains:

460,589 annotated document pages containing tables for table detection.
947,642 fully annotated tables including text content and complete location (bounding box) information for table structure recognition and functional analysis.
Full bounding boxes in both image and PDF coordinates for all table rows, columns, and cells (including blank cells), as well as other annotated structures such as column headers and projected row headers.
Rendered images of all tables and pages.
Bounding boxes and text for all words appearing in each table and page image.
Additional cell properties not used in the current model training.

Additionally, cells in the headers are canonicalized and we implement multiple quality control steps to ensure the annotations are as free of noise as possible. For more details, please see our paper.

News

10/21/2021: The full PubTables-1M dataset has been officially released on Microsoft Research Open Data.

Getting the Data

PubTables-1M is available for download from Microsoft Research Open Data.

It comes in 5 tar.gz files:

PubTables-1M-Image_Page_Detection_PASCAL_VOC.tar.gz
PubTables-1M-Image_Page_Words_JSON.tar.gz
PubTables-1M-Image_Table_Structure_PASCAL_VOC.tar.gz
PubTables-1M-Image_Table_Words_JSON.tar.gz
PubTables-1M-PDF_Annotations_JSON.tar.gz

To download from the command line:

Visit the dataset home page with a web browser and click Download in the top left corner. This will create a link to download the dataset from Azure with a unique access token for you that looks like https://msropendataset01.blob.core.windows.net/pubtables1m?[SAS_TOKEN_HERE].
You can then use the command line tool azcopy to download all of the files with the following command:

azcopy copy "https://msropendataset01.blob.core.windows.net/pubtables1m?[SAS_TOKEN_HERE]" "/path/to/your/download/folder/" --recursive

Then unzip each of the archives from the command line using:

tar -xzvf yourfile.tar.gz

Code Installation

Create a conda environment from the yml file and activate it as follows

conda env create -f environment.yml
conda activate tables-detr

Model Training

The code trains models for 2 different sets of table extraction tasks:

Table Detection
Table Structure Recognition + Functional Analysis

For a detailed description of these tasks and the models, please refer to the paper.

Sample training commands:

cd src
python main.py --data_root_dir /path/to/detection --data_type detection
python main.py --data_root_dir /path/to/structure --data_type structure

GriTS metric evaluation

GriTS metrics proposed in the paper can be evaluated once you have trained a model. We consider the model trained in the previous step. This script calculates all 4 variations presented in the paper. Based on the model, one can tune which variation to use. The table words dir path is not required for all variations but we use it in our case as PubTables1M contains this information.

python main.py --data_root_dir /path/to/structure --model_load_path /path/to/model --table_words_dir /path/to/table/words --mode grits

Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.

When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact [email protected] with any additional questions or comments.

Trademarks

This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow Microsoft's Trademark & Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party's policies.

Comments

Adding Table Transformer models to HuggingFace Transformers

Hi Table Transformer team :)

As I've implemented DETR in 🤗 HuggingFace Transformers a few months ago, it was relatively straightforward to port the 2 checkpoints you released. Here's a notebook that illustrates inference with DETR for table detection and table structure recognition: https://colab.research.google.com/drive/1lLRyBr7WraGdUJm-urUm_utArw6SkoCJ?usp=sharing

As you may or may not know, any model on the HuggingFace hub has its own Github repository. E.g. the DETR-table-detection checkpoint can be found here: https://huggingface.co/nielsr/detr-table-detection. If you check the "files and versions" tab, it includes the weights. The model hub uses git-LFS (large file storage) to use Git with large files such as model weights. This means that any model has its own Git commit history!

A model card can also be added to the repo, which is just a README.

Are you interested in joining the Microsoft organization on the hub, such that we can store all model checkpoints there (rather than under my user name)?

Also, it would be great to add PubTables-1M (and potentially other datasets, useful for improving AI on unstructured documents) to the 🤗 hub. Would you be up for that?

Let me know!

Kind regards,

Niels ML Engineer @ HuggingFace

opened by NielsRogge 28

Visualize model predictions

I ran the pre-trained model in eval mode and got this output:

python main.py --mode eval --data_type structure --config_file structure_config.json --data_root_dir data/ --model_load_path data/model/structure.pth --debug --device cpu
{'lr': 5e-05, 'lr_backbone': 1e-05, 'batch_size': 2, 'weight_decay': 0.0001, 'epochs': 20, 'lr_drop': 1, 'lr_gamma': 0.9, 'clip_max_norm': 0.1, 'backbone': 'resnet18', 'num_classes': 6, 'dilation': False, 'position_embedding': 'sine', 'emphasized_weights': {}, 'enc_layers': 6, 'dec_layers': 6, 'dim_feedforward': 2048, 'hidden_dim': 256, 'dropout': 0.1, 'nheads': 8, 'num_queries': 125, 'pre_norm': True, 'masks': False, 'aux_loss': False, 'mask_loss_coef': 1, 'dice_loss_coef': 1, 'ce_loss_coef': 1, 'bbox_loss_coef': 5, 'giou_loss_coef': 2, 'eos_coef': 0.4, 'set_cost_class': 1, 'set_cost_bbox': 5, 'set_cost_giou': 2, 'device': 'cpu', 'seed': 42, 'start_epoch': 0, 'num_workers': 2, 'data_root_dir': 'data/', 'config_file': 'structure_config.json', 'data_type': 'structure', 'model_load_path': 'data/model/structure.pth', 'metrics_save_filepath': '', 'table_words_dir': None, 'mode': 'eval', 'debug': True, 'checkpoint_freq': 1, '__module__': '__main__', '__dict__': <attribute '__dict__' of 'Args' objects>, '__weakref__': <attribute '__weakref__' of 'Args' objects>, '__doc__': None}
----------------------------------------------------------------------------------------------------
loading model
loading model from checkpoint
loading data
creating index...
index created!
Test:  [0/1]  eta: 0:00:00  class_error: 0.00  loss: 0.3392 (0.3392)  loss_ce: 0.0231 (0.0231)  loss_bbox: 0.0250 (0.0250)  loss_giou: 0.2912 (0.2912)  loss_ce_unscaled: 0.0231 (0.0231)  class_error_unscaled: 0.0000 (0.0000)  loss_bbox_unscaled: 0.0050 (0.0050)  loss_giou_unscaled: 0.1456 (0.1456)  cardinality_error_unscaled: 0.0000 (0.0000)  time: 0.3716  data: 0.0614  max mem: 0
Test: Total time: 0:00:00 (0.3762 s / it)
Averaged stats: class_error: 0.00  loss: 0.3392 (0.3392)  loss_ce: 0.0231 (0.0231)  loss_bbox: 0.0250 (0.0250)  loss_giou: 0.2912 (0.2912)  loss_ce_unscaled: 0.0231 (0.0231)  class_error_unscaled: 0.0000 (0.0000)  loss_bbox_unscaled: 0.0050 (0.0050)  loss_giou_unscaled: 0.1456 (0.1456)  cardinality_error_unscaled: 0.0000 (0.0000)
Accumulating evaluation results...
DONE (t=0.01s).
IoU metric: bbox
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.619
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.750
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.629
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = -1.000
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = -1.000
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.619
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.281
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.506
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.638
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = -1.000
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = -1.000
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.638
pubmed: AP50: 0.750, AP75: 0.629, AP: 0.619, AR: 0.638

How can I visualize the model predictions on input images? like : this

opened by mzhadigerov 9

Homemade evaluation script not working properly + Eval dataset not available

Hi all,

I am very interested in your table detection model and wanted to check it by myself. I encountered different diffculties trying to do so and wanted to get some help.

1 - Eval dataset not available

I used an Azure VM to load the dataset and explore it. In your README.md, it is explicitly stated that the detection dataset is in PubTables-1M-Image_Page_Detection_PASCAL_VOC.tar.gz, and there should be 4 folders inside: images, train, test and val. However, when I opened the archive, there were only 2 folders: images and train, and three textfiles: train_filelist.txt, test_filelist.txt and val_filelist.txt containing the path to the XML annotation files.

test_filelist.txt and val_filelist.txt are clearly referencing files that are in /test/ and /val/, even if those folders don't exist. I verified that the test and val annotations were not all in the train folder, and they are not.

I don't know where to find the test and val annotations, you've probably changed the dataset since the readme was written, and it would be nice to update it.

2 - Homemade inference script not working

Because I didn't have the eval dataset, I evaluated the detection model on some samples from the train dataset (I know, big warning because the model saw them during the training, but I just wanted to see good results, because I struggle to use the detection model).

Here is my code: First, I instanciate the model and load the weights (that I downloaded through the link in the README.md)

import os
import xml.etree.ElementTree as ET
from PIL import Image, ImageDraw

from torchvision import transforms
import torchvision.transforms.functional as F

import torch
from detr.models.position_encoding import PositionEmbeddingSine
from detr.models.detr import DETR
from detr.models.transformer import Transformer
from detr.models.backbone import Backbone, Joiner

position_embedding = PositionEmbeddingSine(128)
backbone = Backbone("resnet18", False, False, False)
backbone_model = Joiner(backbone, position_embedding)
backbone_model.num_channels = backbone.num_channels
backbone = backbone_model

transformer = Transformer(
    d_model=256,
    dropout=0.1,
    nhead=8,
    dim_feedforward=2048,
    num_encoder_layers=6,
    num_decoder_layers=6,
    normalize_before=True,
    return_intermediate_dec=True,
)

model = DETR(
    backbone,
    transformer,
    num_classes=2,
    num_queries=15,
    aux_loss=False,
)

weights = torch.load("~/Projects/table-parsing/models/pubtables1m_detection_detr_r18.pth", map_location=torch.device('cpu'))
model.load_state_dict(weights)

I consider this part successful because I am greeted by a <All keys matched successfully> message. If i would have instantiated the model incorrectly, I would have the usual Missing key(s) or Unexpected key(s) warnings from pytorch.

Secondly, I created a simple pipeline to reproduce the image preprocessing done in the repo:

convert_tensor = transforms.ToTensor()
mean = torch.tensor([0.485, 0.456, 0.406])
std = torch.tensor([0.229, 0.224, 0.225])
final_size = 800
max_size = 1333

def detr_pipeline(image):

    # Resizing image
    w, h = image.size
    min_original_size = float(min((w, h)))
    max_original_size = float(max((w, h)))
    if max_original_size / min_original_size * final_size > max_size:
        size = int(round(max_size * min_original_size / max_original_size))
    else:
        size = final_size

    if (w <= h and w == size) or (h <= w and h == size):
        new_h, new_w = h, w
    elif w < h:
        new_w = size
        new_h = int(size * h / w)
    else:
        new_h = size
        new_w = int(size * w / h)

    rescaled_image = F.resize(image, (new_h, new_w))
    image_tensor = convert_tensor(rescaled_image)

    # Normalizing image
    image_tensor = image_tensor - torch.broadcast_to(mean.unsqueeze(-1).unsqueeze(-1), image_tensor.shape)
    image_tensor = image_tensor / torch.broadcast_to(std.unsqueeze(-1).unsqueeze(-1), image_tensor.shape)

    # Inference
    output = model([image_tensor])
    return output

The hardcoded means and stds come from detr.datasets.coco.make_coco_transforms

Finally, I used this pipeline to evaluate 20 examples from the training set

dataset_path = "~/Data/PubTables1M-Detection-PASCAL-VOC"
annotation_folder = "train"

train_annotations = []
with open(os.path.join(dataset_path, "train_filelist.txt")) as file:
    for line in file:
        train_annotations.append(line[:-1])

found_examples = 0
current = 0

while found_examples < 20:
    ann = train_annotations[current]
    current += 1
    xml_path = os.path.join(dataset_path, ann)
    assert os.path.isfile(xml_path), 'Annotation not found'
    data = ET.parse(xml_path)
    root = data.getroot()
    image_path = os.path.join(dataset_path, "images", root[1].text)
    if not os.path.isfile(image_path):
        print(f"Skipping {root[1].text}, as file doesn't exist")
        continue
    else:
        print(image_path)
    found_examples += 1
    with Image.open(image_path) as im:
        outputs = detr_pipeline(im)
        bboxes, logits = outputs['pred_boxes'], outputs['pred_logits']
        probas_per_class = logits.softmax(-1)[:, :, :-1]
        objects_to_keep = probas_per_class.max(-1).values > 0.5
        pred_boxes = bboxes[objects_to_keep]

        draw = ImageDraw.Draw(im)
        for elem in root:
            if elem.tag == "object":
                x0, y0, xmax, ymax = [float(i.text) for i in elem.getchildren()[-1].getchildren()]
                draw.rectangle(
                    (x0, y0, xmax, ymax),
                    outline="blue",
                    width=3,
                )
        for box in pred_boxes:
                centre_x, centre_y, width, height = box
                x0 = int(im.size[0] * (centre_x - width / 2))
                y0 = int(im.size[1] * (centre_y - height / 2))
                x1 = int(im.size[0] * (centre_x + width / 2))
                y1 = int(im.size[1] * (centre_y + height / 2))
                draw.rectangle(
                    [x0, y0, x1, y1],
                    outline="red",
                    width=3
                )
        im.save(os.path.join("~/Desktop/output/table", root[1].text))

Note that here, I put a confidence threshold of 0.5, which is very low compared to some other DeTr model, where usually they consider a 0.9 confidence level. Hence I expect to have some false positive results. Also, I want to point out that there are many annotation files that reference an image that is not in the image folder (that's why I used a while loop and not a for loop. But when I look at the results, none of them are correct, here are a few samples (the annotations are in blue and the predictions are in red):

PMC6062540_3 PMC6620314_8 PMC6589332_11

It is very weird, considering the model saw these samples during the training. I tried removing the preprocessing, but it doesn't change the results very much, it still looks completely random. Could you please help me with this inference script? What am I doing wrong here?

opened by BenoitdeKersabiec 8

Problem of inference of Table Structure when tables very close to image corners

Hello, I have trained the Table Structure algorithm for 14 epochs and manage to obtain acceptable results on your images of test data. However, when I use the algorithm to perform inference on some table images of my own, I observe problems as the one below. This is a similar image as the one provided by your grits.py code, where all classes are plotted together:

I believe the problem is related with the distance of the table itself to the image borders. If I perform inference for the same table but keeping a larger distance table - image borders these are the results:

The table border and all rows and columns are much better predicted. The image used for the examples is PMC5730189_table_0 from your dataset.

The same happens for many other tables. Moreover, I looked at the xml files with the class labels and bounding boxes data, and a large percentage of tables used for training (more than 95%) have a distance from the table border to image border of almost 40 pixels, for all borders (top, bottom, left & right).

So I was wondering how could the algorithm be made more robust for these cases, on which I need to predict the table structure and the table border is really close to the image border (less than 5-10 pixels). Should I change something on the training? Or something else?

Thanks in advance,
question

opened by RobbyJS 4
Dependency versions are not available via pip now

I was trying to install dependencies via pip and getting errors. Please upgrade the code and dependencies.

pip3 install pytorch==1.5.0 ERROR: Could not find a version that satisfies the requirement pytorch==1.5.0 (from versions: 0.1.2, 1.0.2)

pip3 install torchvision==0.6.0 ERROR: Could not find a version that satisfies the requirement torchvision==0.6.0 (from versions: 0.1.6, 0.1.7, 0.1.8, 0.1.9, 0.2.0, 0.2.1, 0.2.2, 0.2.2.post2, 0.2.2.post3, 0.8.2, 0.9.0, 0.9.1, 0.10.0, 0.10.1, 0.11.0, 0.11.1, 0.11.2, 0.11.3, 0.12.0, 0.13.0) ERROR: No matching distribution found for torchvision==0.6.0

opened by neeleshkshukla 3
Bugfix/grits
What

Minor bugfixes when evaluating GriTS

Why

Was getting KeyError when evaluating GriTS only on a single table_type, i.e., either simple or complex

Return 0s if computing any of the GriTS scores fails so that the execution of the program is not halted

How

Used dict.get() method to return a default value of 0 is the Key is not found, i.e., if the table_type is not present

Used exception handling during computing GriTS
opened by suyogdahal 3
why use PubTables1M-Table-Words-JSON to refine columns and rows when evaluating TSR

Hi, I read the postprocessing code and find that PubTables1M-Table-Words-JSON information is used to refine the columns and rows when evaluating the TSR model performance, but I think these words information is not part of model outputs. Is it reasonable to use these information to evaluate the model?
question

opened by buptxiaofeng 3
More details about the post-processing operations

Hi @bsmock @rohithpv ,

This section in the paper explains the post-processing steps used at inference time.

Could you please provide some more details regarding the conflict-resolution technique used? Where can I find it in the code base?

Thanks!
question

opened by MrinalJain17 3

TypeError: 'numpy.float64' object cannot be interpreted as an integer during evaluation

Hi,

I encountered this numpy type error during the evaluation phase. Any idea how to fix this?

How to reproduce the error

(env)$ python main.py 
  --data_type detection 
  --config_file detection_config.json 
  --data_root_dir ~/../pubtables/PubTables1M-Detection-PASCAL-VOC/

Error Message

{'lr': 5e-05, 'lr_backbone': 1e-05, 'batch_size': 2, 'weight_decay': 0.0001, 'epochs': 20, 'lr_drop': 1, 'lr_gamma': 0.9, 'clip_max_norm': 0.1, 'backbone': 'resnet18', 'num_classes': 2, 'dilation': False, 'position_
embedding': 'sine', 'emphasized_weights': {}, 'enc_layers': 6, 'dec_layers': 6, 'dim_feedforward': 2048, 'hidden_dim': 256, 'dropout': 0.1, 'nheads': 8, 'num_queries': 15, 'pre_norm': True, 'masks': False, 'aux_loss
': False, 'mask_loss_coef': 1, 'dice_loss_coef': 1, 'ce_loss_coef': 1, 'bbox_loss_coef': 5, 'giou_loss_coef': 2, 'eos_coef': 0.4, 'set_cost_class': 1, 'set_cost_bbox': 5, 'set_cost_giou': 2, 'device': 'cuda', 'seed'
: 42, 'start_epoch': 0, 'num_workers': 1, 'data_root_dir': '/home/lxyuan/../pubtables/PubTables1M-Detection-PASCAL-VOC/', 'config_file': 'detection_config.json', 'data_type': 'detection', 'model_load_path': None, 'l
oad_weights_only': False, 'model_save_dir': None, 'metrics_save_filepath': '', 'debug_save_dir': 'debug', 'table_words_dir': None, 'mode': 'train', 'debug': False, 'checkpoint_freq': 1, 'train_max_size': None, 'val_
max_size': None, 'test_max_size': None, 'eval_pool_size': 1, 'eval_step': 1, '__module__': '__main__', '__dict__': <attribute '__dict__' of 'Args' objects>, '__weakref__': <attribute '__weakref__' of 'Args' objects>
, '__doc__': None}
----------------------------------------------------------------------------------------------------
loading model
loading data
loading data
creating index...
index created!
finished loading data in : 0:00:04.291752
Max batches per epoch: 230294
Output directory:  /home/lxyuan/../pubtables/PubTables1M-Detection-PASCAL-VOC/output/20220815202559
Output model path:  /home/lxyuan/../pubtables/PubTables1M-Detection-PASCAL-VOC/output/20220815202559/model.pth
Start training
----------------------------------------------------------------------------------------------------
Epoch: [0]  [     0/230294]  eta: 21:14:22  lr: 0.000050  class_error: 33.33  loss: 7.6202 (7.6202)  loss_ce: 1.3217 (1.3217)  loss_bbox: 4.0440 (4.0440)  loss_giou: 2.2545 (2.2545)  loss_ce_unscaled: 1.3217 (1.3217
)  class_error_unscaled: 33.3333 (33.3333)  loss_bbox_unscaled: 0.8088 (0.8088)  loss_giou_unscaled: 1.1273 (1.1273)  cardinality_error_unscaled: 12.5000 (12.5000)  time: 0.3320  data: 0.1073  max mem: 796
Epoch: [0]  [  1000/230294]  eta: 5:37:18  lr: 0.000050  class_error: 100.00  loss: 2.3491 (3.9134)  loss_ce: 0.4271 (0.4534)  loss_bbox: 1.0936 (2.2053)  loss_giou: 0.8212 (1.2548)  loss_ce_unscaled: 0.4271 (0.4534
)  class_error_unscaled: 100.0000 (96.9789)  loss_bbox_unscaled: 0.2187 (0.4411)  loss_giou_unscaled: 0.4106 (0.6274)  cardinality_error_unscaled: 1.0000 (1.0919)  time: 0.0870  data: 0.0046  max mem: 1393
Epoch: [0]  [  2000/230294]  eta: 5:37:05  lr: 0.000050  class_error: 100.00  loss: 3.2153 (3.2962)  loss_ce: 0.3987 (0.4452)  loss_bbox: 1.6650 (1.7887)  loss_giou: 1.0126 (1.0623)  loss_ce_unscaled: 0.3987 (0.4452
)  class_error_unscaled: 100.0000 (94.8372)  loss_bbox_unscaled: 0.3330 (0.3577)  loss_giou_unscaled: 0.5063 (0.5312)  cardinality_error_unscaled: 1.0000 (1.0160)  time: 0.0845  data: 0.0045  max mem: 1393
Epoch: [0]  [  3000/230294]  eta: 5:35:23  lr: 0.000050  class_error: 100.00  loss: 2.4226 (2.9530)  loss_ce: 0.3809 (0.4328)  loss_bbox: 1.1276 (1.5422)  loss_giou: 0.8819 (0.9780)  loss_ce_unscaled: 0.3809 (0.4328
)  class_error_unscaled: 100.0000 (92.2951)  loss_bbox_unscaled: 0.2255 (0.3084)  loss_giou_unscaled: 0.4409 (0.4890)  cardinality_error_unscaled: 1.0000 (0.9888)  time: 0.0883  data: 0.0045  max mem: 1393
Epoch: [0]  [  4000/230294]  eta: 5:35:24  lr: 0.000050  class_error: 0.00  loss: 1.8408 (2.7103)  loss_ce: 0.3210 (0.4222)  loss_bbox: 0.7209 (1.3707)  loss_giou: 0.6109 (0.9174)  loss_ce_unscaled: 0.3210 (0.4222)
 class_error_unscaled: 50.0000 (89.8711)  loss_bbox_unscaled: 0.1442 (0.2741)  loss_giou_unscaled: 0.3055 (0.4587)  cardinality_error_unscaled: 0.5000 (0.9609)  time: 0.0906  data: 0.0049  max mem: 1393
Epoch: [0]  [  5000/230294]  eta: 5:34:34  lr: 0.000050  class_error: 0.00  loss: 2.0806 (2.5365)  loss_ce: 0.3440 (0.4120)  loss_bbox: 0.7721 (1.2546)  loss_giou: 0.7145 (0.8699)  loss_ce_unscaled: 0.3440 (0.4120)
 class_error_unscaled: 75.0000 (86.8418)  loss_bbox_unscaled: 0.1544 (0.2509)  loss_giou_unscaled: 0.3572 (0.4349)  cardinality_error_unscaled: 0.5000 (0.9352)  time: 0.0928  data: 0.0048  max mem: 1393
Epoch: [0]  [  6000/230294]  eta: 5:33:42  lr: 0.000050  class_error: 50.00  loss: 1.5561 (2.4004)  loss_ce: 0.3442 (0.4008)  loss_bbox: 0.5955 (1.1669)  loss_giou: 0.5303 (0.8327)  loss_ce_unscaled: 0.3442 (0.4008)
  class_error_unscaled: 66.6667 (82.8963)  loss_bbox_unscaled: 0.1191 (0.2334)  loss_giou_unscaled: 0.2652 (0.4163)  cardinality_error_unscaled: 0.5000 (0.8982)  time: 0.0910  data: 0.0048  max mem: 1393
Epoch: [0]  [  7000/230294]  eta: 5:32:53  lr: 0.000050  class_error: 100.00  loss: 1.9024 (2.2844)  loss_ce: 0.2432 (0.3884)  loss_bbox: 0.6760 (1.0965)  loss_giou: 0.6833 (0.7995)  loss_ce_unscaled: 0.2432 (0.3884
)  class_error_unscaled: 50.0000 (79.1719)  loss_bbox_unscaled: 0.1352 (0.2193)  loss_giou_unscaled: 0.3416 (0.3998)  cardinality_error_unscaled: 0.5000 (0.8579)  time: 0.0856  data: 0.0047  max mem: 1393
Epoch: [0]  [  8000/230294]  eta: 5:31:30  lr: 0.000050  class_error: 50.00  loss: 1.3197 (2.1904)  loss_ce: 0.2045 (0.3753)  loss_bbox: 0.5773 (1.0416)  loss_giou: 0.6363 (0.7734)  loss_ce_unscaled: 0.2045 (0.3753)
  class_error_unscaled: 33.3333 (75.1935)  loss_bbox_unscaled: 0.1155 (0.2083)  loss_giou_unscaled: 0.3182 (0.3867)  cardinality_error_unscaled: 0.0000 (0.8116)  time: 0.0903  data: 0.0047  max mem: 1393
Epoch: [0]  [  9000/230294]  eta: 5:30:25  lr: 0.000050  class_error: 100.00  loss: 1.2540 (2.1009)  loss_ce: 0.2317 (0.3612)  loss_bbox: 0.4740 (0.9915)  loss_giou: 0.5079 (0.7482)  loss_ce_unscaled: 0.2317 (0.3612
)  class_error_unscaled: 50.0000 (71.3004)  loss_bbox_unscaled: 0.0948 (0.1983)  loss_giou_unscaled: 0.2539 (0.3741)  cardinality_error_unscaled: 0.5000 (0.7655)  time: 0.0909  data: 0.0048  max mem: 1393

<truncated>

Epoch: [0]  [230293/230294]  eta: 0:00:00  lr: 0.000050  class_error: 0.00  loss: 0.2878 (0.4740)  loss_ce: 0.0005 (0.0355)  loss_bbox: 0.1188 (0.2152)  loss_giou: 0.1408 (0.2233)  loss_ce_unscaled: 0.0005 (0.0355)
 class_error_unscaled: 0.0000 (4.7529)  loss_bbox_unscaled: 0.0238 (0.0430)  loss_giou_unscaled: 0.0704 (0.1116)  cardinality_error_unscaled: 0.0000 (0.0790)  time: 0.0888  data: 0.0057  max mem: 1393
Epoch: [0] Total time: 5:45:45 (0.0901 s / it)
Averaged stats: lr: 0.000050  class_error: 0.00  loss: 0.2878 (0.4740)  loss_ce: 0.0005 (0.0355)  loss_bbox: 0.1188 (0.2152)  loss_giou: 0.1408 (0.2233)  loss_ce_unscaled: 0.0005 (0.0355)  class_error_unscaled: 0.00
00 (4.7529)  loss_bbox_unscaled: 0.0238 (0.0430)  loss_giou_unscaled: 0.0704 (0.1116)  cardinality_error_unscaled: 0.0000 (0.0790)
Epoch completed in  5:45:45.451181
    main()
  File "/home/lxyuan/playground/table-transformer/src/main.py", line 368, in main
    train(args, model, criterion, postprocessors, device)
  File "/home/lxyuan/playground/table-transformer/src/main.py", line 317, in train
    pubmed_stats, coco_evaluator = evaluate(model, criterion,
  File "/home/lxyuan/playground/table-transformer/env/lib64/python3.9/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/home/lxyuan/playground/table-transformer/src/../detr/engine.py", line 81, in evaluate
    coco_evaluator = CocoEvaluator(base_ds, iou_types)
  File "/home/lxyuan/playground/table-transformer/src/../detr/datasets/coco_eval.py", line 31, in __init__
    self.coco_eval[iou_type] = COCOeval(coco_gt, iouType=iou_type)
  File "/home/lxyuan/playground/table-transformer/env/lib64/python3.9/site-packages/pycocotools/cocoeval.py", line 76, in __init__
    self.params = Params(iouType=iouType) # parameters
  File "/home/lxyuan/playground/table-transformer/env/lib64/python3.9/site-packages/pycocotools/cocoeval.py", line 527, in __init__
    self.setDetParams()
  File "/home/lxyuan/playground/table-transformer/env/lib64/python3.9/site-packages/pycocotools/cocoeval.py", line 507, in setDetParams
    self.iouThrs = np.linspace(.5, 0.95, np.round((0.95 - .5) / .05) + 1, endpoint=True)
  File "<__array_function__ internals>", line 180, in linspace
  File "/home/lxyuan/playground/table-transformer/env/lib64/python3.9/site-packages/numpy/core/function_base.py", line 120, in linspace
    num = operator.index(num)
TypeError: 'numpy.float64' object cannot be interpreted as an integer

It seems like i was able to complete one training epoch but got the numpy error message when we were trying to evaluate model performance on the validation set (i.e., src/main:L317)

Similar error when I tried to use main.py to evaluate model performance directly.

How to reproduce the error

(env)$ python main.py 
  --mode eval 
  --data_type detection 
  --config_file detection_config.json
  --data_root_dir ~/../pubtables/PubTables1M-Detection-PASCAL-VOC/ 
  --model_load_path ../pretrained_models/pubtables1m_detection_detr_r18.pth

Error Message

{'lr': 5e-05, 'lr_backbone': 1e-05, 'batch_size': 2, 'weight_decay': 0.0001, 'epochs': 20, 'lr_drop': 1, 'lr_gamma': 0.9, 'clip_max_norm': 0.1, 'backbone': 'resnet18', 'num_classes': 2, 'dilation': False, 'pos
ition_embedding': 'sine', 'emphasized_weights': {}, 'enc_layers': 6, 'dec_layers': 6, 'dim_feedforward': 2048, 'hidden_dim': 256, 'dropout': 0.1, 'nheads': 8, 'num_queries': 15, 'pre_norm': True, 'masks': Fals
e, 'aux_loss': False, 'mask_loss_coef': 1, 'dice_loss_coef': 1, 'ce_loss_coef': 1, 'bbox_loss_coef': 5, 'giou_loss_coef': 2, 'eos_coef': 0.4, 'set_cost_class': 1, 'set_cost_bbox': 5, 'set_cost_giou': 2, 'devic
e': 'cuda', 'seed': 42, 'start_epoch': 0, 'num_workers': 1, 'data_root_dir': '/home/lxyuan/mini-pubtables/PubTables1M-Dectection-PASCAL-VOC/', 'config_file': 'detection_config.json', 'data_type': 'detection',
'model_load_path': '../pretrained_models/pubtables1m_detection_detr_r18.pth', 'load_weights_only': False, 'model_save_dir': None, 'metrics_save_filepath': '', 'debug_save_dir': 'debug', 'table_words_dir': None
, 'mode': 'eval', 'debug': False, 'checkpoint_freq': 1, 'train_max_size': None, 'val_max_size': None, 'test_max_size': None, 'eval_pool_size': 1, 'eval_step': 1, '__module__': '__main__', '__dict__': <attribut
e '__dict__' of 'Args' objects>, '__weakref__': <attribute '__weakref__' of 'Args' objects>, '__doc__': None}
----------------------------------------------------------------------------------------------------
loading model
loading model from checkpoint
loading data
creating index...
index created!
Traceback (most recent call last):
  File "/home/lxyuan/playground/table-transformer/src/main.py", line 375, in <module>
    main()
  File "/home/lxyuan/playground/table-transformer/src/main.py", line 371, in main
    eval_coco(args, model, criterion, postprocessors, data_loader_test, dataset_test, device)
  File "/home/lxyuan/playground/table-transformer/src/eval.py", line 693, in eval_coco
    pubmed_stats, coco_evaluator = evaluate(args, model, criterion, postprocessors,
  File "/home/lxyuan/playground/table-transformer/env/lib64/python3.9/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/home/lxyuan/playground/table-transformer/src/eval.py", line 586, in evaluate
    coco_evaluator = CocoEvaluator(base_ds, iou_types)
  File "/home/lxyuan/playground/table-transformer/src/../detr/datasets/coco_eval.py", line 31, in __init__
    self.coco_eval[iou_type] = COCOeval(coco_gt, iouType=iou_type)
  File "/home/lxyuan/playground/table-transformer/env/lib64/python3.9/site-packages/pycocotools/cocoeval.py", line 76, in __init__
    self.params = Params(iouType=iouType) # parameters
  File "/home/lxyuan/playground/table-transformer/env/lib64/python3.9/site-packages/pycocotools/cocoeval.py", line 527, in __init__
    self.setDetParams()
  File "/home/lxyuan/playground/table-transformer/env/lib64/python3.9/site-packages/pycocotools/cocoeval.py", line 507, in setDetParams
    self.iouThrs = np.linspace(.5, 0.95, np.round((0.95 - .5) / .05) + 1, endpoint=True)
  File "<__array_function__ internals>", line 180, in linspace
  File "/home/lxyuan/playground/table-transformer/env/lib64/python3.9/site-packages/numpy/core/function_base.py", line 120, in linspace
    num = operator.index(num)
TypeError: 'numpy.float64' object cannot be interpreted as an integer

NOTE: I am using numpy==1.23.2 and python3.9

opened by LxYuan-Handshakes 2

The dataset home page is not working

The website kept saying that "Loading Dataset Details..." and it didn't return with anything after waiting for a long time. Another weird problem is that I cann't see any dataset or use the search function in msropendata.com. I tried to file a issue in msropendata.com, but it can not be submitted. Is the backend server of msropendata down or something else? Thanks for your help!

opened by narthchin 2
Error when running in debug mode: Runtime Error: Expected all tensors to be on the same device, but found at least 2 devices, cuda:0 and cpu!

Dear authors, I've just implemented below code in debug mode in order to visualize reconstruction result on PDF file:

!python main.py --data_root_dir path/to/structure --model_load_path path/to/model --table_words_dir path/to/words --mode grits --metrics_save_filepath path/to/metrics_save_file --debug

And I experienced this bug. It says "Runtime Error: Expected all tensors to be on the same device, but found at least 2 devices, cuda:0 and cpu!

I implemented this on a GPU runtime of colab and this error occured. When I tried to run on CPU mode only, it said there are no GPU device. I couldn't figure out what the reasons caused this error are. Could you help me identify where the problem is? Thanks for considering my pledge.

opened by suonbo 2
Why did the row/column dilation get removed?
The paper talks about doing row/column bounding box dilation to align the rows and columns and remove gaps. I see in the postprocessing.py code that this code has been commented out and removed.

# Dilate rows and columns before final extraction #dilated_columns = fill_column_gaps(columns, table_bbox) dilated_columns = columns #dilated_rows = fill_row_gaps(rows, table_bbox) dilated_rows = rows

Is there a reason for this? Or is the bounding box dilation happening elsewhere in the code that I've missed?
opened by wandering-walrus 0
Duplicated classes in table_datasets.py

@bsmock Hi, thank you for sharing your work.

I have some question about your code.

Why there are duplicated class in your code? for example Class RandomCrop in table_datasets.py L137, L267 and Class RandomResize in table_dataset.py L185, L395

Is this just typo? I'm confused. please check this issue.

opened by yellowjs0304 0
Annotation Tool

Hi we are trying to use this model for custom training. We have a set of images we would like to fine tune on. We were able to generate the XML files using LabelImg. But the words.json file is a little tricky. Can you please share the annotation tool used or suggest an alternative.

opened by abhayhk2001 0
Colab Notebook TSR: functional analysis and obtain final dataframe

Hi!

I was working with the TD and TSR notebooks https://github.com/NielsRogge/Transformers-Tutorials/tree/master/Table%20Transformer, and they work properly for me, but the last step of the TSR pipeline to obtain a data frame is not implemented in these notebooks (I think, this process is called functional analysis in this repo). The postprocessing steps of the TSR pass from the structure to grid cells.

Was anyone capable to obtain well the final data frame for TSR in colab? Taking into account spanning cells and titles.

Regards

opened by emigomez 2
Single Inference of TSR

Hi!

I want to run the whole pipeline of TSR for a single input image obtaining as output the final data frame result. I think that the deployment to obtain this final data frame is implemented here as 'functional analysis', but the problem is that I don't know how to make a single image inference with this repo.

Do you know how to run a single image inference of the entire TSR+funtional analysis pipeline? (I read similar issues but don't find the solution)

Regards

opened by emigomez 0

Data, model training, and evaluation code for "PubTables-1M: Towards a universal dataset and metrics for training and evaluating table extraction models".

Related tags

Overview

PubTables-1M

News

Getting the Data

Code Installation

Model Training

GriTS metric evaluation

Contributing

Trademarks

Comments

1 - Eval dataset not available

2 - Homemade inference script not working

What

Why

How

Owner

Microsoft

Provided is code that demonstrates the training and evaluation of the work presented in the paper: "On the Detection of Digital Face Manipulation" published in CVPR 2020.

[CVPR 2021] MiVOS - Mask Propagation module. Reproduced STM (and better) with training code :star2:. Semi-supervised video object segmentation evaluation.

Training code and evaluation benchmarks for the "Self-Supervised Policy Adaptation during Deployment" paper.

This is the official source code for SLATE. We provide the code for the model, the training code, and a dataset loader for the 3D Shapes dataset. This code is implemented in Pytorch.

This is the code for our KILT leaderboard submission to the T-REx and zsRE tasks. It includes code for training a DPR model then continuing training with RAG.

The Hailo Model Zoo includes pre-trained models and a full building and evaluation environment

Active and Sample-Efficient Model Evaluation

Label-Free Model Evaluation with Semi-Structured Dataset Representations

A large-scale video dataset for the training and evaluation of 3D human pose estimation models

A large-scale video dataset for the training and evaluation of 3D human pose estimation models

YoloV5 implemented by TensorFlow2 , with support for training, evaluation and inference.

​TextWorld is a sandbox learning environment for the training and evaluation of reinforcement learning (RL) agents on text-based games.

A complete end-to-end demonstration in which we collect training data in Unity and use that data to train a deep neural network to predict the pose of a cube. This model is then deployed in a simulated robotic pick-and-place task.

Source code for the GPT-2 story generation models in the EMNLP 2020 paper "STORIUM: A Dataset and Evaluation Platform for Human-in-the-Loop Story Generation"

PyTorch evaluation code for Delving Deep into the Generalization of Vision Transformers under Distribution Shifts.

In this project we investigate the performance of the SetCon model on realistic video footage. Therefore, we implemented the model in PyTorch and tested the model on two example videos.

Step by Step on how to create an vision recognition model using LOBE.ai, export the model and run the model in an Azure Function

Automatically download the cwru data set, and then divide it into training data set and test data set

BERT model training impelmentation using 1024 A100 GPUs for MLPerf Training v1.1

TextWorld is a sandbox learning environment for the training and evaluation of reinforcement learning (RL) agents on text-based games.