Data, model training, and evaluation code for "PubTables-1M: Towards a universal dataset and metrics for training and evaluating table extraction models".

Overview

PubTables-1M

This repository contains training and evaluation code for the paper "PubTables-1M: Towards a universal dataset and metrics for training and evaluating table extraction models".

The goal of PubTables-1M is to create a large, detailed, high-quality dataset for training and evaluating a wide variety of models for the tasks of table detection, table structure recognition, and functional analysis. It contains:

  • 460,589 annotated document pages containing tables for table detection.
  • 947,642 fully annotated tables including text content and complete location (bounding box) information for table structure recognition and functional analysis.
  • Full bounding boxes in both image and PDF coordinates for all table rows, columns, and cells (including blank cells), as well as other annotated structures such as column headers and projected row headers.
  • Rendered images of all tables and pages.
  • Bounding boxes and text for all words appearing in each table and page image.
  • Additional cell properties not used in the current model training.

Additionally, cells in the headers are canonicalized and we implement multiple quality control steps to ensure the annotations are as free of noise as possible. For more details, please see our paper.

News

10/21/2021: The full PubTables-1M dataset has been officially released on Microsoft Research Open Data.

Getting the Data

PubTables-1M is available for download from Microsoft Research Open Data.

It comes in 5 tar.gz files:

  • PubTables-1M-Image_Page_Detection_PASCAL_VOC.tar.gz
  • PubTables-1M-Image_Page_Words_JSON.tar.gz
  • PubTables-1M-Image_Table_Structure_PASCAL_VOC.tar.gz
  • PubTables-1M-Image_Table_Words_JSON.tar.gz
  • PubTables-1M-PDF_Annotations_JSON.tar.gz

To download from the command line:

  1. Visit the dataset home page with a web browser and click Download in the top left corner. This will create a link to download the dataset from Azure with a unique access token for you that looks like https://msropendataset01.blob.core.windows.net/pubtables1m?[SAS_TOKEN_HERE].
  2. You can then use the command line tool azcopy to download all of the files with the following command:
azcopy copy "https://msropendataset01.blob.core.windows.net/pubtables1m?[SAS_TOKEN_HERE]" "/path/to/your/download/folder/" --recursive

Then unzip each of the archives from the command line using:

tar -xzvf yourfile.tar.gz

Code Installation

Create a conda environment from the yml file and activate it as follows

conda env create -f environment.yml
conda activate tables-detr

Model Training

The code trains models for 2 different sets of table extraction tasks:

  1. Table Detection
  2. Table Structure Recognition + Functional Analysis

For a detailed description of these tasks and the models, please refer to the paper.

Sample training commands:

cd src
python main.py --data_root_dir /path/to/detection --data_type detection
python main.py --data_root_dir /path/to/structure --data_type structure

GriTS metric evaluation

GriTS metrics proposed in the paper can be evaluated once you have trained a model. We consider the model trained in the previous step. This script calculates all 4 variations presented in the paper. Based on the model, one can tune which variation to use. The table words dir path is not required for all variations but we use it in our case as PubTables1M contains this information.

python main.py --data_root_dir /path/to/structure --model_load_path /path/to/model --table_words_dir /path/to/table/words --mode grits

Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.

When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact [email protected] with any additional questions or comments.

Trademarks

This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow Microsoft's Trademark & Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party's policies.

Comments
  • Adding Table Transformer models to HuggingFace Transformers

    Adding Table Transformer models to HuggingFace Transformers

    Hi Table Transformer team :)

    As I've implemented DETR in 🤗 HuggingFace Transformers a few months ago, it was relatively straightforward to port the 2 checkpoints you released. Here's a notebook that illustrates inference with DETR for table detection and table structure recognition: https://colab.research.google.com/drive/1lLRyBr7WraGdUJm-urUm_utArw6SkoCJ?usp=sharing

    As you may or may not know, any model on the HuggingFace hub has its own Github repository. E.g. the DETR-table-detection checkpoint can be found here: https://huggingface.co/nielsr/detr-table-detection. If you check the "files and versions" tab, it includes the weights. The model hub uses git-LFS (large file storage) to use Git with large files such as model weights. This means that any model has its own Git commit history!

    A model card can also be added to the repo, which is just a README.

    Are you interested in joining the Microsoft organization on the hub, such that we can store all model checkpoints there (rather than under my user name)?

    Also, it would be great to add PubTables-1M (and potentially other datasets, useful for improving AI on unstructured documents) to the 🤗 hub. Would you be up for that?

    Let me know!

    Kind regards,

    Niels ML Engineer @ HuggingFace

    opened by NielsRogge 28
  • Visualize model predictions

    Visualize model predictions

    I ran the pre-trained model in eval mode and got this output:

    python main.py --mode eval --data_type structure --config_file structure_config.json --data_root_dir data/ --model_load_path data/model/structure.pth --debug --device cpu
    {'lr': 5e-05, 'lr_backbone': 1e-05, 'batch_size': 2, 'weight_decay': 0.0001, 'epochs': 20, 'lr_drop': 1, 'lr_gamma': 0.9, 'clip_max_norm': 0.1, 'backbone': 'resnet18', 'num_classes': 6, 'dilation': False, 'position_embedding': 'sine', 'emphasized_weights': {}, 'enc_layers': 6, 'dec_layers': 6, 'dim_feedforward': 2048, 'hidden_dim': 256, 'dropout': 0.1, 'nheads': 8, 'num_queries': 125, 'pre_norm': True, 'masks': False, 'aux_loss': False, 'mask_loss_coef': 1, 'dice_loss_coef': 1, 'ce_loss_coef': 1, 'bbox_loss_coef': 5, 'giou_loss_coef': 2, 'eos_coef': 0.4, 'set_cost_class': 1, 'set_cost_bbox': 5, 'set_cost_giou': 2, 'device': 'cpu', 'seed': 42, 'start_epoch': 0, 'num_workers': 2, 'data_root_dir': 'data/', 'config_file': 'structure_config.json', 'data_type': 'structure', 'model_load_path': 'data/model/structure.pth', 'metrics_save_filepath': '', 'table_words_dir': None, 'mode': 'eval', 'debug': True, 'checkpoint_freq': 1, '__module__': '__main__', '__dict__': <attribute '__dict__' of 'Args' objects>, '__weakref__': <attribute '__weakref__' of 'Args' objects>, '__doc__': None}
    ----------------------------------------------------------------------------------------------------
    loading model
    loading model from checkpoint
    loading data
    creating index...
    index created!
    Test:  [0/1]  eta: 0:00:00  class_error: 0.00  loss: 0.3392 (0.3392)  loss_ce: 0.0231 (0.0231)  loss_bbox: 0.0250 (0.0250)  loss_giou: 0.2912 (0.2912)  loss_ce_unscaled: 0.0231 (0.0231)  class_error_unscaled: 0.0000 (0.0000)  loss_bbox_unscaled: 0.0050 (0.0050)  loss_giou_unscaled: 0.1456 (0.1456)  cardinality_error_unscaled: 0.0000 (0.0000)  time: 0.3716  data: 0.0614  max mem: 0
    Test: Total time: 0:00:00 (0.3762 s / it)
    Averaged stats: class_error: 0.00  loss: 0.3392 (0.3392)  loss_ce: 0.0231 (0.0231)  loss_bbox: 0.0250 (0.0250)  loss_giou: 0.2912 (0.2912)  loss_ce_unscaled: 0.0231 (0.0231)  class_error_unscaled: 0.0000 (0.0000)  loss_bbox_unscaled: 0.0050 (0.0050)  loss_giou_unscaled: 0.1456 (0.1456)  cardinality_error_unscaled: 0.0000 (0.0000)
    Accumulating evaluation results...
    DONE (t=0.01s).
    IoU metric: bbox
     Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.619
     Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.750
     Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.629
     Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = -1.000
     Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = -1.000
     Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.619
     Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.281
     Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.506
     Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.638
     Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = -1.000
     Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = -1.000
     Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.638
    pubmed: AP50: 0.750, AP75: 0.629, AP: 0.619, AR: 0.638
    

    How can I visualize the model predictions on input images? like : this

    opened by mzhadigerov 9
  • Homemade evaluation script not working properly + Eval dataset not available

    Homemade evaluation script not working properly + Eval dataset not available

    Hi all,

    I am very interested in your table detection model and wanted to check it by myself. I encountered different diffculties trying to do so and wanted to get some help.

    1 - Eval dataset not available

    I used an Azure VM to load the dataset and explore it. In your README.md, it is explicitly stated that the detection dataset is in PubTables-1M-Image_Page_Detection_PASCAL_VOC.tar.gz, and there should be 4 folders inside: images, train, test and val. However, when I opened the archive, there were only 2 folders: images and train, and three textfiles: train_filelist.txt, test_filelist.txt and val_filelist.txt containing the path to the XML annotation files.

    test_filelist.txt and val_filelist.txt are clearly referencing files that are in /test/ and /val/, even if those folders don't exist. I verified that the test and val annotations were not all in the train folder, and they are not.

    I don't know where to find the test and val annotations, you've probably changed the dataset since the readme was written, and it would be nice to update it.

    2 - Homemade inference script not working

    Because I didn't have the eval dataset, I evaluated the detection model on some samples from the train dataset (I know, big warning because the model saw them during the training, but I just wanted to see good results, because I struggle to use the detection model).

    Here is my code: First, I instanciate the model and load the weights (that I downloaded through the link in the README.md)

    import os
    import xml.etree.ElementTree as ET
    from PIL import Image, ImageDraw
    
    from torchvision import transforms
    import torchvision.transforms.functional as F
    
    import torch
    from detr.models.position_encoding import PositionEmbeddingSine
    from detr.models.detr import DETR
    from detr.models.transformer import Transformer
    from detr.models.backbone import Backbone, Joiner
    
    position_embedding = PositionEmbeddingSine(128)
    backbone = Backbone("resnet18", False, False, False)
    backbone_model = Joiner(backbone, position_embedding)
    backbone_model.num_channels = backbone.num_channels
    backbone = backbone_model
    
    transformer = Transformer(
        d_model=256,
        dropout=0.1,
        nhead=8,
        dim_feedforward=2048,
        num_encoder_layers=6,
        num_decoder_layers=6,
        normalize_before=True,
        return_intermediate_dec=True,
    )
    
    model = DETR(
        backbone,
        transformer,
        num_classes=2,
        num_queries=15,
        aux_loss=False,
    )
    
    weights = torch.load("~/Projects/table-parsing/models/pubtables1m_detection_detr_r18.pth", map_location=torch.device('cpu'))
    model.load_state_dict(weights)
    

    I consider this part successful because I am greeted by a <All keys matched successfully> message. If i would have instantiated the model incorrectly, I would have the usual Missing key(s) or Unexpected key(s) warnings from pytorch.

    Secondly, I created a simple pipeline to reproduce the image preprocessing done in the repo:

    convert_tensor = transforms.ToTensor()
    mean = torch.tensor([0.485, 0.456, 0.406])
    std = torch.tensor([0.229, 0.224, 0.225])
    final_size = 800
    max_size = 1333
    
    def detr_pipeline(image):
    
        # Resizing image
        w, h = image.size
        min_original_size = float(min((w, h)))
        max_original_size = float(max((w, h)))
        if max_original_size / min_original_size * final_size > max_size:
            size = int(round(max_size * min_original_size / max_original_size))
        else:
            size = final_size
    
        if (w <= h and w == size) or (h <= w and h == size):
            new_h, new_w = h, w
        elif w < h:
            new_w = size
            new_h = int(size * h / w)
        else:
            new_h = size
            new_w = int(size * w / h)
    
        rescaled_image = F.resize(image, (new_h, new_w))
        image_tensor = convert_tensor(rescaled_image)
    
        # Normalizing image
        image_tensor = image_tensor - torch.broadcast_to(mean.unsqueeze(-1).unsqueeze(-1), image_tensor.shape)
        image_tensor = image_tensor / torch.broadcast_to(std.unsqueeze(-1).unsqueeze(-1), image_tensor.shape)
    
        # Inference
        output = model([image_tensor])
        return output
    

    The hardcoded means and stds come from detr.datasets.coco.make_coco_transforms

    Finally, I used this pipeline to evaluate 20 examples from the training set

    dataset_path = "~/Data/PubTables1M-Detection-PASCAL-VOC"
    annotation_folder = "train"
    
    train_annotations = []
    with open(os.path.join(dataset_path, "train_filelist.txt")) as file:
        for line in file:
            train_annotations.append(line[:-1])
    
    found_examples = 0
    current = 0
    
    while found_examples < 20:
        ann = train_annotations[current]
        current += 1
        xml_path = os.path.join(dataset_path, ann)
        assert os.path.isfile(xml_path), 'Annotation not found'
        data = ET.parse(xml_path)
        root = data.getroot()
        image_path = os.path.join(dataset_path, "images", root[1].text)
        if not os.path.isfile(image_path):
            print(f"Skipping {root[1].text}, as file doesn't exist")
            continue
        else:
            print(image_path)
        found_examples += 1
        with Image.open(image_path) as im:
            outputs = detr_pipeline(im)
            bboxes, logits = outputs['pred_boxes'], outputs['pred_logits']
            probas_per_class = logits.softmax(-1)[:, :, :-1]
            objects_to_keep = probas_per_class.max(-1).values > 0.5
            pred_boxes = bboxes[objects_to_keep]
    
            draw = ImageDraw.Draw(im)
            for elem in root:
                if elem.tag == "object":
                    x0, y0, xmax, ymax = [float(i.text) for i in elem.getchildren()[-1].getchildren()]
                    draw.rectangle(
                        (x0, y0, xmax, ymax),
                        outline="blue",
                        width=3,
                    )
            for box in pred_boxes:
                    centre_x, centre_y, width, height = box
                    x0 = int(im.size[0] * (centre_x - width / 2))
                    y0 = int(im.size[1] * (centre_y - height / 2))
                    x1 = int(im.size[0] * (centre_x + width / 2))
                    y1 = int(im.size[1] * (centre_y + height / 2))
                    draw.rectangle(
                        [x0, y0, x1, y1],
                        outline="red",
                        width=3
                    )
            im.save(os.path.join("~/Desktop/output/table", root[1].text))
    

    Note that here, I put a confidence threshold of 0.5, which is very low compared to some other DeTr model, where usually they consider a 0.9 confidence level. Hence I expect to have some false positive results. Also, I want to point out that there are many annotation files that reference an image that is not in the image folder (that's why I used a while loop and not a for loop. But when I look at the results, none of them are correct, here are a few samples (the annotations are in blue and the predictions are in red):

    PMC6062540_3 PMC6620314_8 PMC6589332_11

    It is very weird, considering the model saw these samples during the training. I tried removing the preprocessing, but it doesn't change the results very much, it still looks completely random. Could you please help me with this inference script? What am I doing wrong here?

    opened by BenoitdeKersabiec 8
  • Problem of inference of Table Structure when tables very close to image corners

    Problem of inference of Table Structure when tables very close to image corners

    Hello, I have trained the Table Structure algorithm for 14 epochs and manage to obtain acceptable results on your images of test data. However, when I use the algorithm to perform inference on some table images of my own, I observe problems as the one below. This is a similar image as the one provided by your grits.py code, where all classes are plotted together: PMC5730189_table_0_no_white_w_box_cropped

    I believe the problem is related with the distance of the table itself to the image borders. If I perform inference for the same table but keeping a larger distance table - image borders these are the results:

    PMC5730189_table_0_w_box_cropped

    The table border and all rows and columns are much better predicted. The image used for the examples is PMC5730189_table_0 from your dataset.

    The same happens for many other tables. Moreover, I looked at the xml files with the class labels and bounding boxes data, and a large percentage of tables used for training (more than 95%) have a distance from the table border to image border of almost 40 pixels, for all borders (top, bottom, left & right).

    So I was wondering how could the algorithm be made more robust for these cases, on which I need to predict the table structure and the table border is really close to the image border (less than 5-10 pixels). Should I change something on the training? Or something else?

    Thanks in advance,

    question 
    opened by RobbyJS 4
  • Dependency versions are not available via pip now

    Dependency versions are not available via pip now

    I was trying to install dependencies via pip and getting errors. Please upgrade the code and dependencies.

    pip3 install pytorch==1.5.0 ERROR: Could not find a version that satisfies the requirement pytorch==1.5.0 (from versions: 0.1.2, 1.0.2)

    pip3 install torchvision==0.6.0 ERROR: Could not find a version that satisfies the requirement torchvision==0.6.0 (from versions: 0.1.6, 0.1.7, 0.1.8, 0.1.9, 0.2.0, 0.2.1, 0.2.2, 0.2.2.post2, 0.2.2.post3, 0.8.2, 0.9.0, 0.9.1, 0.10.0, 0.10.1, 0.11.0, 0.11.1, 0.11.2, 0.11.3, 0.12.0, 0.13.0) ERROR: No matching distribution found for torchvision==0.6.0

    opened by neeleshkshukla 3
  • Bugfix/grits

    Bugfix/grits

    What

    Minor bugfixes when evaluating GriTS

    Why

    • Was getting KeyError when evaluating GriTS only on a single table_type, i.e., either simple or complex
    • Return 0s if computing any of the GriTS scores fails so that the execution of the program is not halted

    How

    • Used dict.get() method to return a default value of 0 is the Key is not found, i.e., if the table_type is not present
    • Used exception handling during computing GriTS
    opened by suyogdahal 3
  • why use PubTables1M-Table-Words-JSON to refine columns and rows when evaluating TSR

    why use PubTables1M-Table-Words-JSON to refine columns and rows when evaluating TSR

    Hi, I read the postprocessing code and find that PubTables1M-Table-Words-JSON information is used to refine the columns and rows when evaluating the TSR model performance, but I think these words information is not part of model outputs. Is it reasonable to use these information to evaluate the model?

    question 
    opened by buptxiaofeng 3
  • More details about the post-processing operations

    More details about the post-processing operations

    Hi @bsmock @rohithpv ,

    This section in the paper explains the post-processing steps used at inference time.

    image

    Could you please provide some more details regarding the conflict-resolution technique used? Where can I find it in the code base?

    Thanks!

    question 
    opened by MrinalJain17 3
  • TypeError: 'numpy.float64' object cannot be interpreted as an integer during evaluation

    TypeError: 'numpy.float64' object cannot be interpreted as an integer during evaluation

    Hi,

    I encountered this numpy type error during the evaluation phase. Any idea how to fix this?


    How to reproduce the error

    (env)$ python main.py 
      --data_type detection 
      --config_file detection_config.json 
      --data_root_dir ~/../pubtables/PubTables1M-Detection-PASCAL-VOC/ 
    

    Error Message

    {'lr': 5e-05, 'lr_backbone': 1e-05, 'batch_size': 2, 'weight_decay': 0.0001, 'epochs': 20, 'lr_drop': 1, 'lr_gamma': 0.9, 'clip_max_norm': 0.1, 'backbone': 'resnet18', 'num_classes': 2, 'dilation': False, 'position_
    embedding': 'sine', 'emphasized_weights': {}, 'enc_layers': 6, 'dec_layers': 6, 'dim_feedforward': 2048, 'hidden_dim': 256, 'dropout': 0.1, 'nheads': 8, 'num_queries': 15, 'pre_norm': True, 'masks': False, 'aux_loss
    ': False, 'mask_loss_coef': 1, 'dice_loss_coef': 1, 'ce_loss_coef': 1, 'bbox_loss_coef': 5, 'giou_loss_coef': 2, 'eos_coef': 0.4, 'set_cost_class': 1, 'set_cost_bbox': 5, 'set_cost_giou': 2, 'device': 'cuda', 'seed'
    : 42, 'start_epoch': 0, 'num_workers': 1, 'data_root_dir': '/home/lxyuan/../pubtables/PubTables1M-Detection-PASCAL-VOC/', 'config_file': 'detection_config.json', 'data_type': 'detection', 'model_load_path': None, 'l
    oad_weights_only': False, 'model_save_dir': None, 'metrics_save_filepath': '', 'debug_save_dir': 'debug', 'table_words_dir': None, 'mode': 'train', 'debug': False, 'checkpoint_freq': 1, 'train_max_size': None, 'val_
    max_size': None, 'test_max_size': None, 'eval_pool_size': 1, 'eval_step': 1, '__module__': '__main__', '__dict__': <attribute '__dict__' of 'Args' objects>, '__weakref__': <attribute '__weakref__' of 'Args' objects>
    , '__doc__': None}
    ----------------------------------------------------------------------------------------------------
    loading model
    loading data
    loading data
    creating index...
    index created!
    finished loading data in : 0:00:04.291752
    Max batches per epoch: 230294
    Output directory:  /home/lxyuan/../pubtables/PubTables1M-Detection-PASCAL-VOC/output/20220815202559
    Output model path:  /home/lxyuan/../pubtables/PubTables1M-Detection-PASCAL-VOC/output/20220815202559/model.pth
    Start training
    ----------------------------------------------------------------------------------------------------
    Epoch: [0]  [     0/230294]  eta: 21:14:22  lr: 0.000050  class_error: 33.33  loss: 7.6202 (7.6202)  loss_ce: 1.3217 (1.3217)  loss_bbox: 4.0440 (4.0440)  loss_giou: 2.2545 (2.2545)  loss_ce_unscaled: 1.3217 (1.3217
    )  class_error_unscaled: 33.3333 (33.3333)  loss_bbox_unscaled: 0.8088 (0.8088)  loss_giou_unscaled: 1.1273 (1.1273)  cardinality_error_unscaled: 12.5000 (12.5000)  time: 0.3320  data: 0.1073  max mem: 796
    Epoch: [0]  [  1000/230294]  eta: 5:37:18  lr: 0.000050  class_error: 100.00  loss: 2.3491 (3.9134)  loss_ce: 0.4271 (0.4534)  loss_bbox: 1.0936 (2.2053)  loss_giou: 0.8212 (1.2548)  loss_ce_unscaled: 0.4271 (0.4534
    )  class_error_unscaled: 100.0000 (96.9789)  loss_bbox_unscaled: 0.2187 (0.4411)  loss_giou_unscaled: 0.4106 (0.6274)  cardinality_error_unscaled: 1.0000 (1.0919)  time: 0.0870  data: 0.0046  max mem: 1393
    Epoch: [0]  [  2000/230294]  eta: 5:37:05  lr: 0.000050  class_error: 100.00  loss: 3.2153 (3.2962)  loss_ce: 0.3987 (0.4452)  loss_bbox: 1.6650 (1.7887)  loss_giou: 1.0126 (1.0623)  loss_ce_unscaled: 0.3987 (0.4452
    )  class_error_unscaled: 100.0000 (94.8372)  loss_bbox_unscaled: 0.3330 (0.3577)  loss_giou_unscaled: 0.5063 (0.5312)  cardinality_error_unscaled: 1.0000 (1.0160)  time: 0.0845  data: 0.0045  max mem: 1393
    Epoch: [0]  [  3000/230294]  eta: 5:35:23  lr: 0.000050  class_error: 100.00  loss: 2.4226 (2.9530)  loss_ce: 0.3809 (0.4328)  loss_bbox: 1.1276 (1.5422)  loss_giou: 0.8819 (0.9780)  loss_ce_unscaled: 0.3809 (0.4328
    )  class_error_unscaled: 100.0000 (92.2951)  loss_bbox_unscaled: 0.2255 (0.3084)  loss_giou_unscaled: 0.4409 (0.4890)  cardinality_error_unscaled: 1.0000 (0.9888)  time: 0.0883  data: 0.0045  max mem: 1393
    Epoch: [0]  [  4000/230294]  eta: 5:35:24  lr: 0.000050  class_error: 0.00  loss: 1.8408 (2.7103)  loss_ce: 0.3210 (0.4222)  loss_bbox: 0.7209 (1.3707)  loss_giou: 0.6109 (0.9174)  loss_ce_unscaled: 0.3210 (0.4222)
     class_error_unscaled: 50.0000 (89.8711)  loss_bbox_unscaled: 0.1442 (0.2741)  loss_giou_unscaled: 0.3055 (0.4587)  cardinality_error_unscaled: 0.5000 (0.9609)  time: 0.0906  data: 0.0049  max mem: 1393
    Epoch: [0]  [  5000/230294]  eta: 5:34:34  lr: 0.000050  class_error: 0.00  loss: 2.0806 (2.5365)  loss_ce: 0.3440 (0.4120)  loss_bbox: 0.7721 (1.2546)  loss_giou: 0.7145 (0.8699)  loss_ce_unscaled: 0.3440 (0.4120)
     class_error_unscaled: 75.0000 (86.8418)  loss_bbox_unscaled: 0.1544 (0.2509)  loss_giou_unscaled: 0.3572 (0.4349)  cardinality_error_unscaled: 0.5000 (0.9352)  time: 0.0928  data: 0.0048  max mem: 1393
    Epoch: [0]  [  6000/230294]  eta: 5:33:42  lr: 0.000050  class_error: 50.00  loss: 1.5561 (2.4004)  loss_ce: 0.3442 (0.4008)  loss_bbox: 0.5955 (1.1669)  loss_giou: 0.5303 (0.8327)  loss_ce_unscaled: 0.3442 (0.4008)
      class_error_unscaled: 66.6667 (82.8963)  loss_bbox_unscaled: 0.1191 (0.2334)  loss_giou_unscaled: 0.2652 (0.4163)  cardinality_error_unscaled: 0.5000 (0.8982)  time: 0.0910  data: 0.0048  max mem: 1393
    Epoch: [0]  [  7000/230294]  eta: 5:32:53  lr: 0.000050  class_error: 100.00  loss: 1.9024 (2.2844)  loss_ce: 0.2432 (0.3884)  loss_bbox: 0.6760 (1.0965)  loss_giou: 0.6833 (0.7995)  loss_ce_unscaled: 0.2432 (0.3884
    )  class_error_unscaled: 50.0000 (79.1719)  loss_bbox_unscaled: 0.1352 (0.2193)  loss_giou_unscaled: 0.3416 (0.3998)  cardinality_error_unscaled: 0.5000 (0.8579)  time: 0.0856  data: 0.0047  max mem: 1393
    Epoch: [0]  [  8000/230294]  eta: 5:31:30  lr: 0.000050  class_error: 50.00  loss: 1.3197 (2.1904)  loss_ce: 0.2045 (0.3753)  loss_bbox: 0.5773 (1.0416)  loss_giou: 0.6363 (0.7734)  loss_ce_unscaled: 0.2045 (0.3753)
      class_error_unscaled: 33.3333 (75.1935)  loss_bbox_unscaled: 0.1155 (0.2083)  loss_giou_unscaled: 0.3182 (0.3867)  cardinality_error_unscaled: 0.0000 (0.8116)  time: 0.0903  data: 0.0047  max mem: 1393
    Epoch: [0]  [  9000/230294]  eta: 5:30:25  lr: 0.000050  class_error: 100.00  loss: 1.2540 (2.1009)  loss_ce: 0.2317 (0.3612)  loss_bbox: 0.4740 (0.9915)  loss_giou: 0.5079 (0.7482)  loss_ce_unscaled: 0.2317 (0.3612
    )  class_error_unscaled: 50.0000 (71.3004)  loss_bbox_unscaled: 0.0948 (0.1983)  loss_giou_unscaled: 0.2539 (0.3741)  cardinality_error_unscaled: 0.5000 (0.7655)  time: 0.0909  data: 0.0048  max mem: 1393
    
    <truncated>
    
    Epoch: [0]  [230293/230294]  eta: 0:00:00  lr: 0.000050  class_error: 0.00  loss: 0.2878 (0.4740)  loss_ce: 0.0005 (0.0355)  loss_bbox: 0.1188 (0.2152)  loss_giou: 0.1408 (0.2233)  loss_ce_unscaled: 0.0005 (0.0355)
     class_error_unscaled: 0.0000 (4.7529)  loss_bbox_unscaled: 0.0238 (0.0430)  loss_giou_unscaled: 0.0704 (0.1116)  cardinality_error_unscaled: 0.0000 (0.0790)  time: 0.0888  data: 0.0057  max mem: 1393
    Epoch: [0] Total time: 5:45:45 (0.0901 s / it)
    Averaged stats: lr: 0.000050  class_error: 0.00  loss: 0.2878 (0.4740)  loss_ce: 0.0005 (0.0355)  loss_bbox: 0.1188 (0.2152)  loss_giou: 0.1408 (0.2233)  loss_ce_unscaled: 0.0005 (0.0355)  class_error_unscaled: 0.00
    00 (4.7529)  loss_bbox_unscaled: 0.0238 (0.0430)  loss_giou_unscaled: 0.0704 (0.1116)  cardinality_error_unscaled: 0.0000 (0.0790)
    Epoch completed in  5:45:45.451181
        main()
      File "/home/lxyuan/playground/table-transformer/src/main.py", line 368, in main
        train(args, model, criterion, postprocessors, device)
      File "/home/lxyuan/playground/table-transformer/src/main.py", line 317, in train
        pubmed_stats, coco_evaluator = evaluate(model, criterion,
      File "/home/lxyuan/playground/table-transformer/env/lib64/python3.9/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
        return func(*args, **kwargs)
      File "/home/lxyuan/playground/table-transformer/src/../detr/engine.py", line 81, in evaluate
        coco_evaluator = CocoEvaluator(base_ds, iou_types)
      File "/home/lxyuan/playground/table-transformer/src/../detr/datasets/coco_eval.py", line 31, in __init__
        self.coco_eval[iou_type] = COCOeval(coco_gt, iouType=iou_type)
      File "/home/lxyuan/playground/table-transformer/env/lib64/python3.9/site-packages/pycocotools/cocoeval.py", line 76, in __init__
        self.params = Params(iouType=iouType) # parameters
      File "/home/lxyuan/playground/table-transformer/env/lib64/python3.9/site-packages/pycocotools/cocoeval.py", line 527, in __init__
        self.setDetParams()
      File "/home/lxyuan/playground/table-transformer/env/lib64/python3.9/site-packages/pycocotools/cocoeval.py", line 507, in setDetParams
        self.iouThrs = np.linspace(.5, 0.95, np.round((0.95 - .5) / .05) + 1, endpoint=True)
      File "<__array_function__ internals>", line 180, in linspace
      File "/home/lxyuan/playground/table-transformer/env/lib64/python3.9/site-packages/numpy/core/function_base.py", line 120, in linspace
        num = operator.index(num)
    TypeError: 'numpy.float64' object cannot be interpreted as an integer
    

    It seems like i was able to complete one training epoch but got the numpy error message when we were trying to evaluate model performance on the validation set (i.e., src/main:L317)


    Similar error when I tried to use main.py to evaluate model performance directly.

    How to reproduce the error

    (env)$ python main.py 
      --mode eval 
      --data_type detection 
      --config_file detection_config.json
      --data_root_dir ~/../pubtables/PubTables1M-Detection-PASCAL-VOC/ 
      --model_load_path ../pretrained_models/pubtables1m_detection_detr_r18.pth
    

    Error Message

    {'lr': 5e-05, 'lr_backbone': 1e-05, 'batch_size': 2, 'weight_decay': 0.0001, 'epochs': 20, 'lr_drop': 1, 'lr_gamma': 0.9, 'clip_max_norm': 0.1, 'backbone': 'resnet18', 'num_classes': 2, 'dilation': False, 'pos
    ition_embedding': 'sine', 'emphasized_weights': {}, 'enc_layers': 6, 'dec_layers': 6, 'dim_feedforward': 2048, 'hidden_dim': 256, 'dropout': 0.1, 'nheads': 8, 'num_queries': 15, 'pre_norm': True, 'masks': Fals
    e, 'aux_loss': False, 'mask_loss_coef': 1, 'dice_loss_coef': 1, 'ce_loss_coef': 1, 'bbox_loss_coef': 5, 'giou_loss_coef': 2, 'eos_coef': 0.4, 'set_cost_class': 1, 'set_cost_bbox': 5, 'set_cost_giou': 2, 'devic
    e': 'cuda', 'seed': 42, 'start_epoch': 0, 'num_workers': 1, 'data_root_dir': '/home/lxyuan/mini-pubtables/PubTables1M-Dectection-PASCAL-VOC/', 'config_file': 'detection_config.json', 'data_type': 'detection',
    'model_load_path': '../pretrained_models/pubtables1m_detection_detr_r18.pth', 'load_weights_only': False, 'model_save_dir': None, 'metrics_save_filepath': '', 'debug_save_dir': 'debug', 'table_words_dir': None
    , 'mode': 'eval', 'debug': False, 'checkpoint_freq': 1, 'train_max_size': None, 'val_max_size': None, 'test_max_size': None, 'eval_pool_size': 1, 'eval_step': 1, '__module__': '__main__', '__dict__': <attribut
    e '__dict__' of 'Args' objects>, '__weakref__': <attribute '__weakref__' of 'Args' objects>, '__doc__': None}
    ----------------------------------------------------------------------------------------------------
    loading model
    loading model from checkpoint
    loading data
    creating index...
    index created!
    Traceback (most recent call last):
      File "/home/lxyuan/playground/table-transformer/src/main.py", line 375, in <module>
        main()
      File "/home/lxyuan/playground/table-transformer/src/main.py", line 371, in main
        eval_coco(args, model, criterion, postprocessors, data_loader_test, dataset_test, device)
      File "/home/lxyuan/playground/table-transformer/src/eval.py", line 693, in eval_coco
        pubmed_stats, coco_evaluator = evaluate(args, model, criterion, postprocessors,
      File "/home/lxyuan/playground/table-transformer/env/lib64/python3.9/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
        return func(*args, **kwargs)
      File "/home/lxyuan/playground/table-transformer/src/eval.py", line 586, in evaluate
        coco_evaluator = CocoEvaluator(base_ds, iou_types)
      File "/home/lxyuan/playground/table-transformer/src/../detr/datasets/coco_eval.py", line 31, in __init__
        self.coco_eval[iou_type] = COCOeval(coco_gt, iouType=iou_type)
      File "/home/lxyuan/playground/table-transformer/env/lib64/python3.9/site-packages/pycocotools/cocoeval.py", line 76, in __init__
        self.params = Params(iouType=iouType) # parameters
      File "/home/lxyuan/playground/table-transformer/env/lib64/python3.9/site-packages/pycocotools/cocoeval.py", line 527, in __init__
        self.setDetParams()
      File "/home/lxyuan/playground/table-transformer/env/lib64/python3.9/site-packages/pycocotools/cocoeval.py", line 507, in setDetParams
        self.iouThrs = np.linspace(.5, 0.95, np.round((0.95 - .5) / .05) + 1, endpoint=True)
      File "<__array_function__ internals>", line 180, in linspace
      File "/home/lxyuan/playground/table-transformer/env/lib64/python3.9/site-packages/numpy/core/function_base.py", line 120, in linspace
        num = operator.index(num)
    TypeError: 'numpy.float64' object cannot be interpreted as an integer
    

    NOTE: I am using numpy==1.23.2 and python3.9

    opened by LxYuan-Handshakes 2
  • The dataset home page is not working

    The dataset home page is not working

    The website kept saying that "Loading Dataset Details..." and it didn't return with anything after waiting for a long time. Another weird problem is that I cann't see any dataset or use the search function in msropendata.com. I tried to file a issue in msropendata.com, but it can not be submitted. Is the backend server of msropendata down or something else? Thanks for your help!

    opened by narthchin 2
  • Error when running in debug mode: Runtime Error: Expected all tensors to be on the same device, but found at least 2 devices, cuda:0 and cpu!

    Error when running in debug mode: Runtime Error: Expected all tensors to be on the same device, but found at least 2 devices, cuda:0 and cpu!

    Dear authors, I've just implemented below code in debug mode in order to visualize reconstruction result on PDF file:

    !python main.py --data_root_dir path/to/structure --model_load_path path/to/model --table_words_dir path/to/words --mode grits --metrics_save_filepath path/to/metrics_save_file --debug

    And I experienced this bug. It says "Runtime Error: Expected all tensors to be on the same device, but found at least 2 devices, cuda:0 and cpu! image

    I implemented this on a GPU runtime of colab and this error occured. When I tried to run on CPU mode only, it said there are no GPU device. I couldn't figure out what the reasons caused this error are. Could you help me identify where the problem is? Thanks for considering my pledge.

    opened by suonbo 2
  • Why did the row/column dilation get removed?

    Why did the row/column dilation get removed?

    The paper talks about doing row/column bounding box dilation to align the rows and columns and remove gaps. I see in the postprocessing.py code that this code has been commented out and removed.

        # Dilate rows and columns before final extraction
        #dilated_columns = fill_column_gaps(columns, table_bbox)
        dilated_columns = columns
        #dilated_rows = fill_row_gaps(rows, table_bbox)
        dilated_rows = rows
    

    Is there a reason for this? Or is the bounding box dilation happening elsewhere in the code that I've missed?

    opened by wandering-walrus 0
  • Duplicated classes in table_datasets.py

    Duplicated classes in table_datasets.py

    @bsmock Hi, thank you for sharing your work.

    I have some question about your code.

    Why there are duplicated class in your code? for example Class RandomCrop in table_datasets.py L137, L267 and Class RandomResize in table_dataset.py L185, L395

    Is this just typo? I'm confused. please check this issue.

    opened by yellowjs0304 0
  • Annotation Tool

    Annotation Tool

    Hi we are trying to use this model for custom training. We have a set of images we would like to fine tune on. We were able to generate the XML files using LabelImg. But the words.json file is a little tricky. Can you please share the annotation tool used or suggest an alternative.

    opened by abhayhk2001 0
  • Colab Notebook TSR: functional analysis and obtain final dataframe

    Colab Notebook TSR: functional analysis and obtain final dataframe

    Hi!

    I was working with the TD and TSR notebooks https://github.com/NielsRogge/Transformers-Tutorials/tree/master/Table%20Transformer, and they work properly for me, but the last step of the TSR pipeline to obtain a data frame is not implemented in these notebooks (I think, this process is called functional analysis in this repo). The postprocessing steps of the TSR pass from the structure to grid cells.

    Was anyone capable to obtain well the final data frame for TSR in colab? Taking into account spanning cells and titles.

    Regards

    opened by emigomez 2
  • Single Inference of TSR

    Single Inference of TSR

    Hi!

    I want to run the whole pipeline of TSR for a single input image obtaining as output the final data frame result. I think that the deployment to obtain this final data frame is implemented here as 'functional analysis', but the problem is that I don't know how to make a single image inference with this repo.

    Do you know how to run a single image inference of the entire TSR+funtional analysis pipeline? (I read similar issues but don't find the solution)

    Regards

    opened by emigomez 0
Owner
Microsoft
Open source projects and samples from Microsoft
Microsoft
Provided is code that demonstrates the training and evaluation of the work presented in the paper: "On the Detection of Digital Face Manipulation" published in CVPR 2020.

FFD Source Code Provided is code that demonstrates the training and evaluation of the work presented in the paper: "On the Detection of Digital Face M

null 88 Nov 22, 2022
[CVPR 2021] MiVOS - Mask Propagation module. Reproduced STM (and better) with training code :star2:. Semi-supervised video object segmentation evaluation.

MiVOS (CVPR 2021) - Mask Propagation Ho Kei Cheng, Yu-Wing Tai, Chi-Keung Tang [arXiv] [Paper PDF] [Project Page] [Papers with Code] This repo impleme

Rex Cheng 106 Jan 3, 2023
Training code and evaluation benchmarks for the "Self-Supervised Policy Adaptation during Deployment" paper.

Self-Supervised Policy Adaptation during Deployment PyTorch implementation of PAD and evaluation benchmarks from Self-Supervised Policy Adaptation dur

Nicklas Hansen 101 Nov 1, 2022
This is the official source code for SLATE. We provide the code for the model, the training code, and a dataset loader for the 3D Shapes dataset. This code is implemented in Pytorch.

SLATE This is the official source code for SLATE. We provide the code for the model, the training code and a dataset loader for the 3D Shapes dataset.

Gautam Singh 66 Dec 26, 2022
This is the code for our KILT leaderboard submission to the T-REx and zsRE tasks. It includes code for training a DPR model then continuing training with RAG.

KGI (Knowledge Graph Induction) for slot filling This is the code for our KILT leaderboard submission to the T-REx and zsRE tasks. It includes code fo

International Business Machines 72 Jan 6, 2023
The Hailo Model Zoo includes pre-trained models and a full building and evaluation environment

Hailo Model Zoo The Hailo Model Zoo provides pre-trained models for high-performance deep learning applications. Using the Hailo Model Zoo you can mea

Hailo 50 Dec 7, 2022
Active and Sample-Efficient Model Evaluation

Active Testing: Sample-Efficient Model Evaluation Hi, good to see you here! ?? This is code for "Active Testing: Sample-Efficient Model Evaluation". P

Jannik Kossen 19 Oct 30, 2022
Label-Free Model Evaluation with Semi-Structured Dataset Representations

Label-Free Model Evaluation with Semi-Structured Dataset Representations Prerequisites This code uses the following libraries Python 3.7 NumPy PyTorch

null 8 Oct 6, 2022
A large-scale video dataset for the training and evaluation of 3D human pose estimation models

ASPset-510 ASPset-510 (Australian Sports Pose Dataset) is a large-scale video dataset for the training and evaluation of 3D human pose estimation mode

Aiden Nibali 36 Oct 30, 2022
A large-scale video dataset for the training and evaluation of 3D human pose estimation models

ASPset-510 (Australian Sports Pose Dataset) is a large-scale video dataset for the training and evaluation of 3D human pose estimation models. It contains 17 different amateur subjects performing 30 sports-related actions each, for a total of 510 action clips.

Aiden Nibali 25 Jun 20, 2021
null 202 Jan 6, 2023
​TextWorld is a sandbox learning environment for the training and evaluation of reinforcement learning (RL) agents on text-based games.

TextWorld A text-based game generator and extensible sandbox learning environment for training and testing reinforcement learning (RL) agents. Also ch

Microsoft 983 Dec 23, 2022
A complete end-to-end demonstration in which we collect training data in Unity and use that data to train a deep neural network to predict the pose of a cube. This model is then deployed in a simulated robotic pick-and-place task.

Object Pose Estimation Demo This tutorial will go through the steps necessary to perform pose estimation with a UR3 robotic arm in Unity. You’ll gain

Unity Technologies 187 Dec 24, 2022
Source code for the GPT-2 story generation models in the EMNLP 2020 paper "STORIUM: A Dataset and Evaluation Platform for Human-in-the-Loop Story Generation"

Storium GPT-2 Models This is the official repository for the GPT-2 models described in the EMNLP 2020 paper [STORIUM: A Dataset and Evaluation Platfor

Nader Akoury 27 Dec 20, 2022
PyTorch evaluation code for Delving Deep into the Generalization of Vision Transformers under Distribution Shifts.

Out-of-distribution Generalization Investigation on Vision Transformers This repository contains PyTorch evaluation code for Delving Deep into the Gen

Chongzhi Zhang 72 Dec 13, 2022
In this project we investigate the performance of the SetCon model on realistic video footage. Therefore, we implemented the model in PyTorch and tested the model on two example videos.

Contrastive Learning of Object Representations Supervisor: Prof. Dr. Gemma Roig Institutions: Goethe University CVAI - Computational Vision & Artifici

Dirk Neuhäuser 6 Dec 8, 2022
Step by Step on how to create an vision recognition model using LOBE.ai, export the model and run the model in an Azure Function

Step by Step on how to create an vision recognition model using LOBE.ai, export the model and run the model in an Azure Function

El Bruno 3 Mar 30, 2022
Automatically download the cwru data set, and then divide it into training data set and test data set

Automatically download the cwru data set, and then divide it into training data set and test data set.自动下载cwru数据集,然后分训练数据集和测试数据集

null 6 Jun 27, 2022
BERT model training impelmentation using 1024 A100 GPUs for MLPerf Training v1.1

Pre-trained checkpoint and bert config json file Location of checkpoint and bert config json file This MLCommons members Google Drive location contain

SAIT (Samsung Advanced Institute of Technology) 12 Apr 27, 2022