SurvTRACE: Transformers for Survival Analysis with Competing Events

Overview

SurvTRACE: Transformers for Survival Analysis with Competing Events

This repo provides the implementation of SurvTRACE for survival analysis. It is easy to use with only the following codes:

from survtrace.dataset import load_data
from survtrace.model import SurvTraceSingle
from survtrace import Evaluator
from survtrace import Trainer
from survtrace import STConfig

# use METABRIC dataset
STConfig['data'] = 'metabric'
df, df_train, df_y_train, df_test, df_y_test, df_val, df_y_val = load_data(STConfig)

# initialize model
model = SurvTraceSingle(STConfig)

# execute training
trainer = Trainer(model)
trainer.fit((df_train, df_y_train), (df_val, df_y_val))

# evaluating
evaluator = Evaluator(df, df_train.index)
evaluator.eval(model, (df_test, df_y_test))

print("done!")

🔥 See the demo

Please refer to experiment_metabric.ipynb and experiment_support.ipynb !

🔥 How to config the environment

Use our pre-saved conda environment!

conda env create --name survtrace --file=survtrace.yml
conda activate survtrace

or try to install from the requirement.txt

pip3 install -r requirements.txt

🔥 How to get SEER data

  1. Go to https://seer.cancer.gov/data/ to ask for data request from SEER following the guide there.

  2. After complete the step one, we should have the following seerstat software for data access. Open it and sign in with the username and password sent by seer.

  1. Use seerstat to open the ./data/seer.sl file, we shall see the following.

Click on the 'excute' icon to request from the seer database. We will obtain a csv file.

  1. move the csv file to ./data/seer_raw.csv, then run the python script process_seer.py, as

    python process_seer.py

    we will obtain the processed seer data named seer_processed.csv.

📝 Functions

  • single event survival analysis
  • competing events survival analysis
  • multi-task learning
  • automatic hyperparameter grid-search

😄 If you find this result interesting, please consider to cite this paper:

@article{wang2021survtrace,
      title={Surv{TRACE}: Transformers for Survival Analysis with Competing Events}, 
      author={Zifeng Wang and Jimeng Sun},
      year={2021},
      eprint={2110.00855},
      archivePrefix={arXiv},
      primaryClass={cs.LG}
}
You might also like...
:mag: Transformers at scale for question answering & neural search. Using NLP via a modular Retriever-Reader-Pipeline. Supporting DPR, Elasticsearch, HuggingFace's Modelhub...
:mag: Transformers at scale for question answering & neural search. Using NLP via a modular Retriever-Reader-Pipeline. Supporting DPR, Elasticsearch, HuggingFace's Modelhub...

Haystack is an end-to-end framework for Question Answering & Neural search that enables you to ... ... ask questions in natural language and find gran

🛸 Use pretrained transformers like BERT, XLNet and GPT-2 in spaCy

spacy-transformers: Use pretrained transformers like BERT, XLNet and GPT-2 in spaCy This package provides spaCy components and architectures to use tr

spaCy plugin for Transformers , Udify, ELmo, etc.

Camphr - spaCy plugin for Transformers, Udify, Elmo, etc. Camphr is a Natural Language Processing library that helps in seamless integration for a wid

🤗Transformers: State-of-the-art Natural Language Processing for Pytorch and TensorFlow 2.0.
🤗Transformers: State-of-the-art Natural Language Processing for Pytorch and TensorFlow 2.0.

State-of-the-art Natural Language Processing for PyTorch and TensorFlow 2.0 🤗 Transformers provides thousands of pretrained models to perform tasks o

:mag: End-to-End Framework for building natural language search interfaces to data by utilizing Transformers and the State-of-the-Art of NLP. Supporting DPR, Elasticsearch, HuggingFace’s Modelhub and much more!
:mag: End-to-End Framework for building natural language search interfaces to data by utilizing Transformers and the State-of-the-Art of NLP. Supporting DPR, Elasticsearch, HuggingFace’s Modelhub and much more!

Haystack is an end-to-end framework that enables you to build powerful and production-ready pipelines for different search use cases. Whether you want

🛸 Use pretrained transformers like BERT, XLNet and GPT-2 in spaCy

spacy-transformers: Use pretrained transformers like BERT, XLNet and GPT-2 in spaCy This package provides spaCy components and architectures to use tr

spaCy plugin for Transformers , Udify, ELmo, etc.

Camphr - spaCy plugin for Transformers, Udify, Elmo, etc. Camphr is a Natural Language Processing library that helps in seamless integration for a wid

A deep learning-based translation library built on Huggingface transformers

DL Translate A deep learning-based translation library built on Huggingface transformers and Facebook's mBART-Large 💻 GitHub Repository 📚 Documentat

Guide: Finetune GPT2-XL (1.5 Billion Parameters) and GPT-NEO (2.7 B) on a single 16 GB VRAM V100 Google Cloud instance with Huggingface Transformers using DeepSpeed

Guide: Finetune GPT2-XL (1.5 Billion Parameters) and GPT-NEO (2.7 Billion Parameters) on a single 16 GB VRAM V100 Google Cloud instance with Huggingfa

Comments
  • How to prepare model input from my own data?

    How to prepare model input from my own data?

    Hi Dr. Wang, I'm a surgeon in China. I'm really interested in your SurvTrace and i'd like to apply it on my research to predict the prognosis of cancer patients. However, I do just learned python not long ago. Could you show me how to prepare the model input from local files? E.g. A matrix (mxn), the row is patients ID, the col containing overall survival time, events, and features for modeling.

    opened by Jwenyi 13
  • Fail to install the enviroment

    Fail to install the enviroment

    Hi,Zifeng: Your work is very good and I really want to use this method . But i meet some problem at the first step:

    conda env create --name survtrace --file=survtrace.yml

    Here is the problem

    • [ ] Collecting package metadata (repodata.json): done
    • [ ] Solving environment: failed
    • [ ] ResolvePackageNotFound:
    • [ ] - vs2015_runtime==14.27.29016=h5e58377_2
    • [ ] - m2w64-gmp==6.1.0=2
    • [ ] - cvxopt==1.2.5=py36h542453d_0
    • [ ] - glpk==4.65=hdc00fd2_2
    • [ ] - multiprocess==0.70.11.1=py36hf4a77e7_0
    • [ ] - mkl_fft==1.3.0=py36h46781fe_0
    • [ ] - icc_rt==2019.0.0=h0cc432a_1
    • [ ] - setuptools==58.0.4=py36haa95532_0
    • [ ] - libcblas==3.9.0=5_hd5c7e75_netlib
    • [ ] - fastcache==1.1.0=py36he774522_0
    • [ ] - sqlite==3.36.0=h2bbff1b_0
    • [ ] - wincertstore==0.2=py36h7fe50ca_0
    • [ ] - certifi==2021.5.30=py36ha15d459_0
    • [ ] - vc==14.2=h21ff451_1
    • [ ] - python==3.6.13=h3758d61_0
    • [ ] - scikit-learn==0.22.1=py36h7208079_1
    • [ ] - numexpr==2.7.3=py36hcbcaa1e_0
    • [ ] - scikit-survival==0.14.0=py36he350917_0
    • [ ] - scs==2.1.2=py36haa4650d_0
    • [ ] - ecos==2.0.7.post1=py36haa4650d_3
    • [ ] - msys2-conda-epoch==20160418=1
    • [ ] - scipy==1.5.2=py36h9439919_0
    • [ ] - mkl_random==1.1.1=py36h47e9c7a_0
    • [ ] - numpy-base==1.19.2=py36ha3acd2a_0
    • [ ] - m2w64-gcc-libs==5.3.0=7
    • [ ] - cvxpy-base==1.0.31=py36h6538335_0
    • [ ] - intel-openmp==2021.3.0=haa95532_3372
    • [ ] - m2w64-libwinpthread-git==5.0.0.4634.697f757=2
    • [ ] - libblas==3.9.0=1_h8933c1f_netlib
    • [ ] - mkl-service==2.3.0=py36h196d8e1_0
    • [ ] - pandas==1.1.5=py36hd77b12b_0
    • [ ] - osqp==0.5.0=py36haa4650d_3
    • [ ] - m2w64-gcc-libgfortran==5.3.0=6
    • [ ] - pip==21.0.1=py36haa95532_0
    • [ ] - m2w64-gcc-libs-core==5.3.0=7

    And my compter is _Architecture:

    • [ ] x86_64
    • [ ] CPU op-mode(s): 32-bit, 64-bit
    • [ ] Byte Order: Little Endian
    • [ ] Address sizes: 46 bits physical, 48 bits virtual
    • [ ] CPU(s): 32
    • [ ] On-line CPU(s) list: 0-31
    • [ ] Thread(s) per core: 2
    • [ ] Core(s) per socket: 16
    • [ ] Socket(s): 1
    • [ ] NUMA node(s): 1
    • [ ] Vendor ID: GenuineIntel
    • [ ] CPU family: 6
    • [ ] Model: 85
    • [ ] Model name: Intel(R) Xeon(R) Gold 6246R CPU @ 3.40GHz
    • [ ] Stepping: 7
    • [ ] CPU MHz: 3400.000
    • [ ] CPU max MHz: 4100.0000
    • [ ] CPU min MHz: 1200.0000
    • [ ] BogoMIPS: 6800.00
    • [ ] Virtualization: VT-x
    • [ ] L1d cache: 512 KiB
    • [ ] L1i cache: 512 KiB
    • [ ] L2 cache: 16 MiB
    • [ ] L3 cache: 35.8 MiB
    • [ ] NUMA node0 CPU(s): 0-31
    • [ ] Vulnerability Itlb multihit: KVM: Mitigation: VMX disabled
    • [ ] Vulnerability L1tf: Not affected
    • [ ] Vulnerability Mds: Not affected
    • [ ] Vulnerability Meltdown: Not affected
    • [ ] Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled v
    • [ ] ia prctl and seccomp
    • [ ] Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user
    • [ ] pointer sanitization
    • [ ] Vulnerability Spectre v2: Mitigation; Enhanced IBRS, IBPB conditional, RS
    • [ ] B filling
    • [ ] Vulnerability Srbds: Not affected
    • [ ] Vulnerability Tsx async abort: Mitigation; TSX disabled
    • [ ] Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtr
    • [ ] r pge mca cmov pat pse36 clflush dts acpi mmx f
    • [ ] xsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rd
    • [ ] tscp lm constant_tsc art arch_perfmon pebs bts
    • [ ] rep_good nopl xtopology nonstop_tsc cpuid aperf
    • [ ] mperf pni pclmulqdq dtes64 monitor ds_cpl vmx s
    • [ ] mx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid d
    • [ ] ca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadli
    • [ ] ne_timer aes xsave avx f16c rdrand lahf_lm abm
    • [ ] 3dnowprefetch cpuid_fault epb cat_l3 cdp_l3 inv
    • [ ] pcid_single intel_ppin ssbd mba ibrs ibpb stibp
    • [ ] ibrs_enhanced tpr_shadow vnmi flexpriority ept
    • [ ] vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep
    • [ ] bmi2 erms invpcid cqm mpx rdt_a avx512f avx512
    • [ ] dq rdseed adx smap clflushopt clwb intel_pt avx
    • [ ] 512cd avx512bw avx512vl xsaveopt xsavec xgetbv1
    • [ ] xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm
    • [ ] mbm_local dtherm ida arat pln pts hwp hwp_act
    • [ ] window hwp_epp hwp_pkg_req pku ospke avx512_vnn
    • [ ] i md_clear flush_l1d arch_capabilities

    my conda version is 4.10.3. thanks a lot,I will be appreciated if you reply me this stupid question

    opened by dandata123-tech 4
  • Question about inverse propensity score loss

    Question about inverse propensity score loss

    Hi Zifeng,

    I read your paper on Arxiv and got interested in the inverse propensity score loss that you implemented for debiasing the competing events. However, I still have some questions about this and hope you can help me with them.

    I can see from the paper that IPS-weighting, equation, is trained to estimate the true distribution of the competing events. Based on your equation 20, equation, this IPS weighting is seemed to be obtained from scratch using a different model, not a downstream model after the latent representation equation.

    However, I didn't find this implementation in this repo. Can you let me know in which part did you implement this IPS loss? Sorry if the questions are naive or due to my carelessness. I'm looking forward to hearing from you.

    Best, Shiang

    opened by shi-ang 2
  • Inquiry about how to visualize an attention map

    Inquiry about how to visualize an attention map

    Would you please share with me how to visualize an attention map? You have provided an Attention Map within your paper. How did you make this figure? I am trying to visualize a similar figure by trial and error. How do you use the last layer of attention? I would appreciate it if you could share your method if possible. Best Regards.

    opened by kirohirahanoshi 0
Owner
Zifeng
PhD student of Computer Science
Zifeng
Creating a Feed of MISP Events from ThreatFox (by abuse.ch)

ThreatFox2Misp Creating a Feed of MISP Events from ThreatFox (by abuse.ch) What will it do? This will fetch IOCs from ThreatFox by Abuse.ch, convert t

null 17 Nov 22, 2022
This script just scrapes the most recent Nepali news from Kathmandu Post and notifies the user about current events at regular intervals.It sends out the most recent news at random!

Nepali-news-notifier This script just scrapes the most recent Nepali news from Kathmandu Post and notifies the user about current events at regular in

Sachit Yadav 1 Feb 11, 2022
C.J. Hutto 3.8k Dec 30, 2022
C.J. Hutto 2.8k Feb 18, 2021
Twitter-Sentiment-Analysis - Twitter sentiment analysis for india's top online retailers(2019 to 2022)

Twitter-Sentiment-Analysis Twitter sentiment analysis for india's top online retailers(2019 to 2022) Project Overview : Sentiment Analysis helps us to

Balaji R 1 Jan 1, 2022
Twitter-NLP-Analysis - Twitter Natural Language Processing Analysis

Twitter-NLP-Analysis Business Problem I got last @turk_politika 3000 tweets with

Çağrı Karadeniz 7 Mar 12, 2022
Framework for fine-tuning pretrained transformers for Named-Entity Recognition (NER) tasks

NERDA Not only is NERDA a mesmerizing muppet-like character. NERDA is also a python package, that offers a slick easy-to-use interface for fine-tuning

Ekstra Bladet 141 Dec 30, 2022
KoBART model on huggingface transformers

KoBART-Transformers SKT에서 공개한 KoBART를 편리하게 사용할 수 있게 transformers로 포팅하였습니다. Install (Optional) BartModel과 PreTrainedTokenizerFast를 이용하면 설치하실 필요 없습니다. p

Hyunwoong Ko 58 Dec 7, 2022
Big Bird: Transformers for Longer Sequences

BigBird, is a sparse-attention based transformer which extends Transformer based models, such as BERT to much longer sequences. Moreover, BigBird comes along with a theoretical understanding of the capabilities of a complete transformer that the sparse model can handle.

Google Research 457 Dec 23, 2022
🤗Transformers: State-of-the-art Natural Language Processing for Pytorch and TensorFlow 2.0.

State-of-the-art Natural Language Processing for PyTorch and TensorFlow 2.0 ?? Transformers provides thousands of pretrained models to perform tasks o

Hugging Face 77.3k Jan 3, 2023