SurvTRACE: Transformers for Survival Analysis with Competing Events

Zifeng

Last update: Oct 6, 2022

Related tags

Overview

⭐ SurvTRACE: Transformers for Survival Analysis with Competing Events

This repo provides the implementation of SurvTRACE for survival analysis. It is easy to use with only the following codes:

from survtrace.dataset import load_data
from survtrace.model import SurvTraceSingle
from survtrace import Evaluator
from survtrace import Trainer
from survtrace import STConfig

# use METABRIC dataset
STConfig['data'] = 'metabric'
df, df_train, df_y_train, df_test, df_y_test, df_val, df_y_val = load_data(STConfig)

# initialize model
model = SurvTraceSingle(STConfig)

# execute training
trainer = Trainer(model)
trainer.fit((df_train, df_y_train), (df_val, df_y_val))

# evaluating
evaluator = Evaluator(df, df_train.index)
evaluator.eval(model, (df_test, df_y_test))

print("done!")

🔥 See the demo

Please refer to experiment_metabric.ipynb and experiment_support.ipynb !

🔥 How to config the environment

Use our pre-saved conda environment!

conda env create --name survtrace --file=survtrace.yml
conda activate survtrace

or try to install from the requirement.txt

pip3 install -r requirements.txt

🔥 How to get SEER data

Go to https://seer.cancer.gov/data/ to ask for data request from SEER following the guide there.
After complete the step one, we should have the following seerstat software for data access. Open it and sign in with the username and password sent by seer.

Use seerstat to open the ./data/seer.sl file, we shall see the following.

Click on the 'excute' icon to request from the seer database. We will obtain a csv file.

move the csv file to ./data/seer_raw.csv, then run the python script process_seer.py, as
```
python process_seer.py
```
we will obtain the processed seer data named seer_processed.csv.

📝 Functions

single event survival analysis
competing events survival analysis
multi-task learning
automatic hyperparameter grid-search

😄 If you find this result interesting, please consider to cite this paper:

@article{wang2021survtrace,
      title={Surv{TRACE}: Transformers for Survival Analysis with Competing Events}, 
      author={Zifeng Wang and Jimeng Sun},
      year={2021},
      eprint={2110.00855},
      archivePrefix={arXiv},
      primaryClass={cs.LG}
}

You might also like...

:mag: Transformers at scale for question answering & neural search. Using NLP via a modular Retriever-Reader-Pipeline. Supporting DPR, Elasticsearch, HuggingFace's Modelhub...

Haystack is an end-to-end framework for Question Answering & Neural search that enables you to ... ... ask questions in natural language and find gran

6.4k Jan 9, 2023

🛸 Use pretrained transformers like BERT, XLNet and GPT-2 in spaCy

spacy-transformers: Use pretrained transformers like BERT, XLNet and GPT-2 in spaCy This package provides spaCy components and architectures to use tr

1.2k Jan 8, 2023

spaCy plugin for Transformers , Udify, ELmo, etc.

Camphr - spaCy plugin for Transformers, Udify, Elmo, etc. Camphr is a Natural Language Processing library that helps in seamless integration for a wid

342 Nov 21, 2022

🤗Transformers: State-of-the-art Natural Language Processing for Pytorch and TensorFlow 2.0.

State-of-the-art Natural Language Processing for PyTorch and TensorFlow 2.0 🤗 Transformers provides thousands of pretrained models to perform tasks o

40.9k Feb 18, 2021

:mag: End-to-End Framework for building natural language search interfaces to data by utilizing Transformers and the State-of-the-Art of NLP. Supporting DPR, Elasticsearch, HuggingFace’s Modelhub and much more!

Haystack is an end-to-end framework that enables you to build powerful and production-ready pipelines for different search use cases. Whether you want

1.4k Feb 18, 2021

🛸 Use pretrained transformers like BERT, XLNet and GPT-2 in spaCy

spacy-transformers: Use pretrained transformers like BERT, XLNet and GPT-2 in spaCy This package provides spaCy components and architectures to use tr

903 Feb 17, 2021

spaCy plugin for Transformers , Udify, ELmo, etc.

Camphr - spaCy plugin for Transformers, Udify, Elmo, etc. Camphr is a Natural Language Processing library that helps in seamless integration for a wid

327 Feb 18, 2021

A deep learning-based translation library built on Huggingface transformers

DL Translate A deep learning-based translation library built on Huggingface transformers and Facebook's mBART-Large 💻 GitHub Repository 📚 Documentat

244 Dec 30, 2022

Guide: Finetune GPT2-XL (1.5 Billion Parameters) and GPT-NEO (2.7 B) on a single 16 GB VRAM V100 Google Cloud instance with Huggingface Transformers using DeepSpeed

Guide: Finetune GPT2-XL (1.5 Billion Parameters) and GPT-NEO (2.7 Billion Parameters) on a single 16 GB VRAM V100 Google Cloud instance with Huggingfa

289 Jan 6, 2023

Comments

How to prepare model input from my own data?

Hi Dr. Wang, I'm a surgeon in China. I'm really interested in your SurvTrace and i'd like to apply it on my research to predict the prognosis of cancer patients. However, I do just learned python not long ago. Could you show me how to prepare the model input from local files? E.g. A matrix (mxn), the row is patients ID, the col containing overall survival time, events, and features for modeling.

opened by Jwenyi 13
Fail to install the enviroment
Hi,Zifeng: Your work is very good and I really want to use this method . But i meet some problem at the first step:

conda env create --name survtrace --file=survtrace.yml

Here is the problem

[ ] Collecting package metadata (repodata.json): done

[ ] Solving environment: failed

[ ] ResolvePackageNotFound:

[ ] - vs2015_runtime==14.27.29016=h5e58377_2

[ ] - m2w64-gmp==6.1.0=2

[ ] - cvxopt==1.2.5=py36h542453d_0

[ ] - glpk==4.65=hdc00fd2_2

[ ] - multiprocess==0.70.11.1=py36hf4a77e7_0

[ ] - mkl_fft==1.3.0=py36h46781fe_0

[ ] - icc_rt==2019.0.0=h0cc432a_1

[ ] - setuptools==58.0.4=py36haa95532_0

[ ] - libcblas==3.9.0=5_hd5c7e75_netlib

[ ] - fastcache==1.1.0=py36he774522_0

[ ] - sqlite==3.36.0=h2bbff1b_0

[ ] - wincertstore==0.2=py36h7fe50ca_0

[ ] - certifi==2021.5.30=py36ha15d459_0

[ ] - vc==14.2=h21ff451_1

[ ] - python==3.6.13=h3758d61_0

[ ] - scikit-learn==0.22.1=py36h7208079_1

[ ] - numexpr==2.7.3=py36hcbcaa1e_0

[ ] - scikit-survival==0.14.0=py36he350917_0

[ ] - scs==2.1.2=py36haa4650d_0

[ ] - ecos==2.0.7.post1=py36haa4650d_3

[ ] - msys2-conda-epoch==20160418=1

[ ] - scipy==1.5.2=py36h9439919_0

[ ] - mkl_random==1.1.1=py36h47e9c7a_0

[ ] - numpy-base==1.19.2=py36ha3acd2a_0

[ ] - m2w64-gcc-libs==5.3.0=7

[ ] - cvxpy-base==1.0.31=py36h6538335_0

[ ] - intel-openmp==2021.3.0=haa95532_3372

[ ] - m2w64-libwinpthread-git==5.0.0.4634.697f757=2

[ ] - libblas==3.9.0=1_h8933c1f_netlib

[ ] - mkl-service==2.3.0=py36h196d8e1_0

[ ] - pandas==1.1.5=py36hd77b12b_0

[ ] - osqp==0.5.0=py36haa4650d_3

[ ] - m2w64-gcc-libgfortran==5.3.0=6

[ ] - pip==21.0.1=py36haa95532_0

[ ] - m2w64-gcc-libs-core==5.3.0=7

And my compter is _Architecture:

[ ] x86_64

[ ] CPU op-mode(s): 32-bit, 64-bit

[ ] Byte Order: Little Endian

[ ] Address sizes: 46 bits physical, 48 bits virtual

[ ] CPU(s): 32

[ ] On-line CPU(s) list: 0-31

[ ] Thread(s) per core: 2

[ ] Core(s) per socket: 16

[ ] Socket(s): 1

[ ] NUMA node(s): 1

[ ] Vendor ID: GenuineIntel

[ ] CPU family: 6

[ ] Model: 85

[ ] Model name: Intel(R) Xeon(R) Gold 6246R CPU @ 3.40GHz

[ ] Stepping: 7

[ ] CPU MHz: 3400.000

[ ] CPU max MHz: 4100.0000

[ ] CPU min MHz: 1200.0000

[ ] BogoMIPS: 6800.00

[ ] Virtualization: VT-x

[ ] L1d cache: 512 KiB

[ ] L1i cache: 512 KiB

[ ] L2 cache: 16 MiB

[ ] L3 cache: 35.8 MiB

[ ] NUMA node0 CPU(s): 0-31

[ ] Vulnerability Itlb multihit: KVM: Mitigation: VMX disabled

[ ] Vulnerability L1tf: Not affected

[ ] Vulnerability Mds: Not affected

[ ] Vulnerability Meltdown: Not affected

[ ] Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled v

[ ] ia prctl and seccomp

[ ] Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user

[ ] pointer sanitization

[ ] Vulnerability Spectre v2: Mitigation; Enhanced IBRS, IBPB conditional, RS

[ ] B filling

[ ] Vulnerability Srbds: Not affected

[ ] Vulnerability Tsx async abort: Mitigation; TSX disabled

[ ] Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtr

[ ] r pge mca cmov pat pse36 clflush dts acpi mmx f

[ ] xsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rd

[ ] tscp lm constant_tsc art arch_perfmon pebs bts

[ ] rep_good nopl xtopology nonstop_tsc cpuid aperf

[ ] mperf pni pclmulqdq dtes64 monitor ds_cpl vmx s

[ ] mx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid d

[ ] ca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadli

[ ] ne_timer aes xsave avx f16c rdrand lahf_lm abm

[ ] 3dnowprefetch cpuid_fault epb cat_l3 cdp_l3 inv

[ ] pcid_single intel_ppin ssbd mba ibrs ibpb stibp

[ ] ibrs_enhanced tpr_shadow vnmi flexpriority ept

[ ] vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep

[ ] bmi2 erms invpcid cqm mpx rdt_a avx512f avx512

[ ] dq rdseed adx smap clflushopt clwb intel_pt avx

[ ] 512cd avx512bw avx512vl xsaveopt xsavec xgetbv1

[ ] xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm

[ ] mbm_local dtherm ida arat pln pts hwp hwp_act

[ ] window hwp_epp hwp_pkg_req pku ospke avx512_vnn

[ ] i md_clear flush_l1d arch_capabilities

my conda version is 4.10.3. thanks a lot,I will be appreciated if you reply me this stupid question
opened by dandata123-tech 4
Question about inverse propensity score loss

Hi Zifeng,

I read your paper on Arxiv and got interested in the inverse propensity score loss that you implemented for debiasing the competing events. However, I still have some questions about this and hope you can help me with them.

I can see from the paper that IPS-weighting, $\pi_{ik} = Pr(e_i = k | \phi, \boldsymbol{x})$ , is trained to estimate the true distribution of the competing events. Based on your equation 20, $\pi_{ik}\triangleq \sigma(\boldsymbol{w}^T\boldsymbol{x}_i+\beta)$ , this IPS weighting is seemed to be obtained from scratch using a different model, not a downstream model after the latent representation $t_{SR}$ .

However, I didn't find this implementation in this repo. Can you let me know in which part did you implement this IPS loss? Sorry if the questions are naive or due to my carelessness. I'm looking forward to hearing from you.

Best, Shiang

opened by shi-ang 2
Inquiry about how to visualize an attention map

Would you please share with me how to visualize an attention map? You have provided an Attention Map within your paper. How did you make this figure? I am trying to visualize a similar figure by trial and error. How do you use the last layer of attention? I would appreciate it if you could share your method if possible. Best Regards.

opened by kirohirahanoshi 0

Owner

Zifeng

PhD student of Computer Science

GitHub

Creating a Feed of MISP Events from ThreatFox (by abuse.ch)

ThreatFox2Misp Creating a Feed of MISP Events from ThreatFox (by abuse.ch) What will it do? This will fetch IOCs from ThreatFox by Abuse.ch, convert t

17 Nov 22, 2022

This script just scrapes the most recent Nepali news from Kathmandu Post and notifies the user about current events at regular intervals.It sends out the most recent news at random!

Nepali-news-notifier This script just scrapes the most recent Nepali news from Kathmandu Post and notifies the user about current events at regular in

1 Feb 11, 2022

VADER Sentiment Analysis. VADER (Valence Aware Dictionary and sEntiment Reasoner) is a lexicon and rule-based sentiment analysis tool that is specifically attuned to sentiments expressed in social media, and works well on texts from other domains.

VADER-Sentiment-Analysis VADER (Valence Aware Dictionary and sEntiment Reasoner) is a lexicon and rule-based sentiment analysis tool that is specifica

3.8k Dec 30, 2022

VADER Sentiment Analysis. VADER (Valence Aware Dictionary and sEntiment Reasoner) is a lexicon and rule-based sentiment analysis tool that is specifically attuned to sentiments expressed in social media, and works well on texts from other domains.

VADER-Sentiment-Analysis VADER (Valence Aware Dictionary and sEntiment Reasoner) is a lexicon and rule-based sentiment analysis tool that is specifica

2.8k Feb 18, 2021

Twitter-Sentiment-Analysis - Twitter sentiment analysis for india's top online retailers(2019 to 2022)

Twitter-Sentiment-Analysis Twitter sentiment analysis for india's top online retailers(2019 to 2022) Project Overview : Sentiment Analysis helps us to

1 Jan 1, 2022

Twitter-NLP-Analysis - Twitter Natural Language Processing Analysis

Twitter-NLP-Analysis Business Problem I got last @turk_politika 3000 tweets with

7 Mar 12, 2022

Framework for fine-tuning pretrained transformers for Named-Entity Recognition (NER) tasks

NERDA Not only is NERDA a mesmerizing muppet-like character. NERDA is also a python package, that offers a slick easy-to-use interface for fine-tuning

141 Dec 30, 2022

KoBART model on huggingface transformers

KoBART-Transformers SKT에서 공개한 KoBART를 편리하게 사용할 수 있게 transformers로 포팅하였습니다. Install (Optional) BartModel과 PreTrainedTokenizerFast를 이용하면 설치하실 필요 없습니다. p

58 Dec 7, 2022

Big Bird: Transformers for Longer Sequences

BigBird, is a sparse-attention based transformer which extends Transformer based models, such as BERT to much longer sequences. Moreover, BigBird comes along with a theoretical understanding of the capabilities of a complete transformer that the sparse model can handle.

457 Dec 23, 2022

🤗Transformers: State-of-the-art Natural Language Processing for Pytorch and TensorFlow 2.0.

State-of-the-art Natural Language Processing for PyTorch and TensorFlow 2.0 ?? Transformers provides thousands of pretrained models to perform tasks o

77.3k Jan 3, 2023

SurvTRACE: Transformers for Survival Analysis with Competing Events

Related tags

Overview

⭐ SurvTRACE: Transformers for Survival Analysis with Competing Events

🔥 See the demo

🔥 How to config the environment

🔥 How to get SEER data

📝 Functions

😄 If you find this result interesting, please consider to cite this paper:

You might also like...

:mag: Transformers at scale for question answering & neural search. Using NLP via a modular Retriever-Reader-Pipeline. Supporting DPR, Elasticsearch, HuggingFace's Modelhub...

🛸 Use pretrained transformers like BERT, XLNet and GPT-2 in spaCy

spaCy plugin for Transformers , Udify, ELmo, etc.

🤗Transformers: State-of-the-art Natural Language Processing for Pytorch and TensorFlow 2.0.

:mag: End-to-End Framework for building natural language search interfaces to data by utilizing Transformers and the State-of-the-Art of NLP. Supporting DPR, Elasticsearch, HuggingFace’s Modelhub and much more!

🛸 Use pretrained transformers like BERT, XLNet and GPT-2 in spaCy

spaCy plugin for Transformers , Udify, ELmo, etc.

A deep learning-based translation library built on Huggingface transformers

Guide: Finetune GPT2-XL (1.5 Billion Parameters) and GPT-NEO (2.7 B) on a single 16 GB VRAM V100 Google Cloud instance with Huggingface Transformers using DeepSpeed

Comments

How to prepare model input from my own data?

Fail to install the enviroment

Question about inverse propensity score loss

Inquiry about how to visualize an attention map

Owner

Zifeng

Creating a Feed of MISP Events from ThreatFox (by abuse.ch)

This script just scrapes the most recent Nepali news from Kathmandu Post and notifies the user about current events at regular intervals.It sends out the most recent news at random!

VADER Sentiment Analysis. VADER (Valence Aware Dictionary and sEntiment Reasoner) is a lexicon and rule-based sentiment analysis tool that is specifically attuned to sentiments expressed in social media, and works well on texts from other domains.

VADER Sentiment Analysis. VADER (Valence Aware Dictionary and sEntiment Reasoner) is a lexicon and rule-based sentiment analysis tool that is specifically attuned to sentiments expressed in social media, and works well on texts from other domains.

Twitter-Sentiment-Analysis - Twitter sentiment analysis for india's top online retailers(2019 to 2022)

Twitter-NLP-Analysis - Twitter Natural Language Processing Analysis

Framework for fine-tuning pretrained transformers for Named-Entity Recognition (NER) tasks

KoBART model on huggingface transformers

Big Bird: Transformers for Longer Sequences

🤗Transformers: State-of-the-art Natural Language Processing for Pytorch and TensorFlow 2.0.