ADGC: Awesome Deep Graph Clustering
ADGC is a collection of state-of-the-art (SOTA), novel deep graph clustering methods (papers, codes and datasets). Any other interesting papers and codes are welcome. Any problems, please contact [email protected].
What's Deep Graph Clustering?
Deep graph clustering, which aims to reveal the underlying graph structure and divide the nodes into different groups, has attracted intensive attention in recent years.
Important Survey Papers
Papers
- K-Means: "Algorithm AS 136: A k-means clustering algorithm" [pdf|code]
- DCN (ICML17): "Towards k-means-friendly spaces: Simultaneous deep learning and clustering" [pdf|code]
- DEC (ICML16): "Unsupervised Deep Embedding for Clustering Analysis" [pdf|code]
- IDEC (IJCAI17): "Improved Deep Embedded Clustering with Local Structure Preservation" [pdf|code]
- GAE/VGAE : "Variational Graph Auto-Encoders" [pdf|code]
- DAEGC (IJCAI19): "Attributed Graph Clustering: A Deep Attentional Embedding Approach" [pdf|code]
- ARGA/ARVGA (TCYB19): "Learning Graph Embedding with Adversarial Training Methods" [pdf|code]
- SDCN/SDCN_Q (WWW20): "Structural Deep Clustering Network" [pdf|code]
- DFCN (AAAI21): "Deep Fusion Clustering Network" [pdf|code]
- MVGRL (ICML20): "Contrastive Multi-View Representation Learning on Graphs" [pdf|code]
Benchmark Datasets
We divide the datasets into two categories, i.e. graph datasets and non-graph datasets. Graph datasets are some graphs in real-world, such as citation networks, social networks and so on. Non-graph datasets are NOT graph type. However, if necessary, we could construct "adjacency matrices" by K-Nearest Neighbors (KNN) algorithm.
Quick Start
-
Step1: Download all datasets from [Google Drive|Baidu Netdisk]. Optionally, download some of them from URLs in the tables (Google Drive)
-
Step2: Unzip them to ./dataset/
-
Step3: Run the ./dataset/utils.py
Two functions load_graph_data and load_data are provided in ./dataset/utils.py to load graph datasets and non-graph datasets, respectively.
Datasets Details
-
Graph Datasets
Dataset Samples Dimension Edges Classes URL DBLP 4057 334 3528 4 dblp.zip CITE 3327 3703 4552 6 cite.zip ACM 3025 1870 13128 3 acm.zip AMAP 7650 745 119081 8 amap.zip AMAC 13752 767 245861 10 amac.zip PUBMED 19717 500 44325 3 pubmed.zip CORAFULL 19793 8710 63421 70 corafull.zip CORA 2708 1433 6632 7 cora.zip CITESEER 3327 3703 6215 6 citeseer.zip -
Non-graph Datasets
Dataset Samples Dimension Type Classes URL USPS 9298 256 Image 10 usps.zip HHAR 10299 561 Record 6 hhar.zip REUT 10000 2000 Text 4 reut.zip
If you find this repository useful to your research or work, it is really appreciate to star this repository.