Nebula LiveJournal, Import LiveJournal Dataset into Nebula Graph and Run Nebula Algorithm
![Import LiveJournal Dataset into Nebula Graph and Run Nebula Algorithm 导入 Livejournal 数据集到 Nebula 并运行 Nebula Algorithm 图算法 Import LiveJournal Dataset into Nebula Graph and Run Nebula Algorithm 导入 Livejournal 数据集到 Nebula 并运行 Nebula Algorithm 图算法](/en/nebula-livejournal/featured-image.webp)
一个导入 Livejournal 数据集到 Nebula Graph 图数据库,并执行 Nebula Algorithm 图算法的过程分享。
Related GitHub Repo: https://github.com/wey-gu/nebula-LiveJournal
nebula-LiveJournal
LiveJournal Dataset is a Social Network Dataset in one file with two columns(FromNodeId, ToNodeId).
|
|
It could be accessed in https://snap.stanford.edu/data/soc-LiveJournal1.html.
Dataset statistics | |
---|---|
Nodes | 4847571 |
Edges | 68993773 |
Nodes in largest WCC | 4843953 (0.999) |
Edges in largest WCC | 68983820 (1.000) |
Nodes in largest SCC | 3828682 (0.790) |
Edges in largest SCC | 65825429 (0.954) |
Average clustering coefficient | 0.2742 |
Number of triangles | 285730264 |
Fraction of closed triangles | 0.04266 |
Diameter (longest shortest path) | 16 |
90-percentile effective diameter | 6.5 |
1 Dataset Download and Preprocessing
1.1 Download
It is accesissiable from the official web page:
|
|
Comments in data file should be removed to make the data import tool happy.
1.2 Preprocessing
|
|
2 Import dataset to Nebula Graph
2.1 With Nebula Importer
Nebula-Importer is a Golang Headless import tool for Nebula Graph.
You may need to edit the config file under nebula-importer/importer.yaml on Nebula Graph’s address and credential。
Then, Nebula-Importer could be called in Docker as follow:
|
|
Or if you have the binary nebula-importer locally:
|
|
2.2 With Nebula Exchange
Nebula-Exchange is a Spark Application to enable batch and streaming data import from multiple data sources to Nebula Graph.
To be done. (You can refer to https://siwei.io/nebula-exchange-sst-2.x/)
3 Run Algorithms with Nebula Graph
Nebula-Algorithm is a Spark/GraphX Application to run Graph Algorithms with data consumed from files or a Nebula Graph Cluster.
Supported Algorithms for now:
Name | Use Case |
---|---|
PageRank | page ranking, important node digging |
Louvain | community digging, hierarchical clustering |
KCore | community detection, financial risk control |
LabelPropagation | community detection, consultation propagation, advertising recommendation |
ConnectedComponent | community detection, isolated island detection |
StronglyConnectedComponent | community detection |
ShortestPath | path plan, network plan |
TriangleCount | network structure analysis |
BetweennessCentrality | important node digging, node influence calculation |
DegreeStatic | graph structure analysis |
3.1 Ad-hoc Spark Env setup
Here I assume the Nebula Graph was bootstraped with Nebula-Up, thus nebula is running in a Docker Network named nebula-docker-compose_nebula-net
.
Then let’s start a single server spark:
|
|
Thus we could make spark application submt inside this container:
|
|
3.2 Run Algorithms
There are many altorithms supported by Nebula-Algorithm, here some of their configuration files were put under nebula-algorithm as an example.
Before using them, please first edit and change Nebula Graph Cluster Addresses and credentials.
|
|
Then we could enter the spark container and call corresponding algorithms as follow.
Please adjust your --driver-memeory
accordingly, i.e. pagerank altorithm:
|
|
After the algorithm finished, the output will be under the path insdie the container defined in conf file:
|
|
题图版权:@sigmund