Best way to get (millions of rows of) data into Janusgraph via Tinkerpop, with a specific model

Just started out with Tinkerpop and Janusgraph, and I'm trying to figure this out based on the documentation.

I have three datasets, each containing about 20 milions rows (csv files)

There is a specific model in which the variables and rows need to be connected, e.g. what are vertices, what are labels, what are edges, etc.

After having everything in a graph, I'd like to of course use some basic Gremlin to see how well the model works.

But first I need a way to get the data into Janusgraph.

Possibly there exist scripts for this.
But otherwise, is it perhaps something to be written in python, to open a csv file, get each row of a variable X, and add this as a vertex/edge/etc. ...?
Or am I completely misinterpreting Janusgraph/Tinkerpop?

Thanks for any help in advance.

EDIT:

Say I have a few files, each of which contain a few million rows, representing people, and several variables, representing different metrics. A first example could look like thid:

 metric_1 metric_2 metric_3 ..

person_1 a e i
person_2 b f j
person_3 c g k
person_4 d h l
..

Should I translate this to files with nodes that are in the first place made up of just the values, [a,..., l].
(and later perhaps more elaborate sets of properties)

And are [a,..., l] then indexed?

The 'Modern' graph here seems to have an index (number 1,...,12 for all the nodes and edges, independent of their overlapping label/category), e.g. should each measurement be indexed separately and then linked to a given person_x to which they belong?

Apologies for these probably straightforward questions, but I'm fairly new to this.

edited Nov 14 '18 at 18:35

asked Nov 13 '18 at 20:02

nikolai

133

1

Does each dataset map to a different graph ? Have you already configured a storage backend ?

– Benoit Guigal
Nov 14 '18 at 14:24

In this case there are several datasets (csv files) that should become one graph. (In another case I will use only one dataset.) For the storage backend: I've downloaded ScyllaDB and I performed step 1 & 2 scylladb.com/download/debian9 -> since I only want to use this on my desktop, not in a cluster (yet) I have not done step 3. Should I?

– nikolai
Nov 14 '18 at 16:13

Ok great. For testing purposes though, I would recomend using the script bin/janusgraph.sh which will start Cassandra, ElasticSearch and a gremlin-server. You will then be free in the future to tune which storage backend you want to use

– Benoit Guigal
Nov 14 '18 at 16:56

Thanks, I'll download Cassandra and do as stated here docs.janusgraph.org/latest/cassandra.html but do I need to use/download ElasticSearch as well? Also, does this not interfere with ScyllaDB?

– nikolai
Nov 14 '18 at 17:25

If you use the script janusgraph.sh you do not have to download anything, Cassandra and Elasticsearch are packaged with JanusGraph. Indeed you have to stop ScyllaDB to avoid conflicting port binding

– Benoit Guigal
Nov 14 '18 at 18:01

add a comment |

Just started out with Tinkerpop and Janusgraph, and I'm trying to figure this out based on the documentation.

I have three datasets, each containing about 20 milions rows (csv files)

There is a specific model in which the variables and rows need to be connected, e.g. what are vertices, what are labels, what are edges, etc.

After having everything in a graph, I'd like to of course use some basic Gremlin to see how well the model works.

But first I need a way to get the data into Janusgraph.

Thanks for any help in advance.

EDIT:

Say I have a few files, each of which contain a few million rows, representing people, and several variables, representing different metrics. A first example could look like thid:

 metric_1 metric_2 metric_3 ..

person_1 a e i
person_2 b f j
person_3 c g k
person_4 d h l
..

Should I translate this to files with nodes that are in the first place made up of just the values, [a,..., l].
(and later perhaps more elaborate sets of properties)

And are [a,..., l] then indexed?

Apologies for these probably straightforward questions, but I'm fairly new to this.

edited Nov 14 '18 at 18:35

asked Nov 13 '18 at 20:02

nikolai

133

1

Does each dataset map to a different graph ? Have you already configured a storage backend ?

– Benoit Guigal
Nov 14 '18 at 14:24

In this case there are several datasets (csv files) that should become one graph. (In another case I will use only one dataset.) For the storage backend: I've downloaded ScyllaDB and I performed step 1 & 2 scylladb.com/download/debian9 -> since I only want to use this on my desktop, not in a cluster (yet) I have not done step 3. Should I?

– nikolai
Nov 14 '18 at 16:13

Ok great. For testing purposes though, I would recomend using the script bin/janusgraph.sh which will start Cassandra, ElasticSearch and a gremlin-server. You will then be free in the future to tune which storage backend you want to use

– Benoit Guigal
Nov 14 '18 at 16:56

Thanks, I'll download Cassandra and do as stated here docs.janusgraph.org/latest/cassandra.html but do I need to use/download ElasticSearch as well? Also, does this not interfere with ScyllaDB?

– nikolai
Nov 14 '18 at 17:25

If you use the script janusgraph.sh you do not have to download anything, Cassandra and Elasticsearch are packaged with JanusGraph. Indeed you have to stop ScyllaDB to avoid conflicting port binding

– Benoit Guigal
Nov 14 '18 at 18:01

add a comment |

Just started out with Tinkerpop and Janusgraph, and I'm trying to figure this out based on the documentation.

I have three datasets, each containing about 20 milions rows (csv files)

There is a specific model in which the variables and rows need to be connected, e.g. what are vertices, what are labels, what are edges, etc.

After having everything in a graph, I'd like to of course use some basic Gremlin to see how well the model works.

But first I need a way to get the data into Janusgraph.

Thanks for any help in advance.

EDIT:

Say I have a few files, each of which contain a few million rows, representing people, and several variables, representing different metrics. A first example could look like thid:

 metric_1 metric_2 metric_3 ..

person_1 a e i
person_2 b f j
person_3 c g k
person_4 d h l
..

Should I translate this to files with nodes that are in the first place made up of just the values, [a,..., l].
(and later perhaps more elaborate sets of properties)

And are [a,..., l] then indexed?

Apologies for these probably straightforward questions, but I'm fairly new to this.

edited Nov 14 '18 at 18:35

asked Nov 13 '18 at 20:02

nikolai

133

Just started out with Tinkerpop and Janusgraph, and I'm trying to figure this out based on the documentation.

I have three datasets, each containing about 20 milions rows (csv files)

There is a specific model in which the variables and rows need to be connected, e.g. what are vertices, what are labels, what are edges, etc.

After having everything in a graph, I'd like to of course use some basic Gremlin to see how well the model works.

But first I need a way to get the data into Janusgraph.

Thanks for any help in advance.

EDIT:

Say I have a few files, each of which contain a few million rows, representing people, and several variables, representing different metrics. A first example could look like thid:

 metric_1 metric_2 metric_3 ..

person_1 a e i
person_2 b f j
person_3 c g k
person_4 d h l
..

Should I translate this to files with nodes that are in the first place made up of just the values, [a,..., l].
(and later perhaps more elaborate sets of properties)

And are [a,..., l] then indexed?

Apologies for these probably straightforward questions, but I'm fairly new to this.

python gremlin tinkerpop tinkerpop3 janusgraph

edited Nov 14 '18 at 18:35

asked Nov 13 '18 at 20:02

nikolai

133

edited Nov 14 '18 at 18:35

asked Nov 13 '18 at 20:02

nikolai

133

edited Nov 14 '18 at 18:35

asked Nov 13 '18 at 20:02

nikolai

133

asked Nov 13 '18 at 20:02

nikolai

133

asked Nov 13 '18 at 20:02

nikolai

133

1

Does each dataset map to a different graph ? Have you already configured a storage backend ?

– Benoit Guigal
Nov 14 '18 at 14:24

In this case there are several datasets (csv files) that should become one graph. (In another case I will use only one dataset.) For the storage backend: I've downloaded ScyllaDB and I performed step 1 & 2 scylladb.com/download/debian9 -> since I only want to use this on my desktop, not in a cluster (yet) I have not done step 3. Should I?

– nikolai
Nov 14 '18 at 16:13

Ok great. For testing purposes though, I would recomend using the script bin/janusgraph.sh which will start Cassandra, ElasticSearch and a gremlin-server. You will then be free in the future to tune which storage backend you want to use

– Benoit Guigal
Nov 14 '18 at 16:56

Thanks, I'll download Cassandra and do as stated here docs.janusgraph.org/latest/cassandra.html but do I need to use/download ElasticSearch as well? Also, does this not interfere with ScyllaDB?

– nikolai
Nov 14 '18 at 17:25

If you use the script janusgraph.sh you do not have to download anything, Cassandra and Elasticsearch are packaged with JanusGraph. Indeed you have to stop ScyllaDB to avoid conflicting port binding

– Benoit Guigal
Nov 14 '18 at 18:01

add a comment |

1

Does each dataset map to a different graph ? Have you already configured a storage backend ?

– Benoit Guigal
Nov 14 '18 at 14:24

In this case there are several datasets (csv files) that should become one graph. (In another case I will use only one dataset.) For the storage backend: I've downloaded ScyllaDB and I performed step 1 & 2 scylladb.com/download/debian9 -> since I only want to use this on my desktop, not in a cluster (yet) I have not done step 3. Should I?

– nikolai
Nov 14 '18 at 16:13

Ok great. For testing purposes though, I would recomend using the script bin/janusgraph.sh which will start Cassandra, ElasticSearch and a gremlin-server. You will then be free in the future to tune which storage backend you want to use

– Benoit Guigal
Nov 14 '18 at 16:56

Thanks, I'll download Cassandra and do as stated here docs.janusgraph.org/latest/cassandra.html but do I need to use/download ElasticSearch as well? Also, does this not interfere with ScyllaDB?

– nikolai
Nov 14 '18 at 17:25

If you use the script janusgraph.sh you do not have to download anything, Cassandra and Elasticsearch are packaged with JanusGraph. Indeed you have to stop ScyllaDB to avoid conflicting port binding

– Benoit Guigal
Nov 14 '18 at 18:01

Does each dataset map to a different graph ? Have you already configured a storage backend ?

– Benoit Guigal
Nov 14 '18 at 14:24

In this case there are several datasets (csv files) that should become one graph. (In another case I will use only one dataset.) For the storage backend: I've downloaded ScyllaDB and I performed step 1 & 2 scylladb.com/download/debian9 -> since I only want to use this on my desktop, not in a cluster (yet) I have not done step 3. Should I?

– nikolai
Nov 14 '18 at 16:13

Ok great. For testing purposes though, I would recomend using the script bin/janusgraph.sh which will start Cassandra, ElasticSearch and a gremlin-server. You will then be free in the future to tune which storage backend you want to use

– Benoit Guigal
Nov 14 '18 at 16:56

Thanks, I'll download Cassandra and do as stated here docs.janusgraph.org/latest/cassandra.html but do I need to use/download ElasticSearch as well? Also, does this not interfere with ScyllaDB?

– nikolai
Nov 14 '18 at 17:25

If you use the script janusgraph.sh you do not have to download anything, Cassandra and Elasticsearch are packaged with JanusGraph. Indeed you have to stop ScyllaDB to avoid conflicting port binding

– Benoit Guigal
Nov 14 '18 at 18:01

add a comment |

2 Answers
2

active

oldest

votes

JanusGraph uses pluggable storage backends and indexs. For testing purposes, a script called bin/janusgraph.sh is packaged with the distribution. It allows to quickly get up and running by starting Cassandra and Elasticsearch (it also starts a gremlin-server but we won't use it)

cd /path/to/janus
bin/janusgraph.sh start

Then I would recommend loading your data using a Groovy script. Groovy scripts can be executed with the Gremlin console

bin/gremlin.sh -e scripts/load_data.script

An efficient way to load the data is to split it into two files:

nodes.csv: one line per node with all attributes

links.csv: one line per link with source_id and target_id and all the links attributes

This might require some data preparation steps.

Here is an example script

The trick to speed up the process is to keep a mapping between your id and the id created by JanusGraph during the creation of the nodes.

Even if it is not mandatory, I strongly recommend you to create an explicit schema for your graph before loading any data. Here is an example script

answered Nov 14 '18 at 17:27

Benoit Guigal

3481417

1

Thanks, this is very helpful! I've updated my question to specify one bit, as I'm also not sure what is exactly meant by 'Id'.

– nikolai
Nov 14 '18 at 18:32

You're welcome, I am not sure to understand your graph structure. Are the persons in your files the vertices of the graph ? What are the links between the persons ? I recommend reading this presentation (Slides 5 to 10) slideshare.net/ptgoetz/… about JanusGraph and graph structure.

– Benoit Guigal
Nov 15 '18 at 8:40

add a comment |

Well, the truth is bulk loading of real user data into JanusGraph is a real pain. I've been using JanuGraph since it's very first version about 2 years ago and its still a pain to bulk load data. A lot of it is not necessarily down to JanusGraph because different users have very different data, different formats, different graph models (ie some mostly need one vertex with one edge ( ex. child-mother ) others deal with one vertex with many edges ( ex user followers ) ) and last but definitely not least, the very nature of the tool deals with large data sets, not to mention the underlying storage and index databases mostly come preconfigured to replicate massively (i.e you might be thinking 20m rows but you actually end up inserting 60m or 80m entries)

All said, I've had moderate success in bulk loading a some tens of millions in decent timeframes (again it will be painful but here are the general steps).

Provide IDs when creating graph elements. If importing from eg MySQL think of perhaps combining the tablename with the id value to create unique IDs eg users1, tweets2

Don't specify schema up front. This is because JanusGraph will need to ensure the data conforms on each inserting

Don't specify index up front. Just related to above but really deserves its own entry. Bulk insert first index later

Please, please, please, be aware of the underlying database features for bulk inserts and activate them i.e read up on Cassandra, ScyllaDB, Big Table, docs especially on replication and indexing

After all the above, configure JanusGraph for bulk loading, ensure your data integrity is correct (i.e no duplicate ids) and consider some form of parallelizing insert request e.g some kind of map reduce system

I think I've covered the major points, again, there's no silver bullet here and the process normally involves quite some trial and error for example the bulk insert rates, too low is bad e.g 10 per second while too high is equally bad eg 10k per second and it almost always depends on your data so its a case by case basis, can't recommend where you should start.

All said and done, give it a real go, bulk load is the hardest part in my opinion and the struggles are well worth the new dimension it gives your application.

All the best!

answered Nov 16 '18 at 2:42

Don Omondi

736813

add a comment |

Your Answer

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53288639%2fbest-way-to-get-millions-of-rows-of-data-into-janusgraph-via-tinkerpop-with-a%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

cd /path/to/janus
bin/janusgraph.sh start

Then I would recommend loading your data using a Groovy script. Groovy scripts can be executed with the Gremlin console

bin/gremlin.sh -e scripts/load_data.script

An efficient way to load the data is to split it into two files:

nodes.csv: one line per node with all attributes

links.csv: one line per link with source_id and target_id and all the links attributes

This might require some data preparation steps.

Here is an example script

The trick to speed up the process is to keep a mapping between your id and the id created by JanusGraph during the creation of the nodes.

Even if it is not mandatory, I strongly recommend you to create an explicit schema for your graph before loading any data. Here is an example script

answered Nov 14 '18 at 17:27

Benoit Guigal

3481417

1

Thanks, this is very helpful! I've updated my question to specify one bit, as I'm also not sure what is exactly meant by 'Id'.

– nikolai
Nov 14 '18 at 18:32

You're welcome, I am not sure to understand your graph structure. Are the persons in your files the vertices of the graph ? What are the links between the persons ? I recommend reading this presentation (Slides 5 to 10) slideshare.net/ptgoetz/… about JanusGraph and graph structure.

– Benoit Guigal
Nov 15 '18 at 8:40

add a comment |

cd /path/to/janus
bin/janusgraph.sh start

Then I would recommend loading your data using a Groovy script. Groovy scripts can be executed with the Gremlin console

bin/gremlin.sh -e scripts/load_data.script

An efficient way to load the data is to split it into two files:

nodes.csv: one line per node with all attributes

links.csv: one line per link with source_id and target_id and all the links attributes

This might require some data preparation steps.

Here is an example script

The trick to speed up the process is to keep a mapping between your id and the id created by JanusGraph during the creation of the nodes.

Even if it is not mandatory, I strongly recommend you to create an explicit schema for your graph before loading any data. Here is an example script

answered Nov 14 '18 at 17:27

Benoit Guigal

3481417

1

Thanks, this is very helpful! I've updated my question to specify one bit, as I'm also not sure what is exactly meant by 'Id'.

– nikolai
Nov 14 '18 at 18:32

You're welcome, I am not sure to understand your graph structure. Are the persons in your files the vertices of the graph ? What are the links between the persons ? I recommend reading this presentation (Slides 5 to 10) slideshare.net/ptgoetz/… about JanusGraph and graph structure.

– Benoit Guigal
Nov 15 '18 at 8:40

add a comment |

cd /path/to/janus
bin/janusgraph.sh start

Then I would recommend loading your data using a Groovy script. Groovy scripts can be executed with the Gremlin console

bin/gremlin.sh -e scripts/load_data.script

An efficient way to load the data is to split it into two files:

nodes.csv: one line per node with all attributes

links.csv: one line per link with source_id and target_id and all the links attributes

This might require some data preparation steps.

Here is an example script

The trick to speed up the process is to keep a mapping between your id and the id created by JanusGraph during the creation of the nodes.

Even if it is not mandatory, I strongly recommend you to create an explicit schema for your graph before loading any data. Here is an example script

answered Nov 14 '18 at 17:27

Benoit Guigal

3481417

cd /path/to/janus
bin/janusgraph.sh start

Then I would recommend loading your data using a Groovy script. Groovy scripts can be executed with the Gremlin console

bin/gremlin.sh -e scripts/load_data.script

An efficient way to load the data is to split it into two files:

nodes.csv: one line per node with all attributes

links.csv: one line per link with source_id and target_id and all the links attributes

This might require some data preparation steps.

Here is an example script

The trick to speed up the process is to keep a mapping between your id and the id created by JanusGraph during the creation of the nodes.

Even if it is not mandatory, I strongly recommend you to create an explicit schema for your graph before loading any data. Here is an example script

answered Nov 14 '18 at 17:27

Benoit Guigal

3481417

answered Nov 14 '18 at 17:27

Benoit Guigal

3481417

answered Nov 14 '18 at 17:27

Benoit Guigal

3481417

answered Nov 14 '18 at 17:27

Benoit Guigal

3481417

1

Thanks, this is very helpful! I've updated my question to specify one bit, as I'm also not sure what is exactly meant by 'Id'.

– nikolai
Nov 14 '18 at 18:32

You're welcome, I am not sure to understand your graph structure. Are the persons in your files the vertices of the graph ? What are the links between the persons ? I recommend reading this presentation (Slides 5 to 10) slideshare.net/ptgoetz/… about JanusGraph and graph structure.

– Benoit Guigal
Nov 15 '18 at 8:40

add a comment |

1

Thanks, this is very helpful! I've updated my question to specify one bit, as I'm also not sure what is exactly meant by 'Id'.

– nikolai
Nov 14 '18 at 18:32

You're welcome, I am not sure to understand your graph structure. Are the persons in your files the vertices of the graph ? What are the links between the persons ? I recommend reading this presentation (Slides 5 to 10) slideshare.net/ptgoetz/… about JanusGraph and graph structure.

– Benoit Guigal
Nov 15 '18 at 8:40

Thanks, this is very helpful! I've updated my question to specify one bit, as I'm also not sure what is exactly meant by 'Id'.

– nikolai
Nov 14 '18 at 18:32

You're welcome, I am not sure to understand your graph structure. Are the persons in your files the vertices of the graph ? What are the links between the persons ? I recommend reading this presentation (Slides 5 to 10) slideshare.net/ptgoetz/… about JanusGraph and graph structure.

– Benoit Guigal
Nov 15 '18 at 8:40

add a comment |

All said, I've had moderate success in bulk loading a some tens of millions in decent timeframes (again it will be painful but here are the general steps).

Provide IDs when creating graph elements. If importing from eg MySQL think of perhaps combining the tablename with the id value to create unique IDs eg users1, tweets2

Don't specify schema up front. This is because JanusGraph will need to ensure the data conforms on each inserting

Don't specify index up front. Just related to above but really deserves its own entry. Bulk insert first index later

Please, please, please, be aware of the underlying database features for bulk inserts and activate them i.e read up on Cassandra, ScyllaDB, Big Table, docs especially on replication and indexing

After all the above, configure JanusGraph for bulk loading, ensure your data integrity is correct (i.e no duplicate ids) and consider some form of parallelizing insert request e.g some kind of map reduce system

All said and done, give it a real go, bulk load is the hardest part in my opinion and the struggles are well worth the new dimension it gives your application.

All the best!

answered Nov 16 '18 at 2:42

Don Omondi

736813

add a comment |

All said, I've had moderate success in bulk loading a some tens of millions in decent timeframes (again it will be painful but here are the general steps).

Provide IDs when creating graph elements. If importing from eg MySQL think of perhaps combining the tablename with the id value to create unique IDs eg users1, tweets2

Don't specify schema up front. This is because JanusGraph will need to ensure the data conforms on each inserting

Don't specify index up front. Just related to above but really deserves its own entry. Bulk insert first index later

Please, please, please, be aware of the underlying database features for bulk inserts and activate them i.e read up on Cassandra, ScyllaDB, Big Table, docs especially on replication and indexing

After all the above, configure JanusGraph for bulk loading, ensure your data integrity is correct (i.e no duplicate ids) and consider some form of parallelizing insert request e.g some kind of map reduce system

All said and done, give it a real go, bulk load is the hardest part in my opinion and the struggles are well worth the new dimension it gives your application.

All the best!

answered Nov 16 '18 at 2:42

Don Omondi

736813

add a comment |

All said, I've had moderate success in bulk loading a some tens of millions in decent timeframes (again it will be painful but here are the general steps).

Provide IDs when creating graph elements. If importing from eg MySQL think of perhaps combining the tablename with the id value to create unique IDs eg users1, tweets2

Don't specify schema up front. This is because JanusGraph will need to ensure the data conforms on each inserting

Don't specify index up front. Just related to above but really deserves its own entry. Bulk insert first index later

Please, please, please, be aware of the underlying database features for bulk inserts and activate them i.e read up on Cassandra, ScyllaDB, Big Table, docs especially on replication and indexing

After all the above, configure JanusGraph for bulk loading, ensure your data integrity is correct (i.e no duplicate ids) and consider some form of parallelizing insert request e.g some kind of map reduce system

All said and done, give it a real go, bulk load is the hardest part in my opinion and the struggles are well worth the new dimension it gives your application.

All the best!

answered Nov 16 '18 at 2:42

Don Omondi

736813

All said, I've had moderate success in bulk loading a some tens of millions in decent timeframes (again it will be painful but here are the general steps).

Provide IDs when creating graph elements. If importing from eg MySQL think of perhaps combining the tablename with the id value to create unique IDs eg users1, tweets2

Don't specify schema up front. This is because JanusGraph will need to ensure the data conforms on each inserting

Don't specify index up front. Just related to above but really deserves its own entry. Bulk insert first index later

Please, please, please, be aware of the underlying database features for bulk inserts and activate them i.e read up on Cassandra, ScyllaDB, Big Table, docs especially on replication and indexing

After all the above, configure JanusGraph for bulk loading, ensure your data integrity is correct (i.e no duplicate ids) and consider some form of parallelizing insert request e.g some kind of map reduce system

All said and done, give it a real go, bulk load is the hardest part in my opinion and the struggles are well worth the new dimension it gives your application.

All the best!

answered Nov 16 '18 at 2:42

Don Omondi

736813

answered Nov 16 '18 at 2:42

Don Omondi

736813

answered Nov 16 '18 at 2:42

Don Omondi

736813

answered Nov 16 '18 at 2:42

Don Omondi

736813

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

yu,2BKjDO Rfw8jzNcJl V55evCcuwLt

搜尋此網誌

Odtnhj