Query writing performance on neo4j with py2neo

Currently im struggle on finding a performant way, running multiple queries with py2neo. My problem is a have a big list of write queries in python that need to be written to neo4j.

I tried multiple ways to solve the issue right now. The best working approach for me was the following one:

from py2neo import Graph
queries = ["create (n) return id(n)","create (n) return id(n)",...] ## list of queries
g = Graph()
t = graph.begin(autocommit=False)
for idx, q in enumerate(queries):
 t.run(q)
 if idx % 100 == 0:
 t.commit()
 t = graph.begin(autocommit=False)
t.commit()

It it still takes to long for writing the queries. I also tried the run many from apoc without success, query was never finished. I also tried the same writing method with auto commit. Is there a better way to do this? Are there any tricks like dropping indexes first and then adding them after inserting the data?

-- Edit: Additional information:

I'm using Neo4j 3.4, Py2neo v4 and Python 3.7

edited Nov 15 at 9:35

asked Nov 12 at 11:54

Bierbarbar

751420

Are you creating a lot of the same thing? i.e. CREATE (n:Person name: 'Joe') RETURN id(n) * 1000 different people? If you are not using indexes to create do you need them later for querying? If so might as well leave them in, if not then do you even need them? Are you initializing a graph from scratch, if so there might be some other tools available to more quickly start it up.
– jacob.mccrumb
Nov 13 at 19:32

I sending a big variety of queries to neo4j types including create for nodes, merge and create for relations. The queries can differ a lot. I need the indexes for querying later. No unfortunately I didn't build up the graph from scratch.
– Bierbarbar
Nov 14 at 6:45

Fair -- what version of py2neo? More specifically, are you using bolt to connect (available/default as of v4) or http? Are you using the id(n) returns or are they just there for debugging/etc? Can you track run times of all queries and see if any in particular are going slow? Sorry for all the questions, query tuning a ton of queries is a complex thing :)
– jacob.mccrumb
Nov 14 at 14:38

1

Oh yeah sorry i forgot. I'm using py2neo v4 and the newest version of neo4j from docker hub. There a no returns, this queries that i gave a basically placeholders but i just have write queries, where i don't care about the return. I tracked run times - one of the queries is slow in particular it is only about sending so much queries that causes problems.
– Bierbarbar
Nov 15 at 9:33

1

Sounds good -- see @InverseFalcon 's answer below. For creating a lot of similar things use UNWIND + query parameters: UNWIND $list AS item MERGE (n:Node foo: item.bar) and call it with a paremeter list with the props for each node you want to create. And make sure you are using the bolt to connect, not http.
– jacob.mccrumb
Nov 15 at 22:04

add a comment |

Currently im struggle on finding a performant way, running multiple queries with py2neo. My problem is a have a big list of write queries in python that need to be written to neo4j.

I tried multiple ways to solve the issue right now. The best working approach for me was the following one:

from py2neo import Graph
queries = ["create (n) return id(n)","create (n) return id(n)",...] ## list of queries
g = Graph()
t = graph.begin(autocommit=False)
for idx, q in enumerate(queries):
 t.run(q)
 if idx % 100 == 0:
 t.commit()
 t = graph.begin(autocommit=False)
t.commit()

-- Edit: Additional information:

I'm using Neo4j 3.4, Py2neo v4 and Python 3.7

edited Nov 15 at 9:35

asked Nov 12 at 11:54

Bierbarbar

751420

Are you creating a lot of the same thing? i.e. CREATE (n:Person name: 'Joe') RETURN id(n) * 1000 different people? If you are not using indexes to create do you need them later for querying? If so might as well leave them in, if not then do you even need them? Are you initializing a graph from scratch, if so there might be some other tools available to more quickly start it up.
– jacob.mccrumb
Nov 13 at 19:32

I sending a big variety of queries to neo4j types including create for nodes, merge and create for relations. The queries can differ a lot. I need the indexes for querying later. No unfortunately I didn't build up the graph from scratch.
– Bierbarbar
Nov 14 at 6:45

Fair -- what version of py2neo? More specifically, are you using bolt to connect (available/default as of v4) or http? Are you using the id(n) returns or are they just there for debugging/etc? Can you track run times of all queries and see if any in particular are going slow? Sorry for all the questions, query tuning a ton of queries is a complex thing :)
– jacob.mccrumb
Nov 14 at 14:38

1

Oh yeah sorry i forgot. I'm using py2neo v4 and the newest version of neo4j from docker hub. There a no returns, this queries that i gave a basically placeholders but i just have write queries, where i don't care about the return. I tracked run times - one of the queries is slow in particular it is only about sending so much queries that causes problems.
– Bierbarbar
Nov 15 at 9:33

1

Sounds good -- see @InverseFalcon 's answer below. For creating a lot of similar things use UNWIND + query parameters: UNWIND $list AS item MERGE (n:Node foo: item.bar) and call it with a paremeter list with the props for each node you want to create. And make sure you are using the bolt to connect, not http.
– jacob.mccrumb
Nov 15 at 22:04

add a comment |

Currently im struggle on finding a performant way, running multiple queries with py2neo. My problem is a have a big list of write queries in python that need to be written to neo4j.

I tried multiple ways to solve the issue right now. The best working approach for me was the following one:

from py2neo import Graph
queries = ["create (n) return id(n)","create (n) return id(n)",...] ## list of queries
g = Graph()
t = graph.begin(autocommit=False)
for idx, q in enumerate(queries):
 t.run(q)
 if idx % 100 == 0:
 t.commit()
 t = graph.begin(autocommit=False)
t.commit()

-- Edit: Additional information:

I'm using Neo4j 3.4, Py2neo v4 and Python 3.7

edited Nov 15 at 9:35

asked Nov 12 at 11:54

Bierbarbar

751420

Currently im struggle on finding a performant way, running multiple queries with py2neo. My problem is a have a big list of write queries in python that need to be written to neo4j.

I tried multiple ways to solve the issue right now. The best working approach for me was the following one:

from py2neo import Graph
queries = ["create (n) return id(n)","create (n) return id(n)",...] ## list of queries
g = Graph()
t = graph.begin(autocommit=False)
for idx, q in enumerate(queries):
 t.run(q)
 if idx % 100 == 0:
 t.commit()
 t = graph.begin(autocommit=False)
t.commit()

-- Edit: Additional information:

I'm using Neo4j 3.4, Py2neo v4 and Python 3.7

python neo4j graph-databases py2neo

edited Nov 15 at 9:35

asked Nov 12 at 11:54

Bierbarbar

751420

edited Nov 15 at 9:35

asked Nov 12 at 11:54

Bierbarbar

751420

edited Nov 15 at 9:35

asked Nov 12 at 11:54

Bierbarbar

751420

asked Nov 12 at 11:54

Bierbarbar

751420

asked Nov 12 at 11:54

Bierbarbar

751420

Are you creating a lot of the same thing? i.e. CREATE (n:Person name: 'Joe') RETURN id(n) * 1000 different people? If you are not using indexes to create do you need them later for querying? If so might as well leave them in, if not then do you even need them? Are you initializing a graph from scratch, if so there might be some other tools available to more quickly start it up.
– jacob.mccrumb
Nov 13 at 19:32

I sending a big variety of queries to neo4j types including create for nodes, merge and create for relations. The queries can differ a lot. I need the indexes for querying later. No unfortunately I didn't build up the graph from scratch.
– Bierbarbar
Nov 14 at 6:45

Fair -- what version of py2neo? More specifically, are you using bolt to connect (available/default as of v4) or http? Are you using the id(n) returns or are they just there for debugging/etc? Can you track run times of all queries and see if any in particular are going slow? Sorry for all the questions, query tuning a ton of queries is a complex thing :)
– jacob.mccrumb
Nov 14 at 14:38

1

Oh yeah sorry i forgot. I'm using py2neo v4 and the newest version of neo4j from docker hub. There a no returns, this queries that i gave a basically placeholders but i just have write queries, where i don't care about the return. I tracked run times - one of the queries is slow in particular it is only about sending so much queries that causes problems.
– Bierbarbar
Nov 15 at 9:33

1

Sounds good -- see @InverseFalcon 's answer below. For creating a lot of similar things use UNWIND + query parameters: UNWIND $list AS item MERGE (n:Node foo: item.bar) and call it with a paremeter list with the props for each node you want to create. And make sure you are using the bolt to connect, not http.
– jacob.mccrumb
Nov 15 at 22:04

add a comment |

Are you creating a lot of the same thing? i.e. CREATE (n:Person name: 'Joe') RETURN id(n) * 1000 different people? If you are not using indexes to create do you need them later for querying? If so might as well leave them in, if not then do you even need them? Are you initializing a graph from scratch, if so there might be some other tools available to more quickly start it up.
– jacob.mccrumb
Nov 13 at 19:32

I sending a big variety of queries to neo4j types including create for nodes, merge and create for relations. The queries can differ a lot. I need the indexes for querying later. No unfortunately I didn't build up the graph from scratch.
– Bierbarbar
Nov 14 at 6:45

Fair -- what version of py2neo? More specifically, are you using bolt to connect (available/default as of v4) or http? Are you using the id(n) returns or are they just there for debugging/etc? Can you track run times of all queries and see if any in particular are going slow? Sorry for all the questions, query tuning a ton of queries is a complex thing :)
– jacob.mccrumb
Nov 14 at 14:38

1

Oh yeah sorry i forgot. I'm using py2neo v4 and the newest version of neo4j from docker hub. There a no returns, this queries that i gave a basically placeholders but i just have write queries, where i don't care about the return. I tracked run times - one of the queries is slow in particular it is only about sending so much queries that causes problems.
– Bierbarbar
Nov 15 at 9:33

1

Sounds good -- see @InverseFalcon 's answer below. For creating a lot of similar things use UNWIND + query parameters: UNWIND $list AS item MERGE (n:Node foo: item.bar) and call it with a paremeter list with the props for each node you want to create. And make sure you are using the bolt to connect, not http.
– jacob.mccrumb
Nov 15 at 22:04

Are you creating a lot of the same thing? i.e. CREATE (n:Person name: 'Joe') RETURN id(n) * 1000 different people? If you are not using indexes to create do you need them later for querying? If so might as well leave them in, if not then do you even need them? Are you initializing a graph from scratch, if so there might be some other tools available to more quickly start it up.
– jacob.mccrumb
Nov 13 at 19:32

I sending a big variety of queries to neo4j types including create for nodes, merge and create for relations. The queries can differ a lot. I need the indexes for querying later. No unfortunately I didn't build up the graph from scratch.
– Bierbarbar
Nov 14 at 6:45

Fair -- what version of py2neo? More specifically, are you using bolt to connect (available/default as of v4) or http? Are you using the id(n) returns or are they just there for debugging/etc? Can you track run times of all queries and see if any in particular are going slow? Sorry for all the questions, query tuning a ton of queries is a complex thing :)
– jacob.mccrumb
Nov 14 at 14:38

Oh yeah sorry i forgot. I'm using py2neo v4 and the newest version of neo4j from docker hub. There a no returns, this queries that i gave a basically placeholders but i just have write queries, where i don't care about the return. I tracked run times - one of the queries is slow in particular it is only about sending so much queries that causes problems.
– Bierbarbar
Nov 15 at 9:33

Sounds good -- see @InverseFalcon 's answer below. For creating a lot of similar things use UNWIND + query parameters: UNWIND $list AS item MERGE (n:Node foo: item.bar) and call it with a paremeter list with the props for each node you want to create. And make sure you are using the bolt to connect, not http.
– jacob.mccrumb
Nov 15 at 22:04

add a comment |

1 Answer
1

active

oldest

votes

You may want to read up on Michael Hunger's tips and tricks for fast batched updates.

The key trick is using UNWIND to transform list elements into rows, and then subsequent operations are performed per row.

There are supporting functions that can easily create lists for you, like range().

As an example, if you wanted to create 10k nodes and add a name property, then return the node name and its graph id, you could do something like this:

UNWIND range(1, 10000) as index
CREATE (n:Node name:'Node ' + index)
RETURN n.name as name, id(n) as id

Likewise if you have a good amount of data to import, you can create a list of parameter maps, call the query, then UNWIND the list to operate on each entry at once, similar to how we process CSV files with LOAD CSV.

answered Nov 15 at 10:18

InverseFalcon

18.2k21829

Thanks for your answer. I will try it next week and if it works i will mark it as accepted.
– Bierbarbar
Nov 16 at 7:46

add a comment |

Your Answer

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53261625%2fquery-writing-performance-on-neo4j-with-py2neo%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

You may want to read up on Michael Hunger's tips and tricks for fast batched updates.

The key trick is using UNWIND to transform list elements into rows, and then subsequent operations are performed per row.

There are supporting functions that can easily create lists for you, like range().

As an example, if you wanted to create 10k nodes and add a name property, then return the node name and its graph id, you could do something like this:

UNWIND range(1, 10000) as index
CREATE (n:Node name:'Node ' + index)
RETURN n.name as name, id(n) as id

answered Nov 15 at 10:18

InverseFalcon

18.2k21829

Thanks for your answer. I will try it next week and if it works i will mark it as accepted.
– Bierbarbar
Nov 16 at 7:46

add a comment |

You may want to read up on Michael Hunger's tips and tricks for fast batched updates.

The key trick is using UNWIND to transform list elements into rows, and then subsequent operations are performed per row.

There are supporting functions that can easily create lists for you, like range().

As an example, if you wanted to create 10k nodes and add a name property, then return the node name and its graph id, you could do something like this:

UNWIND range(1, 10000) as index
CREATE (n:Node name:'Node ' + index)
RETURN n.name as name, id(n) as id

answered Nov 15 at 10:18

InverseFalcon

18.2k21829

Thanks for your answer. I will try it next week and if it works i will mark it as accepted.
– Bierbarbar
Nov 16 at 7:46

add a comment |

You may want to read up on Michael Hunger's tips and tricks for fast batched updates.

The key trick is using UNWIND to transform list elements into rows, and then subsequent operations are performed per row.

There are supporting functions that can easily create lists for you, like range().

As an example, if you wanted to create 10k nodes and add a name property, then return the node name and its graph id, you could do something like this:

UNWIND range(1, 10000) as index
CREATE (n:Node name:'Node ' + index)
RETURN n.name as name, id(n) as id

answered Nov 15 at 10:18

InverseFalcon

18.2k21829

You may want to read up on Michael Hunger's tips and tricks for fast batched updates.

The key trick is using UNWIND to transform list elements into rows, and then subsequent operations are performed per row.

There are supporting functions that can easily create lists for you, like range().

As an example, if you wanted to create 10k nodes and add a name property, then return the node name and its graph id, you could do something like this:

UNWIND range(1, 10000) as index
CREATE (n:Node name:'Node ' + index)
RETURN n.name as name, id(n) as id

answered Nov 15 at 10:18

InverseFalcon

18.2k21829

answered Nov 15 at 10:18

InverseFalcon

18.2k21829

answered Nov 15 at 10:18

InverseFalcon

18.2k21829

answered Nov 15 at 10:18

InverseFalcon

18.2k21829

Thanks for your answer. I will try it next week and if it works i will mark it as accepted.
– Bierbarbar
Nov 16 at 7:46

add a comment |

Thanks for your answer. I will try it next week and if it works i will mark it as accepted.
– Bierbarbar
Nov 16 at 7:46

Thanks for your answer. I will try it next week and if it works i will mark it as accepted.
– Bierbarbar
Nov 16 at 7:46

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

Some of your past answers have not been well-received, and you're in danger of being blocked from answering.

Please pay close attention to the following guidance:

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

HuNnig,a,EVYX,XUtM KdeqiFWhetx0qf7 m hv

搜尋此網誌

Odtnhj