Merge Dataframes on basis of coordinates having no common columns

INPUT :

df1

Pg x0 y0 x1 y1 Text
1 521.3 745.92 537.348 754.097 word1
1 538.982 745.92 580.247 754.097 word2
1 527.978 735.253 572.996 747.727 word3
2 268.985 732.36 341.59 746.636 word4
2 344.443 732.36 390.175 746.636 word5

df2

Pg x0 y0 x1 y1 Text T R C
1 507.6 730.8 593.76 754.8 word1 word2 word3 1 1 2
2 334.56 732.36 401.34 746.636 word5 2 3 1

Expected OUTPUT :

Pg x0 y0 x1 y1 Text T R C
1 521.3 745.92 537.348 754.097 word1 1 1 2
1 538.982 745.92 580.247 754.097 word2 1 1 2
1 527.978 735.253 572.996 747.727 word3 1 1 2
2 268.985 732.36 341.59 746.636 word4 
2 344.443 732.36 390.175 746.636 word5 2 3 1

I need to find which all words in df1 are present in df2 on basis of coordinates(overlap) and not Text based approach. After this I need to copy the values of columns [T, R, C] from df2 to df1.

For eg : First row of df2 has coordinates that overlap the coordinates of the word1, word2, word3 of df1. Overlap here means the bbox(x0, y0, x1, y1) of a row in df1 should lie inside the bbox(x0, y0, x1, y1) of a specific row of df2.

My Approach :

I am iterating each row in df2 and then comparing each row coordinate from df1 to find any overlaps and then merging the dataframes.

for i, r in df2.iterrows():
 df1.loc[
 (df1.x0 >= r.x0) &
 (df1.y0 >= r.y0) &
 (df1.x1 <= r.x1) &
 (df1.y1 <= r.y1) , 'flag'] = 1

 df1.loc[df.flag == 1, ['T', 'R', 'C']] = r.T, r.R, r.C

Problem is the whole process is working properly as expected but takes a lot of time to run. It takes around 90 seconds to run df1 = 20,000 rows and df2 = 3500 rows.

edited Nov 13 '18 at 10:08

asked Nov 12 '18 at 10:40

Mahendra Singh

405

Can you post your working code that merges the dataframes so we can make specific recommendations on how to improve it?

– Michael
Nov 12 '18 at 15:03

add a comment |

INPUT :

df1

Pg x0 y0 x1 y1 Text
1 521.3 745.92 537.348 754.097 word1
1 538.982 745.92 580.247 754.097 word2
1 527.978 735.253 572.996 747.727 word3
2 268.985 732.36 341.59 746.636 word4
2 344.443 732.36 390.175 746.636 word5

df2

Pg x0 y0 x1 y1 Text T R C
1 507.6 730.8 593.76 754.8 word1 word2 word3 1 1 2
2 334.56 732.36 401.34 746.636 word5 2 3 1

Expected OUTPUT :

Pg x0 y0 x1 y1 Text T R C
1 521.3 745.92 537.348 754.097 word1 1 1 2
1 538.982 745.92 580.247 754.097 word2 1 1 2
1 527.978 735.253 572.996 747.727 word3 1 1 2
2 268.985 732.36 341.59 746.636 word4 
2 344.443 732.36 390.175 746.636 word5 2 3 1

I need to find which all words in df1 are present in df2 on basis of coordinates(overlap) and not Text based approach. After this I need to copy the values of columns [T, R, C] from df2 to df1.

My Approach :

I am iterating each row in df2 and then comparing each row coordinate from df1 to find any overlaps and then merging the dataframes.

for i, r in df2.iterrows():
 df1.loc[
 (df1.x0 >= r.x0) &
 (df1.y0 >= r.y0) &
 (df1.x1 <= r.x1) &
 (df1.y1 <= r.y1) , 'flag'] = 1

 df1.loc[df.flag == 1, ['T', 'R', 'C']] = r.T, r.R, r.C

Problem is the whole process is working properly as expected but takes a lot of time to run. It takes around 90 seconds to run df1 = 20,000 rows and df2 = 3500 rows.

edited Nov 13 '18 at 10:08

asked Nov 12 '18 at 10:40

Mahendra Singh

405

Can you post your working code that merges the dataframes so we can make specific recommendations on how to improve it?

– Michael
Nov 12 '18 at 15:03

add a comment |

INPUT :

df1

Pg x0 y0 x1 y1 Text
1 521.3 745.92 537.348 754.097 word1
1 538.982 745.92 580.247 754.097 word2
1 527.978 735.253 572.996 747.727 word3
2 268.985 732.36 341.59 746.636 word4
2 344.443 732.36 390.175 746.636 word5

df2

Pg x0 y0 x1 y1 Text T R C
1 507.6 730.8 593.76 754.8 word1 word2 word3 1 1 2
2 334.56 732.36 401.34 746.636 word5 2 3 1

Expected OUTPUT :

Pg x0 y0 x1 y1 Text T R C
1 521.3 745.92 537.348 754.097 word1 1 1 2
1 538.982 745.92 580.247 754.097 word2 1 1 2
1 527.978 735.253 572.996 747.727 word3 1 1 2
2 268.985 732.36 341.59 746.636 word4 
2 344.443 732.36 390.175 746.636 word5 2 3 1

I need to find which all words in df1 are present in df2 on basis of coordinates(overlap) and not Text based approach. After this I need to copy the values of columns [T, R, C] from df2 to df1.

My Approach :

I am iterating each row in df2 and then comparing each row coordinate from df1 to find any overlaps and then merging the dataframes.

for i, r in df2.iterrows():
 df1.loc[
 (df1.x0 >= r.x0) &
 (df1.y0 >= r.y0) &
 (df1.x1 <= r.x1) &
 (df1.y1 <= r.y1) , 'flag'] = 1

 df1.loc[df.flag == 1, ['T', 'R', 'C']] = r.T, r.R, r.C

Problem is the whole process is working properly as expected but takes a lot of time to run. It takes around 90 seconds to run df1 = 20,000 rows and df2 = 3500 rows.

edited Nov 13 '18 at 10:08

asked Nov 12 '18 at 10:40

Mahendra Singh

405

INPUT :

df1

Pg x0 y0 x1 y1 Text
1 521.3 745.92 537.348 754.097 word1
1 538.982 745.92 580.247 754.097 word2
1 527.978 735.253 572.996 747.727 word3
2 268.985 732.36 341.59 746.636 word4
2 344.443 732.36 390.175 746.636 word5

df2

Pg x0 y0 x1 y1 Text T R C
1 507.6 730.8 593.76 754.8 word1 word2 word3 1 1 2
2 334.56 732.36 401.34 746.636 word5 2 3 1

Expected OUTPUT :

Pg x0 y0 x1 y1 Text T R C
1 521.3 745.92 537.348 754.097 word1 1 1 2
1 538.982 745.92 580.247 754.097 word2 1 1 2
1 527.978 735.253 572.996 747.727 word3 1 1 2
2 268.985 732.36 341.59 746.636 word4 
2 344.443 732.36 390.175 746.636 word5 2 3 1

I need to find which all words in df1 are present in df2 on basis of coordinates(overlap) and not Text based approach. After this I need to copy the values of columns [T, R, C] from df2 to df1.

My Approach :

I am iterating each row in df2 and then comparing each row coordinate from df1 to find any overlaps and then merging the dataframes.

for i, r in df2.iterrows():
 df1.loc[
 (df1.x0 >= r.x0) &
 (df1.y0 >= r.y0) &
 (df1.x1 <= r.x1) &
 (df1.y1 <= r.y1) , 'flag'] = 1

 df1.loc[df.flag == 1, ['T', 'R', 'C']] = r.T, r.R, r.C

Problem is the whole process is working properly as expected but takes a lot of time to run. It takes around 90 seconds to run df1 = 20,000 rows and df2 = 3500 rows.

python pandas dataframe geometry

edited Nov 13 '18 at 10:08

asked Nov 12 '18 at 10:40

Mahendra Singh

405

edited Nov 13 '18 at 10:08

asked Nov 12 '18 at 10:40

Mahendra Singh

405

edited Nov 13 '18 at 10:08

asked Nov 12 '18 at 10:40

Mahendra Singh

405

asked Nov 12 '18 at 10:40

Mahendra Singh

405

asked Nov 12 '18 at 10:40

Mahendra Singh

405

Can you post your working code that merges the dataframes so we can make specific recommendations on how to improve it?

– Michael
Nov 12 '18 at 15:03

add a comment |

Can you post your working code that merges the dataframes so we can make specific recommendations on how to improve it?

– Michael
Nov 12 '18 at 15:03

Can you post your working code that merges the dataframes so we can make specific recommendations on how to improve it?

– Michael
Nov 12 '18 at 15:03

add a comment |

1 Answer
1

active

oldest

votes

You can use apply and masking. Example:

def compare(row):
 mask = df2[
 (df2['x0'] <= row['x0']) &
 (df2['x1'] >= row['x1']) &
 (df2['y0'] <= row['y0']) &
 (df2['y1'] >= row['y1'])
 ]
 if mask.empty:
 return row
 row['T'] = mask['T'].tolist()[0]
 row['R'] = mask['R'].tolist()[0]
 row['C'] = mask['C'].tolist()[0]

return row

result = df1.apply(compare, axis=1)

answered Nov 12 '18 at 15:12

ievbu

364

add a comment |

Your Answer

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53260436%2fmerge-dataframes-on-basis-of-coordinates-having-no-common-columns%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

You can use apply and masking. Example:

def compare(row):
 mask = df2[
 (df2['x0'] <= row['x0']) &
 (df2['x1'] >= row['x1']) &
 (df2['y0'] <= row['y0']) &
 (df2['y1'] >= row['y1'])
 ]
 if mask.empty:
 return row
 row['T'] = mask['T'].tolist()[0]
 row['R'] = mask['R'].tolist()[0]
 row['C'] = mask['C'].tolist()[0]

return row

result = df1.apply(compare, axis=1)

answered Nov 12 '18 at 15:12

ievbu

364

add a comment |

You can use apply and masking. Example:

def compare(row):
 mask = df2[
 (df2['x0'] <= row['x0']) &
 (df2['x1'] >= row['x1']) &
 (df2['y0'] <= row['y0']) &
 (df2['y1'] >= row['y1'])
 ]
 if mask.empty:
 return row
 row['T'] = mask['T'].tolist()[0]
 row['R'] = mask['R'].tolist()[0]
 row['C'] = mask['C'].tolist()[0]

return row

result = df1.apply(compare, axis=1)

answered Nov 12 '18 at 15:12

ievbu

364

add a comment |

You can use apply and masking. Example:

def compare(row):
 mask = df2[
 (df2['x0'] <= row['x0']) &
 (df2['x1'] >= row['x1']) &
 (df2['y0'] <= row['y0']) &
 (df2['y1'] >= row['y1'])
 ]
 if mask.empty:
 return row
 row['T'] = mask['T'].tolist()[0]
 row['R'] = mask['R'].tolist()[0]
 row['C'] = mask['C'].tolist()[0]

return row

result = df1.apply(compare, axis=1)

answered Nov 12 '18 at 15:12

ievbu

364

You can use apply and masking. Example:

def compare(row):
 mask = df2[
 (df2['x0'] <= row['x0']) &
 (df2['x1'] >= row['x1']) &
 (df2['y0'] <= row['y0']) &
 (df2['y1'] >= row['y1'])
 ]
 if mask.empty:
 return row
 row['T'] = mask['T'].tolist()[0]
 row['R'] = mask['R'].tolist()[0]
 row['C'] = mask['C'].tolist()[0]

return row

result = df1.apply(compare, axis=1)

answered Nov 12 '18 at 15:12

ievbu

364

answered Nov 12 '18 at 15:12

ievbu

364

answered Nov 12 '18 at 15:12

ievbu

364

answered Nov 12 '18 at 15:12

ievbu

364

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Odtnhj