Merge Dataframes on basis of coordinates having no common columns










1















INPUT :



df1



Pg x0 y0 x1 y1 Text
1 521.3 745.92 537.348 754.097 word1
1 538.982 745.92 580.247 754.097 word2
1 527.978 735.253 572.996 747.727 word3
2 268.985 732.36 341.59 746.636 word4
2 344.443 732.36 390.175 746.636 word5


df2



Pg x0 y0 x1 y1 Text T R C
1 507.6 730.8 593.76 754.8 word1 word2 word3 1 1 2
2 334.56 732.36 401.34 746.636 word5 2 3 1


Expected OUTPUT :



Pg x0 y0 x1 y1 Text T R C
1 521.3 745.92 537.348 754.097 word1 1 1 2
1 538.982 745.92 580.247 754.097 word2 1 1 2
1 527.978 735.253 572.996 747.727 word3 1 1 2
2 268.985 732.36 341.59 746.636 word4
2 344.443 732.36 390.175 746.636 word5 2 3 1


I need to find which all words in df1 are present in df2 on basis of coordinates(overlap) and not Text based approach. After this I need to copy the values of columns [T, R, C] from df2 to df1.



For eg : First row of df2 has coordinates that overlap the coordinates of the word1, word2, word3 of df1. Overlap here means the bbox(x0, y0, x1, y1) of a row in df1 should lie inside the bbox(x0, y0, x1, y1) of a specific row of df2.



My Approach :



I am iterating each row in df2 and then comparing each row coordinate from df1 to find any overlaps and then merging the dataframes.



for i, r in df2.iterrows():
df1.loc[
(df1.x0 >= r.x0) &
(df1.y0 >= r.y0) &
(df1.x1 <= r.x1) &
(df1.y1 <= r.y1) , 'flag'] = 1

df1.loc[df.flag == 1, ['T', 'R', 'C']] = r.T, r.R, r.C


Problem is the whole process is working properly as expected but takes a lot of time to run. It takes around 90 seconds to run df1 = 20,000 rows and df2 = 3500 rows.










share|improve this question
























  • Can you post your working code that merges the dataframes so we can make specific recommendations on how to improve it?

    – Michael
    Nov 12 '18 at 15:03















1















INPUT :



df1



Pg x0 y0 x1 y1 Text
1 521.3 745.92 537.348 754.097 word1
1 538.982 745.92 580.247 754.097 word2
1 527.978 735.253 572.996 747.727 word3
2 268.985 732.36 341.59 746.636 word4
2 344.443 732.36 390.175 746.636 word5


df2



Pg x0 y0 x1 y1 Text T R C
1 507.6 730.8 593.76 754.8 word1 word2 word3 1 1 2
2 334.56 732.36 401.34 746.636 word5 2 3 1


Expected OUTPUT :



Pg x0 y0 x1 y1 Text T R C
1 521.3 745.92 537.348 754.097 word1 1 1 2
1 538.982 745.92 580.247 754.097 word2 1 1 2
1 527.978 735.253 572.996 747.727 word3 1 1 2
2 268.985 732.36 341.59 746.636 word4
2 344.443 732.36 390.175 746.636 word5 2 3 1


I need to find which all words in df1 are present in df2 on basis of coordinates(overlap) and not Text based approach. After this I need to copy the values of columns [T, R, C] from df2 to df1.



For eg : First row of df2 has coordinates that overlap the coordinates of the word1, word2, word3 of df1. Overlap here means the bbox(x0, y0, x1, y1) of a row in df1 should lie inside the bbox(x0, y0, x1, y1) of a specific row of df2.



My Approach :



I am iterating each row in df2 and then comparing each row coordinate from df1 to find any overlaps and then merging the dataframes.



for i, r in df2.iterrows():
df1.loc[
(df1.x0 >= r.x0) &
(df1.y0 >= r.y0) &
(df1.x1 <= r.x1) &
(df1.y1 <= r.y1) , 'flag'] = 1

df1.loc[df.flag == 1, ['T', 'R', 'C']] = r.T, r.R, r.C


Problem is the whole process is working properly as expected but takes a lot of time to run. It takes around 90 seconds to run df1 = 20,000 rows and df2 = 3500 rows.










share|improve this question
























  • Can you post your working code that merges the dataframes so we can make specific recommendations on how to improve it?

    – Michael
    Nov 12 '18 at 15:03













1












1








1








INPUT :



df1



Pg x0 y0 x1 y1 Text
1 521.3 745.92 537.348 754.097 word1
1 538.982 745.92 580.247 754.097 word2
1 527.978 735.253 572.996 747.727 word3
2 268.985 732.36 341.59 746.636 word4
2 344.443 732.36 390.175 746.636 word5


df2



Pg x0 y0 x1 y1 Text T R C
1 507.6 730.8 593.76 754.8 word1 word2 word3 1 1 2
2 334.56 732.36 401.34 746.636 word5 2 3 1


Expected OUTPUT :



Pg x0 y0 x1 y1 Text T R C
1 521.3 745.92 537.348 754.097 word1 1 1 2
1 538.982 745.92 580.247 754.097 word2 1 1 2
1 527.978 735.253 572.996 747.727 word3 1 1 2
2 268.985 732.36 341.59 746.636 word4
2 344.443 732.36 390.175 746.636 word5 2 3 1


I need to find which all words in df1 are present in df2 on basis of coordinates(overlap) and not Text based approach. After this I need to copy the values of columns [T, R, C] from df2 to df1.



For eg : First row of df2 has coordinates that overlap the coordinates of the word1, word2, word3 of df1. Overlap here means the bbox(x0, y0, x1, y1) of a row in df1 should lie inside the bbox(x0, y0, x1, y1) of a specific row of df2.



My Approach :



I am iterating each row in df2 and then comparing each row coordinate from df1 to find any overlaps and then merging the dataframes.



for i, r in df2.iterrows():
df1.loc[
(df1.x0 >= r.x0) &
(df1.y0 >= r.y0) &
(df1.x1 <= r.x1) &
(df1.y1 <= r.y1) , 'flag'] = 1

df1.loc[df.flag == 1, ['T', 'R', 'C']] = r.T, r.R, r.C


Problem is the whole process is working properly as expected but takes a lot of time to run. It takes around 90 seconds to run df1 = 20,000 rows and df2 = 3500 rows.










share|improve this question
















INPUT :



df1



Pg x0 y0 x1 y1 Text
1 521.3 745.92 537.348 754.097 word1
1 538.982 745.92 580.247 754.097 word2
1 527.978 735.253 572.996 747.727 word3
2 268.985 732.36 341.59 746.636 word4
2 344.443 732.36 390.175 746.636 word5


df2



Pg x0 y0 x1 y1 Text T R C
1 507.6 730.8 593.76 754.8 word1 word2 word3 1 1 2
2 334.56 732.36 401.34 746.636 word5 2 3 1


Expected OUTPUT :



Pg x0 y0 x1 y1 Text T R C
1 521.3 745.92 537.348 754.097 word1 1 1 2
1 538.982 745.92 580.247 754.097 word2 1 1 2
1 527.978 735.253 572.996 747.727 word3 1 1 2
2 268.985 732.36 341.59 746.636 word4
2 344.443 732.36 390.175 746.636 word5 2 3 1


I need to find which all words in df1 are present in df2 on basis of coordinates(overlap) and not Text based approach. After this I need to copy the values of columns [T, R, C] from df2 to df1.



For eg : First row of df2 has coordinates that overlap the coordinates of the word1, word2, word3 of df1. Overlap here means the bbox(x0, y0, x1, y1) of a row in df1 should lie inside the bbox(x0, y0, x1, y1) of a specific row of df2.



My Approach :



I am iterating each row in df2 and then comparing each row coordinate from df1 to find any overlaps and then merging the dataframes.



for i, r in df2.iterrows():
df1.loc[
(df1.x0 >= r.x0) &
(df1.y0 >= r.y0) &
(df1.x1 <= r.x1) &
(df1.y1 <= r.y1) , 'flag'] = 1

df1.loc[df.flag == 1, ['T', 'R', 'C']] = r.T, r.R, r.C


Problem is the whole process is working properly as expected but takes a lot of time to run. It takes around 90 seconds to run df1 = 20,000 rows and df2 = 3500 rows.







python pandas dataframe geometry






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 13 '18 at 10:08







Mahendra Singh

















asked Nov 12 '18 at 10:40









Mahendra SinghMahendra Singh

405




405












  • Can you post your working code that merges the dataframes so we can make specific recommendations on how to improve it?

    – Michael
    Nov 12 '18 at 15:03

















  • Can you post your working code that merges the dataframes so we can make specific recommendations on how to improve it?

    – Michael
    Nov 12 '18 at 15:03
















Can you post your working code that merges the dataframes so we can make specific recommendations on how to improve it?

– Michael
Nov 12 '18 at 15:03





Can you post your working code that merges the dataframes so we can make specific recommendations on how to improve it?

– Michael
Nov 12 '18 at 15:03












1 Answer
1






active

oldest

votes


















0














You can use apply and masking. Example:



def compare(row):
mask = df2[
(df2['x0'] <= row['x0']) &
(df2['x1'] >= row['x1']) &
(df2['y0'] <= row['y0']) &
(df2['y1'] >= row['y1'])
]
if mask.empty:
return row
row['T'] = mask['T'].tolist()[0]
row['R'] = mask['R'].tolist()[0]
row['C'] = mask['C'].tolist()[0]

return row

result = df1.apply(compare, axis=1)





share|improve this answer






















    Your Answer






    StackExchange.ifUsing("editor", function ()
    StackExchange.using("externalEditor", function ()
    StackExchange.using("snippets", function ()
    StackExchange.snippets.init();
    );
    );
    , "code-snippets");

    StackExchange.ready(function()
    var channelOptions =
    tags: "".split(" "),
    id: "1"
    ;
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function()
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled)
    StackExchange.using("snippets", function()
    createEditor();
    );

    else
    createEditor();

    );

    function createEditor()
    StackExchange.prepareEditor(
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader:
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    ,
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    );



    );













    draft saved

    draft discarded


















    StackExchange.ready(
    function ()
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53260436%2fmerge-dataframes-on-basis-of-coordinates-having-no-common-columns%23new-answer', 'question_page');

    );

    Post as a guest















    Required, but never shown

























    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    0














    You can use apply and masking. Example:



    def compare(row):
    mask = df2[
    (df2['x0'] <= row['x0']) &
    (df2['x1'] >= row['x1']) &
    (df2['y0'] <= row['y0']) &
    (df2['y1'] >= row['y1'])
    ]
    if mask.empty:
    return row
    row['T'] = mask['T'].tolist()[0]
    row['R'] = mask['R'].tolist()[0]
    row['C'] = mask['C'].tolist()[0]

    return row

    result = df1.apply(compare, axis=1)





    share|improve this answer



























      0














      You can use apply and masking. Example:



      def compare(row):
      mask = df2[
      (df2['x0'] <= row['x0']) &
      (df2['x1'] >= row['x1']) &
      (df2['y0'] <= row['y0']) &
      (df2['y1'] >= row['y1'])
      ]
      if mask.empty:
      return row
      row['T'] = mask['T'].tolist()[0]
      row['R'] = mask['R'].tolist()[0]
      row['C'] = mask['C'].tolist()[0]

      return row

      result = df1.apply(compare, axis=1)





      share|improve this answer

























        0












        0








        0







        You can use apply and masking. Example:



        def compare(row):
        mask = df2[
        (df2['x0'] <= row['x0']) &
        (df2['x1'] >= row['x1']) &
        (df2['y0'] <= row['y0']) &
        (df2['y1'] >= row['y1'])
        ]
        if mask.empty:
        return row
        row['T'] = mask['T'].tolist()[0]
        row['R'] = mask['R'].tolist()[0]
        row['C'] = mask['C'].tolist()[0]

        return row

        result = df1.apply(compare, axis=1)





        share|improve this answer













        You can use apply and masking. Example:



        def compare(row):
        mask = df2[
        (df2['x0'] <= row['x0']) &
        (df2['x1'] >= row['x1']) &
        (df2['y0'] <= row['y0']) &
        (df2['y1'] >= row['y1'])
        ]
        if mask.empty:
        return row
        row['T'] = mask['T'].tolist()[0]
        row['R'] = mask['R'].tolist()[0]
        row['C'] = mask['C'].tolist()[0]

        return row

        result = df1.apply(compare, axis=1)






        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Nov 12 '18 at 15:12









        ievbuievbu

        364




        364



























            draft saved

            draft discarded
















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid


            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.

            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53260436%2fmerge-dataframes-on-basis-of-coordinates-having-no-common-columns%23new-answer', 'question_page');

            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            這個網誌中的熱門文章

            Barbados

            How to read a connectionString WITH PROVIDER in .NET Core?

            Node.js Script on GitHub Pages or Amazon S3