Join a dataframe with a column from another, based on a common column










0















I have two pyspark dataframes:



| A | B | C |
| 21 | 999 | 1000|
| 22 | 786 | 1978|
| 23 | 345 | 1563|


and



| A | D | E |
| 21 | aaa | a12 |
| 22 | bbb | b43 |
| 23 | ccc | h67 |


Desired result:



| A | B | C | E |
| 21 | 999 | 1000| a12 |
| 22 | 786 | 1978| b43 |
| 23 | 345 | 1563| h67 |


I tried using join, even df1.join(df2.E, df1.A == df2.A) to no avail.










share|improve this question
























  • Possible duplicate of pandas: merge (join) two data frames on multiple columns

    – Sotos
    Nov 14 '18 at 14:33











  • df1.join(df2, "A").select("A", "B", "C", "E")

    – shay__
    Nov 14 '18 at 14:36















0















I have two pyspark dataframes:



| A | B | C |
| 21 | 999 | 1000|
| 22 | 786 | 1978|
| 23 | 345 | 1563|


and



| A | D | E |
| 21 | aaa | a12 |
| 22 | bbb | b43 |
| 23 | ccc | h67 |


Desired result:



| A | B | C | E |
| 21 | 999 | 1000| a12 |
| 22 | 786 | 1978| b43 |
| 23 | 345 | 1563| h67 |


I tried using join, even df1.join(df2.E, df1.A == df2.A) to no avail.










share|improve this question
























  • Possible duplicate of pandas: merge (join) two data frames on multiple columns

    – Sotos
    Nov 14 '18 at 14:33











  • df1.join(df2, "A").select("A", "B", "C", "E")

    – shay__
    Nov 14 '18 at 14:36













0












0








0








I have two pyspark dataframes:



| A | B | C |
| 21 | 999 | 1000|
| 22 | 786 | 1978|
| 23 | 345 | 1563|


and



| A | D | E |
| 21 | aaa | a12 |
| 22 | bbb | b43 |
| 23 | ccc | h67 |


Desired result:



| A | B | C | E |
| 21 | 999 | 1000| a12 |
| 22 | 786 | 1978| b43 |
| 23 | 345 | 1563| h67 |


I tried using join, even df1.join(df2.E, df1.A == df2.A) to no avail.










share|improve this question
















I have two pyspark dataframes:



| A | B | C |
| 21 | 999 | 1000|
| 22 | 786 | 1978|
| 23 | 345 | 1563|


and



| A | D | E |
| 21 | aaa | a12 |
| 22 | bbb | b43 |
| 23 | ccc | h67 |


Desired result:



| A | B | C | E |
| 21 | 999 | 1000| a12 |
| 22 | 786 | 1978| b43 |
| 23 | 345 | 1563| h67 |


I tried using join, even df1.join(df2.E, df1.A == df2.A) to no avail.







python apache-spark pyspark pyspark-sql






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 14 '18 at 14:31







Qubix

















asked Nov 14 '18 at 14:11









QubixQubix

77721327




77721327












  • Possible duplicate of pandas: merge (join) two data frames on multiple columns

    – Sotos
    Nov 14 '18 at 14:33











  • df1.join(df2, "A").select("A", "B", "C", "E")

    – shay__
    Nov 14 '18 at 14:36

















  • Possible duplicate of pandas: merge (join) two data frames on multiple columns

    – Sotos
    Nov 14 '18 at 14:33











  • df1.join(df2, "A").select("A", "B", "C", "E")

    – shay__
    Nov 14 '18 at 14:36
















Possible duplicate of pandas: merge (join) two data frames on multiple columns

– Sotos
Nov 14 '18 at 14:33





Possible duplicate of pandas: merge (join) two data frames on multiple columns

– Sotos
Nov 14 '18 at 14:33













df1.join(df2, "A").select("A", "B", "C", "E")

– shay__
Nov 14 '18 at 14:36





df1.join(df2, "A").select("A", "B", "C", "E")

– shay__
Nov 14 '18 at 14:36












2 Answers
2






active

oldest

votes


















1














I think this code does what you want:



joinedDF = df1.join(df2.select('A', 'E'), ['A'])





share|improve this answer






























    1














    When you are trying to join the 2 dataframe using the function join it takes 3 arguments.



    1. arg-1 : another dataframe which you need to join.

    2. arg-2 : columns based on which you need to join the dataframes.

    3. arg-3 : Type of join you want to perform. by default its inner join.

    PFB sample code.



    df1.join(df2, df1.id == df2.id, 'outer')


    You can find more details here.



    Regards,



    Neeraj






    share|improve this answer






















      Your Answer






      StackExchange.ifUsing("editor", function ()
      StackExchange.using("externalEditor", function ()
      StackExchange.using("snippets", function ()
      StackExchange.snippets.init();
      );
      );
      , "code-snippets");

      StackExchange.ready(function()
      var channelOptions =
      tags: "".split(" "),
      id: "1"
      ;
      initTagRenderer("".split(" "), "".split(" "), channelOptions);

      StackExchange.using("externalEditor", function()
      // Have to fire editor after snippets, if snippets enabled
      if (StackExchange.settings.snippets.snippetsEnabled)
      StackExchange.using("snippets", function()
      createEditor();
      );

      else
      createEditor();

      );

      function createEditor()
      StackExchange.prepareEditor(
      heartbeatType: 'answer',
      autoActivateHeartbeat: false,
      convertImagesToLinks: true,
      noModals: true,
      showLowRepImageUploadWarning: true,
      reputationToPostImages: 10,
      bindNavPrevention: true,
      postfix: "",
      imageUploader:
      brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
      contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
      allowUrls: true
      ,
      onDemand: true,
      discardSelector: ".discard-answer"
      ,immediatelyShowMarkdownHelp:true
      );



      );













      draft saved

      draft discarded


















      StackExchange.ready(
      function ()
      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53302205%2fjoin-a-dataframe-with-a-column-from-another-based-on-a-common-column%23new-answer', 'question_page');

      );

      Post as a guest















      Required, but never shown

























      2 Answers
      2






      active

      oldest

      votes








      2 Answers
      2






      active

      oldest

      votes









      active

      oldest

      votes






      active

      oldest

      votes









      1














      I think this code does what you want:



      joinedDF = df1.join(df2.select('A', 'E'), ['A'])





      share|improve this answer



























        1














        I think this code does what you want:



        joinedDF = df1.join(df2.select('A', 'E'), ['A'])





        share|improve this answer

























          1












          1








          1







          I think this code does what you want:



          joinedDF = df1.join(df2.select('A', 'E'), ['A'])





          share|improve this answer













          I think this code does what you want:



          joinedDF = df1.join(df2.select('A', 'E'), ['A'])






          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered Nov 14 '18 at 14:36









          Ali AzGAli AzG

          6971616




          6971616























              1














              When you are trying to join the 2 dataframe using the function join it takes 3 arguments.



              1. arg-1 : another dataframe which you need to join.

              2. arg-2 : columns based on which you need to join the dataframes.

              3. arg-3 : Type of join you want to perform. by default its inner join.

              PFB sample code.



              df1.join(df2, df1.id == df2.id, 'outer')


              You can find more details here.



              Regards,



              Neeraj






              share|improve this answer



























                1














                When you are trying to join the 2 dataframe using the function join it takes 3 arguments.



                1. arg-1 : another dataframe which you need to join.

                2. arg-2 : columns based on which you need to join the dataframes.

                3. arg-3 : Type of join you want to perform. by default its inner join.

                PFB sample code.



                df1.join(df2, df1.id == df2.id, 'outer')


                You can find more details here.



                Regards,



                Neeraj






                share|improve this answer

























                  1












                  1








                  1







                  When you are trying to join the 2 dataframe using the function join it takes 3 arguments.



                  1. arg-1 : another dataframe which you need to join.

                  2. arg-2 : columns based on which you need to join the dataframes.

                  3. arg-3 : Type of join you want to perform. by default its inner join.

                  PFB sample code.



                  df1.join(df2, df1.id == df2.id, 'outer')


                  You can find more details here.



                  Regards,



                  Neeraj






                  share|improve this answer













                  When you are trying to join the 2 dataframe using the function join it takes 3 arguments.



                  1. arg-1 : another dataframe which you need to join.

                  2. arg-2 : columns based on which you need to join the dataframes.

                  3. arg-3 : Type of join you want to perform. by default its inner join.

                  PFB sample code.



                  df1.join(df2, df1.id == df2.id, 'outer')


                  You can find more details here.



                  Regards,



                  Neeraj







                  share|improve this answer












                  share|improve this answer



                  share|improve this answer










                  answered Nov 19 '18 at 14:07









                  neeraj bhadanineeraj bhadani

                  837312




                  837312



























                      draft saved

                      draft discarded
















































                      Thanks for contributing an answer to Stack Overflow!


                      • Please be sure to answer the question. Provide details and share your research!

                      But avoid


                      • Asking for help, clarification, or responding to other answers.

                      • Making statements based on opinion; back them up with references or personal experience.

                      To learn more, see our tips on writing great answers.




                      draft saved


                      draft discarded














                      StackExchange.ready(
                      function ()
                      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53302205%2fjoin-a-dataframe-with-a-column-from-another-based-on-a-common-column%23new-answer', 'question_page');

                      );

                      Post as a guest















                      Required, but never shown





















































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown

































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown







                      這個網誌中的熱門文章

                      Barbados

                      How to read a connectionString WITH PROVIDER in .NET Core?

                      Node.js Script on GitHub Pages or Amazon S3