Finding common lines in 2 different files










-2















I am trying to find common lines in 2 different files and trying to list them in a new text file. I wrote this below but it does not find the commons, only writes whatever the file I gave in the arg2. Please help me to troubleshoot.



#!/usr/bin/python

import sys


def find_common_lines(arg1, arg2, arg3):
fh1 = open(arg1, 'r+')
fh2 = open(arg2, 'r+')
with open(arg3, 'w+') as f:
for line in fh1 and fh2:
if line:
f.write(line)

fh1.close()
fh2.close()


number_of_arguments = len(sys.argv) - 1
if number_of_arguments < 3:
print("ERROR:tThe script is called with less than 3 arguments, but it needs 3!")
print("Usage:tfind_common_lines.py <file1> <file2> <output_filepath>")
else:
arg1 = sys.argv[1]
arg2 = sys.argv[2]
arg3 = sys.argv[3]
find_common_lines(arg1, arg2, arg3)


So, basically what I want this script to do is:



File A



AAB
BBC
DDE
GGC


File B



123
AAB
DDE
345
GHY
GJK


File C



AAB
DDE


Thanks!!!










share|improve this question


























    -2















    I am trying to find common lines in 2 different files and trying to list them in a new text file. I wrote this below but it does not find the commons, only writes whatever the file I gave in the arg2. Please help me to troubleshoot.



    #!/usr/bin/python

    import sys


    def find_common_lines(arg1, arg2, arg3):
    fh1 = open(arg1, 'r+')
    fh2 = open(arg2, 'r+')
    with open(arg3, 'w+') as f:
    for line in fh1 and fh2:
    if line:
    f.write(line)

    fh1.close()
    fh2.close()


    number_of_arguments = len(sys.argv) - 1
    if number_of_arguments < 3:
    print("ERROR:tThe script is called with less than 3 arguments, but it needs 3!")
    print("Usage:tfind_common_lines.py <file1> <file2> <output_filepath>")
    else:
    arg1 = sys.argv[1]
    arg2 = sys.argv[2]
    arg3 = sys.argv[3]
    find_common_lines(arg1, arg2, arg3)


    So, basically what I want this script to do is:



    File A



    AAB
    BBC
    DDE
    GGC


    File B



    123
    AAB
    DDE
    345
    GHY
    GJK


    File C



    AAB
    DDE


    Thanks!!!










    share|improve this question
























      -2












      -2








      -2








      I am trying to find common lines in 2 different files and trying to list them in a new text file. I wrote this below but it does not find the commons, only writes whatever the file I gave in the arg2. Please help me to troubleshoot.



      #!/usr/bin/python

      import sys


      def find_common_lines(arg1, arg2, arg3):
      fh1 = open(arg1, 'r+')
      fh2 = open(arg2, 'r+')
      with open(arg3, 'w+') as f:
      for line in fh1 and fh2:
      if line:
      f.write(line)

      fh1.close()
      fh2.close()


      number_of_arguments = len(sys.argv) - 1
      if number_of_arguments < 3:
      print("ERROR:tThe script is called with less than 3 arguments, but it needs 3!")
      print("Usage:tfind_common_lines.py <file1> <file2> <output_filepath>")
      else:
      arg1 = sys.argv[1]
      arg2 = sys.argv[2]
      arg3 = sys.argv[3]
      find_common_lines(arg1, arg2, arg3)


      So, basically what I want this script to do is:



      File A



      AAB
      BBC
      DDE
      GGC


      File B



      123
      AAB
      DDE
      345
      GHY
      GJK


      File C



      AAB
      DDE


      Thanks!!!










      share|improve this question














      I am trying to find common lines in 2 different files and trying to list them in a new text file. I wrote this below but it does not find the commons, only writes whatever the file I gave in the arg2. Please help me to troubleshoot.



      #!/usr/bin/python

      import sys


      def find_common_lines(arg1, arg2, arg3):
      fh1 = open(arg1, 'r+')
      fh2 = open(arg2, 'r+')
      with open(arg3, 'w+') as f:
      for line in fh1 and fh2:
      if line:
      f.write(line)

      fh1.close()
      fh2.close()


      number_of_arguments = len(sys.argv) - 1
      if number_of_arguments < 3:
      print("ERROR:tThe script is called with less than 3 arguments, but it needs 3!")
      print("Usage:tfind_common_lines.py <file1> <file2> <output_filepath>")
      else:
      arg1 = sys.argv[1]
      arg2 = sys.argv[2]
      arg3 = sys.argv[3]
      find_common_lines(arg1, arg2, arg3)


      So, basically what I want this script to do is:



      File A



      AAB
      BBC
      DDE
      GGC


      File B



      123
      AAB
      DDE
      345
      GHY
      GJK


      File C



      AAB
      DDE


      Thanks!!!







      python






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Nov 14 '18 at 18:23









      RiddlyRiddly

      115




      115






















          3 Answers
          3






          active

          oldest

          votes


















          0














          Try using dictionary:



          import sys
          def find_common_lines(arg1, arg2, arg3):
          alllines_dict =
          with open(arg1, 'r') as f:
          while True:
          line = f.readline()
          if not line:
          break
          alllines_dict[line.strip()] = 1
          with open(arg3, 'w') as out:
          with open(arg2, 'r') as f:
          while True:
          line2 = f.readline()
          if not line2:
          break
          line2 = line2.strip()
          ispresent = alllines_dict.get(line2, None)
          if ispresent is not None:
          out.write(line2 + 'n')
          number_of_arguments = len(sys.argv)-1
          print(sys.argv)
          if number_of_arguments < 3:
          print("ERROR:tThe script is called with less than 3 arguments, but it needs 3!")
          print("Usage:tfind_common_lines.py <file1> <file2> <output_filepath>")
          else:
          arg1 = sys.argv[1]
          arg2 = sys.argv[2]
          arg3 = sys.argv[3]
          find_common_lines(arg1, arg2, arg3)





          share|improve this answer






























            1














            first of all, you need to give 2 logical statements when using the "and" operator, right now you are using 1 logical statement and then directly feeding fh2 in the for loop. Try changing the code to something along these lines:



            for line in fh1 and fh2:
            if line:
            f.write(line)


            to



            if line in fh1:
            if line in fh2:
            f.write(line)





            share|improve this answer























            • Thanks! This worked!

              – Riddly
              Nov 14 '18 at 18:43












            • Glad I could help! that does the same thing, but if speed/space is an issue the 'readlines()' shouldn't be necessary and almost always slows code down. Also user the ` character(top right key above tab on a US keyboard) to show code in a comment

              – SRT HellKitty
              Nov 14 '18 at 18:48











            • hahaha, that is I was exactly trying to do. Thanks for that too!!!

              – Riddly
              Nov 14 '18 at 18:52


















            0














            You can use python's library pandas for this:



            Create dataframes for each .txt file like below:



            In [2017]: df_A = pd.read_fwf('/home/mayankp/Documents/Personal/stackoverflow/A.txt', header=None)

            In [2018]: df_A
            Out[2018]:
            0
            0 AAB
            1 BBC
            2 DDE
            3 GGC

            In [2019]: df_B = pd.read_fwf('/home/mayankp/Documents/Personal/stackoverflow/B.txt', header=None)

            In [2020]: df_B
            Out[2020]:
            0
            0 123
            1 AAB
            2 DDE
            3 345
            4 GHY
            5 GJK


            Now, merge both dataframes(like inner join) to find out only common rows between the both.



            In [2021]: df_C = pd.merge(df_A, df_B, on=0, how='inner')
            Out[2021]: df_C
            0
            0 AAB
            1 DDE


            Then, you can write this output in a file like below:



            In [2023]: df_C.to_csv('out.csv', index=False)


            This will be efficient as no loops are required, also, no complex regex are required to be written. Code becomes cleaner and simpler.



            Let me know if this helps.






            share|improve this answer






















              Your Answer






              StackExchange.ifUsing("editor", function ()
              StackExchange.using("externalEditor", function ()
              StackExchange.using("snippets", function ()
              StackExchange.snippets.init();
              );
              );
              , "code-snippets");

              StackExchange.ready(function()
              var channelOptions =
              tags: "".split(" "),
              id: "1"
              ;
              initTagRenderer("".split(" "), "".split(" "), channelOptions);

              StackExchange.using("externalEditor", function()
              // Have to fire editor after snippets, if snippets enabled
              if (StackExchange.settings.snippets.snippetsEnabled)
              StackExchange.using("snippets", function()
              createEditor();
              );

              else
              createEditor();

              );

              function createEditor()
              StackExchange.prepareEditor(
              heartbeatType: 'answer',
              autoActivateHeartbeat: false,
              convertImagesToLinks: true,
              noModals: true,
              showLowRepImageUploadWarning: true,
              reputationToPostImages: 10,
              bindNavPrevention: true,
              postfix: "",
              imageUploader:
              brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
              contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
              allowUrls: true
              ,
              onDemand: true,
              discardSelector: ".discard-answer"
              ,immediatelyShowMarkdownHelp:true
              );



              );













              draft saved

              draft discarded


















              StackExchange.ready(
              function ()
              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53306557%2ffinding-common-lines-in-2-different-files%23new-answer', 'question_page');

              );

              Post as a guest















              Required, but never shown

























              3 Answers
              3






              active

              oldest

              votes








              3 Answers
              3






              active

              oldest

              votes









              active

              oldest

              votes






              active

              oldest

              votes









              0














              Try using dictionary:



              import sys
              def find_common_lines(arg1, arg2, arg3):
              alllines_dict =
              with open(arg1, 'r') as f:
              while True:
              line = f.readline()
              if not line:
              break
              alllines_dict[line.strip()] = 1
              with open(arg3, 'w') as out:
              with open(arg2, 'r') as f:
              while True:
              line2 = f.readline()
              if not line2:
              break
              line2 = line2.strip()
              ispresent = alllines_dict.get(line2, None)
              if ispresent is not None:
              out.write(line2 + 'n')
              number_of_arguments = len(sys.argv)-1
              print(sys.argv)
              if number_of_arguments < 3:
              print("ERROR:tThe script is called with less than 3 arguments, but it needs 3!")
              print("Usage:tfind_common_lines.py <file1> <file2> <output_filepath>")
              else:
              arg1 = sys.argv[1]
              arg2 = sys.argv[2]
              arg3 = sys.argv[3]
              find_common_lines(arg1, arg2, arg3)





              share|improve this answer



























                0














                Try using dictionary:



                import sys
                def find_common_lines(arg1, arg2, arg3):
                alllines_dict =
                with open(arg1, 'r') as f:
                while True:
                line = f.readline()
                if not line:
                break
                alllines_dict[line.strip()] = 1
                with open(arg3, 'w') as out:
                with open(arg2, 'r') as f:
                while True:
                line2 = f.readline()
                if not line2:
                break
                line2 = line2.strip()
                ispresent = alllines_dict.get(line2, None)
                if ispresent is not None:
                out.write(line2 + 'n')
                number_of_arguments = len(sys.argv)-1
                print(sys.argv)
                if number_of_arguments < 3:
                print("ERROR:tThe script is called with less than 3 arguments, but it needs 3!")
                print("Usage:tfind_common_lines.py <file1> <file2> <output_filepath>")
                else:
                arg1 = sys.argv[1]
                arg2 = sys.argv[2]
                arg3 = sys.argv[3]
                find_common_lines(arg1, arg2, arg3)





                share|improve this answer

























                  0












                  0








                  0







                  Try using dictionary:



                  import sys
                  def find_common_lines(arg1, arg2, arg3):
                  alllines_dict =
                  with open(arg1, 'r') as f:
                  while True:
                  line = f.readline()
                  if not line:
                  break
                  alllines_dict[line.strip()] = 1
                  with open(arg3, 'w') as out:
                  with open(arg2, 'r') as f:
                  while True:
                  line2 = f.readline()
                  if not line2:
                  break
                  line2 = line2.strip()
                  ispresent = alllines_dict.get(line2, None)
                  if ispresent is not None:
                  out.write(line2 + 'n')
                  number_of_arguments = len(sys.argv)-1
                  print(sys.argv)
                  if number_of_arguments < 3:
                  print("ERROR:tThe script is called with less than 3 arguments, but it needs 3!")
                  print("Usage:tfind_common_lines.py <file1> <file2> <output_filepath>")
                  else:
                  arg1 = sys.argv[1]
                  arg2 = sys.argv[2]
                  arg3 = sys.argv[3]
                  find_common_lines(arg1, arg2, arg3)





                  share|improve this answer













                  Try using dictionary:



                  import sys
                  def find_common_lines(arg1, arg2, arg3):
                  alllines_dict =
                  with open(arg1, 'r') as f:
                  while True:
                  line = f.readline()
                  if not line:
                  break
                  alllines_dict[line.strip()] = 1
                  with open(arg3, 'w') as out:
                  with open(arg2, 'r') as f:
                  while True:
                  line2 = f.readline()
                  if not line2:
                  break
                  line2 = line2.strip()
                  ispresent = alllines_dict.get(line2, None)
                  if ispresent is not None:
                  out.write(line2 + 'n')
                  number_of_arguments = len(sys.argv)-1
                  print(sys.argv)
                  if number_of_arguments < 3:
                  print("ERROR:tThe script is called with less than 3 arguments, but it needs 3!")
                  print("Usage:tfind_common_lines.py <file1> <file2> <output_filepath>")
                  else:
                  arg1 = sys.argv[1]
                  arg2 = sys.argv[2]
                  arg3 = sys.argv[3]
                  find_common_lines(arg1, arg2, arg3)






                  share|improve this answer












                  share|improve this answer



                  share|improve this answer










                  answered Nov 14 '18 at 18:47









                  Rishabh MishraRishabh Mishra

                  378310




                  378310























                      1














                      first of all, you need to give 2 logical statements when using the "and" operator, right now you are using 1 logical statement and then directly feeding fh2 in the for loop. Try changing the code to something along these lines:



                      for line in fh1 and fh2:
                      if line:
                      f.write(line)


                      to



                      if line in fh1:
                      if line in fh2:
                      f.write(line)





                      share|improve this answer























                      • Thanks! This worked!

                        – Riddly
                        Nov 14 '18 at 18:43












                      • Glad I could help! that does the same thing, but if speed/space is an issue the 'readlines()' shouldn't be necessary and almost always slows code down. Also user the ` character(top right key above tab on a US keyboard) to show code in a comment

                        – SRT HellKitty
                        Nov 14 '18 at 18:48











                      • hahaha, that is I was exactly trying to do. Thanks for that too!!!

                        – Riddly
                        Nov 14 '18 at 18:52















                      1














                      first of all, you need to give 2 logical statements when using the "and" operator, right now you are using 1 logical statement and then directly feeding fh2 in the for loop. Try changing the code to something along these lines:



                      for line in fh1 and fh2:
                      if line:
                      f.write(line)


                      to



                      if line in fh1:
                      if line in fh2:
                      f.write(line)





                      share|improve this answer























                      • Thanks! This worked!

                        – Riddly
                        Nov 14 '18 at 18:43












                      • Glad I could help! that does the same thing, but if speed/space is an issue the 'readlines()' shouldn't be necessary and almost always slows code down. Also user the ` character(top right key above tab on a US keyboard) to show code in a comment

                        – SRT HellKitty
                        Nov 14 '18 at 18:48











                      • hahaha, that is I was exactly trying to do. Thanks for that too!!!

                        – Riddly
                        Nov 14 '18 at 18:52













                      1












                      1








                      1







                      first of all, you need to give 2 logical statements when using the "and" operator, right now you are using 1 logical statement and then directly feeding fh2 in the for loop. Try changing the code to something along these lines:



                      for line in fh1 and fh2:
                      if line:
                      f.write(line)


                      to



                      if line in fh1:
                      if line in fh2:
                      f.write(line)





                      share|improve this answer













                      first of all, you need to give 2 logical statements when using the "and" operator, right now you are using 1 logical statement and then directly feeding fh2 in the for loop. Try changing the code to something along these lines:



                      for line in fh1 and fh2:
                      if line:
                      f.write(line)


                      to



                      if line in fh1:
                      if line in fh2:
                      f.write(line)






                      share|improve this answer












                      share|improve this answer



                      share|improve this answer










                      answered Nov 14 '18 at 18:37









                      SRT HellKittySRT HellKitty

                      29518




                      29518












                      • Thanks! This worked!

                        – Riddly
                        Nov 14 '18 at 18:43












                      • Glad I could help! that does the same thing, but if speed/space is an issue the 'readlines()' shouldn't be necessary and almost always slows code down. Also user the ` character(top right key above tab on a US keyboard) to show code in a comment

                        – SRT HellKitty
                        Nov 14 '18 at 18:48











                      • hahaha, that is I was exactly trying to do. Thanks for that too!!!

                        – Riddly
                        Nov 14 '18 at 18:52

















                      • Thanks! This worked!

                        – Riddly
                        Nov 14 '18 at 18:43












                      • Glad I could help! that does the same thing, but if speed/space is an issue the 'readlines()' shouldn't be necessary and almost always slows code down. Also user the ` character(top right key above tab on a US keyboard) to show code in a comment

                        – SRT HellKitty
                        Nov 14 '18 at 18:48











                      • hahaha, that is I was exactly trying to do. Thanks for that too!!!

                        – Riddly
                        Nov 14 '18 at 18:52
















                      Thanks! This worked!

                      – Riddly
                      Nov 14 '18 at 18:43






                      Thanks! This worked!

                      – Riddly
                      Nov 14 '18 at 18:43














                      Glad I could help! that does the same thing, but if speed/space is an issue the 'readlines()' shouldn't be necessary and almost always slows code down. Also user the ` character(top right key above tab on a US keyboard) to show code in a comment

                      – SRT HellKitty
                      Nov 14 '18 at 18:48





                      Glad I could help! that does the same thing, but if speed/space is an issue the 'readlines()' shouldn't be necessary and almost always slows code down. Also user the ` character(top right key above tab on a US keyboard) to show code in a comment

                      – SRT HellKitty
                      Nov 14 '18 at 18:48













                      hahaha, that is I was exactly trying to do. Thanks for that too!!!

                      – Riddly
                      Nov 14 '18 at 18:52





                      hahaha, that is I was exactly trying to do. Thanks for that too!!!

                      – Riddly
                      Nov 14 '18 at 18:52











                      0














                      You can use python's library pandas for this:



                      Create dataframes for each .txt file like below:



                      In [2017]: df_A = pd.read_fwf('/home/mayankp/Documents/Personal/stackoverflow/A.txt', header=None)

                      In [2018]: df_A
                      Out[2018]:
                      0
                      0 AAB
                      1 BBC
                      2 DDE
                      3 GGC

                      In [2019]: df_B = pd.read_fwf('/home/mayankp/Documents/Personal/stackoverflow/B.txt', header=None)

                      In [2020]: df_B
                      Out[2020]:
                      0
                      0 123
                      1 AAB
                      2 DDE
                      3 345
                      4 GHY
                      5 GJK


                      Now, merge both dataframes(like inner join) to find out only common rows between the both.



                      In [2021]: df_C = pd.merge(df_A, df_B, on=0, how='inner')
                      Out[2021]: df_C
                      0
                      0 AAB
                      1 DDE


                      Then, you can write this output in a file like below:



                      In [2023]: df_C.to_csv('out.csv', index=False)


                      This will be efficient as no loops are required, also, no complex regex are required to be written. Code becomes cleaner and simpler.



                      Let me know if this helps.






                      share|improve this answer



























                        0














                        You can use python's library pandas for this:



                        Create dataframes for each .txt file like below:



                        In [2017]: df_A = pd.read_fwf('/home/mayankp/Documents/Personal/stackoverflow/A.txt', header=None)

                        In [2018]: df_A
                        Out[2018]:
                        0
                        0 AAB
                        1 BBC
                        2 DDE
                        3 GGC

                        In [2019]: df_B = pd.read_fwf('/home/mayankp/Documents/Personal/stackoverflow/B.txt', header=None)

                        In [2020]: df_B
                        Out[2020]:
                        0
                        0 123
                        1 AAB
                        2 DDE
                        3 345
                        4 GHY
                        5 GJK


                        Now, merge both dataframes(like inner join) to find out only common rows between the both.



                        In [2021]: df_C = pd.merge(df_A, df_B, on=0, how='inner')
                        Out[2021]: df_C
                        0
                        0 AAB
                        1 DDE


                        Then, you can write this output in a file like below:



                        In [2023]: df_C.to_csv('out.csv', index=False)


                        This will be efficient as no loops are required, also, no complex regex are required to be written. Code becomes cleaner and simpler.



                        Let me know if this helps.






                        share|improve this answer

























                          0












                          0








                          0







                          You can use python's library pandas for this:



                          Create dataframes for each .txt file like below:



                          In [2017]: df_A = pd.read_fwf('/home/mayankp/Documents/Personal/stackoverflow/A.txt', header=None)

                          In [2018]: df_A
                          Out[2018]:
                          0
                          0 AAB
                          1 BBC
                          2 DDE
                          3 GGC

                          In [2019]: df_B = pd.read_fwf('/home/mayankp/Documents/Personal/stackoverflow/B.txt', header=None)

                          In [2020]: df_B
                          Out[2020]:
                          0
                          0 123
                          1 AAB
                          2 DDE
                          3 345
                          4 GHY
                          5 GJK


                          Now, merge both dataframes(like inner join) to find out only common rows between the both.



                          In [2021]: df_C = pd.merge(df_A, df_B, on=0, how='inner')
                          Out[2021]: df_C
                          0
                          0 AAB
                          1 DDE


                          Then, you can write this output in a file like below:



                          In [2023]: df_C.to_csv('out.csv', index=False)


                          This will be efficient as no loops are required, also, no complex regex are required to be written. Code becomes cleaner and simpler.



                          Let me know if this helps.






                          share|improve this answer













                          You can use python's library pandas for this:



                          Create dataframes for each .txt file like below:



                          In [2017]: df_A = pd.read_fwf('/home/mayankp/Documents/Personal/stackoverflow/A.txt', header=None)

                          In [2018]: df_A
                          Out[2018]:
                          0
                          0 AAB
                          1 BBC
                          2 DDE
                          3 GGC

                          In [2019]: df_B = pd.read_fwf('/home/mayankp/Documents/Personal/stackoverflow/B.txt', header=None)

                          In [2020]: df_B
                          Out[2020]:
                          0
                          0 123
                          1 AAB
                          2 DDE
                          3 345
                          4 GHY
                          5 GJK


                          Now, merge both dataframes(like inner join) to find out only common rows between the both.



                          In [2021]: df_C = pd.merge(df_A, df_B, on=0, how='inner')
                          Out[2021]: df_C
                          0
                          0 AAB
                          1 DDE


                          Then, you can write this output in a file like below:



                          In [2023]: df_C.to_csv('out.csv', index=False)


                          This will be efficient as no loops are required, also, no complex regex are required to be written. Code becomes cleaner and simpler.



                          Let me know if this helps.







                          share|improve this answer












                          share|improve this answer



                          share|improve this answer










                          answered Nov 14 '18 at 18:37









                          Mayank PorwalMayank Porwal

                          4,9352724




                          4,9352724



























                              draft saved

                              draft discarded
















































                              Thanks for contributing an answer to Stack Overflow!


                              • Please be sure to answer the question. Provide details and share your research!

                              But avoid


                              • Asking for help, clarification, or responding to other answers.

                              • Making statements based on opinion; back them up with references or personal experience.

                              To learn more, see our tips on writing great answers.




                              draft saved


                              draft discarded














                              StackExchange.ready(
                              function ()
                              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53306557%2ffinding-common-lines-in-2-different-files%23new-answer', 'question_page');

                              );

                              Post as a guest















                              Required, but never shown





















































                              Required, but never shown














                              Required, but never shown












                              Required, but never shown







                              Required, but never shown

































                              Required, but never shown














                              Required, but never shown












                              Required, but never shown







                              Required, but never shown







                              這個網誌中的熱門文章

                              How to read a connectionString WITH PROVIDER in .NET Core?

                              In R, how to develop a multiplot heatmap.2 figure showing key labels successfully

                              Museum of Modern and Contemporary Art of Trento and Rovereto