Finding common lines in 2 different files
I am trying to find common lines in 2 different files and trying to list them in a new text file. I wrote this below but it does not find the commons, only writes whatever the file I gave in the arg2. Please help me to troubleshoot.
#!/usr/bin/python
import sys
def find_common_lines(arg1, arg2, arg3):
fh1 = open(arg1, 'r+')
fh2 = open(arg2, 'r+')
with open(arg3, 'w+') as f:
for line in fh1 and fh2:
if line:
f.write(line)
fh1.close()
fh2.close()
number_of_arguments = len(sys.argv) - 1
if number_of_arguments < 3:
print("ERROR:tThe script is called with less than 3 arguments, but it needs 3!")
print("Usage:tfind_common_lines.py <file1> <file2> <output_filepath>")
else:
arg1 = sys.argv[1]
arg2 = sys.argv[2]
arg3 = sys.argv[3]
find_common_lines(arg1, arg2, arg3)
So, basically what I want this script to do is:
File A
AAB
BBC
DDE
GGC
File B
123
AAB
DDE
345
GHY
GJK
File C
AAB
DDE
Thanks!!!
python
add a comment |
I am trying to find common lines in 2 different files and trying to list them in a new text file. I wrote this below but it does not find the commons, only writes whatever the file I gave in the arg2. Please help me to troubleshoot.
#!/usr/bin/python
import sys
def find_common_lines(arg1, arg2, arg3):
fh1 = open(arg1, 'r+')
fh2 = open(arg2, 'r+')
with open(arg3, 'w+') as f:
for line in fh1 and fh2:
if line:
f.write(line)
fh1.close()
fh2.close()
number_of_arguments = len(sys.argv) - 1
if number_of_arguments < 3:
print("ERROR:tThe script is called with less than 3 arguments, but it needs 3!")
print("Usage:tfind_common_lines.py <file1> <file2> <output_filepath>")
else:
arg1 = sys.argv[1]
arg2 = sys.argv[2]
arg3 = sys.argv[3]
find_common_lines(arg1, arg2, arg3)
So, basically what I want this script to do is:
File A
AAB
BBC
DDE
GGC
File B
123
AAB
DDE
345
GHY
GJK
File C
AAB
DDE
Thanks!!!
python
add a comment |
I am trying to find common lines in 2 different files and trying to list them in a new text file. I wrote this below but it does not find the commons, only writes whatever the file I gave in the arg2. Please help me to troubleshoot.
#!/usr/bin/python
import sys
def find_common_lines(arg1, arg2, arg3):
fh1 = open(arg1, 'r+')
fh2 = open(arg2, 'r+')
with open(arg3, 'w+') as f:
for line in fh1 and fh2:
if line:
f.write(line)
fh1.close()
fh2.close()
number_of_arguments = len(sys.argv) - 1
if number_of_arguments < 3:
print("ERROR:tThe script is called with less than 3 arguments, but it needs 3!")
print("Usage:tfind_common_lines.py <file1> <file2> <output_filepath>")
else:
arg1 = sys.argv[1]
arg2 = sys.argv[2]
arg3 = sys.argv[3]
find_common_lines(arg1, arg2, arg3)
So, basically what I want this script to do is:
File A
AAB
BBC
DDE
GGC
File B
123
AAB
DDE
345
GHY
GJK
File C
AAB
DDE
Thanks!!!
python
I am trying to find common lines in 2 different files and trying to list them in a new text file. I wrote this below but it does not find the commons, only writes whatever the file I gave in the arg2. Please help me to troubleshoot.
#!/usr/bin/python
import sys
def find_common_lines(arg1, arg2, arg3):
fh1 = open(arg1, 'r+')
fh2 = open(arg2, 'r+')
with open(arg3, 'w+') as f:
for line in fh1 and fh2:
if line:
f.write(line)
fh1.close()
fh2.close()
number_of_arguments = len(sys.argv) - 1
if number_of_arguments < 3:
print("ERROR:tThe script is called with less than 3 arguments, but it needs 3!")
print("Usage:tfind_common_lines.py <file1> <file2> <output_filepath>")
else:
arg1 = sys.argv[1]
arg2 = sys.argv[2]
arg3 = sys.argv[3]
find_common_lines(arg1, arg2, arg3)
So, basically what I want this script to do is:
File A
AAB
BBC
DDE
GGC
File B
123
AAB
DDE
345
GHY
GJK
File C
AAB
DDE
Thanks!!!
python
python
asked Nov 14 '18 at 18:23
RiddlyRiddly
115
115
add a comment |
add a comment |
3 Answers
3
active
oldest
votes
Try using dictionary:
import sys
def find_common_lines(arg1, arg2, arg3):
alllines_dict =
with open(arg1, 'r') as f:
while True:
line = f.readline()
if not line:
break
alllines_dict[line.strip()] = 1
with open(arg3, 'w') as out:
with open(arg2, 'r') as f:
while True:
line2 = f.readline()
if not line2:
break
line2 = line2.strip()
ispresent = alllines_dict.get(line2, None)
if ispresent is not None:
out.write(line2 + 'n')
number_of_arguments = len(sys.argv)-1
print(sys.argv)
if number_of_arguments < 3:
print("ERROR:tThe script is called with less than 3 arguments, but it needs 3!")
print("Usage:tfind_common_lines.py <file1> <file2> <output_filepath>")
else:
arg1 = sys.argv[1]
arg2 = sys.argv[2]
arg3 = sys.argv[3]
find_common_lines(arg1, arg2, arg3)
add a comment |
first of all, you need to give 2 logical statements when using the "and" operator, right now you are using 1 logical statement and then directly feeding fh2 in the for loop. Try changing the code to something along these lines:
for line in fh1 and fh2:
if line:
f.write(line)
to
if line in fh1:
if line in fh2:
f.write(line)
Thanks! This worked!
– Riddly
Nov 14 '18 at 18:43
Glad I could help! that does the same thing, but if speed/space is an issue the 'readlines()' shouldn't be necessary and almost always slows code down. Also user the ` character(top right key above tab on a US keyboard) to show code in a comment
– SRT HellKitty
Nov 14 '18 at 18:48
hahaha, that is I was exactly trying to do. Thanks for that too!!!
– Riddly
Nov 14 '18 at 18:52
add a comment |
You can use python's library pandas
for this:
Create dataframes for each .txt
file like below:
In [2017]: df_A = pd.read_fwf('/home/mayankp/Documents/Personal/stackoverflow/A.txt', header=None)
In [2018]: df_A
Out[2018]:
0
0 AAB
1 BBC
2 DDE
3 GGC
In [2019]: df_B = pd.read_fwf('/home/mayankp/Documents/Personal/stackoverflow/B.txt', header=None)
In [2020]: df_B
Out[2020]:
0
0 123
1 AAB
2 DDE
3 345
4 GHY
5 GJK
Now, merge
both dataframes(like inner join) to find out only common rows between the both.
In [2021]: df_C = pd.merge(df_A, df_B, on=0, how='inner')
Out[2021]: df_C
0
0 AAB
1 DDE
Then, you can write this output in a file like below:
In [2023]: df_C.to_csv('out.csv', index=False)
This will be efficient as no loops are required, also, no complex regex are required to be written. Code becomes cleaner and simpler.
Let me know if this helps.
add a comment |
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53306557%2ffinding-common-lines-in-2-different-files%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
Try using dictionary:
import sys
def find_common_lines(arg1, arg2, arg3):
alllines_dict =
with open(arg1, 'r') as f:
while True:
line = f.readline()
if not line:
break
alllines_dict[line.strip()] = 1
with open(arg3, 'w') as out:
with open(arg2, 'r') as f:
while True:
line2 = f.readline()
if not line2:
break
line2 = line2.strip()
ispresent = alllines_dict.get(line2, None)
if ispresent is not None:
out.write(line2 + 'n')
number_of_arguments = len(sys.argv)-1
print(sys.argv)
if number_of_arguments < 3:
print("ERROR:tThe script is called with less than 3 arguments, but it needs 3!")
print("Usage:tfind_common_lines.py <file1> <file2> <output_filepath>")
else:
arg1 = sys.argv[1]
arg2 = sys.argv[2]
arg3 = sys.argv[3]
find_common_lines(arg1, arg2, arg3)
add a comment |
Try using dictionary:
import sys
def find_common_lines(arg1, arg2, arg3):
alllines_dict =
with open(arg1, 'r') as f:
while True:
line = f.readline()
if not line:
break
alllines_dict[line.strip()] = 1
with open(arg3, 'w') as out:
with open(arg2, 'r') as f:
while True:
line2 = f.readline()
if not line2:
break
line2 = line2.strip()
ispresent = alllines_dict.get(line2, None)
if ispresent is not None:
out.write(line2 + 'n')
number_of_arguments = len(sys.argv)-1
print(sys.argv)
if number_of_arguments < 3:
print("ERROR:tThe script is called with less than 3 arguments, but it needs 3!")
print("Usage:tfind_common_lines.py <file1> <file2> <output_filepath>")
else:
arg1 = sys.argv[1]
arg2 = sys.argv[2]
arg3 = sys.argv[3]
find_common_lines(arg1, arg2, arg3)
add a comment |
Try using dictionary:
import sys
def find_common_lines(arg1, arg2, arg3):
alllines_dict =
with open(arg1, 'r') as f:
while True:
line = f.readline()
if not line:
break
alllines_dict[line.strip()] = 1
with open(arg3, 'w') as out:
with open(arg2, 'r') as f:
while True:
line2 = f.readline()
if not line2:
break
line2 = line2.strip()
ispresent = alllines_dict.get(line2, None)
if ispresent is not None:
out.write(line2 + 'n')
number_of_arguments = len(sys.argv)-1
print(sys.argv)
if number_of_arguments < 3:
print("ERROR:tThe script is called with less than 3 arguments, but it needs 3!")
print("Usage:tfind_common_lines.py <file1> <file2> <output_filepath>")
else:
arg1 = sys.argv[1]
arg2 = sys.argv[2]
arg3 = sys.argv[3]
find_common_lines(arg1, arg2, arg3)
Try using dictionary:
import sys
def find_common_lines(arg1, arg2, arg3):
alllines_dict =
with open(arg1, 'r') as f:
while True:
line = f.readline()
if not line:
break
alllines_dict[line.strip()] = 1
with open(arg3, 'w') as out:
with open(arg2, 'r') as f:
while True:
line2 = f.readline()
if not line2:
break
line2 = line2.strip()
ispresent = alllines_dict.get(line2, None)
if ispresent is not None:
out.write(line2 + 'n')
number_of_arguments = len(sys.argv)-1
print(sys.argv)
if number_of_arguments < 3:
print("ERROR:tThe script is called with less than 3 arguments, but it needs 3!")
print("Usage:tfind_common_lines.py <file1> <file2> <output_filepath>")
else:
arg1 = sys.argv[1]
arg2 = sys.argv[2]
arg3 = sys.argv[3]
find_common_lines(arg1, arg2, arg3)
answered Nov 14 '18 at 18:47
Rishabh MishraRishabh Mishra
378310
378310
add a comment |
add a comment |
first of all, you need to give 2 logical statements when using the "and" operator, right now you are using 1 logical statement and then directly feeding fh2 in the for loop. Try changing the code to something along these lines:
for line in fh1 and fh2:
if line:
f.write(line)
to
if line in fh1:
if line in fh2:
f.write(line)
Thanks! This worked!
– Riddly
Nov 14 '18 at 18:43
Glad I could help! that does the same thing, but if speed/space is an issue the 'readlines()' shouldn't be necessary and almost always slows code down. Also user the ` character(top right key above tab on a US keyboard) to show code in a comment
– SRT HellKitty
Nov 14 '18 at 18:48
hahaha, that is I was exactly trying to do. Thanks for that too!!!
– Riddly
Nov 14 '18 at 18:52
add a comment |
first of all, you need to give 2 logical statements when using the "and" operator, right now you are using 1 logical statement and then directly feeding fh2 in the for loop. Try changing the code to something along these lines:
for line in fh1 and fh2:
if line:
f.write(line)
to
if line in fh1:
if line in fh2:
f.write(line)
Thanks! This worked!
– Riddly
Nov 14 '18 at 18:43
Glad I could help! that does the same thing, but if speed/space is an issue the 'readlines()' shouldn't be necessary and almost always slows code down. Also user the ` character(top right key above tab on a US keyboard) to show code in a comment
– SRT HellKitty
Nov 14 '18 at 18:48
hahaha, that is I was exactly trying to do. Thanks for that too!!!
– Riddly
Nov 14 '18 at 18:52
add a comment |
first of all, you need to give 2 logical statements when using the "and" operator, right now you are using 1 logical statement and then directly feeding fh2 in the for loop. Try changing the code to something along these lines:
for line in fh1 and fh2:
if line:
f.write(line)
to
if line in fh1:
if line in fh2:
f.write(line)
first of all, you need to give 2 logical statements when using the "and" operator, right now you are using 1 logical statement and then directly feeding fh2 in the for loop. Try changing the code to something along these lines:
for line in fh1 and fh2:
if line:
f.write(line)
to
if line in fh1:
if line in fh2:
f.write(line)
answered Nov 14 '18 at 18:37
SRT HellKittySRT HellKitty
29518
29518
Thanks! This worked!
– Riddly
Nov 14 '18 at 18:43
Glad I could help! that does the same thing, but if speed/space is an issue the 'readlines()' shouldn't be necessary and almost always slows code down. Also user the ` character(top right key above tab on a US keyboard) to show code in a comment
– SRT HellKitty
Nov 14 '18 at 18:48
hahaha, that is I was exactly trying to do. Thanks for that too!!!
– Riddly
Nov 14 '18 at 18:52
add a comment |
Thanks! This worked!
– Riddly
Nov 14 '18 at 18:43
Glad I could help! that does the same thing, but if speed/space is an issue the 'readlines()' shouldn't be necessary and almost always slows code down. Also user the ` character(top right key above tab on a US keyboard) to show code in a comment
– SRT HellKitty
Nov 14 '18 at 18:48
hahaha, that is I was exactly trying to do. Thanks for that too!!!
– Riddly
Nov 14 '18 at 18:52
Thanks! This worked!
– Riddly
Nov 14 '18 at 18:43
Thanks! This worked!
– Riddly
Nov 14 '18 at 18:43
Glad I could help! that does the same thing, but if speed/space is an issue the 'readlines()' shouldn't be necessary and almost always slows code down. Also user the ` character(top right key above tab on a US keyboard) to show code in a comment
– SRT HellKitty
Nov 14 '18 at 18:48
Glad I could help! that does the same thing, but if speed/space is an issue the 'readlines()' shouldn't be necessary and almost always slows code down. Also user the ` character(top right key above tab on a US keyboard) to show code in a comment
– SRT HellKitty
Nov 14 '18 at 18:48
hahaha, that is I was exactly trying to do. Thanks for that too!!!
– Riddly
Nov 14 '18 at 18:52
hahaha, that is I was exactly trying to do. Thanks for that too!!!
– Riddly
Nov 14 '18 at 18:52
add a comment |
You can use python's library pandas
for this:
Create dataframes for each .txt
file like below:
In [2017]: df_A = pd.read_fwf('/home/mayankp/Documents/Personal/stackoverflow/A.txt', header=None)
In [2018]: df_A
Out[2018]:
0
0 AAB
1 BBC
2 DDE
3 GGC
In [2019]: df_B = pd.read_fwf('/home/mayankp/Documents/Personal/stackoverflow/B.txt', header=None)
In [2020]: df_B
Out[2020]:
0
0 123
1 AAB
2 DDE
3 345
4 GHY
5 GJK
Now, merge
both dataframes(like inner join) to find out only common rows between the both.
In [2021]: df_C = pd.merge(df_A, df_B, on=0, how='inner')
Out[2021]: df_C
0
0 AAB
1 DDE
Then, you can write this output in a file like below:
In [2023]: df_C.to_csv('out.csv', index=False)
This will be efficient as no loops are required, also, no complex regex are required to be written. Code becomes cleaner and simpler.
Let me know if this helps.
add a comment |
You can use python's library pandas
for this:
Create dataframes for each .txt
file like below:
In [2017]: df_A = pd.read_fwf('/home/mayankp/Documents/Personal/stackoverflow/A.txt', header=None)
In [2018]: df_A
Out[2018]:
0
0 AAB
1 BBC
2 DDE
3 GGC
In [2019]: df_B = pd.read_fwf('/home/mayankp/Documents/Personal/stackoverflow/B.txt', header=None)
In [2020]: df_B
Out[2020]:
0
0 123
1 AAB
2 DDE
3 345
4 GHY
5 GJK
Now, merge
both dataframes(like inner join) to find out only common rows between the both.
In [2021]: df_C = pd.merge(df_A, df_B, on=0, how='inner')
Out[2021]: df_C
0
0 AAB
1 DDE
Then, you can write this output in a file like below:
In [2023]: df_C.to_csv('out.csv', index=False)
This will be efficient as no loops are required, also, no complex regex are required to be written. Code becomes cleaner and simpler.
Let me know if this helps.
add a comment |
You can use python's library pandas
for this:
Create dataframes for each .txt
file like below:
In [2017]: df_A = pd.read_fwf('/home/mayankp/Documents/Personal/stackoverflow/A.txt', header=None)
In [2018]: df_A
Out[2018]:
0
0 AAB
1 BBC
2 DDE
3 GGC
In [2019]: df_B = pd.read_fwf('/home/mayankp/Documents/Personal/stackoverflow/B.txt', header=None)
In [2020]: df_B
Out[2020]:
0
0 123
1 AAB
2 DDE
3 345
4 GHY
5 GJK
Now, merge
both dataframes(like inner join) to find out only common rows between the both.
In [2021]: df_C = pd.merge(df_A, df_B, on=0, how='inner')
Out[2021]: df_C
0
0 AAB
1 DDE
Then, you can write this output in a file like below:
In [2023]: df_C.to_csv('out.csv', index=False)
This will be efficient as no loops are required, also, no complex regex are required to be written. Code becomes cleaner and simpler.
Let me know if this helps.
You can use python's library pandas
for this:
Create dataframes for each .txt
file like below:
In [2017]: df_A = pd.read_fwf('/home/mayankp/Documents/Personal/stackoverflow/A.txt', header=None)
In [2018]: df_A
Out[2018]:
0
0 AAB
1 BBC
2 DDE
3 GGC
In [2019]: df_B = pd.read_fwf('/home/mayankp/Documents/Personal/stackoverflow/B.txt', header=None)
In [2020]: df_B
Out[2020]:
0
0 123
1 AAB
2 DDE
3 345
4 GHY
5 GJK
Now, merge
both dataframes(like inner join) to find out only common rows between the both.
In [2021]: df_C = pd.merge(df_A, df_B, on=0, how='inner')
Out[2021]: df_C
0
0 AAB
1 DDE
Then, you can write this output in a file like below:
In [2023]: df_C.to_csv('out.csv', index=False)
This will be efficient as no loops are required, also, no complex regex are required to be written. Code becomes cleaner and simpler.
Let me know if this helps.
answered Nov 14 '18 at 18:37
Mayank PorwalMayank Porwal
4,9352724
4,9352724
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53306557%2ffinding-common-lines-in-2-different-files%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown