Python Remove duplicates from csv if value in column duplicated
up vote
4
down vote
favorite
I am trying to write csv parser so if i have the same name in the name column i will delete the second name's line. For example:
['CSE_MAIN\LC-CSEWS61', 'DEREGISTERED', '2018-04-18-192446'],
['CSE_MAIN\IT-Laptop12', 'DEREGISTERED', '2018-03-28-144236'],
['CSE_MAIN\LC-CSEWS61', 'DEREGISTERED', '2018-03-28-144236']]
I need that the last line will be deleted because it has the same name as the first one.
What i wrote is:
file2 = str(sys.argv[2])
print ("The first file is:" + file2)
reader2 = csv.reader (open(file2))
with open("result2.csv",'wb') as result2:
wtr2= csv.writer( result2 )
for r in reader2:
wtr2.writerow( (r[0], r[6], r[9] ))
newreader2 = csv.reader (open("result2.csv"))
sortedlist2 = sorted(newreader2, key=lambda col: col[2] , reverse = True)
for i in range(len(sortedlist2)):
for j in range(len(sortedlist2)-1):
if (sortedlist2[i][0] == sortedlist2[j+1][0] and sortedlist2[i][1]!=sortedlist2[j+1][1]):
if(sortedlist2[i][1]>sortedlist2[j+1][1]):
del sortedlist2[i][0-2]
else:
del sortedlist2[j+1][0-2]
Thanks.
python csv parsing
add a comment |
up vote
4
down vote
favorite
I am trying to write csv parser so if i have the same name in the name column i will delete the second name's line. For example:
['CSE_MAIN\LC-CSEWS61', 'DEREGISTERED', '2018-04-18-192446'],
['CSE_MAIN\IT-Laptop12', 'DEREGISTERED', '2018-03-28-144236'],
['CSE_MAIN\LC-CSEWS61', 'DEREGISTERED', '2018-03-28-144236']]
I need that the last line will be deleted because it has the same name as the first one.
What i wrote is:
file2 = str(sys.argv[2])
print ("The first file is:" + file2)
reader2 = csv.reader (open(file2))
with open("result2.csv",'wb') as result2:
wtr2= csv.writer( result2 )
for r in reader2:
wtr2.writerow( (r[0], r[6], r[9] ))
newreader2 = csv.reader (open("result2.csv"))
sortedlist2 = sorted(newreader2, key=lambda col: col[2] , reverse = True)
for i in range(len(sortedlist2)):
for j in range(len(sortedlist2)-1):
if (sortedlist2[i][0] == sortedlist2[j+1][0] and sortedlist2[i][1]!=sortedlist2[j+1][1]):
if(sortedlist2[i][1]>sortedlist2[j+1][1]):
del sortedlist2[i][0-2]
else:
del sortedlist2[j+1][0-2]
Thanks.
python csv parsing
You are deleting list entries (del sortedlist2[i]). This way it is not written to a new file yet. Print sortedlist2. So you see what is in there.
– user2853437
Nov 11 at 12:35
add a comment |
up vote
4
down vote
favorite
up vote
4
down vote
favorite
I am trying to write csv parser so if i have the same name in the name column i will delete the second name's line. For example:
['CSE_MAIN\LC-CSEWS61', 'DEREGISTERED', '2018-04-18-192446'],
['CSE_MAIN\IT-Laptop12', 'DEREGISTERED', '2018-03-28-144236'],
['CSE_MAIN\LC-CSEWS61', 'DEREGISTERED', '2018-03-28-144236']]
I need that the last line will be deleted because it has the same name as the first one.
What i wrote is:
file2 = str(sys.argv[2])
print ("The first file is:" + file2)
reader2 = csv.reader (open(file2))
with open("result2.csv",'wb') as result2:
wtr2= csv.writer( result2 )
for r in reader2:
wtr2.writerow( (r[0], r[6], r[9] ))
newreader2 = csv.reader (open("result2.csv"))
sortedlist2 = sorted(newreader2, key=lambda col: col[2] , reverse = True)
for i in range(len(sortedlist2)):
for j in range(len(sortedlist2)-1):
if (sortedlist2[i][0] == sortedlist2[j+1][0] and sortedlist2[i][1]!=sortedlist2[j+1][1]):
if(sortedlist2[i][1]>sortedlist2[j+1][1]):
del sortedlist2[i][0-2]
else:
del sortedlist2[j+1][0-2]
Thanks.
python csv parsing
I am trying to write csv parser so if i have the same name in the name column i will delete the second name's line. For example:
['CSE_MAIN\LC-CSEWS61', 'DEREGISTERED', '2018-04-18-192446'],
['CSE_MAIN\IT-Laptop12', 'DEREGISTERED', '2018-03-28-144236'],
['CSE_MAIN\LC-CSEWS61', 'DEREGISTERED', '2018-03-28-144236']]
I need that the last line will be deleted because it has the same name as the first one.
What i wrote is:
file2 = str(sys.argv[2])
print ("The first file is:" + file2)
reader2 = csv.reader (open(file2))
with open("result2.csv",'wb') as result2:
wtr2= csv.writer( result2 )
for r in reader2:
wtr2.writerow( (r[0], r[6], r[9] ))
newreader2 = csv.reader (open("result2.csv"))
sortedlist2 = sorted(newreader2, key=lambda col: col[2] , reverse = True)
for i in range(len(sortedlist2)):
for j in range(len(sortedlist2)-1):
if (sortedlist2[i][0] == sortedlist2[j+1][0] and sortedlist2[i][1]!=sortedlist2[j+1][1]):
if(sortedlist2[i][1]>sortedlist2[j+1][1]):
del sortedlist2[i][0-2]
else:
del sortedlist2[j+1][0-2]
Thanks.
python csv parsing
python csv parsing
edited Nov 11 at 16:25
Ajax1234
39.2k42452
39.2k42452
asked Nov 11 at 12:05
Rosen
211
211
You are deleting list entries (del sortedlist2[i]). This way it is not written to a new file yet. Print sortedlist2. So you see what is in there.
– user2853437
Nov 11 at 12:35
add a comment |
You are deleting list entries (del sortedlist2[i]). This way it is not written to a new file yet. Print sortedlist2. So you see what is in there.
– user2853437
Nov 11 at 12:35
You are deleting list entries (del sortedlist2[i]). This way it is not written to a new file yet. Print sortedlist2. So you see what is in there.
– user2853437
Nov 11 at 12:35
You are deleting list entries (del sortedlist2[i]). This way it is not written to a new file yet. Print sortedlist2. So you see what is in there.
– user2853437
Nov 11 at 12:35
add a comment |
2 Answers
2
active
oldest
votes
up vote
0
down vote
If you want to use csv
module, a dict
is probably the easiest bet:
>>> x[0]:x[1:] for x in list(csv.reader(open('bla')))[::-1]
'CSE_MAIN\\LC-CSEWS61': ['DEREGISTERED', '2018-04-18-192446'], 'CSE_MAIN\\IT-Laptop12': ['DEREGISTERED', '2018-03-28-144236']
The need to reverse ([::-1]
) is to make sure the first occurrence of a key will be selected, instead of the last. The better but more lines option would probably be:
res =
for a,b,c in csv.reader(open('bla')):
if a not in res:
res[a]=(b,c)
then you have a "clean" dict and no need for two iterations like the one liner.
add a comment |
up vote
0
down vote
Try with pandas:
import pandas as pd
df = pd.read_csv('path/name_file.csv')
df = df.drop_duplicates([0]) #0 this is columns which will compare.
df.to_csv('New_file.csv') #save to csv
This method delete all duplicates from columns 1.
If you need simple delete you can use method drop.
#You file after use pandas (print(df)):
0 1 2
0 CSE_MAINLC-CSEWS61 DEREGISTERED 2018-04-18-192446
1 CSE_MAINIT-Laptop12 DEREGISTERED 2018-03-28-144236
2 CSE_MAINLC-CSEWS61 DEREGISTERED 2018-03-28-144236
For example you need delete 2 row.
df.drop(2,axis=0, inplace=True) #axis=0 means row, if you switch 1 this is columns.
Output:
0 1 2
0 CSE_MAINLC-CSEWS61 DEREGISTERED 2018-04-18-192446
1 CSE_MAINIT-Laptop12 DEREGISTERED 2018-03-28-144236
Thanks for the response, but i meant that i will delete row only if there is same name. in my example pandas should iterate threw the names and only if there is a duplicated name it will delete the second
– Rosen
Nov 11 at 13:30
Mmm... so be it)
– Rudolf Morkovskyi
Nov 11 at 14:45
Oh) I think your need change 1 to 0 indf.drop_duplicates([0])
. I fixed it. Try again.
– Rudolf Morkovskyi
Nov 11 at 14:47
add a comment |
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
0
down vote
If you want to use csv
module, a dict
is probably the easiest bet:
>>> x[0]:x[1:] for x in list(csv.reader(open('bla')))[::-1]
'CSE_MAIN\\LC-CSEWS61': ['DEREGISTERED', '2018-04-18-192446'], 'CSE_MAIN\\IT-Laptop12': ['DEREGISTERED', '2018-03-28-144236']
The need to reverse ([::-1]
) is to make sure the first occurrence of a key will be selected, instead of the last. The better but more lines option would probably be:
res =
for a,b,c in csv.reader(open('bla')):
if a not in res:
res[a]=(b,c)
then you have a "clean" dict and no need for two iterations like the one liner.
add a comment |
up vote
0
down vote
If you want to use csv
module, a dict
is probably the easiest bet:
>>> x[0]:x[1:] for x in list(csv.reader(open('bla')))[::-1]
'CSE_MAIN\\LC-CSEWS61': ['DEREGISTERED', '2018-04-18-192446'], 'CSE_MAIN\\IT-Laptop12': ['DEREGISTERED', '2018-03-28-144236']
The need to reverse ([::-1]
) is to make sure the first occurrence of a key will be selected, instead of the last. The better but more lines option would probably be:
res =
for a,b,c in csv.reader(open('bla')):
if a not in res:
res[a]=(b,c)
then you have a "clean" dict and no need for two iterations like the one liner.
add a comment |
up vote
0
down vote
up vote
0
down vote
If you want to use csv
module, a dict
is probably the easiest bet:
>>> x[0]:x[1:] for x in list(csv.reader(open('bla')))[::-1]
'CSE_MAIN\\LC-CSEWS61': ['DEREGISTERED', '2018-04-18-192446'], 'CSE_MAIN\\IT-Laptop12': ['DEREGISTERED', '2018-03-28-144236']
The need to reverse ([::-1]
) is to make sure the first occurrence of a key will be selected, instead of the last. The better but more lines option would probably be:
res =
for a,b,c in csv.reader(open('bla')):
if a not in res:
res[a]=(b,c)
then you have a "clean" dict and no need for two iterations like the one liner.
If you want to use csv
module, a dict
is probably the easiest bet:
>>> x[0]:x[1:] for x in list(csv.reader(open('bla')))[::-1]
'CSE_MAIN\\LC-CSEWS61': ['DEREGISTERED', '2018-04-18-192446'], 'CSE_MAIN\\IT-Laptop12': ['DEREGISTERED', '2018-03-28-144236']
The need to reverse ([::-1]
) is to make sure the first occurrence of a key will be selected, instead of the last. The better but more lines option would probably be:
res =
for a,b,c in csv.reader(open('bla')):
if a not in res:
res[a]=(b,c)
then you have a "clean" dict and no need for two iterations like the one liner.
edited Nov 11 at 12:29
answered Nov 11 at 12:24
kabanus
10.9k21237
10.9k21237
add a comment |
add a comment |
up vote
0
down vote
Try with pandas:
import pandas as pd
df = pd.read_csv('path/name_file.csv')
df = df.drop_duplicates([0]) #0 this is columns which will compare.
df.to_csv('New_file.csv') #save to csv
This method delete all duplicates from columns 1.
If you need simple delete you can use method drop.
#You file after use pandas (print(df)):
0 1 2
0 CSE_MAINLC-CSEWS61 DEREGISTERED 2018-04-18-192446
1 CSE_MAINIT-Laptop12 DEREGISTERED 2018-03-28-144236
2 CSE_MAINLC-CSEWS61 DEREGISTERED 2018-03-28-144236
For example you need delete 2 row.
df.drop(2,axis=0, inplace=True) #axis=0 means row, if you switch 1 this is columns.
Output:
0 1 2
0 CSE_MAINLC-CSEWS61 DEREGISTERED 2018-04-18-192446
1 CSE_MAINIT-Laptop12 DEREGISTERED 2018-03-28-144236
Thanks for the response, but i meant that i will delete row only if there is same name. in my example pandas should iterate threw the names and only if there is a duplicated name it will delete the second
– Rosen
Nov 11 at 13:30
Mmm... so be it)
– Rudolf Morkovskyi
Nov 11 at 14:45
Oh) I think your need change 1 to 0 indf.drop_duplicates([0])
. I fixed it. Try again.
– Rudolf Morkovskyi
Nov 11 at 14:47
add a comment |
up vote
0
down vote
Try with pandas:
import pandas as pd
df = pd.read_csv('path/name_file.csv')
df = df.drop_duplicates([0]) #0 this is columns which will compare.
df.to_csv('New_file.csv') #save to csv
This method delete all duplicates from columns 1.
If you need simple delete you can use method drop.
#You file after use pandas (print(df)):
0 1 2
0 CSE_MAINLC-CSEWS61 DEREGISTERED 2018-04-18-192446
1 CSE_MAINIT-Laptop12 DEREGISTERED 2018-03-28-144236
2 CSE_MAINLC-CSEWS61 DEREGISTERED 2018-03-28-144236
For example you need delete 2 row.
df.drop(2,axis=0, inplace=True) #axis=0 means row, if you switch 1 this is columns.
Output:
0 1 2
0 CSE_MAINLC-CSEWS61 DEREGISTERED 2018-04-18-192446
1 CSE_MAINIT-Laptop12 DEREGISTERED 2018-03-28-144236
Thanks for the response, but i meant that i will delete row only if there is same name. in my example pandas should iterate threw the names and only if there is a duplicated name it will delete the second
– Rosen
Nov 11 at 13:30
Mmm... so be it)
– Rudolf Morkovskyi
Nov 11 at 14:45
Oh) I think your need change 1 to 0 indf.drop_duplicates([0])
. I fixed it. Try again.
– Rudolf Morkovskyi
Nov 11 at 14:47
add a comment |
up vote
0
down vote
up vote
0
down vote
Try with pandas:
import pandas as pd
df = pd.read_csv('path/name_file.csv')
df = df.drop_duplicates([0]) #0 this is columns which will compare.
df.to_csv('New_file.csv') #save to csv
This method delete all duplicates from columns 1.
If you need simple delete you can use method drop.
#You file after use pandas (print(df)):
0 1 2
0 CSE_MAINLC-CSEWS61 DEREGISTERED 2018-04-18-192446
1 CSE_MAINIT-Laptop12 DEREGISTERED 2018-03-28-144236
2 CSE_MAINLC-CSEWS61 DEREGISTERED 2018-03-28-144236
For example you need delete 2 row.
df.drop(2,axis=0, inplace=True) #axis=0 means row, if you switch 1 this is columns.
Output:
0 1 2
0 CSE_MAINLC-CSEWS61 DEREGISTERED 2018-04-18-192446
1 CSE_MAINIT-Laptop12 DEREGISTERED 2018-03-28-144236
Try with pandas:
import pandas as pd
df = pd.read_csv('path/name_file.csv')
df = df.drop_duplicates([0]) #0 this is columns which will compare.
df.to_csv('New_file.csv') #save to csv
This method delete all duplicates from columns 1.
If you need simple delete you can use method drop.
#You file after use pandas (print(df)):
0 1 2
0 CSE_MAINLC-CSEWS61 DEREGISTERED 2018-04-18-192446
1 CSE_MAINIT-Laptop12 DEREGISTERED 2018-03-28-144236
2 CSE_MAINLC-CSEWS61 DEREGISTERED 2018-03-28-144236
For example you need delete 2 row.
df.drop(2,axis=0, inplace=True) #axis=0 means row, if you switch 1 this is columns.
Output:
0 1 2
0 CSE_MAINLC-CSEWS61 DEREGISTERED 2018-04-18-192446
1 CSE_MAINIT-Laptop12 DEREGISTERED 2018-03-28-144236
edited Nov 11 at 14:47
answered Nov 11 at 12:17
Rudolf Morkovskyi
714116
714116
Thanks for the response, but i meant that i will delete row only if there is same name. in my example pandas should iterate threw the names and only if there is a duplicated name it will delete the second
– Rosen
Nov 11 at 13:30
Mmm... so be it)
– Rudolf Morkovskyi
Nov 11 at 14:45
Oh) I think your need change 1 to 0 indf.drop_duplicates([0])
. I fixed it. Try again.
– Rudolf Morkovskyi
Nov 11 at 14:47
add a comment |
Thanks for the response, but i meant that i will delete row only if there is same name. in my example pandas should iterate threw the names and only if there is a duplicated name it will delete the second
– Rosen
Nov 11 at 13:30
Mmm... so be it)
– Rudolf Morkovskyi
Nov 11 at 14:45
Oh) I think your need change 1 to 0 indf.drop_duplicates([0])
. I fixed it. Try again.
– Rudolf Morkovskyi
Nov 11 at 14:47
Thanks for the response, but i meant that i will delete row only if there is same name. in my example pandas should iterate threw the names and only if there is a duplicated name it will delete the second
– Rosen
Nov 11 at 13:30
Thanks for the response, but i meant that i will delete row only if there is same name. in my example pandas should iterate threw the names and only if there is a duplicated name it will delete the second
– Rosen
Nov 11 at 13:30
Mmm... so be it)
– Rudolf Morkovskyi
Nov 11 at 14:45
Mmm... so be it)
– Rudolf Morkovskyi
Nov 11 at 14:45
Oh) I think your need change 1 to 0 in
df.drop_duplicates([0])
. I fixed it. Try again.– Rudolf Morkovskyi
Nov 11 at 14:47
Oh) I think your need change 1 to 0 in
df.drop_duplicates([0])
. I fixed it. Try again.– Rudolf Morkovskyi
Nov 11 at 14:47
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53248550%2fpython-remove-duplicates-from-csv-if-value-in-column-duplicated%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
You are deleting list entries (del sortedlist2[i]). This way it is not written to a new file yet. Print sortedlist2. So you see what is in there.
– user2853437
Nov 11 at 12:35