Python Remove duplicates from csv if value in column duplicated

up vote
4
down vote

favorite

I am trying to write csv parser so if i have the same name in the name column i will delete the second name's line. For example:

['CSE_MAIN\LC-CSEWS61', 'DEREGISTERED', '2018-04-18-192446'],
['CSE_MAIN\IT-Laptop12', 'DEREGISTERED', '2018-03-28-144236'],
['CSE_MAIN\LC-CSEWS61', 'DEREGISTERED', '2018-03-28-144236']]

I need that the last line will be deleted because it has the same name as the first one.

What i wrote is:

file2 = str(sys.argv[2])
print ("The first file is:" + file2)
reader2 = csv.reader (open(file2))
with open("result2.csv",'wb') as result2:
 wtr2= csv.writer( result2 )
 for r in reader2:
 wtr2.writerow( (r[0], r[6], r[9] ))
newreader2 = csv.reader (open("result2.csv"))
sortedlist2 = sorted(newreader2, key=lambda col: col[2] , reverse = True)
for i in range(len(sortedlist2)):
 for j in range(len(sortedlist2)-1):
 if (sortedlist2[i][0] == sortedlist2[j+1][0] and sortedlist2[i][1]!=sortedlist2[j+1][1]):
 if(sortedlist2[i][1]>sortedlist2[j+1][1]):
 del sortedlist2[i][0-2]
 else:
 del sortedlist2[j+1][0-2]

Thanks.

edited Nov 11 at 16:25

Ajax1234

39.2k42452

asked Nov 11 at 12:05

Rosen

211

You are deleting list entries (del sortedlist2[i]). This way it is not written to a new file yet. Print sortedlist2. So you see what is in there.
– user2853437
Nov 11 at 12:35

add a comment |

up vote
4
down vote

favorite

I am trying to write csv parser so if i have the same name in the name column i will delete the second name's line. For example:

['CSE_MAIN\LC-CSEWS61', 'DEREGISTERED', '2018-04-18-192446'],
['CSE_MAIN\IT-Laptop12', 'DEREGISTERED', '2018-03-28-144236'],
['CSE_MAIN\LC-CSEWS61', 'DEREGISTERED', '2018-03-28-144236']]

I need that the last line will be deleted because it has the same name as the first one.

What i wrote is:

file2 = str(sys.argv[2])
print ("The first file is:" + file2)
reader2 = csv.reader (open(file2))
with open("result2.csv",'wb') as result2:
 wtr2= csv.writer( result2 )
 for r in reader2:
 wtr2.writerow( (r[0], r[6], r[9] ))
newreader2 = csv.reader (open("result2.csv"))
sortedlist2 = sorted(newreader2, key=lambda col: col[2] , reverse = True)
for i in range(len(sortedlist2)):
 for j in range(len(sortedlist2)-1):
 if (sortedlist2[i][0] == sortedlist2[j+1][0] and sortedlist2[i][1]!=sortedlist2[j+1][1]):
 if(sortedlist2[i][1]>sortedlist2[j+1][1]):
 del sortedlist2[i][0-2]
 else:
 del sortedlist2[j+1][0-2]

Thanks.

edited Nov 11 at 16:25

Ajax1234

39.2k42452

asked Nov 11 at 12:05

Rosen

211

You are deleting list entries (del sortedlist2[i]). This way it is not written to a new file yet. Print sortedlist2. So you see what is in there.
– user2853437
Nov 11 at 12:35

add a comment |

up vote
4
down vote

favorite

I am trying to write csv parser so if i have the same name in the name column i will delete the second name's line. For example:

['CSE_MAIN\LC-CSEWS61', 'DEREGISTERED', '2018-04-18-192446'],
['CSE_MAIN\IT-Laptop12', 'DEREGISTERED', '2018-03-28-144236'],
['CSE_MAIN\LC-CSEWS61', 'DEREGISTERED', '2018-03-28-144236']]

I need that the last line will be deleted because it has the same name as the first one.

What i wrote is:

file2 = str(sys.argv[2])
print ("The first file is:" + file2)
reader2 = csv.reader (open(file2))
with open("result2.csv",'wb') as result2:
 wtr2= csv.writer( result2 )
 for r in reader2:
 wtr2.writerow( (r[0], r[6], r[9] ))
newreader2 = csv.reader (open("result2.csv"))
sortedlist2 = sorted(newreader2, key=lambda col: col[2] , reverse = True)
for i in range(len(sortedlist2)):
 for j in range(len(sortedlist2)-1):
 if (sortedlist2[i][0] == sortedlist2[j+1][0] and sortedlist2[i][1]!=sortedlist2[j+1][1]):
 if(sortedlist2[i][1]>sortedlist2[j+1][1]):
 del sortedlist2[i][0-2]
 else:
 del sortedlist2[j+1][0-2]

Thanks.

edited Nov 11 at 16:25

Ajax1234

39.2k42452

asked Nov 11 at 12:05

Rosen

211

I am trying to write csv parser so if i have the same name in the name column i will delete the second name's line. For example:

['CSE_MAIN\LC-CSEWS61', 'DEREGISTERED', '2018-04-18-192446'],
['CSE_MAIN\IT-Laptop12', 'DEREGISTERED', '2018-03-28-144236'],
['CSE_MAIN\LC-CSEWS61', 'DEREGISTERED', '2018-03-28-144236']]

I need that the last line will be deleted because it has the same name as the first one.

What i wrote is:

file2 = str(sys.argv[2])
print ("The first file is:" + file2)
reader2 = csv.reader (open(file2))
with open("result2.csv",'wb') as result2:
 wtr2= csv.writer( result2 )
 for r in reader2:
 wtr2.writerow( (r[0], r[6], r[9] ))
newreader2 = csv.reader (open("result2.csv"))
sortedlist2 = sorted(newreader2, key=lambda col: col[2] , reverse = True)
for i in range(len(sortedlist2)):
 for j in range(len(sortedlist2)-1):
 if (sortedlist2[i][0] == sortedlist2[j+1][0] and sortedlist2[i][1]!=sortedlist2[j+1][1]):
 if(sortedlist2[i][1]>sortedlist2[j+1][1]):
 del sortedlist2[i][0-2]
 else:
 del sortedlist2[j+1][0-2]

Thanks.

python csv parsing

edited Nov 11 at 16:25

Ajax1234

39.2k42452

asked Nov 11 at 12:05

Rosen

211

edited Nov 11 at 16:25

Ajax1234

39.2k42452

asked Nov 11 at 12:05

Rosen

211

edited Nov 11 at 16:25

Ajax1234

39.2k42452

edited Nov 11 at 16:25

Ajax1234

39.2k42452

edited Nov 11 at 16:25

Ajax1234

39.2k42452

asked Nov 11 at 12:05

Rosen

211

asked Nov 11 at 12:05

Rosen

211

asked Nov 11 at 12:05

Rosen

211

You are deleting list entries (del sortedlist2[i]). This way it is not written to a new file yet. Print sortedlist2. So you see what is in there.
– user2853437
Nov 11 at 12:35

add a comment |

You are deleting list entries (del sortedlist2[i]). This way it is not written to a new file yet. Print sortedlist2. So you see what is in there.
– user2853437
Nov 11 at 12:35

You are deleting list entries (del sortedlist2[i]). This way it is not written to a new file yet. Print sortedlist2. So you see what is in there.
– user2853437
Nov 11 at 12:35

add a comment |

2 Answers
2

active

oldest

votes

up vote
0
down vote

If you want to use csv module, a dict is probably the easiest bet:

>>> x[0]:x[1:] for x in list(csv.reader(open('bla')))[::-1]
'CSE_MAIN\\LC-CSEWS61': ['DEREGISTERED', '2018-04-18-192446'], 'CSE_MAIN\\IT-Laptop12': ['DEREGISTERED', '2018-03-28-144236']

The need to reverse ([::-1]) is to make sure the first occurrence of a key will be selected, instead of the last. The better but more lines option would probably be:

res = 
for a,b,c in csv.reader(open('bla')):
 if a not in res:
 res[a]=(b,c)

then you have a "clean" dict and no need for two iterations like the one liner.

edited Nov 11 at 12:29

answered Nov 11 at 12:24

kabanus

10.9k21237

add a comment |

up vote
0
down vote

Try with pandas:

import pandas as pd
df = pd.read_csv('path/name_file.csv')
df = df.drop_duplicates([0]) #0 this is columns which will compare.
df.to_csv('New_file.csv') #save to csv

This method delete all duplicates from columns 1.

If you need simple delete you can use method drop.

#You file after use pandas (print(df)):
 0 1 2
0 CSE_MAINLC-CSEWS61 DEREGISTERED 2018-04-18-192446
1 CSE_MAINIT-Laptop12 DEREGISTERED 2018-03-28-144236
2 CSE_MAINLC-CSEWS61 DEREGISTERED 2018-03-28-144236

For example you need delete 2 row.

df.drop(2,axis=0, inplace=True) #axis=0 means row, if you switch 1 this is columns.

Output:

 0 1 2
0 CSE_MAINLC-CSEWS61 DEREGISTERED 2018-04-18-192446
1 CSE_MAINIT-Laptop12 DEREGISTERED 2018-03-28-144236

edited Nov 11 at 14:47

answered Nov 11 at 12:17

Rudolf Morkovskyi

714116

Thanks for the response, but i meant that i will delete row only if there is same name. in my example pandas should iterate threw the names and only if there is a duplicated name it will delete the second
– Rosen
Nov 11 at 13:30

Mmm... so be it)
– Rudolf Morkovskyi
Nov 11 at 14:45

Oh) I think your need change 1 to 0 in df.drop_duplicates([0]). I fixed it. Try again.
– Rudolf Morkovskyi
Nov 11 at 14:47

add a comment |

Your Answer

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53248550%2fpython-remove-duplicates-from-csv-if-value-in-column-duplicated%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

up vote
0
down vote

If you want to use csv module, a dict is probably the easiest bet:

>>> x[0]:x[1:] for x in list(csv.reader(open('bla')))[::-1]
'CSE_MAIN\\LC-CSEWS61': ['DEREGISTERED', '2018-04-18-192446'], 'CSE_MAIN\\IT-Laptop12': ['DEREGISTERED', '2018-03-28-144236']

The need to reverse ([::-1]) is to make sure the first occurrence of a key will be selected, instead of the last. The better but more lines option would probably be:

res = 
for a,b,c in csv.reader(open('bla')):
 if a not in res:
 res[a]=(b,c)

then you have a "clean" dict and no need for two iterations like the one liner.

edited Nov 11 at 12:29

answered Nov 11 at 12:24

kabanus

10.9k21237

add a comment |

up vote
0
down vote

If you want to use csv module, a dict is probably the easiest bet:

>>> x[0]:x[1:] for x in list(csv.reader(open('bla')))[::-1]
'CSE_MAIN\\LC-CSEWS61': ['DEREGISTERED', '2018-04-18-192446'], 'CSE_MAIN\\IT-Laptop12': ['DEREGISTERED', '2018-03-28-144236']

The need to reverse ([::-1]) is to make sure the first occurrence of a key will be selected, instead of the last. The better but more lines option would probably be:

res = 
for a,b,c in csv.reader(open('bla')):
 if a not in res:
 res[a]=(b,c)

then you have a "clean" dict and no need for two iterations like the one liner.

edited Nov 11 at 12:29

answered Nov 11 at 12:24

kabanus

10.9k21237

add a comment |

up vote
0
down vote

If you want to use csv module, a dict is probably the easiest bet:

>>> x[0]:x[1:] for x in list(csv.reader(open('bla')))[::-1]
'CSE_MAIN\\LC-CSEWS61': ['DEREGISTERED', '2018-04-18-192446'], 'CSE_MAIN\\IT-Laptop12': ['DEREGISTERED', '2018-03-28-144236']

The need to reverse ([::-1]) is to make sure the first occurrence of a key will be selected, instead of the last. The better but more lines option would probably be:

res = 
for a,b,c in csv.reader(open('bla')):
 if a not in res:
 res[a]=(b,c)

then you have a "clean" dict and no need for two iterations like the one liner.

edited Nov 11 at 12:29

answered Nov 11 at 12:24

kabanus

10.9k21237

If you want to use csv module, a dict is probably the easiest bet:

>>> x[0]:x[1:] for x in list(csv.reader(open('bla')))[::-1]
'CSE_MAIN\\LC-CSEWS61': ['DEREGISTERED', '2018-04-18-192446'], 'CSE_MAIN\\IT-Laptop12': ['DEREGISTERED', '2018-03-28-144236']

The need to reverse ([::-1]) is to make sure the first occurrence of a key will be selected, instead of the last. The better but more lines option would probably be:

res = 
for a,b,c in csv.reader(open('bla')):
 if a not in res:
 res[a]=(b,c)

then you have a "clean" dict and no need for two iterations like the one liner.

edited Nov 11 at 12:29

answered Nov 11 at 12:24

kabanus

10.9k21237

edited Nov 11 at 12:29

answered Nov 11 at 12:24

kabanus

10.9k21237

answered Nov 11 at 12:24

kabanus

10.9k21237

answered Nov 11 at 12:24

kabanus

10.9k21237

add a comment |

up vote
0
down vote

Try with pandas:

import pandas as pd
df = pd.read_csv('path/name_file.csv')
df = df.drop_duplicates([0]) #0 this is columns which will compare.
df.to_csv('New_file.csv') #save to csv

This method delete all duplicates from columns 1.

If you need simple delete you can use method drop.

#You file after use pandas (print(df)):
 0 1 2
0 CSE_MAINLC-CSEWS61 DEREGISTERED 2018-04-18-192446
1 CSE_MAINIT-Laptop12 DEREGISTERED 2018-03-28-144236
2 CSE_MAINLC-CSEWS61 DEREGISTERED 2018-03-28-144236

For example you need delete 2 row.

df.drop(2,axis=0, inplace=True) #axis=0 means row, if you switch 1 this is columns.

Output:

 0 1 2
0 CSE_MAINLC-CSEWS61 DEREGISTERED 2018-04-18-192446
1 CSE_MAINIT-Laptop12 DEREGISTERED 2018-03-28-144236

edited Nov 11 at 14:47

answered Nov 11 at 12:17

Rudolf Morkovskyi

714116

Thanks for the response, but i meant that i will delete row only if there is same name. in my example pandas should iterate threw the names and only if there is a duplicated name it will delete the second
– Rosen
Nov 11 at 13:30

Mmm... so be it)
– Rudolf Morkovskyi
Nov 11 at 14:45

Oh) I think your need change 1 to 0 in df.drop_duplicates([0]). I fixed it. Try again.
– Rudolf Morkovskyi
Nov 11 at 14:47

add a comment |

up vote
0
down vote

Try with pandas:

import pandas as pd
df = pd.read_csv('path/name_file.csv')
df = df.drop_duplicates([0]) #0 this is columns which will compare.
df.to_csv('New_file.csv') #save to csv

This method delete all duplicates from columns 1.

If you need simple delete you can use method drop.

#You file after use pandas (print(df)):
 0 1 2
0 CSE_MAINLC-CSEWS61 DEREGISTERED 2018-04-18-192446
1 CSE_MAINIT-Laptop12 DEREGISTERED 2018-03-28-144236
2 CSE_MAINLC-CSEWS61 DEREGISTERED 2018-03-28-144236

For example you need delete 2 row.

df.drop(2,axis=0, inplace=True) #axis=0 means row, if you switch 1 this is columns.

Output:

 0 1 2
0 CSE_MAINLC-CSEWS61 DEREGISTERED 2018-04-18-192446
1 CSE_MAINIT-Laptop12 DEREGISTERED 2018-03-28-144236

edited Nov 11 at 14:47

answered Nov 11 at 12:17

Rudolf Morkovskyi

714116

Thanks for the response, but i meant that i will delete row only if there is same name. in my example pandas should iterate threw the names and only if there is a duplicated name it will delete the second
– Rosen
Nov 11 at 13:30

Mmm... so be it)
– Rudolf Morkovskyi
Nov 11 at 14:45

Oh) I think your need change 1 to 0 in df.drop_duplicates([0]). I fixed it. Try again.
– Rudolf Morkovskyi
Nov 11 at 14:47

add a comment |

up vote
0
down vote

Try with pandas:

import pandas as pd
df = pd.read_csv('path/name_file.csv')
df = df.drop_duplicates([0]) #0 this is columns which will compare.
df.to_csv('New_file.csv') #save to csv

This method delete all duplicates from columns 1.

If you need simple delete you can use method drop.

#You file after use pandas (print(df)):
 0 1 2
0 CSE_MAINLC-CSEWS61 DEREGISTERED 2018-04-18-192446
1 CSE_MAINIT-Laptop12 DEREGISTERED 2018-03-28-144236
2 CSE_MAINLC-CSEWS61 DEREGISTERED 2018-03-28-144236

For example you need delete 2 row.

df.drop(2,axis=0, inplace=True) #axis=0 means row, if you switch 1 this is columns.

Output:

 0 1 2
0 CSE_MAINLC-CSEWS61 DEREGISTERED 2018-04-18-192446
1 CSE_MAINIT-Laptop12 DEREGISTERED 2018-03-28-144236

edited Nov 11 at 14:47

answered Nov 11 at 12:17

Rudolf Morkovskyi

714116

Try with pandas:

import pandas as pd
df = pd.read_csv('path/name_file.csv')
df = df.drop_duplicates([0]) #0 this is columns which will compare.
df.to_csv('New_file.csv') #save to csv

This method delete all duplicates from columns 1.

If you need simple delete you can use method drop.

#You file after use pandas (print(df)):
 0 1 2
0 CSE_MAINLC-CSEWS61 DEREGISTERED 2018-04-18-192446
1 CSE_MAINIT-Laptop12 DEREGISTERED 2018-03-28-144236
2 CSE_MAINLC-CSEWS61 DEREGISTERED 2018-03-28-144236

For example you need delete 2 row.

df.drop(2,axis=0, inplace=True) #axis=0 means row, if you switch 1 this is columns.

Output:

 0 1 2
0 CSE_MAINLC-CSEWS61 DEREGISTERED 2018-04-18-192446
1 CSE_MAINIT-Laptop12 DEREGISTERED 2018-03-28-144236

edited Nov 11 at 14:47

answered Nov 11 at 12:17

Rudolf Morkovskyi

714116

edited Nov 11 at 14:47

answered Nov 11 at 12:17

Rudolf Morkovskyi

714116

answered Nov 11 at 12:17

Rudolf Morkovskyi

714116

answered Nov 11 at 12:17

Rudolf Morkovskyi

714116

Thanks for the response, but i meant that i will delete row only if there is same name. in my example pandas should iterate threw the names and only if there is a duplicated name it will delete the second
– Rosen
Nov 11 at 13:30

Mmm... so be it)
– Rudolf Morkovskyi
Nov 11 at 14:45

Oh) I think your need change 1 to 0 in df.drop_duplicates([0]). I fixed it. Try again.
– Rudolf Morkovskyi
Nov 11 at 14:47

add a comment |

Thanks for the response, but i meant that i will delete row only if there is same name. in my example pandas should iterate threw the names and only if there is a duplicated name it will delete the second
– Rosen
Nov 11 at 13:30

Mmm... so be it)
– Rudolf Morkovskyi
Nov 11 at 14:45

Oh) I think your need change 1 to 0 in df.drop_duplicates([0]). I fixed it. Try again.
– Rudolf Morkovskyi
Nov 11 at 14:47

Thanks for the response, but i meant that i will delete row only if there is same name. in my example pandas should iterate threw the names and only if there is a duplicated name it will delete the second
– Rosen
Nov 11 at 13:30

Mmm... so be it)
– Rudolf Morkovskyi
Nov 11 at 14:45

Oh) I think your need change 1 to 0 in df.drop_duplicates([0]). I fixed it. Try again.
– Rudolf Morkovskyi
Nov 11 at 14:47

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

Some of your past answers have not been well-received, and you're in danger of being blocked from answering.

Please pay close attention to the following guidance:

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Odtnhj