Python Remove duplicates from csv if value in column duplicated









up vote
4
down vote

favorite












I am trying to write csv parser so if i have the same name in the name column i will delete the second name's line. For example:



['CSE_MAIN\LC-CSEWS61', 'DEREGISTERED', '2018-04-18-192446'],
['CSE_MAIN\IT-Laptop12', 'DEREGISTERED', '2018-03-28-144236'],
['CSE_MAIN\LC-CSEWS61', 'DEREGISTERED', '2018-03-28-144236']]


I need that the last line will be deleted because it has the same name as the first one.



What i wrote is:



file2 = str(sys.argv[2])
print ("The first file is:" + file2)
reader2 = csv.reader (open(file2))
with open("result2.csv",'wb') as result2:
wtr2= csv.writer( result2 )
for r in reader2:
wtr2.writerow( (r[0], r[6], r[9] ))
newreader2 = csv.reader (open("result2.csv"))
sortedlist2 = sorted(newreader2, key=lambda col: col[2] , reverse = True)
for i in range(len(sortedlist2)):
for j in range(len(sortedlist2)-1):
if (sortedlist2[i][0] == sortedlist2[j+1][0] and sortedlist2[i][1]!=sortedlist2[j+1][1]):
if(sortedlist2[i][1]>sortedlist2[j+1][1]):
del sortedlist2[i][0-2]
else:
del sortedlist2[j+1][0-2]


Thanks.










share|improve this question























  • You are deleting list entries (del sortedlist2[i]). This way it is not written to a new file yet. Print sortedlist2. So you see what is in there.
    – user2853437
    Nov 11 at 12:35














up vote
4
down vote

favorite












I am trying to write csv parser so if i have the same name in the name column i will delete the second name's line. For example:



['CSE_MAIN\LC-CSEWS61', 'DEREGISTERED', '2018-04-18-192446'],
['CSE_MAIN\IT-Laptop12', 'DEREGISTERED', '2018-03-28-144236'],
['CSE_MAIN\LC-CSEWS61', 'DEREGISTERED', '2018-03-28-144236']]


I need that the last line will be deleted because it has the same name as the first one.



What i wrote is:



file2 = str(sys.argv[2])
print ("The first file is:" + file2)
reader2 = csv.reader (open(file2))
with open("result2.csv",'wb') as result2:
wtr2= csv.writer( result2 )
for r in reader2:
wtr2.writerow( (r[0], r[6], r[9] ))
newreader2 = csv.reader (open("result2.csv"))
sortedlist2 = sorted(newreader2, key=lambda col: col[2] , reverse = True)
for i in range(len(sortedlist2)):
for j in range(len(sortedlist2)-1):
if (sortedlist2[i][0] == sortedlist2[j+1][0] and sortedlist2[i][1]!=sortedlist2[j+1][1]):
if(sortedlist2[i][1]>sortedlist2[j+1][1]):
del sortedlist2[i][0-2]
else:
del sortedlist2[j+1][0-2]


Thanks.










share|improve this question























  • You are deleting list entries (del sortedlist2[i]). This way it is not written to a new file yet. Print sortedlist2. So you see what is in there.
    – user2853437
    Nov 11 at 12:35












up vote
4
down vote

favorite









up vote
4
down vote

favorite











I am trying to write csv parser so if i have the same name in the name column i will delete the second name's line. For example:



['CSE_MAIN\LC-CSEWS61', 'DEREGISTERED', '2018-04-18-192446'],
['CSE_MAIN\IT-Laptop12', 'DEREGISTERED', '2018-03-28-144236'],
['CSE_MAIN\LC-CSEWS61', 'DEREGISTERED', '2018-03-28-144236']]


I need that the last line will be deleted because it has the same name as the first one.



What i wrote is:



file2 = str(sys.argv[2])
print ("The first file is:" + file2)
reader2 = csv.reader (open(file2))
with open("result2.csv",'wb') as result2:
wtr2= csv.writer( result2 )
for r in reader2:
wtr2.writerow( (r[0], r[6], r[9] ))
newreader2 = csv.reader (open("result2.csv"))
sortedlist2 = sorted(newreader2, key=lambda col: col[2] , reverse = True)
for i in range(len(sortedlist2)):
for j in range(len(sortedlist2)-1):
if (sortedlist2[i][0] == sortedlist2[j+1][0] and sortedlist2[i][1]!=sortedlist2[j+1][1]):
if(sortedlist2[i][1]>sortedlist2[j+1][1]):
del sortedlist2[i][0-2]
else:
del sortedlist2[j+1][0-2]


Thanks.










share|improve this question















I am trying to write csv parser so if i have the same name in the name column i will delete the second name's line. For example:



['CSE_MAIN\LC-CSEWS61', 'DEREGISTERED', '2018-04-18-192446'],
['CSE_MAIN\IT-Laptop12', 'DEREGISTERED', '2018-03-28-144236'],
['CSE_MAIN\LC-CSEWS61', 'DEREGISTERED', '2018-03-28-144236']]


I need that the last line will be deleted because it has the same name as the first one.



What i wrote is:



file2 = str(sys.argv[2])
print ("The first file is:" + file2)
reader2 = csv.reader (open(file2))
with open("result2.csv",'wb') as result2:
wtr2= csv.writer( result2 )
for r in reader2:
wtr2.writerow( (r[0], r[6], r[9] ))
newreader2 = csv.reader (open("result2.csv"))
sortedlist2 = sorted(newreader2, key=lambda col: col[2] , reverse = True)
for i in range(len(sortedlist2)):
for j in range(len(sortedlist2)-1):
if (sortedlist2[i][0] == sortedlist2[j+1][0] and sortedlist2[i][1]!=sortedlist2[j+1][1]):
if(sortedlist2[i][1]>sortedlist2[j+1][1]):
del sortedlist2[i][0-2]
else:
del sortedlist2[j+1][0-2]


Thanks.







python csv parsing






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 11 at 16:25









Ajax1234

39.2k42452




39.2k42452










asked Nov 11 at 12:05









Rosen

211




211











  • You are deleting list entries (del sortedlist2[i]). This way it is not written to a new file yet. Print sortedlist2. So you see what is in there.
    – user2853437
    Nov 11 at 12:35
















  • You are deleting list entries (del sortedlist2[i]). This way it is not written to a new file yet. Print sortedlist2. So you see what is in there.
    – user2853437
    Nov 11 at 12:35















You are deleting list entries (del sortedlist2[i]). This way it is not written to a new file yet. Print sortedlist2. So you see what is in there.
– user2853437
Nov 11 at 12:35




You are deleting list entries (del sortedlist2[i]). This way it is not written to a new file yet. Print sortedlist2. So you see what is in there.
– user2853437
Nov 11 at 12:35












2 Answers
2






active

oldest

votes

















up vote
0
down vote













If you want to use csv module, a dict is probably the easiest bet:



>>> x[0]:x[1:] for x in list(csv.reader(open('bla')))[::-1]
'CSE_MAIN\\LC-CSEWS61': ['DEREGISTERED', '2018-04-18-192446'], 'CSE_MAIN\\IT-Laptop12': ['DEREGISTERED', '2018-03-28-144236']


The need to reverse ([::-1]) is to make sure the first occurrence of a key will be selected, instead of the last. The better but more lines option would probably be:



res = 
for a,b,c in csv.reader(open('bla')):
if a not in res:
res[a]=(b,c)


then you have a "clean" dict and no need for two iterations like the one liner.






share|improve this answer





























    up vote
    0
    down vote













    Try with pandas:



    import pandas as pd
    df = pd.read_csv('path/name_file.csv')
    df = df.drop_duplicates([0]) #0 this is columns which will compare.
    df.to_csv('New_file.csv') #save to csv


    This method delete all duplicates from columns 1.



    If you need simple delete you can use method drop.



    #You file after use pandas (print(df)):
    0 1 2
    0 CSE_MAINLC-CSEWS61 DEREGISTERED 2018-04-18-192446
    1 CSE_MAINIT-Laptop12 DEREGISTERED 2018-03-28-144236
    2 CSE_MAINLC-CSEWS61 DEREGISTERED 2018-03-28-144236


    For example you need delete 2 row.



    df.drop(2,axis=0, inplace=True) #axis=0 means row, if you switch 1 this is columns. 


    Output:



     0 1 2
    0 CSE_MAINLC-CSEWS61 DEREGISTERED 2018-04-18-192446
    1 CSE_MAINIT-Laptop12 DEREGISTERED 2018-03-28-144236





    share|improve this answer






















    • Thanks for the response, but i meant that i will delete row only if there is same name. in my example pandas should iterate threw the names and only if there is a duplicated name it will delete the second
      – Rosen
      Nov 11 at 13:30










    • Mmm... so be it)
      – Rudolf Morkovskyi
      Nov 11 at 14:45










    • Oh) I think your need change 1 to 0 in df.drop_duplicates([0]). I fixed it. Try again.
      – Rudolf Morkovskyi
      Nov 11 at 14:47











    Your Answer






    StackExchange.ifUsing("editor", function ()
    StackExchange.using("externalEditor", function ()
    StackExchange.using("snippets", function ()
    StackExchange.snippets.init();
    );
    );
    , "code-snippets");

    StackExchange.ready(function()
    var channelOptions =
    tags: "".split(" "),
    id: "1"
    ;
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function()
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled)
    StackExchange.using("snippets", function()
    createEditor();
    );

    else
    createEditor();

    );

    function createEditor()
    StackExchange.prepareEditor(
    heartbeatType: 'answer',
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader:
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    ,
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    );



    );













    draft saved

    draft discarded


















    StackExchange.ready(
    function ()
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53248550%2fpython-remove-duplicates-from-csv-if-value-in-column-duplicated%23new-answer', 'question_page');

    );

    Post as a guest















    Required, but never shown

























    2 Answers
    2






    active

    oldest

    votes








    2 Answers
    2






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes








    up vote
    0
    down vote













    If you want to use csv module, a dict is probably the easiest bet:



    >>> x[0]:x[1:] for x in list(csv.reader(open('bla')))[::-1]
    'CSE_MAIN\\LC-CSEWS61': ['DEREGISTERED', '2018-04-18-192446'], 'CSE_MAIN\\IT-Laptop12': ['DEREGISTERED', '2018-03-28-144236']


    The need to reverse ([::-1]) is to make sure the first occurrence of a key will be selected, instead of the last. The better but more lines option would probably be:



    res = 
    for a,b,c in csv.reader(open('bla')):
    if a not in res:
    res[a]=(b,c)


    then you have a "clean" dict and no need for two iterations like the one liner.






    share|improve this answer


























      up vote
      0
      down vote













      If you want to use csv module, a dict is probably the easiest bet:



      >>> x[0]:x[1:] for x in list(csv.reader(open('bla')))[::-1]
      'CSE_MAIN\\LC-CSEWS61': ['DEREGISTERED', '2018-04-18-192446'], 'CSE_MAIN\\IT-Laptop12': ['DEREGISTERED', '2018-03-28-144236']


      The need to reverse ([::-1]) is to make sure the first occurrence of a key will be selected, instead of the last. The better but more lines option would probably be:



      res = 
      for a,b,c in csv.reader(open('bla')):
      if a not in res:
      res[a]=(b,c)


      then you have a "clean" dict and no need for two iterations like the one liner.






      share|improve this answer
























        up vote
        0
        down vote










        up vote
        0
        down vote









        If you want to use csv module, a dict is probably the easiest bet:



        >>> x[0]:x[1:] for x in list(csv.reader(open('bla')))[::-1]
        'CSE_MAIN\\LC-CSEWS61': ['DEREGISTERED', '2018-04-18-192446'], 'CSE_MAIN\\IT-Laptop12': ['DEREGISTERED', '2018-03-28-144236']


        The need to reverse ([::-1]) is to make sure the first occurrence of a key will be selected, instead of the last. The better but more lines option would probably be:



        res = 
        for a,b,c in csv.reader(open('bla')):
        if a not in res:
        res[a]=(b,c)


        then you have a "clean" dict and no need for two iterations like the one liner.






        share|improve this answer














        If you want to use csv module, a dict is probably the easiest bet:



        >>> x[0]:x[1:] for x in list(csv.reader(open('bla')))[::-1]
        'CSE_MAIN\\LC-CSEWS61': ['DEREGISTERED', '2018-04-18-192446'], 'CSE_MAIN\\IT-Laptop12': ['DEREGISTERED', '2018-03-28-144236']


        The need to reverse ([::-1]) is to make sure the first occurrence of a key will be selected, instead of the last. The better but more lines option would probably be:



        res = 
        for a,b,c in csv.reader(open('bla')):
        if a not in res:
        res[a]=(b,c)


        then you have a "clean" dict and no need for two iterations like the one liner.







        share|improve this answer














        share|improve this answer



        share|improve this answer








        edited Nov 11 at 12:29

























        answered Nov 11 at 12:24









        kabanus

        10.9k21237




        10.9k21237






















            up vote
            0
            down vote













            Try with pandas:



            import pandas as pd
            df = pd.read_csv('path/name_file.csv')
            df = df.drop_duplicates([0]) #0 this is columns which will compare.
            df.to_csv('New_file.csv') #save to csv


            This method delete all duplicates from columns 1.



            If you need simple delete you can use method drop.



            #You file after use pandas (print(df)):
            0 1 2
            0 CSE_MAINLC-CSEWS61 DEREGISTERED 2018-04-18-192446
            1 CSE_MAINIT-Laptop12 DEREGISTERED 2018-03-28-144236
            2 CSE_MAINLC-CSEWS61 DEREGISTERED 2018-03-28-144236


            For example you need delete 2 row.



            df.drop(2,axis=0, inplace=True) #axis=0 means row, if you switch 1 this is columns. 


            Output:



             0 1 2
            0 CSE_MAINLC-CSEWS61 DEREGISTERED 2018-04-18-192446
            1 CSE_MAINIT-Laptop12 DEREGISTERED 2018-03-28-144236





            share|improve this answer






















            • Thanks for the response, but i meant that i will delete row only if there is same name. in my example pandas should iterate threw the names and only if there is a duplicated name it will delete the second
              – Rosen
              Nov 11 at 13:30










            • Mmm... so be it)
              – Rudolf Morkovskyi
              Nov 11 at 14:45










            • Oh) I think your need change 1 to 0 in df.drop_duplicates([0]). I fixed it. Try again.
              – Rudolf Morkovskyi
              Nov 11 at 14:47















            up vote
            0
            down vote













            Try with pandas:



            import pandas as pd
            df = pd.read_csv('path/name_file.csv')
            df = df.drop_duplicates([0]) #0 this is columns which will compare.
            df.to_csv('New_file.csv') #save to csv


            This method delete all duplicates from columns 1.



            If you need simple delete you can use method drop.



            #You file after use pandas (print(df)):
            0 1 2
            0 CSE_MAINLC-CSEWS61 DEREGISTERED 2018-04-18-192446
            1 CSE_MAINIT-Laptop12 DEREGISTERED 2018-03-28-144236
            2 CSE_MAINLC-CSEWS61 DEREGISTERED 2018-03-28-144236


            For example you need delete 2 row.



            df.drop(2,axis=0, inplace=True) #axis=0 means row, if you switch 1 this is columns. 


            Output:



             0 1 2
            0 CSE_MAINLC-CSEWS61 DEREGISTERED 2018-04-18-192446
            1 CSE_MAINIT-Laptop12 DEREGISTERED 2018-03-28-144236





            share|improve this answer






















            • Thanks for the response, but i meant that i will delete row only if there is same name. in my example pandas should iterate threw the names and only if there is a duplicated name it will delete the second
              – Rosen
              Nov 11 at 13:30










            • Mmm... so be it)
              – Rudolf Morkovskyi
              Nov 11 at 14:45










            • Oh) I think your need change 1 to 0 in df.drop_duplicates([0]). I fixed it. Try again.
              – Rudolf Morkovskyi
              Nov 11 at 14:47













            up vote
            0
            down vote










            up vote
            0
            down vote









            Try with pandas:



            import pandas as pd
            df = pd.read_csv('path/name_file.csv')
            df = df.drop_duplicates([0]) #0 this is columns which will compare.
            df.to_csv('New_file.csv') #save to csv


            This method delete all duplicates from columns 1.



            If you need simple delete you can use method drop.



            #You file after use pandas (print(df)):
            0 1 2
            0 CSE_MAINLC-CSEWS61 DEREGISTERED 2018-04-18-192446
            1 CSE_MAINIT-Laptop12 DEREGISTERED 2018-03-28-144236
            2 CSE_MAINLC-CSEWS61 DEREGISTERED 2018-03-28-144236


            For example you need delete 2 row.



            df.drop(2,axis=0, inplace=True) #axis=0 means row, if you switch 1 this is columns. 


            Output:



             0 1 2
            0 CSE_MAINLC-CSEWS61 DEREGISTERED 2018-04-18-192446
            1 CSE_MAINIT-Laptop12 DEREGISTERED 2018-03-28-144236





            share|improve this answer














            Try with pandas:



            import pandas as pd
            df = pd.read_csv('path/name_file.csv')
            df = df.drop_duplicates([0]) #0 this is columns which will compare.
            df.to_csv('New_file.csv') #save to csv


            This method delete all duplicates from columns 1.



            If you need simple delete you can use method drop.



            #You file after use pandas (print(df)):
            0 1 2
            0 CSE_MAINLC-CSEWS61 DEREGISTERED 2018-04-18-192446
            1 CSE_MAINIT-Laptop12 DEREGISTERED 2018-03-28-144236
            2 CSE_MAINLC-CSEWS61 DEREGISTERED 2018-03-28-144236


            For example you need delete 2 row.



            df.drop(2,axis=0, inplace=True) #axis=0 means row, if you switch 1 this is columns. 


            Output:



             0 1 2
            0 CSE_MAINLC-CSEWS61 DEREGISTERED 2018-04-18-192446
            1 CSE_MAINIT-Laptop12 DEREGISTERED 2018-03-28-144236






            share|improve this answer














            share|improve this answer



            share|improve this answer








            edited Nov 11 at 14:47

























            answered Nov 11 at 12:17









            Rudolf Morkovskyi

            714116




            714116











            • Thanks for the response, but i meant that i will delete row only if there is same name. in my example pandas should iterate threw the names and only if there is a duplicated name it will delete the second
              – Rosen
              Nov 11 at 13:30










            • Mmm... so be it)
              – Rudolf Morkovskyi
              Nov 11 at 14:45










            • Oh) I think your need change 1 to 0 in df.drop_duplicates([0]). I fixed it. Try again.
              – Rudolf Morkovskyi
              Nov 11 at 14:47

















            • Thanks for the response, but i meant that i will delete row only if there is same name. in my example pandas should iterate threw the names and only if there is a duplicated name it will delete the second
              – Rosen
              Nov 11 at 13:30










            • Mmm... so be it)
              – Rudolf Morkovskyi
              Nov 11 at 14:45










            • Oh) I think your need change 1 to 0 in df.drop_duplicates([0]). I fixed it. Try again.
              – Rudolf Morkovskyi
              Nov 11 at 14:47
















            Thanks for the response, but i meant that i will delete row only if there is same name. in my example pandas should iterate threw the names and only if there is a duplicated name it will delete the second
            – Rosen
            Nov 11 at 13:30




            Thanks for the response, but i meant that i will delete row only if there is same name. in my example pandas should iterate threw the names and only if there is a duplicated name it will delete the second
            – Rosen
            Nov 11 at 13:30












            Mmm... so be it)
            – Rudolf Morkovskyi
            Nov 11 at 14:45




            Mmm... so be it)
            – Rudolf Morkovskyi
            Nov 11 at 14:45












            Oh) I think your need change 1 to 0 in df.drop_duplicates([0]). I fixed it. Try again.
            – Rudolf Morkovskyi
            Nov 11 at 14:47





            Oh) I think your need change 1 to 0 in df.drop_duplicates([0]). I fixed it. Try again.
            – Rudolf Morkovskyi
            Nov 11 at 14:47


















            draft saved

            draft discarded
















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid


            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.

            To learn more, see our tips on writing great answers.





            Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


            Please pay close attention to the following guidance:


            • Please be sure to answer the question. Provide details and share your research!

            But avoid


            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.

            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53248550%2fpython-remove-duplicates-from-csv-if-value-in-column-duplicated%23new-answer', 'question_page');

            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            這個網誌中的熱門文章

            How to read a connectionString WITH PROVIDER in .NET Core?

            Node.js Script on GitHub Pages or Amazon S3

            Museum of Modern and Contemporary Art of Trento and Rovereto