Best way to fix inconsistent csv file in python










0















I have a csv file which is not consistent. It looks like this where some have a middle name and some do not. I don't know the best way to fix this. The middle name will always be in the second position if it exists. But if a middle name doesn't exist the last name is in the second position.



john,doe,52,florida

jane,mary,doe,55,texas

fred,johnson,23,maine

wally,mark,david,44,florida









share|improve this question
























  • What is exactly what you want to do? Is your python program creating this csv file? Or are you meant to fix the gap problem with your python program?

    – gkapellmann
    Nov 13 '18 at 13:02












  • It's creating it. I struggle with programming. I am a a network engineer (and a good one actually) but do Python for its use and to keep my ego in check. it doesn't come natural to me. So I output into a text file; the output is just not consistent due to the source so I decided to make it easier for me and try to fix the flawed csv file rather than the initial parse then I could see how to back into fixing the parse. But I am open to any guidance. My problem I think is I don't know what I don't know here. I am trying list comprehensions. currently (still no joy)

    – Seth
    Nov 14 '18 at 14:13
















0















I have a csv file which is not consistent. It looks like this where some have a middle name and some do not. I don't know the best way to fix this. The middle name will always be in the second position if it exists. But if a middle name doesn't exist the last name is in the second position.



john,doe,52,florida

jane,mary,doe,55,texas

fred,johnson,23,maine

wally,mark,david,44,florida









share|improve this question
























  • What is exactly what you want to do? Is your python program creating this csv file? Or are you meant to fix the gap problem with your python program?

    – gkapellmann
    Nov 13 '18 at 13:02












  • It's creating it. I struggle with programming. I am a a network engineer (and a good one actually) but do Python for its use and to keep my ego in check. it doesn't come natural to me. So I output into a text file; the output is just not consistent due to the source so I decided to make it easier for me and try to fix the flawed csv file rather than the initial parse then I could see how to back into fixing the parse. But I am open to any guidance. My problem I think is I don't know what I don't know here. I am trying list comprehensions. currently (still no joy)

    – Seth
    Nov 14 '18 at 14:13














0












0








0








I have a csv file which is not consistent. It looks like this where some have a middle name and some do not. I don't know the best way to fix this. The middle name will always be in the second position if it exists. But if a middle name doesn't exist the last name is in the second position.



john,doe,52,florida

jane,mary,doe,55,texas

fred,johnson,23,maine

wally,mark,david,44,florida









share|improve this question
















I have a csv file which is not consistent. It looks like this where some have a middle name and some do not. I don't know the best way to fix this. The middle name will always be in the second position if it exists. But if a middle name doesn't exist the last name is in the second position.



john,doe,52,florida

jane,mary,doe,55,texas

fred,johnson,23,maine

wally,mark,david,44,florida






python-3.x csv






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 13 '18 at 13:02









Sociopath

3,66991635




3,66991635










asked Nov 13 '18 at 12:59









SethSeth

441112




441112












  • What is exactly what you want to do? Is your python program creating this csv file? Or are you meant to fix the gap problem with your python program?

    – gkapellmann
    Nov 13 '18 at 13:02












  • It's creating it. I struggle with programming. I am a a network engineer (and a good one actually) but do Python for its use and to keep my ego in check. it doesn't come natural to me. So I output into a text file; the output is just not consistent due to the source so I decided to make it easier for me and try to fix the flawed csv file rather than the initial parse then I could see how to back into fixing the parse. But I am open to any guidance. My problem I think is I don't know what I don't know here. I am trying list comprehensions. currently (still no joy)

    – Seth
    Nov 14 '18 at 14:13


















  • What is exactly what you want to do? Is your python program creating this csv file? Or are you meant to fix the gap problem with your python program?

    – gkapellmann
    Nov 13 '18 at 13:02












  • It's creating it. I struggle with programming. I am a a network engineer (and a good one actually) but do Python for its use and to keep my ego in check. it doesn't come natural to me. So I output into a text file; the output is just not consistent due to the source so I decided to make it easier for me and try to fix the flawed csv file rather than the initial parse then I could see how to back into fixing the parse. But I am open to any guidance. My problem I think is I don't know what I don't know here. I am trying list comprehensions. currently (still no joy)

    – Seth
    Nov 14 '18 at 14:13

















What is exactly what you want to do? Is your python program creating this csv file? Or are you meant to fix the gap problem with your python program?

– gkapellmann
Nov 13 '18 at 13:02






What is exactly what you want to do? Is your python program creating this csv file? Or are you meant to fix the gap problem with your python program?

– gkapellmann
Nov 13 '18 at 13:02














It's creating it. I struggle with programming. I am a a network engineer (and a good one actually) but do Python for its use and to keep my ego in check. it doesn't come natural to me. So I output into a text file; the output is just not consistent due to the source so I decided to make it easier for me and try to fix the flawed csv file rather than the initial parse then I could see how to back into fixing the parse. But I am open to any guidance. My problem I think is I don't know what I don't know here. I am trying list comprehensions. currently (still no joy)

– Seth
Nov 14 '18 at 14:13






It's creating it. I struggle with programming. I am a a network engineer (and a good one actually) but do Python for its use and to keep my ego in check. it doesn't come natural to me. So I output into a text file; the output is just not consistent due to the source so I decided to make it easier for me and try to fix the flawed csv file rather than the initial parse then I could see how to back into fixing the parse. But I am open to any guidance. My problem I think is I don't know what I don't know here. I am trying list comprehensions. currently (still no joy)

– Seth
Nov 14 '18 at 14:13













1 Answer
1






active

oldest

votes


















1














Let's say that you have ① wrong.csv and want to produce ② fixed.csv.



You want to read a line from ①, fix it and write the fixed line to ②, this can be done like this



with open('wrong.csv') as input, open('fixed.csv', 'w') as output:
for line in input:
line = fix(line)
output.write(line)


Now we want to define the fix function...



Each line has either 3 or 4 fields, separated by commas, so what we want to do is splitting the line using the comma as a delimiter, return the unmodified line if the number of fields is 3, otherwise join the field 0 and the field 1 (Python counts from zero...), reassemble the output line and return it to the caller.



def fix(line):

items = line.split(',') # items is a list of strings

if len(items) == 3: # the line is OK as it stands
return line

# join first and middle name
first_middle = join(' ')((items[0], items[1]))

# we want to return a "fixed" line,
# i.e., a string not a list of strings
# we have to join the new name with the remaining info
return ','.join([first_second]+items[2:])





share|improve this answer























  • Of course the with … code block can be reduced to output.write(''.join(fix(line) for line in input))

    – gboffi
    Nov 13 '18 at 22:37










Your Answer






StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);













draft saved

draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53281565%2fbest-way-to-fix-inconsistent-csv-file-in-python%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes









1














Let's say that you have ① wrong.csv and want to produce ② fixed.csv.



You want to read a line from ①, fix it and write the fixed line to ②, this can be done like this



with open('wrong.csv') as input, open('fixed.csv', 'w') as output:
for line in input:
line = fix(line)
output.write(line)


Now we want to define the fix function...



Each line has either 3 or 4 fields, separated by commas, so what we want to do is splitting the line using the comma as a delimiter, return the unmodified line if the number of fields is 3, otherwise join the field 0 and the field 1 (Python counts from zero...), reassemble the output line and return it to the caller.



def fix(line):

items = line.split(',') # items is a list of strings

if len(items) == 3: # the line is OK as it stands
return line

# join first and middle name
first_middle = join(' ')((items[0], items[1]))

# we want to return a "fixed" line,
# i.e., a string not a list of strings
# we have to join the new name with the remaining info
return ','.join([first_second]+items[2:])





share|improve this answer























  • Of course the with … code block can be reduced to output.write(''.join(fix(line) for line in input))

    – gboffi
    Nov 13 '18 at 22:37















1














Let's say that you have ① wrong.csv and want to produce ② fixed.csv.



You want to read a line from ①, fix it and write the fixed line to ②, this can be done like this



with open('wrong.csv') as input, open('fixed.csv', 'w') as output:
for line in input:
line = fix(line)
output.write(line)


Now we want to define the fix function...



Each line has either 3 or 4 fields, separated by commas, so what we want to do is splitting the line using the comma as a delimiter, return the unmodified line if the number of fields is 3, otherwise join the field 0 and the field 1 (Python counts from zero...), reassemble the output line and return it to the caller.



def fix(line):

items = line.split(',') # items is a list of strings

if len(items) == 3: # the line is OK as it stands
return line

# join first and middle name
first_middle = join(' ')((items[0], items[1]))

# we want to return a "fixed" line,
# i.e., a string not a list of strings
# we have to join the new name with the remaining info
return ','.join([first_second]+items[2:])





share|improve this answer























  • Of course the with … code block can be reduced to output.write(''.join(fix(line) for line in input))

    – gboffi
    Nov 13 '18 at 22:37













1












1








1







Let's say that you have ① wrong.csv and want to produce ② fixed.csv.



You want to read a line from ①, fix it and write the fixed line to ②, this can be done like this



with open('wrong.csv') as input, open('fixed.csv', 'w') as output:
for line in input:
line = fix(line)
output.write(line)


Now we want to define the fix function...



Each line has either 3 or 4 fields, separated by commas, so what we want to do is splitting the line using the comma as a delimiter, return the unmodified line if the number of fields is 3, otherwise join the field 0 and the field 1 (Python counts from zero...), reassemble the output line and return it to the caller.



def fix(line):

items = line.split(',') # items is a list of strings

if len(items) == 3: # the line is OK as it stands
return line

# join first and middle name
first_middle = join(' ')((items[0], items[1]))

# we want to return a "fixed" line,
# i.e., a string not a list of strings
# we have to join the new name with the remaining info
return ','.join([first_second]+items[2:])





share|improve this answer













Let's say that you have ① wrong.csv and want to produce ② fixed.csv.



You want to read a line from ①, fix it and write the fixed line to ②, this can be done like this



with open('wrong.csv') as input, open('fixed.csv', 'w') as output:
for line in input:
line = fix(line)
output.write(line)


Now we want to define the fix function...



Each line has either 3 or 4 fields, separated by commas, so what we want to do is splitting the line using the comma as a delimiter, return the unmodified line if the number of fields is 3, otherwise join the field 0 and the field 1 (Python counts from zero...), reassemble the output line and return it to the caller.



def fix(line):

items = line.split(',') # items is a list of strings

if len(items) == 3: # the line is OK as it stands
return line

# join first and middle name
first_middle = join(' ')((items[0], items[1]))

# we want to return a "fixed" line,
# i.e., a string not a list of strings
# we have to join the new name with the remaining info
return ','.join([first_second]+items[2:])






share|improve this answer












share|improve this answer



share|improve this answer










answered Nov 13 '18 at 14:10









gboffigboffi

8,94622455




8,94622455












  • Of course the with … code block can be reduced to output.write(''.join(fix(line) for line in input))

    – gboffi
    Nov 13 '18 at 22:37

















  • Of course the with … code block can be reduced to output.write(''.join(fix(line) for line in input))

    – gboffi
    Nov 13 '18 at 22:37
















Of course the with … code block can be reduced to output.write(''.join(fix(line) for line in input))

– gboffi
Nov 13 '18 at 22:37





Of course the with … code block can be reduced to output.write(''.join(fix(line) for line in input))

– gboffi
Nov 13 '18 at 22:37

















draft saved

draft discarded
















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid


  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53281565%2fbest-way-to-fix-inconsistent-csv-file-in-python%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







這個網誌中的熱門文章

How to read a connectionString WITH PROVIDER in .NET Core?

In R, how to develop a multiplot heatmap.2 figure showing key labels successfully

Museum of Modern and Contemporary Art of Trento and Rovereto