text delimiter shifting values in dataframe









up vote
-2
down vote

favorite












I have a dataframe like the data_df example below, that I create by reading in data from a csv with the code below. the problem I’m running in to is that some of the values in some of the columns are getting shifted to the right. For example the second record values are shifted one column to the right starting with the name column. I think maybe there’s a “” in the name for that record that’s causing the shift. Does anyone know how to fix this, is there something I can do in read_csv that would address this?



Code:



data_df = pd.read_csv(filepath)

with pd.option_context('display.max_rows', None, 'display.max_columns', None):
print(data_df[:5])


Output:



 Unnamed: 0 call_history_id calllog_id 
0 16358 1210746736 ca58d850-6fe6-4673-a049-ea4a2d8d7ecf
1 16361 1210976828 c005329b-955d-4d88-98a5-1c47e6a1cb80
2 16402 1217791595 050e9b83-54c2-4c87-abdd-32225c0d3189
3 16471 1228495414 45705ed1-a8e2-4a15-8941-5b0a40b7d409
4 27906 1245173592 04e56818-04a0-4704-ac86-31c31dac2370

call_id connection_id pbx_name pbx_id extension_number
0 1.509170e+12 1.509170e+12 sales8x8 sales8x8 595
1 1.509170e+12 1.509170e+12 sales8x8 sales8x8 595
2 1.509170e+12 1.509170e+12 sales8x8 sales8x8 595
3 1.509170e+12 1.509170e+12 sales8x8 sales8x8 595
4 1.509170e+12 1.509170e+12 sales8x8 sales8x8 595

extension_id customer_id address name
0 595 2.525100e+29 14086694428 Sun Basket
1 595 2.525100e+29 13214371589 PEREZ
2 595 2.525100e+29 14088566290 14088566290
3 595 2.525100e+29 8059316676 Dialing
4 595 2.525100e+29 12028071151 Implementation Team

start_timestamp direction call_internal call_missed duration
0 1/8/18 19:49 I 0 0 4414
1 BRYAN 1/8/18 20:09 I 0 0
2 1/9/18 20:31 I 0 0 14766
3 1/11/18 17:16 I 0 0 1686
4 1/15/18 22:55 I 0 0 3491

device_model group_call group_name group_number device_id
0 mediaserver 0 N N MasterSlaveService
1 8300 mediaserver 0 N N
2 mediaserver 0 N N MasterSlaveService
3 mediaserver 0 N N MasterSlaveService
4 mediaserver 0 N N MasterSlaveService

history_event_state created_time updated_time group_type
0 A 1/8/18 19:49 1/8/18 19:49 N
1 MasterSlaveService A 1/8/18 20:09 1/8/18 20:09
2 A 1/9/18 20:31 1/9/18 20:31 N
3 A 1/11/18 17:16 1/11/18 17:16 N
4 A 1/15/18 22:55 1/15/18 22:55 N









share|improve this question





















  • The delimiter of your file is ',' yet some of your fields use this special character. That name is almost certainly written as 'PEREZ, BRYAN', which wont parse properly. If you have control over the file, you should choose a different delimiter.
    – ALollz
    Nov 10 at 23:18







  • 1




    @ALollz Thank you for getting back to me so quickly the value looks like ",PEREZ,BRYAN," is there a handy way to deal with the "," when reading the data in with read_csv? Or do I need to find/replace them all before reading them in?
    – user3476463
    Nov 11 at 2:52










  • @ALollz Thanks your previous comment that just disappeared worked nicely "sep=r'(?<!),'"
    – user3476463
    Nov 11 at 4:05










  • I think my suggestion wasn't nearly as correct as dmitriys solution. That will solve a lot of the issues at once
    – ALollz
    Nov 11 at 4:13














up vote
-2
down vote

favorite












I have a dataframe like the data_df example below, that I create by reading in data from a csv with the code below. the problem I’m running in to is that some of the values in some of the columns are getting shifted to the right. For example the second record values are shifted one column to the right starting with the name column. I think maybe there’s a “” in the name for that record that’s causing the shift. Does anyone know how to fix this, is there something I can do in read_csv that would address this?



Code:



data_df = pd.read_csv(filepath)

with pd.option_context('display.max_rows', None, 'display.max_columns', None):
print(data_df[:5])


Output:



 Unnamed: 0 call_history_id calllog_id 
0 16358 1210746736 ca58d850-6fe6-4673-a049-ea4a2d8d7ecf
1 16361 1210976828 c005329b-955d-4d88-98a5-1c47e6a1cb80
2 16402 1217791595 050e9b83-54c2-4c87-abdd-32225c0d3189
3 16471 1228495414 45705ed1-a8e2-4a15-8941-5b0a40b7d409
4 27906 1245173592 04e56818-04a0-4704-ac86-31c31dac2370

call_id connection_id pbx_name pbx_id extension_number
0 1.509170e+12 1.509170e+12 sales8x8 sales8x8 595
1 1.509170e+12 1.509170e+12 sales8x8 sales8x8 595
2 1.509170e+12 1.509170e+12 sales8x8 sales8x8 595
3 1.509170e+12 1.509170e+12 sales8x8 sales8x8 595
4 1.509170e+12 1.509170e+12 sales8x8 sales8x8 595

extension_id customer_id address name
0 595 2.525100e+29 14086694428 Sun Basket
1 595 2.525100e+29 13214371589 PEREZ
2 595 2.525100e+29 14088566290 14088566290
3 595 2.525100e+29 8059316676 Dialing
4 595 2.525100e+29 12028071151 Implementation Team

start_timestamp direction call_internal call_missed duration
0 1/8/18 19:49 I 0 0 4414
1 BRYAN 1/8/18 20:09 I 0 0
2 1/9/18 20:31 I 0 0 14766
3 1/11/18 17:16 I 0 0 1686
4 1/15/18 22:55 I 0 0 3491

device_model group_call group_name group_number device_id
0 mediaserver 0 N N MasterSlaveService
1 8300 mediaserver 0 N N
2 mediaserver 0 N N MasterSlaveService
3 mediaserver 0 N N MasterSlaveService
4 mediaserver 0 N N MasterSlaveService

history_event_state created_time updated_time group_type
0 A 1/8/18 19:49 1/8/18 19:49 N
1 MasterSlaveService A 1/8/18 20:09 1/8/18 20:09
2 A 1/9/18 20:31 1/9/18 20:31 N
3 A 1/11/18 17:16 1/11/18 17:16 N
4 A 1/15/18 22:55 1/15/18 22:55 N









share|improve this question





















  • The delimiter of your file is ',' yet some of your fields use this special character. That name is almost certainly written as 'PEREZ, BRYAN', which wont parse properly. If you have control over the file, you should choose a different delimiter.
    – ALollz
    Nov 10 at 23:18







  • 1




    @ALollz Thank you for getting back to me so quickly the value looks like ",PEREZ,BRYAN," is there a handy way to deal with the "," when reading the data in with read_csv? Or do I need to find/replace them all before reading them in?
    – user3476463
    Nov 11 at 2:52










  • @ALollz Thanks your previous comment that just disappeared worked nicely "sep=r'(?<!),'"
    – user3476463
    Nov 11 at 4:05










  • I think my suggestion wasn't nearly as correct as dmitriys solution. That will solve a lot of the issues at once
    – ALollz
    Nov 11 at 4:13












up vote
-2
down vote

favorite









up vote
-2
down vote

favorite











I have a dataframe like the data_df example below, that I create by reading in data from a csv with the code below. the problem I’m running in to is that some of the values in some of the columns are getting shifted to the right. For example the second record values are shifted one column to the right starting with the name column. I think maybe there’s a “” in the name for that record that’s causing the shift. Does anyone know how to fix this, is there something I can do in read_csv that would address this?



Code:



data_df = pd.read_csv(filepath)

with pd.option_context('display.max_rows', None, 'display.max_columns', None):
print(data_df[:5])


Output:



 Unnamed: 0 call_history_id calllog_id 
0 16358 1210746736 ca58d850-6fe6-4673-a049-ea4a2d8d7ecf
1 16361 1210976828 c005329b-955d-4d88-98a5-1c47e6a1cb80
2 16402 1217791595 050e9b83-54c2-4c87-abdd-32225c0d3189
3 16471 1228495414 45705ed1-a8e2-4a15-8941-5b0a40b7d409
4 27906 1245173592 04e56818-04a0-4704-ac86-31c31dac2370

call_id connection_id pbx_name pbx_id extension_number
0 1.509170e+12 1.509170e+12 sales8x8 sales8x8 595
1 1.509170e+12 1.509170e+12 sales8x8 sales8x8 595
2 1.509170e+12 1.509170e+12 sales8x8 sales8x8 595
3 1.509170e+12 1.509170e+12 sales8x8 sales8x8 595
4 1.509170e+12 1.509170e+12 sales8x8 sales8x8 595

extension_id customer_id address name
0 595 2.525100e+29 14086694428 Sun Basket
1 595 2.525100e+29 13214371589 PEREZ
2 595 2.525100e+29 14088566290 14088566290
3 595 2.525100e+29 8059316676 Dialing
4 595 2.525100e+29 12028071151 Implementation Team

start_timestamp direction call_internal call_missed duration
0 1/8/18 19:49 I 0 0 4414
1 BRYAN 1/8/18 20:09 I 0 0
2 1/9/18 20:31 I 0 0 14766
3 1/11/18 17:16 I 0 0 1686
4 1/15/18 22:55 I 0 0 3491

device_model group_call group_name group_number device_id
0 mediaserver 0 N N MasterSlaveService
1 8300 mediaserver 0 N N
2 mediaserver 0 N N MasterSlaveService
3 mediaserver 0 N N MasterSlaveService
4 mediaserver 0 N N MasterSlaveService

history_event_state created_time updated_time group_type
0 A 1/8/18 19:49 1/8/18 19:49 N
1 MasterSlaveService A 1/8/18 20:09 1/8/18 20:09
2 A 1/9/18 20:31 1/9/18 20:31 N
3 A 1/11/18 17:16 1/11/18 17:16 N
4 A 1/15/18 22:55 1/15/18 22:55 N









share|improve this question













I have a dataframe like the data_df example below, that I create by reading in data from a csv with the code below. the problem I’m running in to is that some of the values in some of the columns are getting shifted to the right. For example the second record values are shifted one column to the right starting with the name column. I think maybe there’s a “” in the name for that record that’s causing the shift. Does anyone know how to fix this, is there something I can do in read_csv that would address this?



Code:



data_df = pd.read_csv(filepath)

with pd.option_context('display.max_rows', None, 'display.max_columns', None):
print(data_df[:5])


Output:



 Unnamed: 0 call_history_id calllog_id 
0 16358 1210746736 ca58d850-6fe6-4673-a049-ea4a2d8d7ecf
1 16361 1210976828 c005329b-955d-4d88-98a5-1c47e6a1cb80
2 16402 1217791595 050e9b83-54c2-4c87-abdd-32225c0d3189
3 16471 1228495414 45705ed1-a8e2-4a15-8941-5b0a40b7d409
4 27906 1245173592 04e56818-04a0-4704-ac86-31c31dac2370

call_id connection_id pbx_name pbx_id extension_number
0 1.509170e+12 1.509170e+12 sales8x8 sales8x8 595
1 1.509170e+12 1.509170e+12 sales8x8 sales8x8 595
2 1.509170e+12 1.509170e+12 sales8x8 sales8x8 595
3 1.509170e+12 1.509170e+12 sales8x8 sales8x8 595
4 1.509170e+12 1.509170e+12 sales8x8 sales8x8 595

extension_id customer_id address name
0 595 2.525100e+29 14086694428 Sun Basket
1 595 2.525100e+29 13214371589 PEREZ
2 595 2.525100e+29 14088566290 14088566290
3 595 2.525100e+29 8059316676 Dialing
4 595 2.525100e+29 12028071151 Implementation Team

start_timestamp direction call_internal call_missed duration
0 1/8/18 19:49 I 0 0 4414
1 BRYAN 1/8/18 20:09 I 0 0
2 1/9/18 20:31 I 0 0 14766
3 1/11/18 17:16 I 0 0 1686
4 1/15/18 22:55 I 0 0 3491

device_model group_call group_name group_number device_id
0 mediaserver 0 N N MasterSlaveService
1 8300 mediaserver 0 N N
2 mediaserver 0 N N MasterSlaveService
3 mediaserver 0 N N MasterSlaveService
4 mediaserver 0 N N MasterSlaveService

history_event_state created_time updated_time group_type
0 A 1/8/18 19:49 1/8/18 19:49 N
1 MasterSlaveService A 1/8/18 20:09 1/8/18 20:09
2 A 1/9/18 20:31 1/9/18 20:31 N
3 A 1/11/18 17:16 1/11/18 17:16 N
4 A 1/15/18 22:55 1/15/18 22:55 N






python-3.x pandas csv






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Nov 10 at 23:13









user3476463

72021329




72021329











  • The delimiter of your file is ',' yet some of your fields use this special character. That name is almost certainly written as 'PEREZ, BRYAN', which wont parse properly. If you have control over the file, you should choose a different delimiter.
    – ALollz
    Nov 10 at 23:18







  • 1




    @ALollz Thank you for getting back to me so quickly the value looks like ",PEREZ,BRYAN," is there a handy way to deal with the "," when reading the data in with read_csv? Or do I need to find/replace them all before reading them in?
    – user3476463
    Nov 11 at 2:52










  • @ALollz Thanks your previous comment that just disappeared worked nicely "sep=r'(?<!),'"
    – user3476463
    Nov 11 at 4:05










  • I think my suggestion wasn't nearly as correct as dmitriys solution. That will solve a lot of the issues at once
    – ALollz
    Nov 11 at 4:13
















  • The delimiter of your file is ',' yet some of your fields use this special character. That name is almost certainly written as 'PEREZ, BRYAN', which wont parse properly. If you have control over the file, you should choose a different delimiter.
    – ALollz
    Nov 10 at 23:18







  • 1




    @ALollz Thank you for getting back to me so quickly the value looks like ",PEREZ,BRYAN," is there a handy way to deal with the "," when reading the data in with read_csv? Or do I need to find/replace them all before reading them in?
    – user3476463
    Nov 11 at 2:52










  • @ALollz Thanks your previous comment that just disappeared worked nicely "sep=r'(?<!),'"
    – user3476463
    Nov 11 at 4:05










  • I think my suggestion wasn't nearly as correct as dmitriys solution. That will solve a lot of the issues at once
    – ALollz
    Nov 11 at 4:13















The delimiter of your file is ',' yet some of your fields use this special character. That name is almost certainly written as 'PEREZ, BRYAN', which wont parse properly. If you have control over the file, you should choose a different delimiter.
– ALollz
Nov 10 at 23:18





The delimiter of your file is ',' yet some of your fields use this special character. That name is almost certainly written as 'PEREZ, BRYAN', which wont parse properly. If you have control over the file, you should choose a different delimiter.
– ALollz
Nov 10 at 23:18





1




1




@ALollz Thank you for getting back to me so quickly the value looks like ",PEREZ,BRYAN," is there a handy way to deal with the "," when reading the data in with read_csv? Or do I need to find/replace them all before reading them in?
– user3476463
Nov 11 at 2:52




@ALollz Thank you for getting back to me so quickly the value looks like ",PEREZ,BRYAN," is there a handy way to deal with the "," when reading the data in with read_csv? Or do I need to find/replace them all before reading them in?
– user3476463
Nov 11 at 2:52












@ALollz Thanks your previous comment that just disappeared worked nicely "sep=r'(?<!),'"
– user3476463
Nov 11 at 4:05




@ALollz Thanks your previous comment that just disappeared worked nicely "sep=r'(?<!),'"
– user3476463
Nov 11 at 4:05












I think my suggestion wasn't nearly as correct as dmitriys solution. That will solve a lot of the issues at once
– ALollz
Nov 11 at 4:13




I think my suggestion wasn't nearly as correct as dmitriys solution. That will solve a lot of the issues at once
– ALollz
Nov 11 at 4:13












1 Answer
1






active

oldest

votes

















up vote
2
down vote



accepted










The is an escape character. Since I take it the values in your file are not enclosed in quotes, the is placed before the comma so that you treat PEREZ, BRYAN as one value.



Try passing \ to the escapechar option of pd.read_csv and this should take care of it, e.g. pd.read_csv(filename, escapechar="\").






share|improve this answer




















    Your Answer






    StackExchange.ifUsing("editor", function ()
    StackExchange.using("externalEditor", function ()
    StackExchange.using("snippets", function ()
    StackExchange.snippets.init();
    );
    );
    , "code-snippets");

    StackExchange.ready(function()
    var channelOptions =
    tags: "".split(" "),
    id: "1"
    ;
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function()
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled)
    StackExchange.using("snippets", function()
    createEditor();
    );

    else
    createEditor();

    );

    function createEditor()
    StackExchange.prepareEditor(
    heartbeatType: 'answer',
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader:
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    ,
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    );



    );













     

    draft saved


    draft discarded


















    StackExchange.ready(
    function ()
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53244339%2ftext-delimiter-shifting-values-in-dataframe%23new-answer', 'question_page');

    );

    Post as a guest















    Required, but never shown

























    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes








    up vote
    2
    down vote



    accepted










    The is an escape character. Since I take it the values in your file are not enclosed in quotes, the is placed before the comma so that you treat PEREZ, BRYAN as one value.



    Try passing \ to the escapechar option of pd.read_csv and this should take care of it, e.g. pd.read_csv(filename, escapechar="\").






    share|improve this answer
























      up vote
      2
      down vote



      accepted










      The is an escape character. Since I take it the values in your file are not enclosed in quotes, the is placed before the comma so that you treat PEREZ, BRYAN as one value.



      Try passing \ to the escapechar option of pd.read_csv and this should take care of it, e.g. pd.read_csv(filename, escapechar="\").






      share|improve this answer






















        up vote
        2
        down vote



        accepted







        up vote
        2
        down vote



        accepted






        The is an escape character. Since I take it the values in your file are not enclosed in quotes, the is placed before the comma so that you treat PEREZ, BRYAN as one value.



        Try passing \ to the escapechar option of pd.read_csv and this should take care of it, e.g. pd.read_csv(filename, escapechar="\").






        share|improve this answer












        The is an escape character. Since I take it the values in your file are not enclosed in quotes, the is placed before the comma so that you treat PEREZ, BRYAN as one value.



        Try passing \ to the escapechar option of pd.read_csv and this should take care of it, e.g. pd.read_csv(filename, escapechar="\").







        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Nov 11 at 3:52









        dmitriys

        1469




        1469



























             

            draft saved


            draft discarded















































             


            draft saved


            draft discarded














            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53244339%2ftext-delimiter-shifting-values-in-dataframe%23new-answer', 'question_page');

            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            這個網誌中的熱門文章

            Barbados

            How to read a connectionString WITH PROVIDER in .NET Core?

            Node.js Script on GitHub Pages or Amazon S3