text delimiter shifting values in dataframe

up vote
-2
down vote

favorite

I have a dataframe like the data_df example below, that I create by reading in data from a csv with the code below. the problem I’m running in to is that some of the values in some of the columns are getting shifted to the right. For example the second record values are shifted one column to the right starting with the name column. I think maybe there’s a “” in the name for that record that’s causing the shift. Does anyone know how to fix this, is there something I can do in read_csv that would address this?

Code:

data_df = pd.read_csv(filepath)

with pd.option_context('display.max_rows', None, 'display.max_columns', None):
 print(data_df[:5])

Output:

 Unnamed: 0 call_history_id calllog_id 
0 16358 1210746736 ca58d850-6fe6-4673-a049-ea4a2d8d7ecf 
1 16361 1210976828 c005329b-955d-4d88-98a5-1c47e6a1cb80 
2 16402 1217791595 050e9b83-54c2-4c87-abdd-32225c0d3189 
3 16471 1228495414 45705ed1-a8e2-4a15-8941-5b0a40b7d409 
4 27906 1245173592 04e56818-04a0-4704-ac86-31c31dac2370 

 call_id connection_id pbx_name pbx_id extension_number 
0 1.509170e+12 1.509170e+12 sales8x8 sales8x8 595 
1 1.509170e+12 1.509170e+12 sales8x8 sales8x8 595 
2 1.509170e+12 1.509170e+12 sales8x8 sales8x8 595 
3 1.509170e+12 1.509170e+12 sales8x8 sales8x8 595 
4 1.509170e+12 1.509170e+12 sales8x8 sales8x8 595 

 extension_id customer_id address name 
0 595 2.525100e+29 14086694428 Sun Basket 
1 595 2.525100e+29 13214371589 PEREZ 
2 595 2.525100e+29 14088566290 14088566290 
3 595 2.525100e+29 8059316676 Dialing 
4 595 2.525100e+29 12028071151 Implementation Team 

 start_timestamp direction call_internal call_missed duration 
0 1/8/18 19:49 I 0 0 4414 
1 BRYAN 1/8/18 20:09 I 0 0 
2 1/9/18 20:31 I 0 0 14766 
3 1/11/18 17:16 I 0 0 1686 
4 1/15/18 22:55 I 0 0 3491 

 device_model group_call group_name group_number device_id 
0 mediaserver 0 N N MasterSlaveService 
1 8300 mediaserver 0 N N 
2 mediaserver 0 N N MasterSlaveService 
3 mediaserver 0 N N MasterSlaveService 
4 mediaserver 0 N N MasterSlaveService 

 history_event_state created_time updated_time group_type 
0 A 1/8/18 19:49 1/8/18 19:49 N 
1 MasterSlaveService A 1/8/18 20:09 1/8/18 20:09 
2 A 1/9/18 20:31 1/9/18 20:31 N 
3 A 1/11/18 17:16 1/11/18 17:16 N 
4 A 1/15/18 22:55 1/15/18 22:55 N

asked Nov 10 at 23:13

user3476463

72021329

The delimiter of your file is ',' yet some of your fields use this special character. That name is almost certainly written as 'PEREZ, BRYAN', which wont parse properly. If you have control over the file, you should choose a different delimiter.
– ALollz
Nov 10 at 23:18

1

@ALollz Thank you for getting back to me so quickly the value looks like ",PEREZ,BRYAN," is there a handy way to deal with the "," when reading the data in with read_csv? Or do I need to find/replace them all before reading them in?
– user3476463
Nov 11 at 2:52

@ALollz Thanks your previous comment that just disappeared worked nicely "sep=r'(?<!),'"
– user3476463
Nov 11 at 4:05

I think my suggestion wasn't nearly as correct as dmitriys solution. That will solve a lot of the issues at once
– ALollz
Nov 11 at 4:13

add a comment |

up vote
-2
down vote

favorite

Code:

data_df = pd.read_csv(filepath)

with pd.option_context('display.max_rows', None, 'display.max_columns', None):
 print(data_df[:5])

Output:

 Unnamed: 0 call_history_id calllog_id 
0 16358 1210746736 ca58d850-6fe6-4673-a049-ea4a2d8d7ecf 
1 16361 1210976828 c005329b-955d-4d88-98a5-1c47e6a1cb80 
2 16402 1217791595 050e9b83-54c2-4c87-abdd-32225c0d3189 
3 16471 1228495414 45705ed1-a8e2-4a15-8941-5b0a40b7d409 
4 27906 1245173592 04e56818-04a0-4704-ac86-31c31dac2370 

 call_id connection_id pbx_name pbx_id extension_number 
0 1.509170e+12 1.509170e+12 sales8x8 sales8x8 595 
1 1.509170e+12 1.509170e+12 sales8x8 sales8x8 595 
2 1.509170e+12 1.509170e+12 sales8x8 sales8x8 595 
3 1.509170e+12 1.509170e+12 sales8x8 sales8x8 595 
4 1.509170e+12 1.509170e+12 sales8x8 sales8x8 595 

 extension_id customer_id address name 
0 595 2.525100e+29 14086694428 Sun Basket 
1 595 2.525100e+29 13214371589 PEREZ 
2 595 2.525100e+29 14088566290 14088566290 
3 595 2.525100e+29 8059316676 Dialing 
4 595 2.525100e+29 12028071151 Implementation Team 

 start_timestamp direction call_internal call_missed duration 
0 1/8/18 19:49 I 0 0 4414 
1 BRYAN 1/8/18 20:09 I 0 0 
2 1/9/18 20:31 I 0 0 14766 
3 1/11/18 17:16 I 0 0 1686 
4 1/15/18 22:55 I 0 0 3491 

 device_model group_call group_name group_number device_id 
0 mediaserver 0 N N MasterSlaveService 
1 8300 mediaserver 0 N N 
2 mediaserver 0 N N MasterSlaveService 
3 mediaserver 0 N N MasterSlaveService 
4 mediaserver 0 N N MasterSlaveService 

 history_event_state created_time updated_time group_type 
0 A 1/8/18 19:49 1/8/18 19:49 N 
1 MasterSlaveService A 1/8/18 20:09 1/8/18 20:09 
2 A 1/9/18 20:31 1/9/18 20:31 N 
3 A 1/11/18 17:16 1/11/18 17:16 N 
4 A 1/15/18 22:55 1/15/18 22:55 N

asked Nov 10 at 23:13

user3476463

72021329

The delimiter of your file is ',' yet some of your fields use this special character. That name is almost certainly written as 'PEREZ, BRYAN', which wont parse properly. If you have control over the file, you should choose a different delimiter.
– ALollz
Nov 10 at 23:18

1

@ALollz Thank you for getting back to me so quickly the value looks like ",PEREZ,BRYAN," is there a handy way to deal with the "," when reading the data in with read_csv? Or do I need to find/replace them all before reading them in?
– user3476463
Nov 11 at 2:52

@ALollz Thanks your previous comment that just disappeared worked nicely "sep=r'(?<!),'"
– user3476463
Nov 11 at 4:05

I think my suggestion wasn't nearly as correct as dmitriys solution. That will solve a lot of the issues at once
– ALollz
Nov 11 at 4:13

add a comment |

up vote
-2
down vote

favorite

Code:

data_df = pd.read_csv(filepath)

with pd.option_context('display.max_rows', None, 'display.max_columns', None):
 print(data_df[:5])

Output:

 Unnamed: 0 call_history_id calllog_id 
0 16358 1210746736 ca58d850-6fe6-4673-a049-ea4a2d8d7ecf 
1 16361 1210976828 c005329b-955d-4d88-98a5-1c47e6a1cb80 
2 16402 1217791595 050e9b83-54c2-4c87-abdd-32225c0d3189 
3 16471 1228495414 45705ed1-a8e2-4a15-8941-5b0a40b7d409 
4 27906 1245173592 04e56818-04a0-4704-ac86-31c31dac2370 

 call_id connection_id pbx_name pbx_id extension_number 
0 1.509170e+12 1.509170e+12 sales8x8 sales8x8 595 
1 1.509170e+12 1.509170e+12 sales8x8 sales8x8 595 
2 1.509170e+12 1.509170e+12 sales8x8 sales8x8 595 
3 1.509170e+12 1.509170e+12 sales8x8 sales8x8 595 
4 1.509170e+12 1.509170e+12 sales8x8 sales8x8 595 

 extension_id customer_id address name 
0 595 2.525100e+29 14086694428 Sun Basket 
1 595 2.525100e+29 13214371589 PEREZ 
2 595 2.525100e+29 14088566290 14088566290 
3 595 2.525100e+29 8059316676 Dialing 
4 595 2.525100e+29 12028071151 Implementation Team 

 start_timestamp direction call_internal call_missed duration 
0 1/8/18 19:49 I 0 0 4414 
1 BRYAN 1/8/18 20:09 I 0 0 
2 1/9/18 20:31 I 0 0 14766 
3 1/11/18 17:16 I 0 0 1686 
4 1/15/18 22:55 I 0 0 3491 

 device_model group_call group_name group_number device_id 
0 mediaserver 0 N N MasterSlaveService 
1 8300 mediaserver 0 N N 
2 mediaserver 0 N N MasterSlaveService 
3 mediaserver 0 N N MasterSlaveService 
4 mediaserver 0 N N MasterSlaveService 

 history_event_state created_time updated_time group_type 
0 A 1/8/18 19:49 1/8/18 19:49 N 
1 MasterSlaveService A 1/8/18 20:09 1/8/18 20:09 
2 A 1/9/18 20:31 1/9/18 20:31 N 
3 A 1/11/18 17:16 1/11/18 17:16 N 
4 A 1/15/18 22:55 1/15/18 22:55 N

asked Nov 10 at 23:13

user3476463

72021329

Code:

data_df = pd.read_csv(filepath)

with pd.option_context('display.max_rows', None, 'display.max_columns', None):
 print(data_df[:5])

Output:

 Unnamed: 0 call_history_id calllog_id 
0 16358 1210746736 ca58d850-6fe6-4673-a049-ea4a2d8d7ecf 
1 16361 1210976828 c005329b-955d-4d88-98a5-1c47e6a1cb80 
2 16402 1217791595 050e9b83-54c2-4c87-abdd-32225c0d3189 
3 16471 1228495414 45705ed1-a8e2-4a15-8941-5b0a40b7d409 
4 27906 1245173592 04e56818-04a0-4704-ac86-31c31dac2370 

 call_id connection_id pbx_name pbx_id extension_number 
0 1.509170e+12 1.509170e+12 sales8x8 sales8x8 595 
1 1.509170e+12 1.509170e+12 sales8x8 sales8x8 595 
2 1.509170e+12 1.509170e+12 sales8x8 sales8x8 595 
3 1.509170e+12 1.509170e+12 sales8x8 sales8x8 595 
4 1.509170e+12 1.509170e+12 sales8x8 sales8x8 595 

 extension_id customer_id address name 
0 595 2.525100e+29 14086694428 Sun Basket 
1 595 2.525100e+29 13214371589 PEREZ 
2 595 2.525100e+29 14088566290 14088566290 
3 595 2.525100e+29 8059316676 Dialing 
4 595 2.525100e+29 12028071151 Implementation Team 

 start_timestamp direction call_internal call_missed duration 
0 1/8/18 19:49 I 0 0 4414 
1 BRYAN 1/8/18 20:09 I 0 0 
2 1/9/18 20:31 I 0 0 14766 
3 1/11/18 17:16 I 0 0 1686 
4 1/15/18 22:55 I 0 0 3491 

 device_model group_call group_name group_number device_id 
0 mediaserver 0 N N MasterSlaveService 
1 8300 mediaserver 0 N N 
2 mediaserver 0 N N MasterSlaveService 
3 mediaserver 0 N N MasterSlaveService 
4 mediaserver 0 N N MasterSlaveService 

 history_event_state created_time updated_time group_type 
0 A 1/8/18 19:49 1/8/18 19:49 N 
1 MasterSlaveService A 1/8/18 20:09 1/8/18 20:09 
2 A 1/9/18 20:31 1/9/18 20:31 N 
3 A 1/11/18 17:16 1/11/18 17:16 N 
4 A 1/15/18 22:55 1/15/18 22:55 N

python-3.x pandas csv

asked Nov 10 at 23:13

user3476463

72021329

asked Nov 10 at 23:13

user3476463

72021329

asked Nov 10 at 23:13

user3476463

72021329

asked Nov 10 at 23:13

user3476463

72021329

asked Nov 10 at 23:13

user3476463

72021329

The delimiter of your file is ',' yet some of your fields use this special character. That name is almost certainly written as 'PEREZ, BRYAN', which wont parse properly. If you have control over the file, you should choose a different delimiter.
– ALollz
Nov 10 at 23:18

1

@ALollz Thank you for getting back to me so quickly the value looks like ",PEREZ,BRYAN," is there a handy way to deal with the "," when reading the data in with read_csv? Or do I need to find/replace them all before reading them in?
– user3476463
Nov 11 at 2:52

@ALollz Thanks your previous comment that just disappeared worked nicely "sep=r'(?<!),'"
– user3476463
Nov 11 at 4:05

I think my suggestion wasn't nearly as correct as dmitriys solution. That will solve a lot of the issues at once
– ALollz
Nov 11 at 4:13

add a comment |

The delimiter of your file is ',' yet some of your fields use this special character. That name is almost certainly written as 'PEREZ, BRYAN', which wont parse properly. If you have control over the file, you should choose a different delimiter.
– ALollz
Nov 10 at 23:18

1

@ALollz Thank you for getting back to me so quickly the value looks like ",PEREZ,BRYAN," is there a handy way to deal with the "," when reading the data in with read_csv? Or do I need to find/replace them all before reading them in?
– user3476463
Nov 11 at 2:52

@ALollz Thanks your previous comment that just disappeared worked nicely "sep=r'(?<!),'"
– user3476463
Nov 11 at 4:05

I think my suggestion wasn't nearly as correct as dmitriys solution. That will solve a lot of the issues at once
– ALollz
Nov 11 at 4:13

The delimiter of your file is ',' yet some of your fields use this special character. That name is almost certainly written as 'PEREZ, BRYAN', which wont parse properly. If you have control over the file, you should choose a different delimiter.
– ALollz
Nov 10 at 23:18

@ALollz Thank you for getting back to me so quickly the value looks like ",PEREZ,BRYAN," is there a handy way to deal with the "," when reading the data in with read_csv? Or do I need to find/replace them all before reading them in?
– user3476463
Nov 11 at 2:52

@ALollz Thanks your previous comment that just disappeared worked nicely "sep=r'(?<!),'"
– user3476463
Nov 11 at 4:05

I think my suggestion wasn't nearly as correct as dmitriys solution. That will solve a lot of the issues at once
– ALollz
Nov 11 at 4:13

add a comment |

1 Answer
1

active

oldest

votes

up vote
2
down vote

accepted

The is an escape character. Since I take it the values in your file are not enclosed in quotes, the is placed before the comma so that you treat PEREZ, BRYAN as one value.

Try passing \ to the escapechar option of pd.read_csv and this should take care of it, e.g. pd.read_csv(filename, escapechar="\").

answered Nov 11 at 3:52

dmitriys

1469

add a comment |

Your Answer

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53244339%2ftext-delimiter-shifting-values-in-dataframe%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

up vote
2
down vote

accepted

The is an escape character. Since I take it the values in your file are not enclosed in quotes, the is placed before the comma so that you treat PEREZ, BRYAN as one value.

Try passing \ to the escapechar option of pd.read_csv and this should take care of it, e.g. pd.read_csv(filename, escapechar="\").

answered Nov 11 at 3:52

dmitriys

1469

add a comment |

up vote
2
down vote

accepted

The is an escape character. Since I take it the values in your file are not enclosed in quotes, the is placed before the comma so that you treat PEREZ, BRYAN as one value.

Try passing \ to the escapechar option of pd.read_csv and this should take care of it, e.g. pd.read_csv(filename, escapechar="\").

answered Nov 11 at 3:52

dmitriys

1469

add a comment |

up vote
2
down vote

accepted

The is an escape character. Since I take it the values in your file are not enclosed in quotes, the is placed before the comma so that you treat PEREZ, BRYAN as one value.

Try passing \ to the escapechar option of pd.read_csv and this should take care of it, e.g. pd.read_csv(filename, escapechar="\").

answered Nov 11 at 3:52

dmitriys

1469

The is an escape character. Since I take it the values in your file are not enclosed in quotes, the is placed before the comma so that you treat PEREZ, BRYAN as one value.

Try passing \ to the escapechar option of pd.read_csv and this should take care of it, e.g. pd.read_csv(filename, escapechar="\").

answered Nov 11 at 3:52

dmitriys

1469

answered Nov 11 at 3:52

dmitriys

1469

answered Nov 11 at 3:52

dmitriys

1469

answered Nov 11 at 3:52

dmitriys

1469

add a comment |

draft saved

draft discarded

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Odtnhj