text delimiter shifting values in dataframe
up vote
-2
down vote
favorite
I have a dataframe like the data_df example below, that I create by reading in data from a csv with the code below. the problem I’m running in to is that some of the values in some of the columns are getting shifted to the right. For example the second record values are shifted one column to the right starting with the name column. I think maybe there’s a “” in the name for that record that’s causing the shift. Does anyone know how to fix this, is there something I can do in read_csv that would address this?
Code:
data_df = pd.read_csv(filepath)
with pd.option_context('display.max_rows', None, 'display.max_columns', None):
print(data_df[:5])
Output:
Unnamed: 0 call_history_id calllog_id
0 16358 1210746736 ca58d850-6fe6-4673-a049-ea4a2d8d7ecf
1 16361 1210976828 c005329b-955d-4d88-98a5-1c47e6a1cb80
2 16402 1217791595 050e9b83-54c2-4c87-abdd-32225c0d3189
3 16471 1228495414 45705ed1-a8e2-4a15-8941-5b0a40b7d409
4 27906 1245173592 04e56818-04a0-4704-ac86-31c31dac2370
call_id connection_id pbx_name pbx_id extension_number
0 1.509170e+12 1.509170e+12 sales8x8 sales8x8 595
1 1.509170e+12 1.509170e+12 sales8x8 sales8x8 595
2 1.509170e+12 1.509170e+12 sales8x8 sales8x8 595
3 1.509170e+12 1.509170e+12 sales8x8 sales8x8 595
4 1.509170e+12 1.509170e+12 sales8x8 sales8x8 595
extension_id customer_id address name
0 595 2.525100e+29 14086694428 Sun Basket
1 595 2.525100e+29 13214371589 PEREZ
2 595 2.525100e+29 14088566290 14088566290
3 595 2.525100e+29 8059316676 Dialing
4 595 2.525100e+29 12028071151 Implementation Team
start_timestamp direction call_internal call_missed duration
0 1/8/18 19:49 I 0 0 4414
1 BRYAN 1/8/18 20:09 I 0 0
2 1/9/18 20:31 I 0 0 14766
3 1/11/18 17:16 I 0 0 1686
4 1/15/18 22:55 I 0 0 3491
device_model group_call group_name group_number device_id
0 mediaserver 0 N N MasterSlaveService
1 8300 mediaserver 0 N N
2 mediaserver 0 N N MasterSlaveService
3 mediaserver 0 N N MasterSlaveService
4 mediaserver 0 N N MasterSlaveService
history_event_state created_time updated_time group_type
0 A 1/8/18 19:49 1/8/18 19:49 N
1 MasterSlaveService A 1/8/18 20:09 1/8/18 20:09
2 A 1/9/18 20:31 1/9/18 20:31 N
3 A 1/11/18 17:16 1/11/18 17:16 N
4 A 1/15/18 22:55 1/15/18 22:55 N
python-3.x pandas csv
add a comment |
up vote
-2
down vote
favorite
I have a dataframe like the data_df example below, that I create by reading in data from a csv with the code below. the problem I’m running in to is that some of the values in some of the columns are getting shifted to the right. For example the second record values are shifted one column to the right starting with the name column. I think maybe there’s a “” in the name for that record that’s causing the shift. Does anyone know how to fix this, is there something I can do in read_csv that would address this?
Code:
data_df = pd.read_csv(filepath)
with pd.option_context('display.max_rows', None, 'display.max_columns', None):
print(data_df[:5])
Output:
Unnamed: 0 call_history_id calllog_id
0 16358 1210746736 ca58d850-6fe6-4673-a049-ea4a2d8d7ecf
1 16361 1210976828 c005329b-955d-4d88-98a5-1c47e6a1cb80
2 16402 1217791595 050e9b83-54c2-4c87-abdd-32225c0d3189
3 16471 1228495414 45705ed1-a8e2-4a15-8941-5b0a40b7d409
4 27906 1245173592 04e56818-04a0-4704-ac86-31c31dac2370
call_id connection_id pbx_name pbx_id extension_number
0 1.509170e+12 1.509170e+12 sales8x8 sales8x8 595
1 1.509170e+12 1.509170e+12 sales8x8 sales8x8 595
2 1.509170e+12 1.509170e+12 sales8x8 sales8x8 595
3 1.509170e+12 1.509170e+12 sales8x8 sales8x8 595
4 1.509170e+12 1.509170e+12 sales8x8 sales8x8 595
extension_id customer_id address name
0 595 2.525100e+29 14086694428 Sun Basket
1 595 2.525100e+29 13214371589 PEREZ
2 595 2.525100e+29 14088566290 14088566290
3 595 2.525100e+29 8059316676 Dialing
4 595 2.525100e+29 12028071151 Implementation Team
start_timestamp direction call_internal call_missed duration
0 1/8/18 19:49 I 0 0 4414
1 BRYAN 1/8/18 20:09 I 0 0
2 1/9/18 20:31 I 0 0 14766
3 1/11/18 17:16 I 0 0 1686
4 1/15/18 22:55 I 0 0 3491
device_model group_call group_name group_number device_id
0 mediaserver 0 N N MasterSlaveService
1 8300 mediaserver 0 N N
2 mediaserver 0 N N MasterSlaveService
3 mediaserver 0 N N MasterSlaveService
4 mediaserver 0 N N MasterSlaveService
history_event_state created_time updated_time group_type
0 A 1/8/18 19:49 1/8/18 19:49 N
1 MasterSlaveService A 1/8/18 20:09 1/8/18 20:09
2 A 1/9/18 20:31 1/9/18 20:31 N
3 A 1/11/18 17:16 1/11/18 17:16 N
4 A 1/15/18 22:55 1/15/18 22:55 N
python-3.x pandas csv
The delimiter of your file is','
yet some of your fields use this special character. That name is almost certainly written as'PEREZ, BRYAN'
, which wont parse properly. If you have control over the file, you should choose a different delimiter.
– ALollz
Nov 10 at 23:18
1
@ALollz Thank you for getting back to me so quickly the value looks like ",PEREZ,BRYAN," is there a handy way to deal with the "," when reading the data in with read_csv? Or do I need to find/replace them all before reading them in?
– user3476463
Nov 11 at 2:52
@ALollz Thanks your previous comment that just disappeared worked nicely "sep=r'(?<!),'"
– user3476463
Nov 11 at 4:05
I think my suggestion wasn't nearly as correct as dmitriys solution. That will solve a lot of the issues at once
– ALollz
Nov 11 at 4:13
add a comment |
up vote
-2
down vote
favorite
up vote
-2
down vote
favorite
I have a dataframe like the data_df example below, that I create by reading in data from a csv with the code below. the problem I’m running in to is that some of the values in some of the columns are getting shifted to the right. For example the second record values are shifted one column to the right starting with the name column. I think maybe there’s a “” in the name for that record that’s causing the shift. Does anyone know how to fix this, is there something I can do in read_csv that would address this?
Code:
data_df = pd.read_csv(filepath)
with pd.option_context('display.max_rows', None, 'display.max_columns', None):
print(data_df[:5])
Output:
Unnamed: 0 call_history_id calllog_id
0 16358 1210746736 ca58d850-6fe6-4673-a049-ea4a2d8d7ecf
1 16361 1210976828 c005329b-955d-4d88-98a5-1c47e6a1cb80
2 16402 1217791595 050e9b83-54c2-4c87-abdd-32225c0d3189
3 16471 1228495414 45705ed1-a8e2-4a15-8941-5b0a40b7d409
4 27906 1245173592 04e56818-04a0-4704-ac86-31c31dac2370
call_id connection_id pbx_name pbx_id extension_number
0 1.509170e+12 1.509170e+12 sales8x8 sales8x8 595
1 1.509170e+12 1.509170e+12 sales8x8 sales8x8 595
2 1.509170e+12 1.509170e+12 sales8x8 sales8x8 595
3 1.509170e+12 1.509170e+12 sales8x8 sales8x8 595
4 1.509170e+12 1.509170e+12 sales8x8 sales8x8 595
extension_id customer_id address name
0 595 2.525100e+29 14086694428 Sun Basket
1 595 2.525100e+29 13214371589 PEREZ
2 595 2.525100e+29 14088566290 14088566290
3 595 2.525100e+29 8059316676 Dialing
4 595 2.525100e+29 12028071151 Implementation Team
start_timestamp direction call_internal call_missed duration
0 1/8/18 19:49 I 0 0 4414
1 BRYAN 1/8/18 20:09 I 0 0
2 1/9/18 20:31 I 0 0 14766
3 1/11/18 17:16 I 0 0 1686
4 1/15/18 22:55 I 0 0 3491
device_model group_call group_name group_number device_id
0 mediaserver 0 N N MasterSlaveService
1 8300 mediaserver 0 N N
2 mediaserver 0 N N MasterSlaveService
3 mediaserver 0 N N MasterSlaveService
4 mediaserver 0 N N MasterSlaveService
history_event_state created_time updated_time group_type
0 A 1/8/18 19:49 1/8/18 19:49 N
1 MasterSlaveService A 1/8/18 20:09 1/8/18 20:09
2 A 1/9/18 20:31 1/9/18 20:31 N
3 A 1/11/18 17:16 1/11/18 17:16 N
4 A 1/15/18 22:55 1/15/18 22:55 N
python-3.x pandas csv
I have a dataframe like the data_df example below, that I create by reading in data from a csv with the code below. the problem I’m running in to is that some of the values in some of the columns are getting shifted to the right. For example the second record values are shifted one column to the right starting with the name column. I think maybe there’s a “” in the name for that record that’s causing the shift. Does anyone know how to fix this, is there something I can do in read_csv that would address this?
Code:
data_df = pd.read_csv(filepath)
with pd.option_context('display.max_rows', None, 'display.max_columns', None):
print(data_df[:5])
Output:
Unnamed: 0 call_history_id calllog_id
0 16358 1210746736 ca58d850-6fe6-4673-a049-ea4a2d8d7ecf
1 16361 1210976828 c005329b-955d-4d88-98a5-1c47e6a1cb80
2 16402 1217791595 050e9b83-54c2-4c87-abdd-32225c0d3189
3 16471 1228495414 45705ed1-a8e2-4a15-8941-5b0a40b7d409
4 27906 1245173592 04e56818-04a0-4704-ac86-31c31dac2370
call_id connection_id pbx_name pbx_id extension_number
0 1.509170e+12 1.509170e+12 sales8x8 sales8x8 595
1 1.509170e+12 1.509170e+12 sales8x8 sales8x8 595
2 1.509170e+12 1.509170e+12 sales8x8 sales8x8 595
3 1.509170e+12 1.509170e+12 sales8x8 sales8x8 595
4 1.509170e+12 1.509170e+12 sales8x8 sales8x8 595
extension_id customer_id address name
0 595 2.525100e+29 14086694428 Sun Basket
1 595 2.525100e+29 13214371589 PEREZ
2 595 2.525100e+29 14088566290 14088566290
3 595 2.525100e+29 8059316676 Dialing
4 595 2.525100e+29 12028071151 Implementation Team
start_timestamp direction call_internal call_missed duration
0 1/8/18 19:49 I 0 0 4414
1 BRYAN 1/8/18 20:09 I 0 0
2 1/9/18 20:31 I 0 0 14766
3 1/11/18 17:16 I 0 0 1686
4 1/15/18 22:55 I 0 0 3491
device_model group_call group_name group_number device_id
0 mediaserver 0 N N MasterSlaveService
1 8300 mediaserver 0 N N
2 mediaserver 0 N N MasterSlaveService
3 mediaserver 0 N N MasterSlaveService
4 mediaserver 0 N N MasterSlaveService
history_event_state created_time updated_time group_type
0 A 1/8/18 19:49 1/8/18 19:49 N
1 MasterSlaveService A 1/8/18 20:09 1/8/18 20:09
2 A 1/9/18 20:31 1/9/18 20:31 N
3 A 1/11/18 17:16 1/11/18 17:16 N
4 A 1/15/18 22:55 1/15/18 22:55 N
python-3.x pandas csv
python-3.x pandas csv
asked Nov 10 at 23:13
user3476463
72021329
72021329
The delimiter of your file is','
yet some of your fields use this special character. That name is almost certainly written as'PEREZ, BRYAN'
, which wont parse properly. If you have control over the file, you should choose a different delimiter.
– ALollz
Nov 10 at 23:18
1
@ALollz Thank you for getting back to me so quickly the value looks like ",PEREZ,BRYAN," is there a handy way to deal with the "," when reading the data in with read_csv? Or do I need to find/replace them all before reading them in?
– user3476463
Nov 11 at 2:52
@ALollz Thanks your previous comment that just disappeared worked nicely "sep=r'(?<!),'"
– user3476463
Nov 11 at 4:05
I think my suggestion wasn't nearly as correct as dmitriys solution. That will solve a lot of the issues at once
– ALollz
Nov 11 at 4:13
add a comment |
The delimiter of your file is','
yet some of your fields use this special character. That name is almost certainly written as'PEREZ, BRYAN'
, which wont parse properly. If you have control over the file, you should choose a different delimiter.
– ALollz
Nov 10 at 23:18
1
@ALollz Thank you for getting back to me so quickly the value looks like ",PEREZ,BRYAN," is there a handy way to deal with the "," when reading the data in with read_csv? Or do I need to find/replace them all before reading them in?
– user3476463
Nov 11 at 2:52
@ALollz Thanks your previous comment that just disappeared worked nicely "sep=r'(?<!),'"
– user3476463
Nov 11 at 4:05
I think my suggestion wasn't nearly as correct as dmitriys solution. That will solve a lot of the issues at once
– ALollz
Nov 11 at 4:13
The delimiter of your file is
','
yet some of your fields use this special character. That name is almost certainly written as 'PEREZ, BRYAN'
, which wont parse properly. If you have control over the file, you should choose a different delimiter.– ALollz
Nov 10 at 23:18
The delimiter of your file is
','
yet some of your fields use this special character. That name is almost certainly written as 'PEREZ, BRYAN'
, which wont parse properly. If you have control over the file, you should choose a different delimiter.– ALollz
Nov 10 at 23:18
1
1
@ALollz Thank you for getting back to me so quickly the value looks like ",PEREZ,BRYAN," is there a handy way to deal with the "," when reading the data in with read_csv? Or do I need to find/replace them all before reading them in?
– user3476463
Nov 11 at 2:52
@ALollz Thank you for getting back to me so quickly the value looks like ",PEREZ,BRYAN," is there a handy way to deal with the "," when reading the data in with read_csv? Or do I need to find/replace them all before reading them in?
– user3476463
Nov 11 at 2:52
@ALollz Thanks your previous comment that just disappeared worked nicely "sep=r'(?<!),'"
– user3476463
Nov 11 at 4:05
@ALollz Thanks your previous comment that just disappeared worked nicely "sep=r'(?<!),'"
– user3476463
Nov 11 at 4:05
I think my suggestion wasn't nearly as correct as dmitriys solution. That will solve a lot of the issues at once
– ALollz
Nov 11 at 4:13
I think my suggestion wasn't nearly as correct as dmitriys solution. That will solve a lot of the issues at once
– ALollz
Nov 11 at 4:13
add a comment |
1 Answer
1
active
oldest
votes
up vote
2
down vote
accepted
The is an escape character. Since I take it the values in your file are not enclosed in quotes, the
is placed before the comma so that you treat
PEREZ, BRYAN
as one value.
Try passing \
to the escapechar
option of pd.read_csv
and this should take care of it, e.g. pd.read_csv(filename, escapechar="\")
.
add a comment |
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
2
down vote
accepted
The is an escape character. Since I take it the values in your file are not enclosed in quotes, the
is placed before the comma so that you treat
PEREZ, BRYAN
as one value.
Try passing \
to the escapechar
option of pd.read_csv
and this should take care of it, e.g. pd.read_csv(filename, escapechar="\")
.
add a comment |
up vote
2
down vote
accepted
The is an escape character. Since I take it the values in your file are not enclosed in quotes, the
is placed before the comma so that you treat
PEREZ, BRYAN
as one value.
Try passing \
to the escapechar
option of pd.read_csv
and this should take care of it, e.g. pd.read_csv(filename, escapechar="\")
.
add a comment |
up vote
2
down vote
accepted
up vote
2
down vote
accepted
The is an escape character. Since I take it the values in your file are not enclosed in quotes, the
is placed before the comma so that you treat
PEREZ, BRYAN
as one value.
Try passing \
to the escapechar
option of pd.read_csv
and this should take care of it, e.g. pd.read_csv(filename, escapechar="\")
.
The is an escape character. Since I take it the values in your file are not enclosed in quotes, the
is placed before the comma so that you treat
PEREZ, BRYAN
as one value.
Try passing \
to the escapechar
option of pd.read_csv
and this should take care of it, e.g. pd.read_csv(filename, escapechar="\")
.
answered Nov 11 at 3:52
dmitriys
1469
1469
add a comment |
add a comment |
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53244339%2ftext-delimiter-shifting-values-in-dataframe%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
The delimiter of your file is
','
yet some of your fields use this special character. That name is almost certainly written as'PEREZ, BRYAN'
, which wont parse properly. If you have control over the file, you should choose a different delimiter.– ALollz
Nov 10 at 23:18
1
@ALollz Thank you for getting back to me so quickly the value looks like ",PEREZ,BRYAN," is there a handy way to deal with the "," when reading the data in with read_csv? Or do I need to find/replace them all before reading them in?
– user3476463
Nov 11 at 2:52
@ALollz Thanks your previous comment that just disappeared worked nicely "sep=r'(?<!),'"
– user3476463
Nov 11 at 4:05
I think my suggestion wasn't nearly as correct as dmitriys solution. That will solve a lot of the issues at once
– ALollz
Nov 11 at 4:13