How to change index dtype of pandas DataFrame to int32?

A default dtype of DataFrame index is int64 and I would like to change it to int32.

I tried changing it with pd.DataFrame.set_index and NumPy array of int32, also tried making new index with dtype=np.int32. It didn't work, always returning index of int64.

Can someone show a working code to produce Pandas index with int32 size?

I use conda Pandas v0.20.1.

edited Oct 4 '18 at 18:58

jpp

95.1k2156108

asked May 20 '17 at 21:24

Stanpol

336517

1

it doesn't seem to be possible... I could be wrong, but i couldn't find a way yet... pd.Index(np.arange(10, dtype=np.32), dtype=np.int32) - returns Int64Index([...], dtype='int64')

– MaxU
May 20 '17 at 21:38

Well, I did the same and couldn't figure out. Now trying to look through the source code here github.com/pandas-dev/pandas/tree/… but don't see where this change happens.

– Stanpol
May 20 '17 at 21:41

I could find support only for np.int64, np.uint64 and np.float64 for "numeric" indices

– MaxU
May 20 '17 at 21:47

2

Is the goal of using int32 to save memory? Are the values in the index consecutive, or regularly spaced? If so, then a RangeIndex might suffice. It is a memory-saving special case of Int64Index. It saves memory by merely storing the start, stop and step values without enumerating all the values in the range.

– unutbu
May 21 '17 at 0:04

1

@unutbu yep, that's the correct answer. That's exactly what I learned from the core contributor github.com/pandas-dev/pandas/issues/16404

– Stanpol
May 21 '17 at 0:19

add a comment |

A default dtype of DataFrame index is int64 and I would like to change it to int32.

I tried changing it with pd.DataFrame.set_index and NumPy array of int32, also tried making new index with dtype=np.int32. It didn't work, always returning index of int64.

Can someone show a working code to produce Pandas index with int32 size?

I use conda Pandas v0.20.1.

edited Oct 4 '18 at 18:58

jpp

95.1k2156108

asked May 20 '17 at 21:24

Stanpol

336517

1

it doesn't seem to be possible... I could be wrong, but i couldn't find a way yet... pd.Index(np.arange(10, dtype=np.32), dtype=np.int32) - returns Int64Index([...], dtype='int64')

– MaxU
May 20 '17 at 21:38

Well, I did the same and couldn't figure out. Now trying to look through the source code here github.com/pandas-dev/pandas/tree/… but don't see where this change happens.

– Stanpol
May 20 '17 at 21:41

I could find support only for np.int64, np.uint64 and np.float64 for "numeric" indices

– MaxU
May 20 '17 at 21:47

2

Is the goal of using int32 to save memory? Are the values in the index consecutive, or regularly spaced? If so, then a RangeIndex might suffice. It is a memory-saving special case of Int64Index. It saves memory by merely storing the start, stop and step values without enumerating all the values in the range.

– unutbu
May 21 '17 at 0:04

1

@unutbu yep, that's the correct answer. That's exactly what I learned from the core contributor github.com/pandas-dev/pandas/issues/16404

– Stanpol
May 21 '17 at 0:19

add a comment |

A default dtype of DataFrame index is int64 and I would like to change it to int32.

I tried changing it with pd.DataFrame.set_index and NumPy array of int32, also tried making new index with dtype=np.int32. It didn't work, always returning index of int64.

Can someone show a working code to produce Pandas index with int32 size?

I use conda Pandas v0.20.1.

edited Oct 4 '18 at 18:58

jpp

95.1k2156108

asked May 20 '17 at 21:24

Stanpol

336517

A default dtype of DataFrame index is int64 and I would like to change it to int32.

I tried changing it with pd.DataFrame.set_index and NumPy array of int32, also tried making new index with dtype=np.int32. It didn't work, always returning index of int64.

Can someone show a working code to produce Pandas index with int32 size?

I use conda Pandas v0.20.1.

python pandas numpy indexing

edited Oct 4 '18 at 18:58

jpp

95.1k2156108

asked May 20 '17 at 21:24

Stanpol

336517

edited Oct 4 '18 at 18:58

jpp

95.1k2156108

asked May 20 '17 at 21:24

Stanpol

336517

edited Oct 4 '18 at 18:58

jpp

95.1k2156108

edited Oct 4 '18 at 18:58

jpp

95.1k2156108

edited Oct 4 '18 at 18:58

jpp

95.1k2156108

asked May 20 '17 at 21:24

Stanpol

336517

asked May 20 '17 at 21:24

Stanpol

336517

asked May 20 '17 at 21:24

Stanpol

336517

1

it doesn't seem to be possible... I could be wrong, but i couldn't find a way yet... pd.Index(np.arange(10, dtype=np.32), dtype=np.int32) - returns Int64Index([...], dtype='int64')

– MaxU
May 20 '17 at 21:38

Well, I did the same and couldn't figure out. Now trying to look through the source code here github.com/pandas-dev/pandas/tree/… but don't see where this change happens.

– Stanpol
May 20 '17 at 21:41

I could find support only for np.int64, np.uint64 and np.float64 for "numeric" indices

– MaxU
May 20 '17 at 21:47

2

Is the goal of using int32 to save memory? Are the values in the index consecutive, or regularly spaced? If so, then a RangeIndex might suffice. It is a memory-saving special case of Int64Index. It saves memory by merely storing the start, stop and step values without enumerating all the values in the range.

– unutbu
May 21 '17 at 0:04

1

@unutbu yep, that's the correct answer. That's exactly what I learned from the core contributor github.com/pandas-dev/pandas/issues/16404

– Stanpol
May 21 '17 at 0:19

add a comment |

1

it doesn't seem to be possible... I could be wrong, but i couldn't find a way yet... pd.Index(np.arange(10, dtype=np.32), dtype=np.int32) - returns Int64Index([...], dtype='int64')

– MaxU
May 20 '17 at 21:38

Well, I did the same and couldn't figure out. Now trying to look through the source code here github.com/pandas-dev/pandas/tree/… but don't see where this change happens.

– Stanpol
May 20 '17 at 21:41

I could find support only for np.int64, np.uint64 and np.float64 for "numeric" indices

– MaxU
May 20 '17 at 21:47

2

Is the goal of using int32 to save memory? Are the values in the index consecutive, or regularly spaced? If so, then a RangeIndex might suffice. It is a memory-saving special case of Int64Index. It saves memory by merely storing the start, stop and step values without enumerating all the values in the range.

– unutbu
May 21 '17 at 0:04

1

@unutbu yep, that's the correct answer. That's exactly what I learned from the core contributor github.com/pandas-dev/pandas/issues/16404

– Stanpol
May 21 '17 at 0:19

it doesn't seem to be possible... I could be wrong, but i couldn't find a way yet... pd.Index(np.arange(10, dtype=np.32), dtype=np.int32) - returns Int64Index([...], dtype='int64')

– MaxU
May 20 '17 at 21:38

Well, I did the same and couldn't figure out. Now trying to look through the source code here github.com/pandas-dev/pandas/tree/… but don't see where this change happens.

– Stanpol
May 20 '17 at 21:41

I could find support only for np.int64, np.uint64 and np.float64 for "numeric" indices

– MaxU
May 20 '17 at 21:47

Is the goal of using int32 to save memory? Are the values in the index consecutive, or regularly spaced? If so, then a RangeIndex might suffice. It is a memory-saving special case of Int64Index. It saves memory by merely storing the start, stop and step values without enumerating all the values in the range.

– unutbu
May 21 '17 at 0:04

@unutbu yep, that's the correct answer. That's exactly what I learned from the core contributor github.com/pandas-dev/pandas/issues/16404

– Stanpol
May 21 '17 at 0:19

add a comment |

3 Answers
3

active

oldest

votes

Not sure this is something worth doing in practice, but the following should work:

class Int32Index(pd.Int64Index):
 _default_dtype = np.int32

 @property
 def asi8(self):
 return self.values

i = Int32Index(np.array([...], dtype='int32'))

(from here)

edited Jan 11 '18 at 8:14

answered May 22 '17 at 10:54

Pietro Battiston

3,97812231

1

In pandas 0.22.0 this doesn't work as expected. i.sort_values will cut the index in (exactly) half. No idea why. i = np.arange(0, 600002, dtype=np.int32); arr = Int32Index(i, name="i"); arr2 = arr.sort_values(); print arr.shape, arr2.shape; assert arr.shape == arr2.shape

– user48956
Jan 9 '18 at 23:07

@user48956 : edited so to fix this specific problem

– Pietro Battiston
Jan 11 '18 at 8:15

add a comment |

All of the code paths I could find, coerce the dtype:

Check in pandas.Index.__new__()

if issubclass(data.dtype.type, np.integer):
 from .numeric import Int64Index
 return Int64Index(data, copy=copy, dtype=dtype, name=name)

This allows passing a dtype, but in NumericIndex().__new__() we have:

if copy or not is_dtype_equal(data.dtype, cls._default_dtype):
 subarr = np.array(data, dtype=cls._default_dtype, copy=copy)

Which changes the dtype.

answered May 20 '17 at 21:49

Stephen Rauch

28.3k153356

add a comment |

Can someone show a working code to produce pandas index with int32
size?

@PietroBattiston's answer may work. But it's worth explaining why you should ordinarily not want to replace the default RangeIndex with an Int64 / Int32 index.

Storing the logic behind a range of values takes less memory than storing each integer in a range. This should be clear when you compare, for instance, Python's built-in range with NumPy np.arange. As described in the pd.RangeIndex docs:

RangeIndex is a memory-saving special case of Int64Index limited
to representing monotonic ranges. Using RangeIndex may in some
instances improve computing speed.

answered Oct 4 '18 at 18:21

jpp

95.1k2156108

add a comment |

Your Answer

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f44090944%2fhow-to-change-index-dtype-of-pandas-dataframe-to-int32%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

3 Answers
3

active

oldest

votes

3 Answers
3

active

oldest

votes

Not sure this is something worth doing in practice, but the following should work:

class Int32Index(pd.Int64Index):
 _default_dtype = np.int32

 @property
 def asi8(self):
 return self.values

i = Int32Index(np.array([...], dtype='int32'))

(from here)

edited Jan 11 '18 at 8:14

answered May 22 '17 at 10:54

Pietro Battiston

3,97812231

1

In pandas 0.22.0 this doesn't work as expected. i.sort_values will cut the index in (exactly) half. No idea why. i = np.arange(0, 600002, dtype=np.int32); arr = Int32Index(i, name="i"); arr2 = arr.sort_values(); print arr.shape, arr2.shape; assert arr.shape == arr2.shape

– user48956
Jan 9 '18 at 23:07

@user48956 : edited so to fix this specific problem

– Pietro Battiston
Jan 11 '18 at 8:15

add a comment |

Not sure this is something worth doing in practice, but the following should work:

class Int32Index(pd.Int64Index):
 _default_dtype = np.int32

 @property
 def asi8(self):
 return self.values

i = Int32Index(np.array([...], dtype='int32'))

(from here)

edited Jan 11 '18 at 8:14

answered May 22 '17 at 10:54

Pietro Battiston

3,97812231

1

In pandas 0.22.0 this doesn't work as expected. i.sort_values will cut the index in (exactly) half. No idea why. i = np.arange(0, 600002, dtype=np.int32); arr = Int32Index(i, name="i"); arr2 = arr.sort_values(); print arr.shape, arr2.shape; assert arr.shape == arr2.shape

– user48956
Jan 9 '18 at 23:07

@user48956 : edited so to fix this specific problem

– Pietro Battiston
Jan 11 '18 at 8:15

add a comment |

Not sure this is something worth doing in practice, but the following should work:

class Int32Index(pd.Int64Index):
 _default_dtype = np.int32

 @property
 def asi8(self):
 return self.values

i = Int32Index(np.array([...], dtype='int32'))

(from here)

edited Jan 11 '18 at 8:14

answered May 22 '17 at 10:54

Pietro Battiston

3,97812231

Not sure this is something worth doing in practice, but the following should work:

class Int32Index(pd.Int64Index):
 _default_dtype = np.int32

 @property
 def asi8(self):
 return self.values

i = Int32Index(np.array([...], dtype='int32'))

(from here)

edited Jan 11 '18 at 8:14

answered May 22 '17 at 10:54

Pietro Battiston

3,97812231

edited Jan 11 '18 at 8:14

answered May 22 '17 at 10:54

Pietro Battiston

3,97812231

answered May 22 '17 at 10:54

Pietro Battiston

3,97812231

answered May 22 '17 at 10:54

Pietro Battiston

3,97812231

1

In pandas 0.22.0 this doesn't work as expected. i.sort_values will cut the index in (exactly) half. No idea why. i = np.arange(0, 600002, dtype=np.int32); arr = Int32Index(i, name="i"); arr2 = arr.sort_values(); print arr.shape, arr2.shape; assert arr.shape == arr2.shape

– user48956
Jan 9 '18 at 23:07

@user48956 : edited so to fix this specific problem

– Pietro Battiston
Jan 11 '18 at 8:15

add a comment |

1

In pandas 0.22.0 this doesn't work as expected. i.sort_values will cut the index in (exactly) half. No idea why. i = np.arange(0, 600002, dtype=np.int32); arr = Int32Index(i, name="i"); arr2 = arr.sort_values(); print arr.shape, arr2.shape; assert arr.shape == arr2.shape

– user48956
Jan 9 '18 at 23:07

@user48956 : edited so to fix this specific problem

– Pietro Battiston
Jan 11 '18 at 8:15

In pandas 0.22.0 this doesn't work as expected. i.sort_values will cut the index in (exactly) half. No idea why.

i = np.arange(0, 600002, dtype=np.int32); arr = Int32Index(i, name="i"); arr2 = arr.sort_values(); print arr.shape, arr2.shape; assert arr.shape == arr2.shape

– user48956
Jan 9 '18 at 23:07

In pandas 0.22.0 this doesn't work as expected. i.sort_values will cut the index in (exactly) half. No idea why.

i = np.arange(0, 600002, dtype=np.int32); arr = Int32Index(i, name="i"); arr2 = arr.sort_values(); print arr.shape, arr2.shape; assert arr.shape == arr2.shape

– user48956
Jan 9 '18 at 23:07

@user48956 : edited so to fix this specific problem

– Pietro Battiston
Jan 11 '18 at 8:15

add a comment |

All of the code paths I could find, coerce the dtype:

Check in pandas.Index.__new__()

if issubclass(data.dtype.type, np.integer):
 from .numeric import Int64Index
 return Int64Index(data, copy=copy, dtype=dtype, name=name)

This allows passing a dtype, but in NumericIndex().__new__() we have:

if copy or not is_dtype_equal(data.dtype, cls._default_dtype):
 subarr = np.array(data, dtype=cls._default_dtype, copy=copy)

Which changes the dtype.

answered May 20 '17 at 21:49

Stephen Rauch

28.3k153356

add a comment |

All of the code paths I could find, coerce the dtype:

Check in pandas.Index.__new__()

if issubclass(data.dtype.type, np.integer):
 from .numeric import Int64Index
 return Int64Index(data, copy=copy, dtype=dtype, name=name)

This allows passing a dtype, but in NumericIndex().__new__() we have:

if copy or not is_dtype_equal(data.dtype, cls._default_dtype):
 subarr = np.array(data, dtype=cls._default_dtype, copy=copy)

Which changes the dtype.

answered May 20 '17 at 21:49

Stephen Rauch

28.3k153356

add a comment |

All of the code paths I could find, coerce the dtype:

Check in pandas.Index.__new__()

if issubclass(data.dtype.type, np.integer):
 from .numeric import Int64Index
 return Int64Index(data, copy=copy, dtype=dtype, name=name)

This allows passing a dtype, but in NumericIndex().__new__() we have:

if copy or not is_dtype_equal(data.dtype, cls._default_dtype):
 subarr = np.array(data, dtype=cls._default_dtype, copy=copy)

Which changes the dtype.

answered May 20 '17 at 21:49

Stephen Rauch

28.3k153356

All of the code paths I could find, coerce the dtype:

Check in pandas.Index.__new__()

if issubclass(data.dtype.type, np.integer):
 from .numeric import Int64Index
 return Int64Index(data, copy=copy, dtype=dtype, name=name)

This allows passing a dtype, but in NumericIndex().__new__() we have:

if copy or not is_dtype_equal(data.dtype, cls._default_dtype):
 subarr = np.array(data, dtype=cls._default_dtype, copy=copy)

Which changes the dtype.

answered May 20 '17 at 21:49

Stephen Rauch

28.3k153356

answered May 20 '17 at 21:49

Stephen Rauch

28.3k153356

answered May 20 '17 at 21:49

Stephen Rauch

28.3k153356

answered May 20 '17 at 21:49

Stephen Rauch

28.3k153356

add a comment |

Can someone show a working code to produce pandas index with int32
size?

@PietroBattiston's answer may work. But it's worth explaining why you should ordinarily not want to replace the default RangeIndex with an Int64 / Int32 index.

RangeIndex is a memory-saving special case of Int64Index limited
to representing monotonic ranges. Using RangeIndex may in some
instances improve computing speed.

answered Oct 4 '18 at 18:21

jpp

95.1k2156108

add a comment |

Can someone show a working code to produce pandas index with int32
size?

@PietroBattiston's answer may work. But it's worth explaining why you should ordinarily not want to replace the default RangeIndex with an Int64 / Int32 index.

RangeIndex is a memory-saving special case of Int64Index limited
to representing monotonic ranges. Using RangeIndex may in some
instances improve computing speed.

answered Oct 4 '18 at 18:21

jpp

95.1k2156108

add a comment |

Can someone show a working code to produce pandas index with int32
size?

@PietroBattiston's answer may work. But it's worth explaining why you should ordinarily not want to replace the default RangeIndex with an Int64 / Int32 index.

RangeIndex is a memory-saving special case of Int64Index limited
to representing monotonic ranges. Using RangeIndex may in some
instances improve computing speed.

answered Oct 4 '18 at 18:21

jpp

95.1k2156108

Can someone show a working code to produce pandas index with int32
size?

@PietroBattiston's answer may work. But it's worth explaining why you should ordinarily not want to replace the default RangeIndex with an Int64 / Int32 index.

RangeIndex is a memory-saving special case of Int64Index limited
to representing monotonic ranges. Using RangeIndex may in some
instances improve computing speed.

answered Oct 4 '18 at 18:21

jpp

95.1k2156108

answered Oct 4 '18 at 18:21

jpp

95.1k2156108

answered Oct 4 '18 at 18:21

jpp

95.1k2156108

answered Oct 4 '18 at 18:21

jpp

95.1k2156108

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

e AEwBvinGIgZPeA rfRIGOwOG kQs XLU Ac4JF 1 JCs37 30lHLBlD3k JBFGpiieSbz0gGz

搜尋此網誌

Odtnhj