How to change index dtype of pandas DataFrame to int32?
A default dtype of DataFrame index is int64
and I would like to change it to int32
.
I tried changing it with pd.DataFrame.set_index
and NumPy array of int32
, also tried making new index with dtype=np.int32
. It didn't work, always returning index of int64
.
Can someone show a working code to produce Pandas index with int32
size?
I use conda Pandas v0.20.1.
python pandas numpy indexing
add a comment |
A default dtype of DataFrame index is int64
and I would like to change it to int32
.
I tried changing it with pd.DataFrame.set_index
and NumPy array of int32
, also tried making new index with dtype=np.int32
. It didn't work, always returning index of int64
.
Can someone show a working code to produce Pandas index with int32
size?
I use conda Pandas v0.20.1.
python pandas numpy indexing
1
it doesn't seem to be possible... I could be wrong, but i couldn't find a way yet...pd.Index(np.arange(10, dtype=np.32), dtype=np.int32)
- returnsInt64Index([...], dtype='int64')
– MaxU
May 20 '17 at 21:38
Well, I did the same and couldn't figure out. Now trying to look through the source code here github.com/pandas-dev/pandas/tree/… but don't see where this change happens.
– Stanpol
May 20 '17 at 21:41
I could find support only fornp.int64
,np.uint64
andnp.float64
for "numeric" indices
– MaxU
May 20 '17 at 21:47
2
Is the goal of usingint32
to save memory? Are the values in the index consecutive, or regularly spaced? If so, then aRangeIndex
might suffice. It is a memory-saving special case ofInt64Index
. It saves memory by merely storing the start, stop and step values without enumerating all the values in the range.
– unutbu
May 21 '17 at 0:04
1
@unutbu yep, that's the correct answer. That's exactly what I learned from the core contributor github.com/pandas-dev/pandas/issues/16404
– Stanpol
May 21 '17 at 0:19
add a comment |
A default dtype of DataFrame index is int64
and I would like to change it to int32
.
I tried changing it with pd.DataFrame.set_index
and NumPy array of int32
, also tried making new index with dtype=np.int32
. It didn't work, always returning index of int64
.
Can someone show a working code to produce Pandas index with int32
size?
I use conda Pandas v0.20.1.
python pandas numpy indexing
A default dtype of DataFrame index is int64
and I would like to change it to int32
.
I tried changing it with pd.DataFrame.set_index
and NumPy array of int32
, also tried making new index with dtype=np.int32
. It didn't work, always returning index of int64
.
Can someone show a working code to produce Pandas index with int32
size?
I use conda Pandas v0.20.1.
python pandas numpy indexing
python pandas numpy indexing
edited Oct 4 '18 at 18:58
jpp
95.1k2156108
95.1k2156108
asked May 20 '17 at 21:24
StanpolStanpol
336517
336517
1
it doesn't seem to be possible... I could be wrong, but i couldn't find a way yet...pd.Index(np.arange(10, dtype=np.32), dtype=np.int32)
- returnsInt64Index([...], dtype='int64')
– MaxU
May 20 '17 at 21:38
Well, I did the same and couldn't figure out. Now trying to look through the source code here github.com/pandas-dev/pandas/tree/… but don't see where this change happens.
– Stanpol
May 20 '17 at 21:41
I could find support only fornp.int64
,np.uint64
andnp.float64
for "numeric" indices
– MaxU
May 20 '17 at 21:47
2
Is the goal of usingint32
to save memory? Are the values in the index consecutive, or regularly spaced? If so, then aRangeIndex
might suffice. It is a memory-saving special case ofInt64Index
. It saves memory by merely storing the start, stop and step values without enumerating all the values in the range.
– unutbu
May 21 '17 at 0:04
1
@unutbu yep, that's the correct answer. That's exactly what I learned from the core contributor github.com/pandas-dev/pandas/issues/16404
– Stanpol
May 21 '17 at 0:19
add a comment |
1
it doesn't seem to be possible... I could be wrong, but i couldn't find a way yet...pd.Index(np.arange(10, dtype=np.32), dtype=np.int32)
- returnsInt64Index([...], dtype='int64')
– MaxU
May 20 '17 at 21:38
Well, I did the same and couldn't figure out. Now trying to look through the source code here github.com/pandas-dev/pandas/tree/… but don't see where this change happens.
– Stanpol
May 20 '17 at 21:41
I could find support only fornp.int64
,np.uint64
andnp.float64
for "numeric" indices
– MaxU
May 20 '17 at 21:47
2
Is the goal of usingint32
to save memory? Are the values in the index consecutive, or regularly spaced? If so, then aRangeIndex
might suffice. It is a memory-saving special case ofInt64Index
. It saves memory by merely storing the start, stop and step values without enumerating all the values in the range.
– unutbu
May 21 '17 at 0:04
1
@unutbu yep, that's the correct answer. That's exactly what I learned from the core contributor github.com/pandas-dev/pandas/issues/16404
– Stanpol
May 21 '17 at 0:19
1
1
it doesn't seem to be possible... I could be wrong, but i couldn't find a way yet...
pd.Index(np.arange(10, dtype=np.32), dtype=np.int32)
- returns Int64Index([...], dtype='int64')
– MaxU
May 20 '17 at 21:38
it doesn't seem to be possible... I could be wrong, but i couldn't find a way yet...
pd.Index(np.arange(10, dtype=np.32), dtype=np.int32)
- returns Int64Index([...], dtype='int64')
– MaxU
May 20 '17 at 21:38
Well, I did the same and couldn't figure out. Now trying to look through the source code here github.com/pandas-dev/pandas/tree/… but don't see where this change happens.
– Stanpol
May 20 '17 at 21:41
Well, I did the same and couldn't figure out. Now trying to look through the source code here github.com/pandas-dev/pandas/tree/… but don't see where this change happens.
– Stanpol
May 20 '17 at 21:41
I could find support only for
np.int64
, np.uint64
and np.float64
for "numeric" indices– MaxU
May 20 '17 at 21:47
I could find support only for
np.int64
, np.uint64
and np.float64
for "numeric" indices– MaxU
May 20 '17 at 21:47
2
2
Is the goal of using
int32
to save memory? Are the values in the index consecutive, or regularly spaced? If so, then a RangeIndex
might suffice. It is a memory-saving special case of Int64Index
. It saves memory by merely storing the start, stop and step values without enumerating all the values in the range.– unutbu
May 21 '17 at 0:04
Is the goal of using
int32
to save memory? Are the values in the index consecutive, or regularly spaced? If so, then a RangeIndex
might suffice. It is a memory-saving special case of Int64Index
. It saves memory by merely storing the start, stop and step values without enumerating all the values in the range.– unutbu
May 21 '17 at 0:04
1
1
@unutbu yep, that's the correct answer. That's exactly what I learned from the core contributor github.com/pandas-dev/pandas/issues/16404
– Stanpol
May 21 '17 at 0:19
@unutbu yep, that's the correct answer. That's exactly what I learned from the core contributor github.com/pandas-dev/pandas/issues/16404
– Stanpol
May 21 '17 at 0:19
add a comment |
3 Answers
3
active
oldest
votes
Not sure this is something worth doing in practice, but the following should work:
class Int32Index(pd.Int64Index):
_default_dtype = np.int32
@property
def asi8(self):
return self.values
i = Int32Index(np.array([...], dtype='int32'))
(from here)
1
In pandas 0.22.0 this doesn't work as expected. i.sort_values will cut the index in (exactly) half. No idea why.i = np.arange(0, 600002, dtype=np.int32); arr = Int32Index(i, name="i"); arr2 = arr.sort_values(); print arr.shape, arr2.shape; assert arr.shape == arr2.shape
– user48956
Jan 9 '18 at 23:07
@user48956 : edited so to fix this specific problem
– Pietro Battiston
Jan 11 '18 at 8:15
add a comment |
All of the code paths I could find, coerce the dtype:
Check in pandas.Index.__new__()
if issubclass(data.dtype.type, np.integer):
from .numeric import Int64Index
return Int64Index(data, copy=copy, dtype=dtype, name=name)
This allows passing a dtype, but in NumericIndex().__new__()
we have:
if copy or not is_dtype_equal(data.dtype, cls._default_dtype):
subarr = np.array(data, dtype=cls._default_dtype, copy=copy)
Which changes the dtype.
add a comment |
Can someone show a working code to produce pandas index with int32
size?
@PietroBattiston's answer may work. But it's worth explaining why you should ordinarily not want to replace the default RangeIndex
with an Int64
/ Int32
index.
Storing the logic behind a range of values takes less memory than storing each integer in a range. This should be clear when you compare, for instance, Python's built-in range
with NumPy np.arange
. As described in the pd.RangeIndex
docs:
RangeIndex
is a memory-saving special case ofInt64Index
limited
to representing monotonic ranges. UsingRangeIndex
may in some
instances improve computing speed.
add a comment |
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f44090944%2fhow-to-change-index-dtype-of-pandas-dataframe-to-int32%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
Not sure this is something worth doing in practice, but the following should work:
class Int32Index(pd.Int64Index):
_default_dtype = np.int32
@property
def asi8(self):
return self.values
i = Int32Index(np.array([...], dtype='int32'))
(from here)
1
In pandas 0.22.0 this doesn't work as expected. i.sort_values will cut the index in (exactly) half. No idea why.i = np.arange(0, 600002, dtype=np.int32); arr = Int32Index(i, name="i"); arr2 = arr.sort_values(); print arr.shape, arr2.shape; assert arr.shape == arr2.shape
– user48956
Jan 9 '18 at 23:07
@user48956 : edited so to fix this specific problem
– Pietro Battiston
Jan 11 '18 at 8:15
add a comment |
Not sure this is something worth doing in practice, but the following should work:
class Int32Index(pd.Int64Index):
_default_dtype = np.int32
@property
def asi8(self):
return self.values
i = Int32Index(np.array([...], dtype='int32'))
(from here)
1
In pandas 0.22.0 this doesn't work as expected. i.sort_values will cut the index in (exactly) half. No idea why.i = np.arange(0, 600002, dtype=np.int32); arr = Int32Index(i, name="i"); arr2 = arr.sort_values(); print arr.shape, arr2.shape; assert arr.shape == arr2.shape
– user48956
Jan 9 '18 at 23:07
@user48956 : edited so to fix this specific problem
– Pietro Battiston
Jan 11 '18 at 8:15
add a comment |
Not sure this is something worth doing in practice, but the following should work:
class Int32Index(pd.Int64Index):
_default_dtype = np.int32
@property
def asi8(self):
return self.values
i = Int32Index(np.array([...], dtype='int32'))
(from here)
Not sure this is something worth doing in practice, but the following should work:
class Int32Index(pd.Int64Index):
_default_dtype = np.int32
@property
def asi8(self):
return self.values
i = Int32Index(np.array([...], dtype='int32'))
(from here)
edited Jan 11 '18 at 8:14
answered May 22 '17 at 10:54
Pietro BattistonPietro Battiston
3,97812231
3,97812231
1
In pandas 0.22.0 this doesn't work as expected. i.sort_values will cut the index in (exactly) half. No idea why.i = np.arange(0, 600002, dtype=np.int32); arr = Int32Index(i, name="i"); arr2 = arr.sort_values(); print arr.shape, arr2.shape; assert arr.shape == arr2.shape
– user48956
Jan 9 '18 at 23:07
@user48956 : edited so to fix this specific problem
– Pietro Battiston
Jan 11 '18 at 8:15
add a comment |
1
In pandas 0.22.0 this doesn't work as expected. i.sort_values will cut the index in (exactly) half. No idea why.i = np.arange(0, 600002, dtype=np.int32); arr = Int32Index(i, name="i"); arr2 = arr.sort_values(); print arr.shape, arr2.shape; assert arr.shape == arr2.shape
– user48956
Jan 9 '18 at 23:07
@user48956 : edited so to fix this specific problem
– Pietro Battiston
Jan 11 '18 at 8:15
1
1
In pandas 0.22.0 this doesn't work as expected. i.sort_values will cut the index in (exactly) half. No idea why.
i = np.arange(0, 600002, dtype=np.int32); arr = Int32Index(i, name="i"); arr2 = arr.sort_values(); print arr.shape, arr2.shape; assert arr.shape == arr2.shape
– user48956
Jan 9 '18 at 23:07
In pandas 0.22.0 this doesn't work as expected. i.sort_values will cut the index in (exactly) half. No idea why.
i = np.arange(0, 600002, dtype=np.int32); arr = Int32Index(i, name="i"); arr2 = arr.sort_values(); print arr.shape, arr2.shape; assert arr.shape == arr2.shape
– user48956
Jan 9 '18 at 23:07
@user48956 : edited so to fix this specific problem
– Pietro Battiston
Jan 11 '18 at 8:15
@user48956 : edited so to fix this specific problem
– Pietro Battiston
Jan 11 '18 at 8:15
add a comment |
All of the code paths I could find, coerce the dtype:
Check in pandas.Index.__new__()
if issubclass(data.dtype.type, np.integer):
from .numeric import Int64Index
return Int64Index(data, copy=copy, dtype=dtype, name=name)
This allows passing a dtype, but in NumericIndex().__new__()
we have:
if copy or not is_dtype_equal(data.dtype, cls._default_dtype):
subarr = np.array(data, dtype=cls._default_dtype, copy=copy)
Which changes the dtype.
add a comment |
All of the code paths I could find, coerce the dtype:
Check in pandas.Index.__new__()
if issubclass(data.dtype.type, np.integer):
from .numeric import Int64Index
return Int64Index(data, copy=copy, dtype=dtype, name=name)
This allows passing a dtype, but in NumericIndex().__new__()
we have:
if copy or not is_dtype_equal(data.dtype, cls._default_dtype):
subarr = np.array(data, dtype=cls._default_dtype, copy=copy)
Which changes the dtype.
add a comment |
All of the code paths I could find, coerce the dtype:
Check in pandas.Index.__new__()
if issubclass(data.dtype.type, np.integer):
from .numeric import Int64Index
return Int64Index(data, copy=copy, dtype=dtype, name=name)
This allows passing a dtype, but in NumericIndex().__new__()
we have:
if copy or not is_dtype_equal(data.dtype, cls._default_dtype):
subarr = np.array(data, dtype=cls._default_dtype, copy=copy)
Which changes the dtype.
All of the code paths I could find, coerce the dtype:
Check in pandas.Index.__new__()
if issubclass(data.dtype.type, np.integer):
from .numeric import Int64Index
return Int64Index(data, copy=copy, dtype=dtype, name=name)
This allows passing a dtype, but in NumericIndex().__new__()
we have:
if copy or not is_dtype_equal(data.dtype, cls._default_dtype):
subarr = np.array(data, dtype=cls._default_dtype, copy=copy)
Which changes the dtype.
answered May 20 '17 at 21:49
Stephen RauchStephen Rauch
28.3k153356
28.3k153356
add a comment |
add a comment |
Can someone show a working code to produce pandas index with int32
size?
@PietroBattiston's answer may work. But it's worth explaining why you should ordinarily not want to replace the default RangeIndex
with an Int64
/ Int32
index.
Storing the logic behind a range of values takes less memory than storing each integer in a range. This should be clear when you compare, for instance, Python's built-in range
with NumPy np.arange
. As described in the pd.RangeIndex
docs:
RangeIndex
is a memory-saving special case ofInt64Index
limited
to representing monotonic ranges. UsingRangeIndex
may in some
instances improve computing speed.
add a comment |
Can someone show a working code to produce pandas index with int32
size?
@PietroBattiston's answer may work. But it's worth explaining why you should ordinarily not want to replace the default RangeIndex
with an Int64
/ Int32
index.
Storing the logic behind a range of values takes less memory than storing each integer in a range. This should be clear when you compare, for instance, Python's built-in range
with NumPy np.arange
. As described in the pd.RangeIndex
docs:
RangeIndex
is a memory-saving special case ofInt64Index
limited
to representing monotonic ranges. UsingRangeIndex
may in some
instances improve computing speed.
add a comment |
Can someone show a working code to produce pandas index with int32
size?
@PietroBattiston's answer may work. But it's worth explaining why you should ordinarily not want to replace the default RangeIndex
with an Int64
/ Int32
index.
Storing the logic behind a range of values takes less memory than storing each integer in a range. This should be clear when you compare, for instance, Python's built-in range
with NumPy np.arange
. As described in the pd.RangeIndex
docs:
RangeIndex
is a memory-saving special case ofInt64Index
limited
to representing monotonic ranges. UsingRangeIndex
may in some
instances improve computing speed.
Can someone show a working code to produce pandas index with int32
size?
@PietroBattiston's answer may work. But it's worth explaining why you should ordinarily not want to replace the default RangeIndex
with an Int64
/ Int32
index.
Storing the logic behind a range of values takes less memory than storing each integer in a range. This should be clear when you compare, for instance, Python's built-in range
with NumPy np.arange
. As described in the pd.RangeIndex
docs:
RangeIndex
is a memory-saving special case ofInt64Index
limited
to representing monotonic ranges. UsingRangeIndex
may in some
instances improve computing speed.
answered Oct 4 '18 at 18:21
jppjpp
95.1k2156108
95.1k2156108
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f44090944%2fhow-to-change-index-dtype-of-pandas-dataframe-to-int32%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
1
it doesn't seem to be possible... I could be wrong, but i couldn't find a way yet...
pd.Index(np.arange(10, dtype=np.32), dtype=np.int32)
- returnsInt64Index([...], dtype='int64')
– MaxU
May 20 '17 at 21:38
Well, I did the same and couldn't figure out. Now trying to look through the source code here github.com/pandas-dev/pandas/tree/… but don't see where this change happens.
– Stanpol
May 20 '17 at 21:41
I could find support only for
np.int64
,np.uint64
andnp.float64
for "numeric" indices– MaxU
May 20 '17 at 21:47
2
Is the goal of using
int32
to save memory? Are the values in the index consecutive, or regularly spaced? If so, then aRangeIndex
might suffice. It is a memory-saving special case ofInt64Index
. It saves memory by merely storing the start, stop and step values without enumerating all the values in the range.– unutbu
May 21 '17 at 0:04
1
@unutbu yep, that's the correct answer. That's exactly what I learned from the core contributor github.com/pandas-dev/pandas/issues/16404
– Stanpol
May 21 '17 at 0:19