How to change index dtype of pandas DataFrame to int32?










7















A default dtype of DataFrame index is int64 and I would like to change it to int32.



I tried changing it with pd.DataFrame.set_index and NumPy array of int32, also tried making new index with dtype=np.int32. It didn't work, always returning index of int64.



Can someone show a working code to produce Pandas index with int32 size?



I use conda Pandas v0.20.1.










share|improve this question



















  • 1





    it doesn't seem to be possible... I could be wrong, but i couldn't find a way yet... pd.Index(np.arange(10, dtype=np.32), dtype=np.int32) - returns Int64Index([...], dtype='int64')

    – MaxU
    May 20 '17 at 21:38











  • Well, I did the same and couldn't figure out. Now trying to look through the source code here github.com/pandas-dev/pandas/tree/… but don't see where this change happens.

    – Stanpol
    May 20 '17 at 21:41











  • I could find support only for np.int64, np.uint64 and np.float64 for "numeric" indices

    – MaxU
    May 20 '17 at 21:47







  • 2





    Is the goal of using int32 to save memory? Are the values in the index consecutive, or regularly spaced? If so, then a RangeIndex might suffice. It is a memory-saving special case of Int64Index. It saves memory by merely storing the start, stop and step values without enumerating all the values in the range.

    – unutbu
    May 21 '17 at 0:04







  • 1





    @unutbu yep, that's the correct answer. That's exactly what I learned from the core contributor github.com/pandas-dev/pandas/issues/16404

    – Stanpol
    May 21 '17 at 0:19















7















A default dtype of DataFrame index is int64 and I would like to change it to int32.



I tried changing it with pd.DataFrame.set_index and NumPy array of int32, also tried making new index with dtype=np.int32. It didn't work, always returning index of int64.



Can someone show a working code to produce Pandas index with int32 size?



I use conda Pandas v0.20.1.










share|improve this question



















  • 1





    it doesn't seem to be possible... I could be wrong, but i couldn't find a way yet... pd.Index(np.arange(10, dtype=np.32), dtype=np.int32) - returns Int64Index([...], dtype='int64')

    – MaxU
    May 20 '17 at 21:38











  • Well, I did the same and couldn't figure out. Now trying to look through the source code here github.com/pandas-dev/pandas/tree/… but don't see where this change happens.

    – Stanpol
    May 20 '17 at 21:41











  • I could find support only for np.int64, np.uint64 and np.float64 for "numeric" indices

    – MaxU
    May 20 '17 at 21:47







  • 2





    Is the goal of using int32 to save memory? Are the values in the index consecutive, or regularly spaced? If so, then a RangeIndex might suffice. It is a memory-saving special case of Int64Index. It saves memory by merely storing the start, stop and step values without enumerating all the values in the range.

    – unutbu
    May 21 '17 at 0:04







  • 1





    @unutbu yep, that's the correct answer. That's exactly what I learned from the core contributor github.com/pandas-dev/pandas/issues/16404

    – Stanpol
    May 21 '17 at 0:19













7












7








7


1






A default dtype of DataFrame index is int64 and I would like to change it to int32.



I tried changing it with pd.DataFrame.set_index and NumPy array of int32, also tried making new index with dtype=np.int32. It didn't work, always returning index of int64.



Can someone show a working code to produce Pandas index with int32 size?



I use conda Pandas v0.20.1.










share|improve this question
















A default dtype of DataFrame index is int64 and I would like to change it to int32.



I tried changing it with pd.DataFrame.set_index and NumPy array of int32, also tried making new index with dtype=np.int32. It didn't work, always returning index of int64.



Can someone show a working code to produce Pandas index with int32 size?



I use conda Pandas v0.20.1.







python pandas numpy indexing






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Oct 4 '18 at 18:58









jpp

95.1k2156108




95.1k2156108










asked May 20 '17 at 21:24









StanpolStanpol

336517




336517







  • 1





    it doesn't seem to be possible... I could be wrong, but i couldn't find a way yet... pd.Index(np.arange(10, dtype=np.32), dtype=np.int32) - returns Int64Index([...], dtype='int64')

    – MaxU
    May 20 '17 at 21:38











  • Well, I did the same and couldn't figure out. Now trying to look through the source code here github.com/pandas-dev/pandas/tree/… but don't see where this change happens.

    – Stanpol
    May 20 '17 at 21:41











  • I could find support only for np.int64, np.uint64 and np.float64 for "numeric" indices

    – MaxU
    May 20 '17 at 21:47







  • 2





    Is the goal of using int32 to save memory? Are the values in the index consecutive, or regularly spaced? If so, then a RangeIndex might suffice. It is a memory-saving special case of Int64Index. It saves memory by merely storing the start, stop and step values without enumerating all the values in the range.

    – unutbu
    May 21 '17 at 0:04







  • 1





    @unutbu yep, that's the correct answer. That's exactly what I learned from the core contributor github.com/pandas-dev/pandas/issues/16404

    – Stanpol
    May 21 '17 at 0:19












  • 1





    it doesn't seem to be possible... I could be wrong, but i couldn't find a way yet... pd.Index(np.arange(10, dtype=np.32), dtype=np.int32) - returns Int64Index([...], dtype='int64')

    – MaxU
    May 20 '17 at 21:38











  • Well, I did the same and couldn't figure out. Now trying to look through the source code here github.com/pandas-dev/pandas/tree/… but don't see where this change happens.

    – Stanpol
    May 20 '17 at 21:41











  • I could find support only for np.int64, np.uint64 and np.float64 for "numeric" indices

    – MaxU
    May 20 '17 at 21:47







  • 2





    Is the goal of using int32 to save memory? Are the values in the index consecutive, or regularly spaced? If so, then a RangeIndex might suffice. It is a memory-saving special case of Int64Index. It saves memory by merely storing the start, stop and step values without enumerating all the values in the range.

    – unutbu
    May 21 '17 at 0:04







  • 1





    @unutbu yep, that's the correct answer. That's exactly what I learned from the core contributor github.com/pandas-dev/pandas/issues/16404

    – Stanpol
    May 21 '17 at 0:19







1




1





it doesn't seem to be possible... I could be wrong, but i couldn't find a way yet... pd.Index(np.arange(10, dtype=np.32), dtype=np.int32) - returns Int64Index([...], dtype='int64')

– MaxU
May 20 '17 at 21:38





it doesn't seem to be possible... I could be wrong, but i couldn't find a way yet... pd.Index(np.arange(10, dtype=np.32), dtype=np.int32) - returns Int64Index([...], dtype='int64')

– MaxU
May 20 '17 at 21:38













Well, I did the same and couldn't figure out. Now trying to look through the source code here github.com/pandas-dev/pandas/tree/… but don't see where this change happens.

– Stanpol
May 20 '17 at 21:41





Well, I did the same and couldn't figure out. Now trying to look through the source code here github.com/pandas-dev/pandas/tree/… but don't see where this change happens.

– Stanpol
May 20 '17 at 21:41













I could find support only for np.int64, np.uint64 and np.float64 for "numeric" indices

– MaxU
May 20 '17 at 21:47






I could find support only for np.int64, np.uint64 and np.float64 for "numeric" indices

– MaxU
May 20 '17 at 21:47





2




2





Is the goal of using int32 to save memory? Are the values in the index consecutive, or regularly spaced? If so, then a RangeIndex might suffice. It is a memory-saving special case of Int64Index. It saves memory by merely storing the start, stop and step values without enumerating all the values in the range.

– unutbu
May 21 '17 at 0:04






Is the goal of using int32 to save memory? Are the values in the index consecutive, or regularly spaced? If so, then a RangeIndex might suffice. It is a memory-saving special case of Int64Index. It saves memory by merely storing the start, stop and step values without enumerating all the values in the range.

– unutbu
May 21 '17 at 0:04





1




1





@unutbu yep, that's the correct answer. That's exactly what I learned from the core contributor github.com/pandas-dev/pandas/issues/16404

– Stanpol
May 21 '17 at 0:19





@unutbu yep, that's the correct answer. That's exactly what I learned from the core contributor github.com/pandas-dev/pandas/issues/16404

– Stanpol
May 21 '17 at 0:19












3 Answers
3






active

oldest

votes


















2














Not sure this is something worth doing in practice, but the following should work:



class Int32Index(pd.Int64Index):
_default_dtype = np.int32

@property
def asi8(self):
return self.values

i = Int32Index(np.array([...], dtype='int32'))


(from here)






share|improve this answer




















  • 1





    In pandas 0.22.0 this doesn't work as expected. i.sort_values will cut the index in (exactly) half. No idea why. i = np.arange(0, 600002, dtype=np.int32); arr = Int32Index(i, name="i"); arr2 = arr.sort_values(); print arr.shape, arr2.shape; assert arr.shape == arr2.shape

    – user48956
    Jan 9 '18 at 23:07











  • @user48956 : edited so to fix this specific problem

    – Pietro Battiston
    Jan 11 '18 at 8:15


















1














All of the code paths I could find, coerce the dtype:



Check in pandas.Index.__new__()



if issubclass(data.dtype.type, np.integer):
from .numeric import Int64Index
return Int64Index(data, copy=copy, dtype=dtype, name=name)


This allows passing a dtype, but in NumericIndex().__new__() we have:



if copy or not is_dtype_equal(data.dtype, cls._default_dtype):
subarr = np.array(data, dtype=cls._default_dtype, copy=copy)


Which changes the dtype.






share|improve this answer






























    1















    Can someone show a working code to produce pandas index with int32
    size?




    @PietroBattiston's answer may work. But it's worth explaining why you should ordinarily not want to replace the default RangeIndex with an Int64 / Int32 index.



    Storing the logic behind a range of values takes less memory than storing each integer in a range. This should be clear when you compare, for instance, Python's built-in range with NumPy np.arange. As described in the pd.RangeIndex docs:




    RangeIndex is a memory-saving special case of Int64Index limited
    to representing monotonic ranges. Using RangeIndex may in some
    instances improve computing speed.







    share|improve this answer






















      Your Answer






      StackExchange.ifUsing("editor", function ()
      StackExchange.using("externalEditor", function ()
      StackExchange.using("snippets", function ()
      StackExchange.snippets.init();
      );
      );
      , "code-snippets");

      StackExchange.ready(function()
      var channelOptions =
      tags: "".split(" "),
      id: "1"
      ;
      initTagRenderer("".split(" "), "".split(" "), channelOptions);

      StackExchange.using("externalEditor", function()
      // Have to fire editor after snippets, if snippets enabled
      if (StackExchange.settings.snippets.snippetsEnabled)
      StackExchange.using("snippets", function()
      createEditor();
      );

      else
      createEditor();

      );

      function createEditor()
      StackExchange.prepareEditor(
      heartbeatType: 'answer',
      autoActivateHeartbeat: false,
      convertImagesToLinks: true,
      noModals: true,
      showLowRepImageUploadWarning: true,
      reputationToPostImages: 10,
      bindNavPrevention: true,
      postfix: "",
      imageUploader:
      brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
      contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
      allowUrls: true
      ,
      onDemand: true,
      discardSelector: ".discard-answer"
      ,immediatelyShowMarkdownHelp:true
      );



      );













      draft saved

      draft discarded


















      StackExchange.ready(
      function ()
      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f44090944%2fhow-to-change-index-dtype-of-pandas-dataframe-to-int32%23new-answer', 'question_page');

      );

      Post as a guest















      Required, but never shown

























      3 Answers
      3






      active

      oldest

      votes








      3 Answers
      3






      active

      oldest

      votes









      active

      oldest

      votes






      active

      oldest

      votes









      2














      Not sure this is something worth doing in practice, but the following should work:



      class Int32Index(pd.Int64Index):
      _default_dtype = np.int32

      @property
      def asi8(self):
      return self.values

      i = Int32Index(np.array([...], dtype='int32'))


      (from here)






      share|improve this answer




















      • 1





        In pandas 0.22.0 this doesn't work as expected. i.sort_values will cut the index in (exactly) half. No idea why. i = np.arange(0, 600002, dtype=np.int32); arr = Int32Index(i, name="i"); arr2 = arr.sort_values(); print arr.shape, arr2.shape; assert arr.shape == arr2.shape

        – user48956
        Jan 9 '18 at 23:07











      • @user48956 : edited so to fix this specific problem

        – Pietro Battiston
        Jan 11 '18 at 8:15















      2














      Not sure this is something worth doing in practice, but the following should work:



      class Int32Index(pd.Int64Index):
      _default_dtype = np.int32

      @property
      def asi8(self):
      return self.values

      i = Int32Index(np.array([...], dtype='int32'))


      (from here)






      share|improve this answer




















      • 1





        In pandas 0.22.0 this doesn't work as expected. i.sort_values will cut the index in (exactly) half. No idea why. i = np.arange(0, 600002, dtype=np.int32); arr = Int32Index(i, name="i"); arr2 = arr.sort_values(); print arr.shape, arr2.shape; assert arr.shape == arr2.shape

        – user48956
        Jan 9 '18 at 23:07











      • @user48956 : edited so to fix this specific problem

        – Pietro Battiston
        Jan 11 '18 at 8:15













      2












      2








      2







      Not sure this is something worth doing in practice, but the following should work:



      class Int32Index(pd.Int64Index):
      _default_dtype = np.int32

      @property
      def asi8(self):
      return self.values

      i = Int32Index(np.array([...], dtype='int32'))


      (from here)






      share|improve this answer















      Not sure this is something worth doing in practice, but the following should work:



      class Int32Index(pd.Int64Index):
      _default_dtype = np.int32

      @property
      def asi8(self):
      return self.values

      i = Int32Index(np.array([...], dtype='int32'))


      (from here)







      share|improve this answer














      share|improve this answer



      share|improve this answer








      edited Jan 11 '18 at 8:14

























      answered May 22 '17 at 10:54









      Pietro BattistonPietro Battiston

      3,97812231




      3,97812231







      • 1





        In pandas 0.22.0 this doesn't work as expected. i.sort_values will cut the index in (exactly) half. No idea why. i = np.arange(0, 600002, dtype=np.int32); arr = Int32Index(i, name="i"); arr2 = arr.sort_values(); print arr.shape, arr2.shape; assert arr.shape == arr2.shape

        – user48956
        Jan 9 '18 at 23:07











      • @user48956 : edited so to fix this specific problem

        – Pietro Battiston
        Jan 11 '18 at 8:15












      • 1





        In pandas 0.22.0 this doesn't work as expected. i.sort_values will cut the index in (exactly) half. No idea why. i = np.arange(0, 600002, dtype=np.int32); arr = Int32Index(i, name="i"); arr2 = arr.sort_values(); print arr.shape, arr2.shape; assert arr.shape == arr2.shape

        – user48956
        Jan 9 '18 at 23:07











      • @user48956 : edited so to fix this specific problem

        – Pietro Battiston
        Jan 11 '18 at 8:15







      1




      1





      In pandas 0.22.0 this doesn't work as expected. i.sort_values will cut the index in (exactly) half. No idea why. i = np.arange(0, 600002, dtype=np.int32); arr = Int32Index(i, name="i"); arr2 = arr.sort_values(); print arr.shape, arr2.shape; assert arr.shape == arr2.shape

      – user48956
      Jan 9 '18 at 23:07





      In pandas 0.22.0 this doesn't work as expected. i.sort_values will cut the index in (exactly) half. No idea why. i = np.arange(0, 600002, dtype=np.int32); arr = Int32Index(i, name="i"); arr2 = arr.sort_values(); print arr.shape, arr2.shape; assert arr.shape == arr2.shape

      – user48956
      Jan 9 '18 at 23:07













      @user48956 : edited so to fix this specific problem

      – Pietro Battiston
      Jan 11 '18 at 8:15





      @user48956 : edited so to fix this specific problem

      – Pietro Battiston
      Jan 11 '18 at 8:15













      1














      All of the code paths I could find, coerce the dtype:



      Check in pandas.Index.__new__()



      if issubclass(data.dtype.type, np.integer):
      from .numeric import Int64Index
      return Int64Index(data, copy=copy, dtype=dtype, name=name)


      This allows passing a dtype, but in NumericIndex().__new__() we have:



      if copy or not is_dtype_equal(data.dtype, cls._default_dtype):
      subarr = np.array(data, dtype=cls._default_dtype, copy=copy)


      Which changes the dtype.






      share|improve this answer



























        1














        All of the code paths I could find, coerce the dtype:



        Check in pandas.Index.__new__()



        if issubclass(data.dtype.type, np.integer):
        from .numeric import Int64Index
        return Int64Index(data, copy=copy, dtype=dtype, name=name)


        This allows passing a dtype, but in NumericIndex().__new__() we have:



        if copy or not is_dtype_equal(data.dtype, cls._default_dtype):
        subarr = np.array(data, dtype=cls._default_dtype, copy=copy)


        Which changes the dtype.






        share|improve this answer

























          1












          1








          1







          All of the code paths I could find, coerce the dtype:



          Check in pandas.Index.__new__()



          if issubclass(data.dtype.type, np.integer):
          from .numeric import Int64Index
          return Int64Index(data, copy=copy, dtype=dtype, name=name)


          This allows passing a dtype, but in NumericIndex().__new__() we have:



          if copy or not is_dtype_equal(data.dtype, cls._default_dtype):
          subarr = np.array(data, dtype=cls._default_dtype, copy=copy)


          Which changes the dtype.






          share|improve this answer













          All of the code paths I could find, coerce the dtype:



          Check in pandas.Index.__new__()



          if issubclass(data.dtype.type, np.integer):
          from .numeric import Int64Index
          return Int64Index(data, copy=copy, dtype=dtype, name=name)


          This allows passing a dtype, but in NumericIndex().__new__() we have:



          if copy or not is_dtype_equal(data.dtype, cls._default_dtype):
          subarr = np.array(data, dtype=cls._default_dtype, copy=copy)


          Which changes the dtype.







          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered May 20 '17 at 21:49









          Stephen RauchStephen Rauch

          28.3k153356




          28.3k153356





















              1















              Can someone show a working code to produce pandas index with int32
              size?




              @PietroBattiston's answer may work. But it's worth explaining why you should ordinarily not want to replace the default RangeIndex with an Int64 / Int32 index.



              Storing the logic behind a range of values takes less memory than storing each integer in a range. This should be clear when you compare, for instance, Python's built-in range with NumPy np.arange. As described in the pd.RangeIndex docs:




              RangeIndex is a memory-saving special case of Int64Index limited
              to representing monotonic ranges. Using RangeIndex may in some
              instances improve computing speed.







              share|improve this answer



























                1















                Can someone show a working code to produce pandas index with int32
                size?




                @PietroBattiston's answer may work. But it's worth explaining why you should ordinarily not want to replace the default RangeIndex with an Int64 / Int32 index.



                Storing the logic behind a range of values takes less memory than storing each integer in a range. This should be clear when you compare, for instance, Python's built-in range with NumPy np.arange. As described in the pd.RangeIndex docs:




                RangeIndex is a memory-saving special case of Int64Index limited
                to representing monotonic ranges. Using RangeIndex may in some
                instances improve computing speed.







                share|improve this answer

























                  1












                  1








                  1








                  Can someone show a working code to produce pandas index with int32
                  size?




                  @PietroBattiston's answer may work. But it's worth explaining why you should ordinarily not want to replace the default RangeIndex with an Int64 / Int32 index.



                  Storing the logic behind a range of values takes less memory than storing each integer in a range. This should be clear when you compare, for instance, Python's built-in range with NumPy np.arange. As described in the pd.RangeIndex docs:




                  RangeIndex is a memory-saving special case of Int64Index limited
                  to representing monotonic ranges. Using RangeIndex may in some
                  instances improve computing speed.







                  share|improve this answer














                  Can someone show a working code to produce pandas index with int32
                  size?




                  @PietroBattiston's answer may work. But it's worth explaining why you should ordinarily not want to replace the default RangeIndex with an Int64 / Int32 index.



                  Storing the logic behind a range of values takes less memory than storing each integer in a range. This should be clear when you compare, for instance, Python's built-in range with NumPy np.arange. As described in the pd.RangeIndex docs:




                  RangeIndex is a memory-saving special case of Int64Index limited
                  to representing monotonic ranges. Using RangeIndex may in some
                  instances improve computing speed.








                  share|improve this answer












                  share|improve this answer



                  share|improve this answer










                  answered Oct 4 '18 at 18:21









                  jppjpp

                  95.1k2156108




                  95.1k2156108



























                      draft saved

                      draft discarded
















































                      Thanks for contributing an answer to Stack Overflow!


                      • Please be sure to answer the question. Provide details and share your research!

                      But avoid


                      • Asking for help, clarification, or responding to other answers.

                      • Making statements based on opinion; back them up with references or personal experience.

                      To learn more, see our tips on writing great answers.




                      draft saved


                      draft discarded














                      StackExchange.ready(
                      function ()
                      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f44090944%2fhow-to-change-index-dtype-of-pandas-dataframe-to-int32%23new-answer', 'question_page');

                      );

                      Post as a guest















                      Required, but never shown





















































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown

































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown







                      這個網誌中的熱門文章

                      How to read a connectionString WITH PROVIDER in .NET Core?

                      In R, how to develop a multiplot heatmap.2 figure showing key labels successfully

                      Museum of Modern and Contemporary Art of Trento and Rovereto