collating multiple rows of a column in a panda to one row while maintaining the data type of the column









up vote
1
down vote

favorite












I have a panda with a few columns like this



username A time place
AAA B 1 YYY
AAA C 2 YYY
AAA D 1 YYY
AAA B 3 ZZZ
AAA C 4 ZZZ
AAA B 3 ZZZ
BBB B 1 YYY
BBB C 2 YYY
BBB D 1 YYY
BBB B 7 ZZZ
BBB C 8 ZZZ
BBB B 9 ZZZ
CCC B 6 YYY
CCC C 5 YYY
CCC D 8 YYY
CCC B 7 ZZZ
CCC C 8 ZZZ
CCC B 9 ZZZ


in the above panda, all the columns except time are strings. TIme is a float column.



I am trying create a sequence such that for every username, I want the all the rows of a username collated to one row. The output dataframe wants to look like this.



username A time place
AAA B+C+D+B+C+B 1+2+1+3+4+3 YYY+YYY+YYY+ZZZ+ZZZ+ZZZ
BBB B+C+D+B+C+B 1+2+1+7+8+9 YYY+YYY+YYY+ZZZ+ZZZ+ZZZ
CCC B+C+D+B+C+B 6+5+8+7+8+9 YYY+YYY+YYY+ZZZ+ZZZ+ZZZ


I am using the '+' as a separator, but it can be any character generally used for separators(like ,/ ..etc)



I have been able to do that for all the columns using



df.groupby('username')['A].apply('+',join).reset_index()


and the same for all columns. I am finally merging all the individual df`s to get the form I want.



For the time column I am able to do but am looking to get a column of type floats. I am having difficulty doing that. Hoping somebody more knowledgeable can guide me here.



I have even tried changing the output column after the fact with
df['time'].astype(float)



but am getting all NaN`s.










share|improve this question

























    up vote
    1
    down vote

    favorite












    I have a panda with a few columns like this



    username A time place
    AAA B 1 YYY
    AAA C 2 YYY
    AAA D 1 YYY
    AAA B 3 ZZZ
    AAA C 4 ZZZ
    AAA B 3 ZZZ
    BBB B 1 YYY
    BBB C 2 YYY
    BBB D 1 YYY
    BBB B 7 ZZZ
    BBB C 8 ZZZ
    BBB B 9 ZZZ
    CCC B 6 YYY
    CCC C 5 YYY
    CCC D 8 YYY
    CCC B 7 ZZZ
    CCC C 8 ZZZ
    CCC B 9 ZZZ


    in the above panda, all the columns except time are strings. TIme is a float column.



    I am trying create a sequence such that for every username, I want the all the rows of a username collated to one row. The output dataframe wants to look like this.



    username A time place
    AAA B+C+D+B+C+B 1+2+1+3+4+3 YYY+YYY+YYY+ZZZ+ZZZ+ZZZ
    BBB B+C+D+B+C+B 1+2+1+7+8+9 YYY+YYY+YYY+ZZZ+ZZZ+ZZZ
    CCC B+C+D+B+C+B 6+5+8+7+8+9 YYY+YYY+YYY+ZZZ+ZZZ+ZZZ


    I am using the '+' as a separator, but it can be any character generally used for separators(like ,/ ..etc)



    I have been able to do that for all the columns using



    df.groupby('username')['A].apply('+',join).reset_index()


    and the same for all columns. I am finally merging all the individual df`s to get the form I want.



    For the time column I am able to do but am looking to get a column of type floats. I am having difficulty doing that. Hoping somebody more knowledgeable can guide me here.



    I have even tried changing the output column after the fact with
    df['time'].astype(float)



    but am getting all NaN`s.










    share|improve this question























      up vote
      1
      down vote

      favorite









      up vote
      1
      down vote

      favorite











      I have a panda with a few columns like this



      username A time place
      AAA B 1 YYY
      AAA C 2 YYY
      AAA D 1 YYY
      AAA B 3 ZZZ
      AAA C 4 ZZZ
      AAA B 3 ZZZ
      BBB B 1 YYY
      BBB C 2 YYY
      BBB D 1 YYY
      BBB B 7 ZZZ
      BBB C 8 ZZZ
      BBB B 9 ZZZ
      CCC B 6 YYY
      CCC C 5 YYY
      CCC D 8 YYY
      CCC B 7 ZZZ
      CCC C 8 ZZZ
      CCC B 9 ZZZ


      in the above panda, all the columns except time are strings. TIme is a float column.



      I am trying create a sequence such that for every username, I want the all the rows of a username collated to one row. The output dataframe wants to look like this.



      username A time place
      AAA B+C+D+B+C+B 1+2+1+3+4+3 YYY+YYY+YYY+ZZZ+ZZZ+ZZZ
      BBB B+C+D+B+C+B 1+2+1+7+8+9 YYY+YYY+YYY+ZZZ+ZZZ+ZZZ
      CCC B+C+D+B+C+B 6+5+8+7+8+9 YYY+YYY+YYY+ZZZ+ZZZ+ZZZ


      I am using the '+' as a separator, but it can be any character generally used for separators(like ,/ ..etc)



      I have been able to do that for all the columns using



      df.groupby('username')['A].apply('+',join).reset_index()


      and the same for all columns. I am finally merging all the individual df`s to get the form I want.



      For the time column I am able to do but am looking to get a column of type floats. I am having difficulty doing that. Hoping somebody more knowledgeable can guide me here.



      I have even tried changing the output column after the fact with
      df['time'].astype(float)



      but am getting all NaN`s.










      share|improve this question













      I have a panda with a few columns like this



      username A time place
      AAA B 1 YYY
      AAA C 2 YYY
      AAA D 1 YYY
      AAA B 3 ZZZ
      AAA C 4 ZZZ
      AAA B 3 ZZZ
      BBB B 1 YYY
      BBB C 2 YYY
      BBB D 1 YYY
      BBB B 7 ZZZ
      BBB C 8 ZZZ
      BBB B 9 ZZZ
      CCC B 6 YYY
      CCC C 5 YYY
      CCC D 8 YYY
      CCC B 7 ZZZ
      CCC C 8 ZZZ
      CCC B 9 ZZZ


      in the above panda, all the columns except time are strings. TIme is a float column.



      I am trying create a sequence such that for every username, I want the all the rows of a username collated to one row. The output dataframe wants to look like this.



      username A time place
      AAA B+C+D+B+C+B 1+2+1+3+4+3 YYY+YYY+YYY+ZZZ+ZZZ+ZZZ
      BBB B+C+D+B+C+B 1+2+1+7+8+9 YYY+YYY+YYY+ZZZ+ZZZ+ZZZ
      CCC B+C+D+B+C+B 6+5+8+7+8+9 YYY+YYY+YYY+ZZZ+ZZZ+ZZZ


      I am using the '+' as a separator, but it can be any character generally used for separators(like ,/ ..etc)



      I have been able to do that for all the columns using



      df.groupby('username')['A].apply('+',join).reset_index()


      and the same for all columns. I am finally merging all the individual df`s to get the form I want.



      For the time column I am able to do but am looking to get a column of type floats. I am having difficulty doing that. Hoping somebody more knowledgeable can guide me here.



      I have even tried changing the output column after the fact with
      df['time'].astype(float)



      but am getting all NaN`s.







      python pandas






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Nov 10 at 21:08









      Acinonyx

      327




      327






















          1 Answer
          1






          active

          oldest

          votes

















          up vote
          1
          down vote



          accepted










          I believe you need convert all columns to strings with agg:



          df = df.astype(str).groupby('username', as_index=False).agg('+'.join)
          print (df)
          username A time place
          0 AAA B+C+D+B+C+B 1.0+2.0+1.0+3.0+4.0+3.0 YYY+YYY+YYY+ZZZ+ZZZ+ZZZ
          1 BBB B+C+D+B+C+B 1.0+2.0+1.0+7.0+8.0+9.0 YYY+YYY+YYY+ZZZ+ZZZ+ZZZ
          2 CCC B+C+D+B+C+B 6.0+5.0+8.0+7.0+8.0+9.0 YYY+YYY+YYY+ZZZ+ZZZ+ZZZ


          If need sum numeric columns and join by + strings columns:



          df = (df.groupby('username', as_index=False)
          .agg(lambda x: x.sum() if np.issubdtype(x.dtype, np.number) else '+'.join(x)))
          print (df)
          username A time place
          0 AAA B+C+D+B+C+B 14.0 YYY+YYY+YYY+ZZZ+ZZZ+ZZZ
          1 BBB B+C+D+B+C+B 28.0 YYY+YYY+YYY+ZZZ+ZZZ+ZZZ
          2 CCC B+C+D+B+C+B 43.0 YYY+YYY+YYY+ZZZ+ZZZ+ZZZ





          share|improve this answer






















          • I am trying to get the time column to be a float in the final output. If that is not possible, I dont mind getting tips on how to make the time column a float after the agg. thx
            – Acinonyx
            Nov 10 at 23:15










          • So for AAA need 14.0 for time?
            – jezrael
            Nov 10 at 23:16










          • @Acinonyx - Please check edited answer.
            – jezrael
            Nov 11 at 3:07










          • Cannot vote due to lack of reputation points. My Q is answered.
            – Acinonyx
            Nov 11 at 6:23










          • @Acinonyx - You can upvote now ;)
            – jezrael
            Nov 11 at 6:24










          Your Answer






          StackExchange.ifUsing("editor", function ()
          StackExchange.using("externalEditor", function ()
          StackExchange.using("snippets", function ()
          StackExchange.snippets.init();
          );
          );
          , "code-snippets");

          StackExchange.ready(function()
          var channelOptions =
          tags: "".split(" "),
          id: "1"
          ;
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function()
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled)
          StackExchange.using("snippets", function()
          createEditor();
          );

          else
          createEditor();

          );

          function createEditor()
          StackExchange.prepareEditor(
          heartbeatType: 'answer',
          convertImagesToLinks: true,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: 10,
          bindNavPrevention: true,
          postfix: "",
          imageUploader:
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          ,
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          );



          );













           

          draft saved


          draft discarded


















          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53243419%2fcollating-multiple-rows-of-a-column-in-a-panda-to-one-row-while-maintaining-the%23new-answer', 'question_page');

          );

          Post as a guest















          Required, but never shown

























          1 Answer
          1






          active

          oldest

          votes








          1 Answer
          1






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes








          up vote
          1
          down vote



          accepted










          I believe you need convert all columns to strings with agg:



          df = df.astype(str).groupby('username', as_index=False).agg('+'.join)
          print (df)
          username A time place
          0 AAA B+C+D+B+C+B 1.0+2.0+1.0+3.0+4.0+3.0 YYY+YYY+YYY+ZZZ+ZZZ+ZZZ
          1 BBB B+C+D+B+C+B 1.0+2.0+1.0+7.0+8.0+9.0 YYY+YYY+YYY+ZZZ+ZZZ+ZZZ
          2 CCC B+C+D+B+C+B 6.0+5.0+8.0+7.0+8.0+9.0 YYY+YYY+YYY+ZZZ+ZZZ+ZZZ


          If need sum numeric columns and join by + strings columns:



          df = (df.groupby('username', as_index=False)
          .agg(lambda x: x.sum() if np.issubdtype(x.dtype, np.number) else '+'.join(x)))
          print (df)
          username A time place
          0 AAA B+C+D+B+C+B 14.0 YYY+YYY+YYY+ZZZ+ZZZ+ZZZ
          1 BBB B+C+D+B+C+B 28.0 YYY+YYY+YYY+ZZZ+ZZZ+ZZZ
          2 CCC B+C+D+B+C+B 43.0 YYY+YYY+YYY+ZZZ+ZZZ+ZZZ





          share|improve this answer






















          • I am trying to get the time column to be a float in the final output. If that is not possible, I dont mind getting tips on how to make the time column a float after the agg. thx
            – Acinonyx
            Nov 10 at 23:15










          • So for AAA need 14.0 for time?
            – jezrael
            Nov 10 at 23:16










          • @Acinonyx - Please check edited answer.
            – jezrael
            Nov 11 at 3:07










          • Cannot vote due to lack of reputation points. My Q is answered.
            – Acinonyx
            Nov 11 at 6:23










          • @Acinonyx - You can upvote now ;)
            – jezrael
            Nov 11 at 6:24














          up vote
          1
          down vote



          accepted










          I believe you need convert all columns to strings with agg:



          df = df.astype(str).groupby('username', as_index=False).agg('+'.join)
          print (df)
          username A time place
          0 AAA B+C+D+B+C+B 1.0+2.0+1.0+3.0+4.0+3.0 YYY+YYY+YYY+ZZZ+ZZZ+ZZZ
          1 BBB B+C+D+B+C+B 1.0+2.0+1.0+7.0+8.0+9.0 YYY+YYY+YYY+ZZZ+ZZZ+ZZZ
          2 CCC B+C+D+B+C+B 6.0+5.0+8.0+7.0+8.0+9.0 YYY+YYY+YYY+ZZZ+ZZZ+ZZZ


          If need sum numeric columns and join by + strings columns:



          df = (df.groupby('username', as_index=False)
          .agg(lambda x: x.sum() if np.issubdtype(x.dtype, np.number) else '+'.join(x)))
          print (df)
          username A time place
          0 AAA B+C+D+B+C+B 14.0 YYY+YYY+YYY+ZZZ+ZZZ+ZZZ
          1 BBB B+C+D+B+C+B 28.0 YYY+YYY+YYY+ZZZ+ZZZ+ZZZ
          2 CCC B+C+D+B+C+B 43.0 YYY+YYY+YYY+ZZZ+ZZZ+ZZZ





          share|improve this answer






















          • I am trying to get the time column to be a float in the final output. If that is not possible, I dont mind getting tips on how to make the time column a float after the agg. thx
            – Acinonyx
            Nov 10 at 23:15










          • So for AAA need 14.0 for time?
            – jezrael
            Nov 10 at 23:16










          • @Acinonyx - Please check edited answer.
            – jezrael
            Nov 11 at 3:07










          • Cannot vote due to lack of reputation points. My Q is answered.
            – Acinonyx
            Nov 11 at 6:23










          • @Acinonyx - You can upvote now ;)
            – jezrael
            Nov 11 at 6:24












          up vote
          1
          down vote



          accepted







          up vote
          1
          down vote



          accepted






          I believe you need convert all columns to strings with agg:



          df = df.astype(str).groupby('username', as_index=False).agg('+'.join)
          print (df)
          username A time place
          0 AAA B+C+D+B+C+B 1.0+2.0+1.0+3.0+4.0+3.0 YYY+YYY+YYY+ZZZ+ZZZ+ZZZ
          1 BBB B+C+D+B+C+B 1.0+2.0+1.0+7.0+8.0+9.0 YYY+YYY+YYY+ZZZ+ZZZ+ZZZ
          2 CCC B+C+D+B+C+B 6.0+5.0+8.0+7.0+8.0+9.0 YYY+YYY+YYY+ZZZ+ZZZ+ZZZ


          If need sum numeric columns and join by + strings columns:



          df = (df.groupby('username', as_index=False)
          .agg(lambda x: x.sum() if np.issubdtype(x.dtype, np.number) else '+'.join(x)))
          print (df)
          username A time place
          0 AAA B+C+D+B+C+B 14.0 YYY+YYY+YYY+ZZZ+ZZZ+ZZZ
          1 BBB B+C+D+B+C+B 28.0 YYY+YYY+YYY+ZZZ+ZZZ+ZZZ
          2 CCC B+C+D+B+C+B 43.0 YYY+YYY+YYY+ZZZ+ZZZ+ZZZ





          share|improve this answer














          I believe you need convert all columns to strings with agg:



          df = df.astype(str).groupby('username', as_index=False).agg('+'.join)
          print (df)
          username A time place
          0 AAA B+C+D+B+C+B 1.0+2.0+1.0+3.0+4.0+3.0 YYY+YYY+YYY+ZZZ+ZZZ+ZZZ
          1 BBB B+C+D+B+C+B 1.0+2.0+1.0+7.0+8.0+9.0 YYY+YYY+YYY+ZZZ+ZZZ+ZZZ
          2 CCC B+C+D+B+C+B 6.0+5.0+8.0+7.0+8.0+9.0 YYY+YYY+YYY+ZZZ+ZZZ+ZZZ


          If need sum numeric columns and join by + strings columns:



          df = (df.groupby('username', as_index=False)
          .agg(lambda x: x.sum() if np.issubdtype(x.dtype, np.number) else '+'.join(x)))
          print (df)
          username A time place
          0 AAA B+C+D+B+C+B 14.0 YYY+YYY+YYY+ZZZ+ZZZ+ZZZ
          1 BBB B+C+D+B+C+B 28.0 YYY+YYY+YYY+ZZZ+ZZZ+ZZZ
          2 CCC B+C+D+B+C+B 43.0 YYY+YYY+YYY+ZZZ+ZZZ+ZZZ






          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited Nov 10 at 23:32

























          answered Nov 10 at 21:10









          jezrael

          308k20244319




          308k20244319











          • I am trying to get the time column to be a float in the final output. If that is not possible, I dont mind getting tips on how to make the time column a float after the agg. thx
            – Acinonyx
            Nov 10 at 23:15










          • So for AAA need 14.0 for time?
            – jezrael
            Nov 10 at 23:16










          • @Acinonyx - Please check edited answer.
            – jezrael
            Nov 11 at 3:07










          • Cannot vote due to lack of reputation points. My Q is answered.
            – Acinonyx
            Nov 11 at 6:23










          • @Acinonyx - You can upvote now ;)
            – jezrael
            Nov 11 at 6:24
















          • I am trying to get the time column to be a float in the final output. If that is not possible, I dont mind getting tips on how to make the time column a float after the agg. thx
            – Acinonyx
            Nov 10 at 23:15










          • So for AAA need 14.0 for time?
            – jezrael
            Nov 10 at 23:16










          • @Acinonyx - Please check edited answer.
            – jezrael
            Nov 11 at 3:07










          • Cannot vote due to lack of reputation points. My Q is answered.
            – Acinonyx
            Nov 11 at 6:23










          • @Acinonyx - You can upvote now ;)
            – jezrael
            Nov 11 at 6:24















          I am trying to get the time column to be a float in the final output. If that is not possible, I dont mind getting tips on how to make the time column a float after the agg. thx
          – Acinonyx
          Nov 10 at 23:15




          I am trying to get the time column to be a float in the final output. If that is not possible, I dont mind getting tips on how to make the time column a float after the agg. thx
          – Acinonyx
          Nov 10 at 23:15












          So for AAA need 14.0 for time?
          – jezrael
          Nov 10 at 23:16




          So for AAA need 14.0 for time?
          – jezrael
          Nov 10 at 23:16












          @Acinonyx - Please check edited answer.
          – jezrael
          Nov 11 at 3:07




          @Acinonyx - Please check edited answer.
          – jezrael
          Nov 11 at 3:07












          Cannot vote due to lack of reputation points. My Q is answered.
          – Acinonyx
          Nov 11 at 6:23




          Cannot vote due to lack of reputation points. My Q is answered.
          – Acinonyx
          Nov 11 at 6:23












          @Acinonyx - You can upvote now ;)
          – jezrael
          Nov 11 at 6:24




          @Acinonyx - You can upvote now ;)
          – jezrael
          Nov 11 at 6:24

















           

          draft saved


          draft discarded















































           


          draft saved


          draft discarded














          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53243419%2fcollating-multiple-rows-of-a-column-in-a-panda-to-one-row-while-maintaining-the%23new-answer', 'question_page');

          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          這個網誌中的熱門文章

          How to read a connectionString WITH PROVIDER in .NET Core?

          In R, how to develop a multiplot heatmap.2 figure showing key labels successfully

          Museum of Modern and Contemporary Art of Trento and Rovereto