grouping rows in list in pandas groupby









up vote
111
down vote

favorite
52












I have a pandas data frame like:



A 1
A 2
B 5
B 5
B 4
C 6


I want to group by the first column and get second column as lists in rows:



A [1,2]
B [5,5,4]
C [6]


Is it possible to do something like this using pandas groupby?










share|improve this question

















  • 1




    Storing lists in dataframes is inefficient, any reason why you want to do this?
    – EdChum
    Mar 6 '14 at 10:35






  • 1




    list is an example, could be anything where I can access all entries from the same group in one row
    – Abhishek Thakur
    Mar 6 '14 at 10:41










  • I think if you just grouped by the columns and access the data corresponding to that group then it saves having to generate a list, what will be returned is a Pandas dataframe/series for that group
    – EdChum
    Mar 6 '14 at 10:52










  • Is there a way to group multiple columns ? And return an array of tuples
    – Akshay L Aradhya
    Oct 25 at 14:11














up vote
111
down vote

favorite
52












I have a pandas data frame like:



A 1
A 2
B 5
B 5
B 4
C 6


I want to group by the first column and get second column as lists in rows:



A [1,2]
B [5,5,4]
C [6]


Is it possible to do something like this using pandas groupby?










share|improve this question

















  • 1




    Storing lists in dataframes is inefficient, any reason why you want to do this?
    – EdChum
    Mar 6 '14 at 10:35






  • 1




    list is an example, could be anything where I can access all entries from the same group in one row
    – Abhishek Thakur
    Mar 6 '14 at 10:41










  • I think if you just grouped by the columns and access the data corresponding to that group then it saves having to generate a list, what will be returned is a Pandas dataframe/series for that group
    – EdChum
    Mar 6 '14 at 10:52










  • Is there a way to group multiple columns ? And return an array of tuples
    – Akshay L Aradhya
    Oct 25 at 14:11












up vote
111
down vote

favorite
52









up vote
111
down vote

favorite
52






52





I have a pandas data frame like:



A 1
A 2
B 5
B 5
B 4
C 6


I want to group by the first column and get second column as lists in rows:



A [1,2]
B [5,5,4]
C [6]


Is it possible to do something like this using pandas groupby?










share|improve this question













I have a pandas data frame like:



A 1
A 2
B 5
B 5
B 4
C 6


I want to group by the first column and get second column as lists in rows:



A [1,2]
B [5,5,4]
C [6]


Is it possible to do something like this using pandas groupby?







python pandas






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Mar 6 '14 at 8:31









Abhishek Thakur

5,94393869




5,94393869







  • 1




    Storing lists in dataframes is inefficient, any reason why you want to do this?
    – EdChum
    Mar 6 '14 at 10:35






  • 1




    list is an example, could be anything where I can access all entries from the same group in one row
    – Abhishek Thakur
    Mar 6 '14 at 10:41










  • I think if you just grouped by the columns and access the data corresponding to that group then it saves having to generate a list, what will be returned is a Pandas dataframe/series for that group
    – EdChum
    Mar 6 '14 at 10:52










  • Is there a way to group multiple columns ? And return an array of tuples
    – Akshay L Aradhya
    Oct 25 at 14:11












  • 1




    Storing lists in dataframes is inefficient, any reason why you want to do this?
    – EdChum
    Mar 6 '14 at 10:35






  • 1




    list is an example, could be anything where I can access all entries from the same group in one row
    – Abhishek Thakur
    Mar 6 '14 at 10:41










  • I think if you just grouped by the columns and access the data corresponding to that group then it saves having to generate a list, what will be returned is a Pandas dataframe/series for that group
    – EdChum
    Mar 6 '14 at 10:52










  • Is there a way to group multiple columns ? And return an array of tuples
    – Akshay L Aradhya
    Oct 25 at 14:11







1




1




Storing lists in dataframes is inefficient, any reason why you want to do this?
– EdChum
Mar 6 '14 at 10:35




Storing lists in dataframes is inefficient, any reason why you want to do this?
– EdChum
Mar 6 '14 at 10:35




1




1




list is an example, could be anything where I can access all entries from the same group in one row
– Abhishek Thakur
Mar 6 '14 at 10:41




list is an example, could be anything where I can access all entries from the same group in one row
– Abhishek Thakur
Mar 6 '14 at 10:41












I think if you just grouped by the columns and access the data corresponding to that group then it saves having to generate a list, what will be returned is a Pandas dataframe/series for that group
– EdChum
Mar 6 '14 at 10:52




I think if you just grouped by the columns and access the data corresponding to that group then it saves having to generate a list, what will be returned is a Pandas dataframe/series for that group
– EdChum
Mar 6 '14 at 10:52












Is there a way to group multiple columns ? And return an array of tuples
– Akshay L Aradhya
Oct 25 at 14:11




Is there a way to group multiple columns ? And return an array of tuples
– Akshay L Aradhya
Oct 25 at 14:11












5 Answers
5






active

oldest

votes

















up vote
167
down vote



accepted










You can do this using groupby to group on the column of interest and then apply list to every group:



In [1]:
# create the dataframe
df = pd.DataFrame( 'a':['A','A','B','B','B','C'], 'b':[1,2,5,5,4,6])
df
Out[1]:
a b
0 A 1
1 A 2
2 B 5
3 B 5
4 B 4
5 C 6

[6 rows x 2 columns]

In [76]:
df.groupby('a')['b'].apply(list)

Out[76]:
a
A [1, 2]
B [5, 5, 4]
C [6]
Name: b, dtype: object





share|improve this answer


















  • 3




    This takes a lot of time if the dataset is huge, say 10million rows. Is there any faster way to do this? The number of uniques in 'a' is however around 500k
    – Abhishek Thakur
    Mar 6 '14 at 11:12






  • 4




    groupby is notoriously slow and memory hungry, what you could do is sort by column A, then find the idxmin and idxmax (probably store this in a dict) and use this to slice your dataframe would be faster I think
    – EdChum
    Mar 6 '14 at 11:32










  • @AbhishekThakur actually that won't work as idxmin will not work for strings, you would need to store the beginning and end indices values
    – EdChum
    Mar 6 '14 at 11:40






  • 1




    Unless I'm missing something (no morning coffee yet) you're doing a separate groupby for each row.
    – DSM
    Mar 6 '14 at 12:21






  • 1




    When I tried this solution with my problem (having multiple columns to groupBy and to group), it didn't work - pandas sent 'Function does not reduce'. Then I used tuplefollowing the second answer here: stackoverflow.com/questions/19530568/… . See second answer in stackoverflow.com/questions/27439023/… for explanation.
    – Andarin
    Jun 24 '16 at 10:54


















up vote
21
down vote













If performance is important go down to numpy level:



import numpy as np

df = pd.DataFrame('a': np.random.randint(0,60,600), 'b': [1,2,5,5,4,6]*100)

def f(df):
keys, values = df.sort_values('a').values.T
ukeys, index = np.unique(keys,True)
arrays = np.split(values,index[1:])
df2 = pd.DataFrame('a':ukeys,'b':[list(a) for a in arrays])
return df2


Tests:



In [301]: %timeit f(df)
1000 loops, best of 3: 1.64 ms per loop

In [302]: %timeit df.groupby('a')['b'].apply(list)
100 loops, best of 3: 5.26 ms per loop





share|improve this answer





























    up vote
    11
    down vote













    As you were saying the groupby method of a pd.DataFrame object can do the job.



    Example



     L = ['A','A','B','B','B','C']
    N = [1,2,5,5,4,6]

    import pandas as pd
    df = pd.DataFrame(zip(L,N),columns = list('LN'))


    groups = df.groupby(df.L)

    groups.groups
    'A': [0, 1], 'B': [2, 3, 4], 'C': [5]


    which gives and index-wise description of the groups.



    To get elements of single groups, you can do, for instance



     groups.get_group('A')

    L N
    0 A 1
    1 A 2

    groups.get_group('B')

    L N
    2 B 5
    3 B 5
    4 B 4





    share|improve this answer





























      up vote
      3
      down vote













      A handy way to achieve this would be:



      df.groupby('a').agg('b':lambda x: list(x))


      Look into writing Custom Aggregations: https://www.kaggle.com/akshaysehgal/how-to-group-by-aggregate-using-py






      share|improve this answer




















      • lambda args: f(args) is equivalent to f
        – BallpointBen
        Oct 11 at 17:43

















      up vote
      1
      down vote













      To solve this for several columns of a dataframe:



      In [5]: df = pd.DataFrame( 'a':['A','A','B','B','B','C'], 'b':[1,2,5,5,4,6],'c'
      ...: :[3,3,3,4,4,4])

      In [6]: df
      Out[6]:
      a b c
      0 A 1 3
      1 A 2 3
      2 B 5 3
      3 B 5 4
      4 B 4 4
      5 C 6 4

      In [7]: df.groupby('a').agg(lambda x: list(x))
      Out[7]:
      b c
      a
      A [1, 2] [3, 3]
      B [5, 5, 4] [3, 4, 4]
      C [6] [4]


      This answer was inspired from Anamika Modi's answer. Thank you!






      share|improve this answer




















        Your Answer






        StackExchange.ifUsing("editor", function ()
        StackExchange.using("externalEditor", function ()
        StackExchange.using("snippets", function ()
        StackExchange.snippets.init();
        );
        );
        , "code-snippets");

        StackExchange.ready(function()
        var channelOptions =
        tags: "".split(" "),
        id: "1"
        ;
        initTagRenderer("".split(" "), "".split(" "), channelOptions);

        StackExchange.using("externalEditor", function()
        // Have to fire editor after snippets, if snippets enabled
        if (StackExchange.settings.snippets.snippetsEnabled)
        StackExchange.using("snippets", function()
        createEditor();
        );

        else
        createEditor();

        );

        function createEditor()
        StackExchange.prepareEditor(
        heartbeatType: 'answer',
        convertImagesToLinks: true,
        noModals: true,
        showLowRepImageUploadWarning: true,
        reputationToPostImages: 10,
        bindNavPrevention: true,
        postfix: "",
        imageUploader:
        brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
        contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
        allowUrls: true
        ,
        onDemand: true,
        discardSelector: ".discard-answer"
        ,immediatelyShowMarkdownHelp:true
        );



        );













         

        draft saved


        draft discarded


















        StackExchange.ready(
        function ()
        StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f22219004%2fgrouping-rows-in-list-in-pandas-groupby%23new-answer', 'question_page');

        );

        Post as a guest






























        5 Answers
        5






        active

        oldest

        votes








        5 Answers
        5






        active

        oldest

        votes









        active

        oldest

        votes






        active

        oldest

        votes








        up vote
        167
        down vote



        accepted










        You can do this using groupby to group on the column of interest and then apply list to every group:



        In [1]:
        # create the dataframe
        df = pd.DataFrame( 'a':['A','A','B','B','B','C'], 'b':[1,2,5,5,4,6])
        df
        Out[1]:
        a b
        0 A 1
        1 A 2
        2 B 5
        3 B 5
        4 B 4
        5 C 6

        [6 rows x 2 columns]

        In [76]:
        df.groupby('a')['b'].apply(list)

        Out[76]:
        a
        A [1, 2]
        B [5, 5, 4]
        C [6]
        Name: b, dtype: object





        share|improve this answer


















        • 3




          This takes a lot of time if the dataset is huge, say 10million rows. Is there any faster way to do this? The number of uniques in 'a' is however around 500k
          – Abhishek Thakur
          Mar 6 '14 at 11:12






        • 4




          groupby is notoriously slow and memory hungry, what you could do is sort by column A, then find the idxmin and idxmax (probably store this in a dict) and use this to slice your dataframe would be faster I think
          – EdChum
          Mar 6 '14 at 11:32










        • @AbhishekThakur actually that won't work as idxmin will not work for strings, you would need to store the beginning and end indices values
          – EdChum
          Mar 6 '14 at 11:40






        • 1




          Unless I'm missing something (no morning coffee yet) you're doing a separate groupby for each row.
          – DSM
          Mar 6 '14 at 12:21






        • 1




          When I tried this solution with my problem (having multiple columns to groupBy and to group), it didn't work - pandas sent 'Function does not reduce'. Then I used tuplefollowing the second answer here: stackoverflow.com/questions/19530568/… . See second answer in stackoverflow.com/questions/27439023/… for explanation.
          – Andarin
          Jun 24 '16 at 10:54















        up vote
        167
        down vote



        accepted










        You can do this using groupby to group on the column of interest and then apply list to every group:



        In [1]:
        # create the dataframe
        df = pd.DataFrame( 'a':['A','A','B','B','B','C'], 'b':[1,2,5,5,4,6])
        df
        Out[1]:
        a b
        0 A 1
        1 A 2
        2 B 5
        3 B 5
        4 B 4
        5 C 6

        [6 rows x 2 columns]

        In [76]:
        df.groupby('a')['b'].apply(list)

        Out[76]:
        a
        A [1, 2]
        B [5, 5, 4]
        C [6]
        Name: b, dtype: object





        share|improve this answer


















        • 3




          This takes a lot of time if the dataset is huge, say 10million rows. Is there any faster way to do this? The number of uniques in 'a' is however around 500k
          – Abhishek Thakur
          Mar 6 '14 at 11:12






        • 4




          groupby is notoriously slow and memory hungry, what you could do is sort by column A, then find the idxmin and idxmax (probably store this in a dict) and use this to slice your dataframe would be faster I think
          – EdChum
          Mar 6 '14 at 11:32










        • @AbhishekThakur actually that won't work as idxmin will not work for strings, you would need to store the beginning and end indices values
          – EdChum
          Mar 6 '14 at 11:40






        • 1




          Unless I'm missing something (no morning coffee yet) you're doing a separate groupby for each row.
          – DSM
          Mar 6 '14 at 12:21






        • 1




          When I tried this solution with my problem (having multiple columns to groupBy and to group), it didn't work - pandas sent 'Function does not reduce'. Then I used tuplefollowing the second answer here: stackoverflow.com/questions/19530568/… . See second answer in stackoverflow.com/questions/27439023/… for explanation.
          – Andarin
          Jun 24 '16 at 10:54













        up vote
        167
        down vote



        accepted







        up vote
        167
        down vote



        accepted






        You can do this using groupby to group on the column of interest and then apply list to every group:



        In [1]:
        # create the dataframe
        df = pd.DataFrame( 'a':['A','A','B','B','B','C'], 'b':[1,2,5,5,4,6])
        df
        Out[1]:
        a b
        0 A 1
        1 A 2
        2 B 5
        3 B 5
        4 B 4
        5 C 6

        [6 rows x 2 columns]

        In [76]:
        df.groupby('a')['b'].apply(list)

        Out[76]:
        a
        A [1, 2]
        B [5, 5, 4]
        C [6]
        Name: b, dtype: object





        share|improve this answer














        You can do this using groupby to group on the column of interest and then apply list to every group:



        In [1]:
        # create the dataframe
        df = pd.DataFrame( 'a':['A','A','B','B','B','C'], 'b':[1,2,5,5,4,6])
        df
        Out[1]:
        a b
        0 A 1
        1 A 2
        2 B 5
        3 B 5
        4 B 4
        5 C 6

        [6 rows x 2 columns]

        In [76]:
        df.groupby('a')['b'].apply(list)

        Out[76]:
        a
        A [1, 2]
        B [5, 5, 4]
        C [6]
        Name: b, dtype: object






        share|improve this answer














        share|improve this answer



        share|improve this answer








        edited Sep 28 '16 at 12:09

























        answered Mar 6 '14 at 10:28









        EdChum

        165k31349299




        165k31349299







        • 3




          This takes a lot of time if the dataset is huge, say 10million rows. Is there any faster way to do this? The number of uniques in 'a' is however around 500k
          – Abhishek Thakur
          Mar 6 '14 at 11:12






        • 4




          groupby is notoriously slow and memory hungry, what you could do is sort by column A, then find the idxmin and idxmax (probably store this in a dict) and use this to slice your dataframe would be faster I think
          – EdChum
          Mar 6 '14 at 11:32










        • @AbhishekThakur actually that won't work as idxmin will not work for strings, you would need to store the beginning and end indices values
          – EdChum
          Mar 6 '14 at 11:40






        • 1




          Unless I'm missing something (no morning coffee yet) you're doing a separate groupby for each row.
          – DSM
          Mar 6 '14 at 12:21






        • 1




          When I tried this solution with my problem (having multiple columns to groupBy and to group), it didn't work - pandas sent 'Function does not reduce'. Then I used tuplefollowing the second answer here: stackoverflow.com/questions/19530568/… . See second answer in stackoverflow.com/questions/27439023/… for explanation.
          – Andarin
          Jun 24 '16 at 10:54













        • 3




          This takes a lot of time if the dataset is huge, say 10million rows. Is there any faster way to do this? The number of uniques in 'a' is however around 500k
          – Abhishek Thakur
          Mar 6 '14 at 11:12






        • 4




          groupby is notoriously slow and memory hungry, what you could do is sort by column A, then find the idxmin and idxmax (probably store this in a dict) and use this to slice your dataframe would be faster I think
          – EdChum
          Mar 6 '14 at 11:32










        • @AbhishekThakur actually that won't work as idxmin will not work for strings, you would need to store the beginning and end indices values
          – EdChum
          Mar 6 '14 at 11:40






        • 1




          Unless I'm missing something (no morning coffee yet) you're doing a separate groupby for each row.
          – DSM
          Mar 6 '14 at 12:21






        • 1




          When I tried this solution with my problem (having multiple columns to groupBy and to group), it didn't work - pandas sent 'Function does not reduce'. Then I used tuplefollowing the second answer here: stackoverflow.com/questions/19530568/… . See second answer in stackoverflow.com/questions/27439023/… for explanation.
          – Andarin
          Jun 24 '16 at 10:54








        3




        3




        This takes a lot of time if the dataset is huge, say 10million rows. Is there any faster way to do this? The number of uniques in 'a' is however around 500k
        – Abhishek Thakur
        Mar 6 '14 at 11:12




        This takes a lot of time if the dataset is huge, say 10million rows. Is there any faster way to do this? The number of uniques in 'a' is however around 500k
        – Abhishek Thakur
        Mar 6 '14 at 11:12




        4




        4




        groupby is notoriously slow and memory hungry, what you could do is sort by column A, then find the idxmin and idxmax (probably store this in a dict) and use this to slice your dataframe would be faster I think
        – EdChum
        Mar 6 '14 at 11:32




        groupby is notoriously slow and memory hungry, what you could do is sort by column A, then find the idxmin and idxmax (probably store this in a dict) and use this to slice your dataframe would be faster I think
        – EdChum
        Mar 6 '14 at 11:32












        @AbhishekThakur actually that won't work as idxmin will not work for strings, you would need to store the beginning and end indices values
        – EdChum
        Mar 6 '14 at 11:40




        @AbhishekThakur actually that won't work as idxmin will not work for strings, you would need to store the beginning and end indices values
        – EdChum
        Mar 6 '14 at 11:40




        1




        1




        Unless I'm missing something (no morning coffee yet) you're doing a separate groupby for each row.
        – DSM
        Mar 6 '14 at 12:21




        Unless I'm missing something (no morning coffee yet) you're doing a separate groupby for each row.
        – DSM
        Mar 6 '14 at 12:21




        1




        1




        When I tried this solution with my problem (having multiple columns to groupBy and to group), it didn't work - pandas sent 'Function does not reduce'. Then I used tuplefollowing the second answer here: stackoverflow.com/questions/19530568/… . See second answer in stackoverflow.com/questions/27439023/… for explanation.
        – Andarin
        Jun 24 '16 at 10:54





        When I tried this solution with my problem (having multiple columns to groupBy and to group), it didn't work - pandas sent 'Function does not reduce'. Then I used tuplefollowing the second answer here: stackoverflow.com/questions/19530568/… . See second answer in stackoverflow.com/questions/27439023/… for explanation.
        – Andarin
        Jun 24 '16 at 10:54













        up vote
        21
        down vote













        If performance is important go down to numpy level:



        import numpy as np

        df = pd.DataFrame('a': np.random.randint(0,60,600), 'b': [1,2,5,5,4,6]*100)

        def f(df):
        keys, values = df.sort_values('a').values.T
        ukeys, index = np.unique(keys,True)
        arrays = np.split(values,index[1:])
        df2 = pd.DataFrame('a':ukeys,'b':[list(a) for a in arrays])
        return df2


        Tests:



        In [301]: %timeit f(df)
        1000 loops, best of 3: 1.64 ms per loop

        In [302]: %timeit df.groupby('a')['b'].apply(list)
        100 loops, best of 3: 5.26 ms per loop





        share|improve this answer


























          up vote
          21
          down vote













          If performance is important go down to numpy level:



          import numpy as np

          df = pd.DataFrame('a': np.random.randint(0,60,600), 'b': [1,2,5,5,4,6]*100)

          def f(df):
          keys, values = df.sort_values('a').values.T
          ukeys, index = np.unique(keys,True)
          arrays = np.split(values,index[1:])
          df2 = pd.DataFrame('a':ukeys,'b':[list(a) for a in arrays])
          return df2


          Tests:



          In [301]: %timeit f(df)
          1000 loops, best of 3: 1.64 ms per loop

          In [302]: %timeit df.groupby('a')['b'].apply(list)
          100 loops, best of 3: 5.26 ms per loop





          share|improve this answer
























            up vote
            21
            down vote










            up vote
            21
            down vote









            If performance is important go down to numpy level:



            import numpy as np

            df = pd.DataFrame('a': np.random.randint(0,60,600), 'b': [1,2,5,5,4,6]*100)

            def f(df):
            keys, values = df.sort_values('a').values.T
            ukeys, index = np.unique(keys,True)
            arrays = np.split(values,index[1:])
            df2 = pd.DataFrame('a':ukeys,'b':[list(a) for a in arrays])
            return df2


            Tests:



            In [301]: %timeit f(df)
            1000 loops, best of 3: 1.64 ms per loop

            In [302]: %timeit df.groupby('a')['b'].apply(list)
            100 loops, best of 3: 5.26 ms per loop





            share|improve this answer














            If performance is important go down to numpy level:



            import numpy as np

            df = pd.DataFrame('a': np.random.randint(0,60,600), 'b': [1,2,5,5,4,6]*100)

            def f(df):
            keys, values = df.sort_values('a').values.T
            ukeys, index = np.unique(keys,True)
            arrays = np.split(values,index[1:])
            df2 = pd.DataFrame('a':ukeys,'b':[list(a) for a in arrays])
            return df2


            Tests:



            In [301]: %timeit f(df)
            1000 loops, best of 3: 1.64 ms per loop

            In [302]: %timeit df.groupby('a')['b'].apply(list)
            100 loops, best of 3: 5.26 ms per loop






            share|improve this answer














            share|improve this answer



            share|improve this answer








            edited Aug 27 at 16:13









            Seanny123

            2,19833261




            2,19833261










            answered Mar 2 '17 at 8:42









            B. M.

            11.6k11934




            11.6k11934




















                up vote
                11
                down vote













                As you were saying the groupby method of a pd.DataFrame object can do the job.



                Example



                 L = ['A','A','B','B','B','C']
                N = [1,2,5,5,4,6]

                import pandas as pd
                df = pd.DataFrame(zip(L,N),columns = list('LN'))


                groups = df.groupby(df.L)

                groups.groups
                'A': [0, 1], 'B': [2, 3, 4], 'C': [5]


                which gives and index-wise description of the groups.



                To get elements of single groups, you can do, for instance



                 groups.get_group('A')

                L N
                0 A 1
                1 A 2

                groups.get_group('B')

                L N
                2 B 5
                3 B 5
                4 B 4





                share|improve this answer


























                  up vote
                  11
                  down vote













                  As you were saying the groupby method of a pd.DataFrame object can do the job.



                  Example



                   L = ['A','A','B','B','B','C']
                  N = [1,2,5,5,4,6]

                  import pandas as pd
                  df = pd.DataFrame(zip(L,N),columns = list('LN'))


                  groups = df.groupby(df.L)

                  groups.groups
                  'A': [0, 1], 'B': [2, 3, 4], 'C': [5]


                  which gives and index-wise description of the groups.



                  To get elements of single groups, you can do, for instance



                   groups.get_group('A')

                  L N
                  0 A 1
                  1 A 2

                  groups.get_group('B')

                  L N
                  2 B 5
                  3 B 5
                  4 B 4





                  share|improve this answer
























                    up vote
                    11
                    down vote










                    up vote
                    11
                    down vote









                    As you were saying the groupby method of a pd.DataFrame object can do the job.



                    Example



                     L = ['A','A','B','B','B','C']
                    N = [1,2,5,5,4,6]

                    import pandas as pd
                    df = pd.DataFrame(zip(L,N),columns = list('LN'))


                    groups = df.groupby(df.L)

                    groups.groups
                    'A': [0, 1], 'B': [2, 3, 4], 'C': [5]


                    which gives and index-wise description of the groups.



                    To get elements of single groups, you can do, for instance



                     groups.get_group('A')

                    L N
                    0 A 1
                    1 A 2

                    groups.get_group('B')

                    L N
                    2 B 5
                    3 B 5
                    4 B 4





                    share|improve this answer














                    As you were saying the groupby method of a pd.DataFrame object can do the job.



                    Example



                     L = ['A','A','B','B','B','C']
                    N = [1,2,5,5,4,6]

                    import pandas as pd
                    df = pd.DataFrame(zip(L,N),columns = list('LN'))


                    groups = df.groupby(df.L)

                    groups.groups
                    'A': [0, 1], 'B': [2, 3, 4], 'C': [5]


                    which gives and index-wise description of the groups.



                    To get elements of single groups, you can do, for instance



                     groups.get_group('A')

                    L N
                    0 A 1
                    1 A 2

                    groups.get_group('B')

                    L N
                    2 B 5
                    3 B 5
                    4 B 4






                    share|improve this answer














                    share|improve this answer



                    share|improve this answer








                    edited Mar 6 '14 at 10:17

























                    answered Mar 6 '14 at 10:12









                    Acorbe

                    6,74632651




                    6,74632651




















                        up vote
                        3
                        down vote













                        A handy way to achieve this would be:



                        df.groupby('a').agg('b':lambda x: list(x))


                        Look into writing Custom Aggregations: https://www.kaggle.com/akshaysehgal/how-to-group-by-aggregate-using-py






                        share|improve this answer




















                        • lambda args: f(args) is equivalent to f
                          – BallpointBen
                          Oct 11 at 17:43














                        up vote
                        3
                        down vote













                        A handy way to achieve this would be:



                        df.groupby('a').agg('b':lambda x: list(x))


                        Look into writing Custom Aggregations: https://www.kaggle.com/akshaysehgal/how-to-group-by-aggregate-using-py






                        share|improve this answer




















                        • lambda args: f(args) is equivalent to f
                          – BallpointBen
                          Oct 11 at 17:43












                        up vote
                        3
                        down vote










                        up vote
                        3
                        down vote









                        A handy way to achieve this would be:



                        df.groupby('a').agg('b':lambda x: list(x))


                        Look into writing Custom Aggregations: https://www.kaggle.com/akshaysehgal/how-to-group-by-aggregate-using-py






                        share|improve this answer












                        A handy way to achieve this would be:



                        df.groupby('a').agg('b':lambda x: list(x))


                        Look into writing Custom Aggregations: https://www.kaggle.com/akshaysehgal/how-to-group-by-aggregate-using-py







                        share|improve this answer












                        share|improve this answer



                        share|improve this answer










                        answered Sep 27 at 6:28









                        Anamika Modi

                        311




                        311











                        • lambda args: f(args) is equivalent to f
                          – BallpointBen
                          Oct 11 at 17:43
















                        • lambda args: f(args) is equivalent to f
                          – BallpointBen
                          Oct 11 at 17:43















                        lambda args: f(args) is equivalent to f
                        – BallpointBen
                        Oct 11 at 17:43




                        lambda args: f(args) is equivalent to f
                        – BallpointBen
                        Oct 11 at 17:43










                        up vote
                        1
                        down vote













                        To solve this for several columns of a dataframe:



                        In [5]: df = pd.DataFrame( 'a':['A','A','B','B','B','C'], 'b':[1,2,5,5,4,6],'c'
                        ...: :[3,3,3,4,4,4])

                        In [6]: df
                        Out[6]:
                        a b c
                        0 A 1 3
                        1 A 2 3
                        2 B 5 3
                        3 B 5 4
                        4 B 4 4
                        5 C 6 4

                        In [7]: df.groupby('a').agg(lambda x: list(x))
                        Out[7]:
                        b c
                        a
                        A [1, 2] [3, 3]
                        B [5, 5, 4] [3, 4, 4]
                        C [6] [4]


                        This answer was inspired from Anamika Modi's answer. Thank you!






                        share|improve this answer
























                          up vote
                          1
                          down vote













                          To solve this for several columns of a dataframe:



                          In [5]: df = pd.DataFrame( 'a':['A','A','B','B','B','C'], 'b':[1,2,5,5,4,6],'c'
                          ...: :[3,3,3,4,4,4])

                          In [6]: df
                          Out[6]:
                          a b c
                          0 A 1 3
                          1 A 2 3
                          2 B 5 3
                          3 B 5 4
                          4 B 4 4
                          5 C 6 4

                          In [7]: df.groupby('a').agg(lambda x: list(x))
                          Out[7]:
                          b c
                          a
                          A [1, 2] [3, 3]
                          B [5, 5, 4] [3, 4, 4]
                          C [6] [4]


                          This answer was inspired from Anamika Modi's answer. Thank you!






                          share|improve this answer






















                            up vote
                            1
                            down vote










                            up vote
                            1
                            down vote









                            To solve this for several columns of a dataframe:



                            In [5]: df = pd.DataFrame( 'a':['A','A','B','B','B','C'], 'b':[1,2,5,5,4,6],'c'
                            ...: :[3,3,3,4,4,4])

                            In [6]: df
                            Out[6]:
                            a b c
                            0 A 1 3
                            1 A 2 3
                            2 B 5 3
                            3 B 5 4
                            4 B 4 4
                            5 C 6 4

                            In [7]: df.groupby('a').agg(lambda x: list(x))
                            Out[7]:
                            b c
                            a
                            A [1, 2] [3, 3]
                            B [5, 5, 4] [3, 4, 4]
                            C [6] [4]


                            This answer was inspired from Anamika Modi's answer. Thank you!






                            share|improve this answer












                            To solve this for several columns of a dataframe:



                            In [5]: df = pd.DataFrame( 'a':['A','A','B','B','B','C'], 'b':[1,2,5,5,4,6],'c'
                            ...: :[3,3,3,4,4,4])

                            In [6]: df
                            Out[6]:
                            a b c
                            0 A 1 3
                            1 A 2 3
                            2 B 5 3
                            3 B 5 4
                            4 B 4 4
                            5 C 6 4

                            In [7]: df.groupby('a').agg(lambda x: list(x))
                            Out[7]:
                            b c
                            a
                            A [1, 2] [3, 3]
                            B [5, 5, 4] [3, 4, 4]
                            C [6] [4]


                            This answer was inspired from Anamika Modi's answer. Thank you!







                            share|improve this answer












                            share|improve this answer



                            share|improve this answer










                            answered Oct 31 at 16:25









                            Markus Dutschke

                            1,0011816




                            1,0011816



























                                 

                                draft saved


                                draft discarded















































                                 


                                draft saved


                                draft discarded














                                StackExchange.ready(
                                function ()
                                StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f22219004%2fgrouping-rows-in-list-in-pandas-groupby%23new-answer', 'question_page');

                                );

                                Post as a guest














































































                                這個網誌中的熱門文章

                                How to read a connectionString WITH PROVIDER in .NET Core?

                                Node.js Script on GitHub Pages or Amazon S3

                                Museum of Modern and Contemporary Art of Trento and Rovereto