Data manipulation based on trends value










3















Given a dataset with Date column and Value column, I need to come up with the best solution of segmenting the data by date based on trends in the Value column. My output should be a CSV filewith the columns: StartDate, EndDate,StartValue,EndValue. Start and End date define the bounds of the segment.
A short example is presented: input data:



 **Date** **Value**
01/01/2014 10
01/02/2014 5
01/03/2014 5
01/04/2014 0


output:



 **StartDate** **EndDate** **StartValue** **EndValue**
01/01/2014 01/15/2014 10 5
01/16/2014 02/03/2014 5 5
02/04/2014 03/10/2014 5 4









share|improve this question


























    3















    Given a dataset with Date column and Value column, I need to come up with the best solution of segmenting the data by date based on trends in the Value column. My output should be a CSV filewith the columns: StartDate, EndDate,StartValue,EndValue. Start and End date define the bounds of the segment.
    A short example is presented: input data:



     **Date** **Value**
    01/01/2014 10
    01/02/2014 5
    01/03/2014 5
    01/04/2014 0


    output:



     **StartDate** **EndDate** **StartValue** **EndValue**
    01/01/2014 01/15/2014 10 5
    01/16/2014 02/03/2014 5 5
    02/04/2014 03/10/2014 5 4









    share|improve this question
























      3












      3








      3








      Given a dataset with Date column and Value column, I need to come up with the best solution of segmenting the data by date based on trends in the Value column. My output should be a CSV filewith the columns: StartDate, EndDate,StartValue,EndValue. Start and End date define the bounds of the segment.
      A short example is presented: input data:



       **Date** **Value**
      01/01/2014 10
      01/02/2014 5
      01/03/2014 5
      01/04/2014 0


      output:



       **StartDate** **EndDate** **StartValue** **EndValue**
      01/01/2014 01/15/2014 10 5
      01/16/2014 02/03/2014 5 5
      02/04/2014 03/10/2014 5 4









      share|improve this question














      Given a dataset with Date column and Value column, I need to come up with the best solution of segmenting the data by date based on trends in the Value column. My output should be a CSV filewith the columns: StartDate, EndDate,StartValue,EndValue. Start and End date define the bounds of the segment.
      A short example is presented: input data:



       **Date** **Value**
      01/01/2014 10
      01/02/2014 5
      01/03/2014 5
      01/04/2014 0


      output:



       **StartDate** **EndDate** **StartValue** **EndValue**
      01/01/2014 01/15/2014 10 5
      01/16/2014 02/03/2014 5 5
      02/04/2014 03/10/2014 5 4






      python-3.x data-mining data-science data-manipulation






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Nov 13 '18 at 23:15









      123josh123123josh123

      275




      275






















          1 Answer
          1






          active

          oldest

          votes


















          1














          An approach using pandas.DataFrame.shift (docs).



          Firstly I'll create a dataframe with some data:



          import pandas as pd
          datelist = pd.date_range('1/1/2019', periods=100).tolist()
          values = np.random.randint(1, 5, 100)
          df = pd.DataFrame('Date': datelist, 'Value': values)
          df = df.set_index('Date')
          df.head(10)

          Date Value
          2019-01-01 1
          2019-01-02 4
          2019-01-03 2
          2019-01-04 2
          2019-01-05 2
          2019-01-06 3
          2019-01-07 2
          2019-01-08 2
          2019-01-09 3
          2019-01-10 2


          Drop contiguously duplicate rows:



          df = df.loc[df.Value.shift() != df.Value]

          Date Value
          2019-01-01 2
          2019-01-02 1
          2019-01-04 2
          2019-01-05 3
          2019-01-06 1


          Reset the index (if the Date column is the index in the original data):



          df = df.reset_index()


          Rename the existing columns to be the start columns.



          df.columns = ['Start_Date', 'Start_Value']


          Create end columns by shifting the start columns back one row.



          df['End_Date'] = df.Start_Date.shift(-1)
          df['End_Value'] = df.Start_Value.shift(-1)


          Drop NaNs (the final row of the dataframe due to the shift(-1).



          df = df.dropna()


          Set the End_Value type to int (if preferred).



          df['End_Value'] = df['End_Value'].astype(int)
          df.head(10)

          Start_Date Start_Value End_Date End_Value
          0 2019-01-01 1 2019-01-02 4
          1 2019-01-02 4 2019-01-03 2
          2 2019-01-03 2 2019-01-06 3
          3 2019-01-06 3 2019-01-07 2
          4 2019-01-07 2 2019-01-09 3
          5 2019-01-09 3 2019-01-10 2
          6 2019-01-10 2 2019-01-11 1
          7 2019-01-11 1 2019-01-12 2
          8 2019-01-12 2 2019-01-15 1
          9 2019-01-15 1 2019-01-16 4


          Create a CSV file from the dataframe:



          df.to_csv('trends.csv')





          share|improve this answer
























            Your Answer






            StackExchange.ifUsing("editor", function ()
            StackExchange.using("externalEditor", function ()
            StackExchange.using("snippets", function ()
            StackExchange.snippets.init();
            );
            );
            , "code-snippets");

            StackExchange.ready(function()
            var channelOptions =
            tags: "".split(" "),
            id: "1"
            ;
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function()
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled)
            StackExchange.using("snippets", function()
            createEditor();
            );

            else
            createEditor();

            );

            function createEditor()
            StackExchange.prepareEditor(
            heartbeatType: 'answer',
            autoActivateHeartbeat: false,
            convertImagesToLinks: true,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: 10,
            bindNavPrevention: true,
            postfix: "",
            imageUploader:
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            ,
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            );



            );













            draft saved

            draft discarded


















            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53290909%2fdata-manipulation-based-on-trends-value%23new-answer', 'question_page');

            );

            Post as a guest















            Required, but never shown

























            1 Answer
            1






            active

            oldest

            votes








            1 Answer
            1






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes









            1














            An approach using pandas.DataFrame.shift (docs).



            Firstly I'll create a dataframe with some data:



            import pandas as pd
            datelist = pd.date_range('1/1/2019', periods=100).tolist()
            values = np.random.randint(1, 5, 100)
            df = pd.DataFrame('Date': datelist, 'Value': values)
            df = df.set_index('Date')
            df.head(10)

            Date Value
            2019-01-01 1
            2019-01-02 4
            2019-01-03 2
            2019-01-04 2
            2019-01-05 2
            2019-01-06 3
            2019-01-07 2
            2019-01-08 2
            2019-01-09 3
            2019-01-10 2


            Drop contiguously duplicate rows:



            df = df.loc[df.Value.shift() != df.Value]

            Date Value
            2019-01-01 2
            2019-01-02 1
            2019-01-04 2
            2019-01-05 3
            2019-01-06 1


            Reset the index (if the Date column is the index in the original data):



            df = df.reset_index()


            Rename the existing columns to be the start columns.



            df.columns = ['Start_Date', 'Start_Value']


            Create end columns by shifting the start columns back one row.



            df['End_Date'] = df.Start_Date.shift(-1)
            df['End_Value'] = df.Start_Value.shift(-1)


            Drop NaNs (the final row of the dataframe due to the shift(-1).



            df = df.dropna()


            Set the End_Value type to int (if preferred).



            df['End_Value'] = df['End_Value'].astype(int)
            df.head(10)

            Start_Date Start_Value End_Date End_Value
            0 2019-01-01 1 2019-01-02 4
            1 2019-01-02 4 2019-01-03 2
            2 2019-01-03 2 2019-01-06 3
            3 2019-01-06 3 2019-01-07 2
            4 2019-01-07 2 2019-01-09 3
            5 2019-01-09 3 2019-01-10 2
            6 2019-01-10 2 2019-01-11 1
            7 2019-01-11 1 2019-01-12 2
            8 2019-01-12 2 2019-01-15 1
            9 2019-01-15 1 2019-01-16 4


            Create a CSV file from the dataframe:



            df.to_csv('trends.csv')





            share|improve this answer





























              1














              An approach using pandas.DataFrame.shift (docs).



              Firstly I'll create a dataframe with some data:



              import pandas as pd
              datelist = pd.date_range('1/1/2019', periods=100).tolist()
              values = np.random.randint(1, 5, 100)
              df = pd.DataFrame('Date': datelist, 'Value': values)
              df = df.set_index('Date')
              df.head(10)

              Date Value
              2019-01-01 1
              2019-01-02 4
              2019-01-03 2
              2019-01-04 2
              2019-01-05 2
              2019-01-06 3
              2019-01-07 2
              2019-01-08 2
              2019-01-09 3
              2019-01-10 2


              Drop contiguously duplicate rows:



              df = df.loc[df.Value.shift() != df.Value]

              Date Value
              2019-01-01 2
              2019-01-02 1
              2019-01-04 2
              2019-01-05 3
              2019-01-06 1


              Reset the index (if the Date column is the index in the original data):



              df = df.reset_index()


              Rename the existing columns to be the start columns.



              df.columns = ['Start_Date', 'Start_Value']


              Create end columns by shifting the start columns back one row.



              df['End_Date'] = df.Start_Date.shift(-1)
              df['End_Value'] = df.Start_Value.shift(-1)


              Drop NaNs (the final row of the dataframe due to the shift(-1).



              df = df.dropna()


              Set the End_Value type to int (if preferred).



              df['End_Value'] = df['End_Value'].astype(int)
              df.head(10)

              Start_Date Start_Value End_Date End_Value
              0 2019-01-01 1 2019-01-02 4
              1 2019-01-02 4 2019-01-03 2
              2 2019-01-03 2 2019-01-06 3
              3 2019-01-06 3 2019-01-07 2
              4 2019-01-07 2 2019-01-09 3
              5 2019-01-09 3 2019-01-10 2
              6 2019-01-10 2 2019-01-11 1
              7 2019-01-11 1 2019-01-12 2
              8 2019-01-12 2 2019-01-15 1
              9 2019-01-15 1 2019-01-16 4


              Create a CSV file from the dataframe:



              df.to_csv('trends.csv')





              share|improve this answer



























                1












                1








                1







                An approach using pandas.DataFrame.shift (docs).



                Firstly I'll create a dataframe with some data:



                import pandas as pd
                datelist = pd.date_range('1/1/2019', periods=100).tolist()
                values = np.random.randint(1, 5, 100)
                df = pd.DataFrame('Date': datelist, 'Value': values)
                df = df.set_index('Date')
                df.head(10)

                Date Value
                2019-01-01 1
                2019-01-02 4
                2019-01-03 2
                2019-01-04 2
                2019-01-05 2
                2019-01-06 3
                2019-01-07 2
                2019-01-08 2
                2019-01-09 3
                2019-01-10 2


                Drop contiguously duplicate rows:



                df = df.loc[df.Value.shift() != df.Value]

                Date Value
                2019-01-01 2
                2019-01-02 1
                2019-01-04 2
                2019-01-05 3
                2019-01-06 1


                Reset the index (if the Date column is the index in the original data):



                df = df.reset_index()


                Rename the existing columns to be the start columns.



                df.columns = ['Start_Date', 'Start_Value']


                Create end columns by shifting the start columns back one row.



                df['End_Date'] = df.Start_Date.shift(-1)
                df['End_Value'] = df.Start_Value.shift(-1)


                Drop NaNs (the final row of the dataframe due to the shift(-1).



                df = df.dropna()


                Set the End_Value type to int (if preferred).



                df['End_Value'] = df['End_Value'].astype(int)
                df.head(10)

                Start_Date Start_Value End_Date End_Value
                0 2019-01-01 1 2019-01-02 4
                1 2019-01-02 4 2019-01-03 2
                2 2019-01-03 2 2019-01-06 3
                3 2019-01-06 3 2019-01-07 2
                4 2019-01-07 2 2019-01-09 3
                5 2019-01-09 3 2019-01-10 2
                6 2019-01-10 2 2019-01-11 1
                7 2019-01-11 1 2019-01-12 2
                8 2019-01-12 2 2019-01-15 1
                9 2019-01-15 1 2019-01-16 4


                Create a CSV file from the dataframe:



                df.to_csv('trends.csv')





                share|improve this answer















                An approach using pandas.DataFrame.shift (docs).



                Firstly I'll create a dataframe with some data:



                import pandas as pd
                datelist = pd.date_range('1/1/2019', periods=100).tolist()
                values = np.random.randint(1, 5, 100)
                df = pd.DataFrame('Date': datelist, 'Value': values)
                df = df.set_index('Date')
                df.head(10)

                Date Value
                2019-01-01 1
                2019-01-02 4
                2019-01-03 2
                2019-01-04 2
                2019-01-05 2
                2019-01-06 3
                2019-01-07 2
                2019-01-08 2
                2019-01-09 3
                2019-01-10 2


                Drop contiguously duplicate rows:



                df = df.loc[df.Value.shift() != df.Value]

                Date Value
                2019-01-01 2
                2019-01-02 1
                2019-01-04 2
                2019-01-05 3
                2019-01-06 1


                Reset the index (if the Date column is the index in the original data):



                df = df.reset_index()


                Rename the existing columns to be the start columns.



                df.columns = ['Start_Date', 'Start_Value']


                Create end columns by shifting the start columns back one row.



                df['End_Date'] = df.Start_Date.shift(-1)
                df['End_Value'] = df.Start_Value.shift(-1)


                Drop NaNs (the final row of the dataframe due to the shift(-1).



                df = df.dropna()


                Set the End_Value type to int (if preferred).



                df['End_Value'] = df['End_Value'].astype(int)
                df.head(10)

                Start_Date Start_Value End_Date End_Value
                0 2019-01-01 1 2019-01-02 4
                1 2019-01-02 4 2019-01-03 2
                2 2019-01-03 2 2019-01-06 3
                3 2019-01-06 3 2019-01-07 2
                4 2019-01-07 2 2019-01-09 3
                5 2019-01-09 3 2019-01-10 2
                6 2019-01-10 2 2019-01-11 1
                7 2019-01-11 1 2019-01-12 2
                8 2019-01-12 2 2019-01-15 1
                9 2019-01-15 1 2019-01-16 4


                Create a CSV file from the dataframe:



                df.to_csv('trends.csv')






                share|improve this answer














                share|improve this answer



                share|improve this answer








                edited Jan 5 at 10:05

























                answered Jan 4 at 14:56









                ChrisChris

                534213




                534213



























                    draft saved

                    draft discarded
















































                    Thanks for contributing an answer to Stack Overflow!


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid


                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.

                    To learn more, see our tips on writing great answers.




                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function ()
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53290909%2fdata-manipulation-based-on-trends-value%23new-answer', 'question_page');

                    );

                    Post as a guest















                    Required, but never shown





















































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown

































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown







                    這個網誌中的熱門文章

                    How to read a connectionString WITH PROVIDER in .NET Core?

                    Node.js Script on GitHub Pages or Amazon S3

                    Museum of Modern and Contemporary Art of Trento and Rovereto