Data manipulation based on trends value
Given a dataset with Date column and Value column, I need to come up with the best solution of segmenting the data by date based on trends in the Value column. My output should be a CSV filewith the columns: StartDate, EndDate,StartValue,EndValue. Start and End date define the bounds of the segment.
A short example is presented: input data:
**Date** **Value**
01/01/2014 10
01/02/2014 5
01/03/2014 5
01/04/2014 0
output:
**StartDate** **EndDate** **StartValue** **EndValue**
01/01/2014 01/15/2014 10 5
01/16/2014 02/03/2014 5 5
02/04/2014 03/10/2014 5 4
python-3.x data-mining data-science data-manipulation
add a comment |
Given a dataset with Date column and Value column, I need to come up with the best solution of segmenting the data by date based on trends in the Value column. My output should be a CSV filewith the columns: StartDate, EndDate,StartValue,EndValue. Start and End date define the bounds of the segment.
A short example is presented: input data:
**Date** **Value**
01/01/2014 10
01/02/2014 5
01/03/2014 5
01/04/2014 0
output:
**StartDate** **EndDate** **StartValue** **EndValue**
01/01/2014 01/15/2014 10 5
01/16/2014 02/03/2014 5 5
02/04/2014 03/10/2014 5 4
python-3.x data-mining data-science data-manipulation
add a comment |
Given a dataset with Date column and Value column, I need to come up with the best solution of segmenting the data by date based on trends in the Value column. My output should be a CSV filewith the columns: StartDate, EndDate,StartValue,EndValue. Start and End date define the bounds of the segment.
A short example is presented: input data:
**Date** **Value**
01/01/2014 10
01/02/2014 5
01/03/2014 5
01/04/2014 0
output:
**StartDate** **EndDate** **StartValue** **EndValue**
01/01/2014 01/15/2014 10 5
01/16/2014 02/03/2014 5 5
02/04/2014 03/10/2014 5 4
python-3.x data-mining data-science data-manipulation
Given a dataset with Date column and Value column, I need to come up with the best solution of segmenting the data by date based on trends in the Value column. My output should be a CSV filewith the columns: StartDate, EndDate,StartValue,EndValue. Start and End date define the bounds of the segment.
A short example is presented: input data:
**Date** **Value**
01/01/2014 10
01/02/2014 5
01/03/2014 5
01/04/2014 0
output:
**StartDate** **EndDate** **StartValue** **EndValue**
01/01/2014 01/15/2014 10 5
01/16/2014 02/03/2014 5 5
02/04/2014 03/10/2014 5 4
python-3.x data-mining data-science data-manipulation
python-3.x data-mining data-science data-manipulation
asked Nov 13 '18 at 23:15
123josh123123josh123
275
275
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
An approach using pandas.DataFrame.shift
(docs).
Firstly I'll create a dataframe with some data:
import pandas as pd
datelist = pd.date_range('1/1/2019', periods=100).tolist()
values = np.random.randint(1, 5, 100)
df = pd.DataFrame('Date': datelist, 'Value': values)
df = df.set_index('Date')
df.head(10)
Date Value
2019-01-01 1
2019-01-02 4
2019-01-03 2
2019-01-04 2
2019-01-05 2
2019-01-06 3
2019-01-07 2
2019-01-08 2
2019-01-09 3
2019-01-10 2
Drop contiguously duplicate rows:
df = df.loc[df.Value.shift() != df.Value]
Date Value
2019-01-01 2
2019-01-02 1
2019-01-04 2
2019-01-05 3
2019-01-06 1
Reset the index (if the Date
column is the index in the original data):
df = df.reset_index()
Rename the existing columns to be the start columns.
df.columns = ['Start_Date', 'Start_Value']
Create end columns by shifting the start columns back one row.
df['End_Date'] = df.Start_Date.shift(-1)
df['End_Value'] = df.Start_Value.shift(-1)
Drop NaNs (the final row of the dataframe due to the shift(-1)
.
df = df.dropna()
Set the End_Value
type to int
(if preferred).
df['End_Value'] = df['End_Value'].astype(int)
df.head(10)
Start_Date Start_Value End_Date End_Value
0 2019-01-01 1 2019-01-02 4
1 2019-01-02 4 2019-01-03 2
2 2019-01-03 2 2019-01-06 3
3 2019-01-06 3 2019-01-07 2
4 2019-01-07 2 2019-01-09 3
5 2019-01-09 3 2019-01-10 2
6 2019-01-10 2 2019-01-11 1
7 2019-01-11 1 2019-01-12 2
8 2019-01-12 2 2019-01-15 1
9 2019-01-15 1 2019-01-16 4
Create a CSV file from the dataframe:
df.to_csv('trends.csv')
add a comment |
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53290909%2fdata-manipulation-based-on-trends-value%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
An approach using pandas.DataFrame.shift
(docs).
Firstly I'll create a dataframe with some data:
import pandas as pd
datelist = pd.date_range('1/1/2019', periods=100).tolist()
values = np.random.randint(1, 5, 100)
df = pd.DataFrame('Date': datelist, 'Value': values)
df = df.set_index('Date')
df.head(10)
Date Value
2019-01-01 1
2019-01-02 4
2019-01-03 2
2019-01-04 2
2019-01-05 2
2019-01-06 3
2019-01-07 2
2019-01-08 2
2019-01-09 3
2019-01-10 2
Drop contiguously duplicate rows:
df = df.loc[df.Value.shift() != df.Value]
Date Value
2019-01-01 2
2019-01-02 1
2019-01-04 2
2019-01-05 3
2019-01-06 1
Reset the index (if the Date
column is the index in the original data):
df = df.reset_index()
Rename the existing columns to be the start columns.
df.columns = ['Start_Date', 'Start_Value']
Create end columns by shifting the start columns back one row.
df['End_Date'] = df.Start_Date.shift(-1)
df['End_Value'] = df.Start_Value.shift(-1)
Drop NaNs (the final row of the dataframe due to the shift(-1)
.
df = df.dropna()
Set the End_Value
type to int
(if preferred).
df['End_Value'] = df['End_Value'].astype(int)
df.head(10)
Start_Date Start_Value End_Date End_Value
0 2019-01-01 1 2019-01-02 4
1 2019-01-02 4 2019-01-03 2
2 2019-01-03 2 2019-01-06 3
3 2019-01-06 3 2019-01-07 2
4 2019-01-07 2 2019-01-09 3
5 2019-01-09 3 2019-01-10 2
6 2019-01-10 2 2019-01-11 1
7 2019-01-11 1 2019-01-12 2
8 2019-01-12 2 2019-01-15 1
9 2019-01-15 1 2019-01-16 4
Create a CSV file from the dataframe:
df.to_csv('trends.csv')
add a comment |
An approach using pandas.DataFrame.shift
(docs).
Firstly I'll create a dataframe with some data:
import pandas as pd
datelist = pd.date_range('1/1/2019', periods=100).tolist()
values = np.random.randint(1, 5, 100)
df = pd.DataFrame('Date': datelist, 'Value': values)
df = df.set_index('Date')
df.head(10)
Date Value
2019-01-01 1
2019-01-02 4
2019-01-03 2
2019-01-04 2
2019-01-05 2
2019-01-06 3
2019-01-07 2
2019-01-08 2
2019-01-09 3
2019-01-10 2
Drop contiguously duplicate rows:
df = df.loc[df.Value.shift() != df.Value]
Date Value
2019-01-01 2
2019-01-02 1
2019-01-04 2
2019-01-05 3
2019-01-06 1
Reset the index (if the Date
column is the index in the original data):
df = df.reset_index()
Rename the existing columns to be the start columns.
df.columns = ['Start_Date', 'Start_Value']
Create end columns by shifting the start columns back one row.
df['End_Date'] = df.Start_Date.shift(-1)
df['End_Value'] = df.Start_Value.shift(-1)
Drop NaNs (the final row of the dataframe due to the shift(-1)
.
df = df.dropna()
Set the End_Value
type to int
(if preferred).
df['End_Value'] = df['End_Value'].astype(int)
df.head(10)
Start_Date Start_Value End_Date End_Value
0 2019-01-01 1 2019-01-02 4
1 2019-01-02 4 2019-01-03 2
2 2019-01-03 2 2019-01-06 3
3 2019-01-06 3 2019-01-07 2
4 2019-01-07 2 2019-01-09 3
5 2019-01-09 3 2019-01-10 2
6 2019-01-10 2 2019-01-11 1
7 2019-01-11 1 2019-01-12 2
8 2019-01-12 2 2019-01-15 1
9 2019-01-15 1 2019-01-16 4
Create a CSV file from the dataframe:
df.to_csv('trends.csv')
add a comment |
An approach using pandas.DataFrame.shift
(docs).
Firstly I'll create a dataframe with some data:
import pandas as pd
datelist = pd.date_range('1/1/2019', periods=100).tolist()
values = np.random.randint(1, 5, 100)
df = pd.DataFrame('Date': datelist, 'Value': values)
df = df.set_index('Date')
df.head(10)
Date Value
2019-01-01 1
2019-01-02 4
2019-01-03 2
2019-01-04 2
2019-01-05 2
2019-01-06 3
2019-01-07 2
2019-01-08 2
2019-01-09 3
2019-01-10 2
Drop contiguously duplicate rows:
df = df.loc[df.Value.shift() != df.Value]
Date Value
2019-01-01 2
2019-01-02 1
2019-01-04 2
2019-01-05 3
2019-01-06 1
Reset the index (if the Date
column is the index in the original data):
df = df.reset_index()
Rename the existing columns to be the start columns.
df.columns = ['Start_Date', 'Start_Value']
Create end columns by shifting the start columns back one row.
df['End_Date'] = df.Start_Date.shift(-1)
df['End_Value'] = df.Start_Value.shift(-1)
Drop NaNs (the final row of the dataframe due to the shift(-1)
.
df = df.dropna()
Set the End_Value
type to int
(if preferred).
df['End_Value'] = df['End_Value'].astype(int)
df.head(10)
Start_Date Start_Value End_Date End_Value
0 2019-01-01 1 2019-01-02 4
1 2019-01-02 4 2019-01-03 2
2 2019-01-03 2 2019-01-06 3
3 2019-01-06 3 2019-01-07 2
4 2019-01-07 2 2019-01-09 3
5 2019-01-09 3 2019-01-10 2
6 2019-01-10 2 2019-01-11 1
7 2019-01-11 1 2019-01-12 2
8 2019-01-12 2 2019-01-15 1
9 2019-01-15 1 2019-01-16 4
Create a CSV file from the dataframe:
df.to_csv('trends.csv')
An approach using pandas.DataFrame.shift
(docs).
Firstly I'll create a dataframe with some data:
import pandas as pd
datelist = pd.date_range('1/1/2019', periods=100).tolist()
values = np.random.randint(1, 5, 100)
df = pd.DataFrame('Date': datelist, 'Value': values)
df = df.set_index('Date')
df.head(10)
Date Value
2019-01-01 1
2019-01-02 4
2019-01-03 2
2019-01-04 2
2019-01-05 2
2019-01-06 3
2019-01-07 2
2019-01-08 2
2019-01-09 3
2019-01-10 2
Drop contiguously duplicate rows:
df = df.loc[df.Value.shift() != df.Value]
Date Value
2019-01-01 2
2019-01-02 1
2019-01-04 2
2019-01-05 3
2019-01-06 1
Reset the index (if the Date
column is the index in the original data):
df = df.reset_index()
Rename the existing columns to be the start columns.
df.columns = ['Start_Date', 'Start_Value']
Create end columns by shifting the start columns back one row.
df['End_Date'] = df.Start_Date.shift(-1)
df['End_Value'] = df.Start_Value.shift(-1)
Drop NaNs (the final row of the dataframe due to the shift(-1)
.
df = df.dropna()
Set the End_Value
type to int
(if preferred).
df['End_Value'] = df['End_Value'].astype(int)
df.head(10)
Start_Date Start_Value End_Date End_Value
0 2019-01-01 1 2019-01-02 4
1 2019-01-02 4 2019-01-03 2
2 2019-01-03 2 2019-01-06 3
3 2019-01-06 3 2019-01-07 2
4 2019-01-07 2 2019-01-09 3
5 2019-01-09 3 2019-01-10 2
6 2019-01-10 2 2019-01-11 1
7 2019-01-11 1 2019-01-12 2
8 2019-01-12 2 2019-01-15 1
9 2019-01-15 1 2019-01-16 4
Create a CSV file from the dataframe:
df.to_csv('trends.csv')
edited Jan 5 at 10:05
answered Jan 4 at 14:56
ChrisChris
534213
534213
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53290909%2fdata-manipulation-based-on-trends-value%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown