split join data.table R










2














Objective



Join DT1 (as i in data.table) to DT2 given key(s) column(s), within each group of DT2 specified by the Date column.



I cannot run DT2[DT1, on = 'key'] as that would be incorrect since key column is repeated across the Date column, but unique within a single date.



Reproducible example with a working solution



DT3 is my expected output. Is there any way to achieve this without the split manoeuvre, which does not feel very data.table-y?



library(data.table)
set.seed(1)
DT1 <- data.table(
Segment = sample(paste0('S', 1:10), 100, TRUE),
Activity = sample(paste0('A', 1:5), 100, TRUE),
Value = runif(100)
)
dates <- seq(as.Date('2018-01-01'), as.Date('2018-11-30'), by = '1 day')
DT2 <- data.table(
Date = rep(dates, each = 5),
Segment = sample(paste0('S', 1:10), 3340, TRUE),
Total = runif(3340, 1, 2)
)
rm(dates)
# To ensure that each Date Segment combination is unique
DT2 <- unique(DT2, by = c('Date', 'Segment'))
iDT2 <- split(DT2, by = 'Date')
iDT2 <- lapply(
iDT2,
function(x)
x[DT1, on = 'Segment', nomatch = 0]

)
DT3 <- rbindlist(iDT2, use.names = TRUE)









share|improve this question





















  • What about using merge function with keys 'Date' and 'Segment'?
    – Heikki
    Nov 12 '18 at 20:59











  • Date does not exist in DT1. Hence, can't be used to merge
    – Ameya
    Nov 12 '18 at 21:03










  • Sorry, I should have written merge by 'Segment' (only).
    – Heikki
    Nov 12 '18 at 21:40















2














Objective



Join DT1 (as i in data.table) to DT2 given key(s) column(s), within each group of DT2 specified by the Date column.



I cannot run DT2[DT1, on = 'key'] as that would be incorrect since key column is repeated across the Date column, but unique within a single date.



Reproducible example with a working solution



DT3 is my expected output. Is there any way to achieve this without the split manoeuvre, which does not feel very data.table-y?



library(data.table)
set.seed(1)
DT1 <- data.table(
Segment = sample(paste0('S', 1:10), 100, TRUE),
Activity = sample(paste0('A', 1:5), 100, TRUE),
Value = runif(100)
)
dates <- seq(as.Date('2018-01-01'), as.Date('2018-11-30'), by = '1 day')
DT2 <- data.table(
Date = rep(dates, each = 5),
Segment = sample(paste0('S', 1:10), 3340, TRUE),
Total = runif(3340, 1, 2)
)
rm(dates)
# To ensure that each Date Segment combination is unique
DT2 <- unique(DT2, by = c('Date', 'Segment'))
iDT2 <- split(DT2, by = 'Date')
iDT2 <- lapply(
iDT2,
function(x)
x[DT1, on = 'Segment', nomatch = 0]

)
DT3 <- rbindlist(iDT2, use.names = TRUE)









share|improve this question





















  • What about using merge function with keys 'Date' and 'Segment'?
    – Heikki
    Nov 12 '18 at 20:59











  • Date does not exist in DT1. Hence, can't be used to merge
    – Ameya
    Nov 12 '18 at 21:03










  • Sorry, I should have written merge by 'Segment' (only).
    – Heikki
    Nov 12 '18 at 21:40













2












2








2







Objective



Join DT1 (as i in data.table) to DT2 given key(s) column(s), within each group of DT2 specified by the Date column.



I cannot run DT2[DT1, on = 'key'] as that would be incorrect since key column is repeated across the Date column, but unique within a single date.



Reproducible example with a working solution



DT3 is my expected output. Is there any way to achieve this without the split manoeuvre, which does not feel very data.table-y?



library(data.table)
set.seed(1)
DT1 <- data.table(
Segment = sample(paste0('S', 1:10), 100, TRUE),
Activity = sample(paste0('A', 1:5), 100, TRUE),
Value = runif(100)
)
dates <- seq(as.Date('2018-01-01'), as.Date('2018-11-30'), by = '1 day')
DT2 <- data.table(
Date = rep(dates, each = 5),
Segment = sample(paste0('S', 1:10), 3340, TRUE),
Total = runif(3340, 1, 2)
)
rm(dates)
# To ensure that each Date Segment combination is unique
DT2 <- unique(DT2, by = c('Date', 'Segment'))
iDT2 <- split(DT2, by = 'Date')
iDT2 <- lapply(
iDT2,
function(x)
x[DT1, on = 'Segment', nomatch = 0]

)
DT3 <- rbindlist(iDT2, use.names = TRUE)









share|improve this question













Objective



Join DT1 (as i in data.table) to DT2 given key(s) column(s), within each group of DT2 specified by the Date column.



I cannot run DT2[DT1, on = 'key'] as that would be incorrect since key column is repeated across the Date column, but unique within a single date.



Reproducible example with a working solution



DT3 is my expected output. Is there any way to achieve this without the split manoeuvre, which does not feel very data.table-y?



library(data.table)
set.seed(1)
DT1 <- data.table(
Segment = sample(paste0('S', 1:10), 100, TRUE),
Activity = sample(paste0('A', 1:5), 100, TRUE),
Value = runif(100)
)
dates <- seq(as.Date('2018-01-01'), as.Date('2018-11-30'), by = '1 day')
DT2 <- data.table(
Date = rep(dates, each = 5),
Segment = sample(paste0('S', 1:10), 3340, TRUE),
Total = runif(3340, 1, 2)
)
rm(dates)
# To ensure that each Date Segment combination is unique
DT2 <- unique(DT2, by = c('Date', 'Segment'))
iDT2 <- split(DT2, by = 'Date')
iDT2 <- lapply(
iDT2,
function(x)
x[DT1, on = 'Segment', nomatch = 0]

)
DT3 <- rbindlist(iDT2, use.names = TRUE)






r data.table






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Nov 12 '18 at 20:49









Ameya

1,0291819




1,0291819











  • What about using merge function with keys 'Date' and 'Segment'?
    – Heikki
    Nov 12 '18 at 20:59











  • Date does not exist in DT1. Hence, can't be used to merge
    – Ameya
    Nov 12 '18 at 21:03










  • Sorry, I should have written merge by 'Segment' (only).
    – Heikki
    Nov 12 '18 at 21:40
















  • What about using merge function with keys 'Date' and 'Segment'?
    – Heikki
    Nov 12 '18 at 20:59











  • Date does not exist in DT1. Hence, can't be used to merge
    – Ameya
    Nov 12 '18 at 21:03










  • Sorry, I should have written merge by 'Segment' (only).
    – Heikki
    Nov 12 '18 at 21:40















What about using merge function with keys 'Date' and 'Segment'?
– Heikki
Nov 12 '18 at 20:59





What about using merge function with keys 'Date' and 'Segment'?
– Heikki
Nov 12 '18 at 20:59













Date does not exist in DT1. Hence, can't be used to merge
– Ameya
Nov 12 '18 at 21:03




Date does not exist in DT1. Hence, can't be used to merge
– Ameya
Nov 12 '18 at 21:03












Sorry, I should have written merge by 'Segment' (only).
– Heikki
Nov 12 '18 at 21:40




Sorry, I should have written merge by 'Segment' (only).
– Heikki
Nov 12 '18 at 21:40












1 Answer
1






active

oldest

votes


















1














You can achieve the same result with a cartesian merge:



DT4 <- merge(DT2,DT1,by='Segment',allow.cartesian = TRUE)


Here is the proof:



> all(DT3[order(Segment,Date,Total,Activity,Value),
c('Segment','Date','Total','Activity','Value')] ==
DT4[order(Segment,Date,Total,Activity,Value),
c('Segment','Date','Total','Activity','Value')])

[1] TRUE





share|improve this answer
















  • 1




    Thanks, allow.cartesian does it.
    – Ameya
    Nov 12 '18 at 22:10










Your Answer






StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);













draft saved

draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53269871%2fsplit-join-data-table-r%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes









1














You can achieve the same result with a cartesian merge:



DT4 <- merge(DT2,DT1,by='Segment',allow.cartesian = TRUE)


Here is the proof:



> all(DT3[order(Segment,Date,Total,Activity,Value),
c('Segment','Date','Total','Activity','Value')] ==
DT4[order(Segment,Date,Total,Activity,Value),
c('Segment','Date','Total','Activity','Value')])

[1] TRUE





share|improve this answer
















  • 1




    Thanks, allow.cartesian does it.
    – Ameya
    Nov 12 '18 at 22:10















1














You can achieve the same result with a cartesian merge:



DT4 <- merge(DT2,DT1,by='Segment',allow.cartesian = TRUE)


Here is the proof:



> all(DT3[order(Segment,Date,Total,Activity,Value),
c('Segment','Date','Total','Activity','Value')] ==
DT4[order(Segment,Date,Total,Activity,Value),
c('Segment','Date','Total','Activity','Value')])

[1] TRUE





share|improve this answer
















  • 1




    Thanks, allow.cartesian does it.
    – Ameya
    Nov 12 '18 at 22:10













1












1








1






You can achieve the same result with a cartesian merge:



DT4 <- merge(DT2,DT1,by='Segment',allow.cartesian = TRUE)


Here is the proof:



> all(DT3[order(Segment,Date,Total,Activity,Value),
c('Segment','Date','Total','Activity','Value')] ==
DT4[order(Segment,Date,Total,Activity,Value),
c('Segment','Date','Total','Activity','Value')])

[1] TRUE





share|improve this answer












You can achieve the same result with a cartesian merge:



DT4 <- merge(DT2,DT1,by='Segment',allow.cartesian = TRUE)


Here is the proof:



> all(DT3[order(Segment,Date,Total,Activity,Value),
c('Segment','Date','Total','Activity','Value')] ==
DT4[order(Segment,Date,Total,Activity,Value),
c('Segment','Date','Total','Activity','Value')])

[1] TRUE






share|improve this answer












share|improve this answer



share|improve this answer










answered Nov 12 '18 at 21:39









Heikki

1,2471017




1,2471017







  • 1




    Thanks, allow.cartesian does it.
    – Ameya
    Nov 12 '18 at 22:10












  • 1




    Thanks, allow.cartesian does it.
    – Ameya
    Nov 12 '18 at 22:10







1




1




Thanks, allow.cartesian does it.
– Ameya
Nov 12 '18 at 22:10




Thanks, allow.cartesian does it.
– Ameya
Nov 12 '18 at 22:10

















draft saved

draft discarded
















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid


  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.





Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


Please pay close attention to the following guidance:


  • Please be sure to answer the question. Provide details and share your research!

But avoid


  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53269871%2fsplit-join-data-table-r%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







這個網誌中的熱門文章

Barbados

How to read a connectionString WITH PROVIDER in .NET Core?

Node.js Script on GitHub Pages or Amazon S3