Get Lead value over multiple partitions










6















I have a problem that I feel could be solved using lag/lead + partitions but I can't wrap my head around it.



Clients are invited to participate in research-projects every two years (aprox.).
A number of clients is selected for each project.
Some clients are selected for multiple research-projects.
Those get sent an invitation. In some cases no invitation is sent. If a client does not react to an invitation, a 2nd invitation (reminder) is sent. A 3rd, a 4rd are also possible.



I need to find out whether a client has had an invitation for a previous research-project. (And optionally, which invitation that was).



The dataset looks like this:



clientID | projectID | invitationID
14 | 267 | 489
14 | 267 | 325
16 | 385 | 475
17 | 546 | NULL
17 | 547 | 885
17 | 548 | 901
18 | 721 | 905
18 | 834 | 906
18 | 834 | 907
19 | 856 | 908
19 | 856 | 929
19 | 857 | 931
19 | 857 | 945
19 | 858 | NULL


Client 14 has had 2 invitations for the same research-project
Client 16 has had 1 invitation for 1 research-project
Client 17 has been selected for 3 research-projects but opted out for project 546, receiving 1 invitation each for the following projects.
Client 18 has been selected for 2 research-projects. For the second project he got a 2 invitations.
Client 19 has been selected for three research-projects. For the first two a reminder was set. Client 19 was selected for project 858 but opted out thus no invitation.


Now I need to determine per client whether there has been a invitation for a previous research-project. (And optionally, which invitation that was). I only need the first invitation (if there were multiple).
So my resulting dataset should look like this (stuff between brackets is optional):



clientID | projectID | invitationID | InvitedForPreviousProject
14 | 267 | 489 | 0
14 | 267 | 325 | 0
16 | 385 | 475 | 0
17 | 546 | NULL | 0
17 | 547 | 885 | 0
17 | 548 | 901 | 1 (885)
18 | 721 | 905 | 0
18 | 834 | 906 | 1 (905)
18 | 834 | 907 | 1 (905)
19 | 856 | 908 | 0
19 | 856 | 929 | 0
19 | 857 | 931 | 1 (908)
19 | 857 | 945 | 1 (908)
19 | 858 | NULL | 1 (931)


Can this be done using LEAD, Rank, Dense-Rank? Create-statement including data below



declare @table table (
[clientID] [int] NULL,
[projectID] [int] NULL,
[invitationID] [int] NULL
)
INSERT @table ([clientID], [projectID], [invitationID]) VALUES
(14, 267, 489),
(14, 267, 325),
(16, 385, 475),
(17, 546, NULL),
(17, 547, 885),
(17, 548, 901),
(18, 721, 905),
(18, 834, 906),
(18, 834, 907),
(19, 856, 908),
(19, 856, 929),
(19, 857, 931),
(19, 857, 945),
(19, 858, NULL)









share|improve this question
























  • I take it you want to get this without a self-join or sub-query (for speed reasons)?

    – Henning Koehler
    Nov 14 '18 at 11:15











  • Yes. It needs to perform, it will run over 6mln+ records

    – Henrov
    Nov 14 '18 at 12:07











  • @Henrov . . . Your table doesn't have enough information. You mention "previous", but SQL tables represent unordered sets. There is no ordering unless a column specifies the ordering.

    – Gordon Linoff
    Nov 14 '18 at 12:21











  • @GordonLinoff De ordering can be derived from the order of the id's. This is a simplified example, in real life I can use datecolumns also. The id's however should be guaranteed in irder (fed by a sequence)

    – Henrov
    Nov 14 '18 at 13:08















6















I have a problem that I feel could be solved using lag/lead + partitions but I can't wrap my head around it.



Clients are invited to participate in research-projects every two years (aprox.).
A number of clients is selected for each project.
Some clients are selected for multiple research-projects.
Those get sent an invitation. In some cases no invitation is sent. If a client does not react to an invitation, a 2nd invitation (reminder) is sent. A 3rd, a 4rd are also possible.



I need to find out whether a client has had an invitation for a previous research-project. (And optionally, which invitation that was).



The dataset looks like this:



clientID | projectID | invitationID
14 | 267 | 489
14 | 267 | 325
16 | 385 | 475
17 | 546 | NULL
17 | 547 | 885
17 | 548 | 901
18 | 721 | 905
18 | 834 | 906
18 | 834 | 907
19 | 856 | 908
19 | 856 | 929
19 | 857 | 931
19 | 857 | 945
19 | 858 | NULL


Client 14 has had 2 invitations for the same research-project
Client 16 has had 1 invitation for 1 research-project
Client 17 has been selected for 3 research-projects but opted out for project 546, receiving 1 invitation each for the following projects.
Client 18 has been selected for 2 research-projects. For the second project he got a 2 invitations.
Client 19 has been selected for three research-projects. For the first two a reminder was set. Client 19 was selected for project 858 but opted out thus no invitation.


Now I need to determine per client whether there has been a invitation for a previous research-project. (And optionally, which invitation that was). I only need the first invitation (if there were multiple).
So my resulting dataset should look like this (stuff between brackets is optional):



clientID | projectID | invitationID | InvitedForPreviousProject
14 | 267 | 489 | 0
14 | 267 | 325 | 0
16 | 385 | 475 | 0
17 | 546 | NULL | 0
17 | 547 | 885 | 0
17 | 548 | 901 | 1 (885)
18 | 721 | 905 | 0
18 | 834 | 906 | 1 (905)
18 | 834 | 907 | 1 (905)
19 | 856 | 908 | 0
19 | 856 | 929 | 0
19 | 857 | 931 | 1 (908)
19 | 857 | 945 | 1 (908)
19 | 858 | NULL | 1 (931)


Can this be done using LEAD, Rank, Dense-Rank? Create-statement including data below



declare @table table (
[clientID] [int] NULL,
[projectID] [int] NULL,
[invitationID] [int] NULL
)
INSERT @table ([clientID], [projectID], [invitationID]) VALUES
(14, 267, 489),
(14, 267, 325),
(16, 385, 475),
(17, 546, NULL),
(17, 547, 885),
(17, 548, 901),
(18, 721, 905),
(18, 834, 906),
(18, 834, 907),
(19, 856, 908),
(19, 856, 929),
(19, 857, 931),
(19, 857, 945),
(19, 858, NULL)









share|improve this question
























  • I take it you want to get this without a self-join or sub-query (for speed reasons)?

    – Henning Koehler
    Nov 14 '18 at 11:15











  • Yes. It needs to perform, it will run over 6mln+ records

    – Henrov
    Nov 14 '18 at 12:07











  • @Henrov . . . Your table doesn't have enough information. You mention "previous", but SQL tables represent unordered sets. There is no ordering unless a column specifies the ordering.

    – Gordon Linoff
    Nov 14 '18 at 12:21











  • @GordonLinoff De ordering can be derived from the order of the id's. This is a simplified example, in real life I can use datecolumns also. The id's however should be guaranteed in irder (fed by a sequence)

    – Henrov
    Nov 14 '18 at 13:08













6












6








6








I have a problem that I feel could be solved using lag/lead + partitions but I can't wrap my head around it.



Clients are invited to participate in research-projects every two years (aprox.).
A number of clients is selected for each project.
Some clients are selected for multiple research-projects.
Those get sent an invitation. In some cases no invitation is sent. If a client does not react to an invitation, a 2nd invitation (reminder) is sent. A 3rd, a 4rd are also possible.



I need to find out whether a client has had an invitation for a previous research-project. (And optionally, which invitation that was).



The dataset looks like this:



clientID | projectID | invitationID
14 | 267 | 489
14 | 267 | 325
16 | 385 | 475
17 | 546 | NULL
17 | 547 | 885
17 | 548 | 901
18 | 721 | 905
18 | 834 | 906
18 | 834 | 907
19 | 856 | 908
19 | 856 | 929
19 | 857 | 931
19 | 857 | 945
19 | 858 | NULL


Client 14 has had 2 invitations for the same research-project
Client 16 has had 1 invitation for 1 research-project
Client 17 has been selected for 3 research-projects but opted out for project 546, receiving 1 invitation each for the following projects.
Client 18 has been selected for 2 research-projects. For the second project he got a 2 invitations.
Client 19 has been selected for three research-projects. For the first two a reminder was set. Client 19 was selected for project 858 but opted out thus no invitation.


Now I need to determine per client whether there has been a invitation for a previous research-project. (And optionally, which invitation that was). I only need the first invitation (if there were multiple).
So my resulting dataset should look like this (stuff between brackets is optional):



clientID | projectID | invitationID | InvitedForPreviousProject
14 | 267 | 489 | 0
14 | 267 | 325 | 0
16 | 385 | 475 | 0
17 | 546 | NULL | 0
17 | 547 | 885 | 0
17 | 548 | 901 | 1 (885)
18 | 721 | 905 | 0
18 | 834 | 906 | 1 (905)
18 | 834 | 907 | 1 (905)
19 | 856 | 908 | 0
19 | 856 | 929 | 0
19 | 857 | 931 | 1 (908)
19 | 857 | 945 | 1 (908)
19 | 858 | NULL | 1 (931)


Can this be done using LEAD, Rank, Dense-Rank? Create-statement including data below



declare @table table (
[clientID] [int] NULL,
[projectID] [int] NULL,
[invitationID] [int] NULL
)
INSERT @table ([clientID], [projectID], [invitationID]) VALUES
(14, 267, 489),
(14, 267, 325),
(16, 385, 475),
(17, 546, NULL),
(17, 547, 885),
(17, 548, 901),
(18, 721, 905),
(18, 834, 906),
(18, 834, 907),
(19, 856, 908),
(19, 856, 929),
(19, 857, 931),
(19, 857, 945),
(19, 858, NULL)









share|improve this question
















I have a problem that I feel could be solved using lag/lead + partitions but I can't wrap my head around it.



Clients are invited to participate in research-projects every two years (aprox.).
A number of clients is selected for each project.
Some clients are selected for multiple research-projects.
Those get sent an invitation. In some cases no invitation is sent. If a client does not react to an invitation, a 2nd invitation (reminder) is sent. A 3rd, a 4rd are also possible.



I need to find out whether a client has had an invitation for a previous research-project. (And optionally, which invitation that was).



The dataset looks like this:



clientID | projectID | invitationID
14 | 267 | 489
14 | 267 | 325
16 | 385 | 475
17 | 546 | NULL
17 | 547 | 885
17 | 548 | 901
18 | 721 | 905
18 | 834 | 906
18 | 834 | 907
19 | 856 | 908
19 | 856 | 929
19 | 857 | 931
19 | 857 | 945
19 | 858 | NULL


Client 14 has had 2 invitations for the same research-project
Client 16 has had 1 invitation for 1 research-project
Client 17 has been selected for 3 research-projects but opted out for project 546, receiving 1 invitation each for the following projects.
Client 18 has been selected for 2 research-projects. For the second project he got a 2 invitations.
Client 19 has been selected for three research-projects. For the first two a reminder was set. Client 19 was selected for project 858 but opted out thus no invitation.


Now I need to determine per client whether there has been a invitation for a previous research-project. (And optionally, which invitation that was). I only need the first invitation (if there were multiple).
So my resulting dataset should look like this (stuff between brackets is optional):



clientID | projectID | invitationID | InvitedForPreviousProject
14 | 267 | 489 | 0
14 | 267 | 325 | 0
16 | 385 | 475 | 0
17 | 546 | NULL | 0
17 | 547 | 885 | 0
17 | 548 | 901 | 1 (885)
18 | 721 | 905 | 0
18 | 834 | 906 | 1 (905)
18 | 834 | 907 | 1 (905)
19 | 856 | 908 | 0
19 | 856 | 929 | 0
19 | 857 | 931 | 1 (908)
19 | 857 | 945 | 1 (908)
19 | 858 | NULL | 1 (931)


Can this be done using LEAD, Rank, Dense-Rank? Create-statement including data below



declare @table table (
[clientID] [int] NULL,
[projectID] [int] NULL,
[invitationID] [int] NULL
)
INSERT @table ([clientID], [projectID], [invitationID]) VALUES
(14, 267, 489),
(14, 267, 325),
(16, 385, 475),
(17, 546, NULL),
(17, 547, 885),
(17, 548, 901),
(18, 721, 905),
(18, 834, 906),
(18, 834, 907),
(19, 856, 908),
(19, 856, 929),
(19, 857, 931),
(19, 857, 945),
(19, 858, NULL)






sql sql-server tsql sql-server-2017 data-partitioning






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 14 '18 at 10:53









Damien_The_Unbeliever

195k17248335




195k17248335










asked Nov 14 '18 at 10:49









HenrovHenrov

7211932




7211932












  • I take it you want to get this without a self-join or sub-query (for speed reasons)?

    – Henning Koehler
    Nov 14 '18 at 11:15











  • Yes. It needs to perform, it will run over 6mln+ records

    – Henrov
    Nov 14 '18 at 12:07











  • @Henrov . . . Your table doesn't have enough information. You mention "previous", but SQL tables represent unordered sets. There is no ordering unless a column specifies the ordering.

    – Gordon Linoff
    Nov 14 '18 at 12:21











  • @GordonLinoff De ordering can be derived from the order of the id's. This is a simplified example, in real life I can use datecolumns also. The id's however should be guaranteed in irder (fed by a sequence)

    – Henrov
    Nov 14 '18 at 13:08

















  • I take it you want to get this without a self-join or sub-query (for speed reasons)?

    – Henning Koehler
    Nov 14 '18 at 11:15











  • Yes. It needs to perform, it will run over 6mln+ records

    – Henrov
    Nov 14 '18 at 12:07











  • @Henrov . . . Your table doesn't have enough information. You mention "previous", but SQL tables represent unordered sets. There is no ordering unless a column specifies the ordering.

    – Gordon Linoff
    Nov 14 '18 at 12:21











  • @GordonLinoff De ordering can be derived from the order of the id's. This is a simplified example, in real life I can use datecolumns also. The id's however should be guaranteed in irder (fed by a sequence)

    – Henrov
    Nov 14 '18 at 13:08
















I take it you want to get this without a self-join or sub-query (for speed reasons)?

– Henning Koehler
Nov 14 '18 at 11:15





I take it you want to get this without a self-join or sub-query (for speed reasons)?

– Henning Koehler
Nov 14 '18 at 11:15













Yes. It needs to perform, it will run over 6mln+ records

– Henrov
Nov 14 '18 at 12:07





Yes. It needs to perform, it will run over 6mln+ records

– Henrov
Nov 14 '18 at 12:07













@Henrov . . . Your table doesn't have enough information. You mention "previous", but SQL tables represent unordered sets. There is no ordering unless a column specifies the ordering.

– Gordon Linoff
Nov 14 '18 at 12:21





@Henrov . . . Your table doesn't have enough information. You mention "previous", but SQL tables represent unordered sets. There is no ordering unless a column specifies the ordering.

– Gordon Linoff
Nov 14 '18 at 12:21













@GordonLinoff De ordering can be derived from the order of the id's. This is a simplified example, in real life I can use datecolumns also. The id's however should be guaranteed in irder (fed by a sequence)

– Henrov
Nov 14 '18 at 13:08





@GordonLinoff De ordering can be derived from the order of the id's. This is a simplified example, in real life I can use datecolumns also. The id's however should be guaranteed in irder (fed by a sequence)

– Henrov
Nov 14 '18 at 13:08












2 Answers
2






active

oldest

votes


















2














Might this help?



declare @table table (
[clientID] [int] NULL,
[projectID] [int] NULL,
[invitationID] [int] NULL
)
INSERT @table ([clientID], [projectID], [invitationID]) VALUES
(14, 267, 489),
(14, 267, 325),
(16, 385, 475),
(17, 546, NULL),
(17, 547, 885),
(17, 548, 901),
(18, 721, 905),
(18, 834, 906),
(18, 834, 907),
(19, 856, 908),
(19, 856, 929),
(19, 857, 931),
(19, 857, 945),
(19, 858, NULL);


--The query uses DENSE_RANK() and a correlated sub-query



WITH ranked AS
(
SELECT t.*
,DENSE_RANK() OVER(PARTITION BY t.clientID ORDER BY t.projectID) AS InvRank
FROM @table t
)
SELECT r.*
,earlierProject.invitationID
FROM ranked r
OUTER APPLY(SELECT TOP 1 *
FROM ranked r2
WHERE r2.clientID=r.clientID
AND r2.projectID<r.projectID
AND r2.InvRank=r.InvRank-1
ORDER BY invitationID ASC
) earlierProject
ORDER BY r.clientID,r.projectID,r.invitationID;


The invitationID will be NULL in case of "0" in your table and set to the needed value in case of an item found.



Hint



There's no need for the APPLY actually. If you'd need the invitationID only, you might place the sub-query as a column directly (slightly faster). But this is better to read and you could get hands on the other columns as well...






share|improve this answer























  • Does what it needs to do. I am going to test this on the actual dataset :)

    – Henrov
    Nov 14 '18 at 12:14






  • 1





    @Henrov If performance matters (lot of rows) it might help to write the intermediate set into a temp table and place an indexes all involved columns (also on InvRank).

    – Shnugo
    Nov 14 '18 at 12:54






  • 1





    Dense_RANKI() works like a charm in this situation. Using a intermediate table with the proper indexes made for good performance . I swear my SSIS server was humming a happy tune when it was crunchinrg the numbers :D

    – Henrov
    Nov 23 '18 at 15:41



















1














You need a column that specifies the ordering. Let me assume that there is an invitation date as well as the other columns.



With this information, your flag is easily calculated by comparing two values:



  • minimum invitation date for the client

  • minimum invitation date for the client/project id

When these are the same, this is the first project with an invitation.



So:



select t.*,
(case when min(invitationDate) over (partition by clientId order by invitationDate) =
min(invitationDate) over (partition by clientId, projectId order by invitationDate)
then 0 else 1
end) as InvitedForPreviousProject
from @table t;





share|improve this answer























  • That seems like smart solution, thank you. I will test that too

    – Henrov
    Nov 14 '18 at 13:16










Your Answer






StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);













draft saved

draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53298418%2fget-lead-value-over-multiple-partitions%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown

























2 Answers
2






active

oldest

votes








2 Answers
2






active

oldest

votes









active

oldest

votes






active

oldest

votes









2














Might this help?



declare @table table (
[clientID] [int] NULL,
[projectID] [int] NULL,
[invitationID] [int] NULL
)
INSERT @table ([clientID], [projectID], [invitationID]) VALUES
(14, 267, 489),
(14, 267, 325),
(16, 385, 475),
(17, 546, NULL),
(17, 547, 885),
(17, 548, 901),
(18, 721, 905),
(18, 834, 906),
(18, 834, 907),
(19, 856, 908),
(19, 856, 929),
(19, 857, 931),
(19, 857, 945),
(19, 858, NULL);


--The query uses DENSE_RANK() and a correlated sub-query



WITH ranked AS
(
SELECT t.*
,DENSE_RANK() OVER(PARTITION BY t.clientID ORDER BY t.projectID) AS InvRank
FROM @table t
)
SELECT r.*
,earlierProject.invitationID
FROM ranked r
OUTER APPLY(SELECT TOP 1 *
FROM ranked r2
WHERE r2.clientID=r.clientID
AND r2.projectID<r.projectID
AND r2.InvRank=r.InvRank-1
ORDER BY invitationID ASC
) earlierProject
ORDER BY r.clientID,r.projectID,r.invitationID;


The invitationID will be NULL in case of "0" in your table and set to the needed value in case of an item found.



Hint



There's no need for the APPLY actually. If you'd need the invitationID only, you might place the sub-query as a column directly (slightly faster). But this is better to read and you could get hands on the other columns as well...






share|improve this answer























  • Does what it needs to do. I am going to test this on the actual dataset :)

    – Henrov
    Nov 14 '18 at 12:14






  • 1





    @Henrov If performance matters (lot of rows) it might help to write the intermediate set into a temp table and place an indexes all involved columns (also on InvRank).

    – Shnugo
    Nov 14 '18 at 12:54






  • 1





    Dense_RANKI() works like a charm in this situation. Using a intermediate table with the proper indexes made for good performance . I swear my SSIS server was humming a happy tune when it was crunchinrg the numbers :D

    – Henrov
    Nov 23 '18 at 15:41
















2














Might this help?



declare @table table (
[clientID] [int] NULL,
[projectID] [int] NULL,
[invitationID] [int] NULL
)
INSERT @table ([clientID], [projectID], [invitationID]) VALUES
(14, 267, 489),
(14, 267, 325),
(16, 385, 475),
(17, 546, NULL),
(17, 547, 885),
(17, 548, 901),
(18, 721, 905),
(18, 834, 906),
(18, 834, 907),
(19, 856, 908),
(19, 856, 929),
(19, 857, 931),
(19, 857, 945),
(19, 858, NULL);


--The query uses DENSE_RANK() and a correlated sub-query



WITH ranked AS
(
SELECT t.*
,DENSE_RANK() OVER(PARTITION BY t.clientID ORDER BY t.projectID) AS InvRank
FROM @table t
)
SELECT r.*
,earlierProject.invitationID
FROM ranked r
OUTER APPLY(SELECT TOP 1 *
FROM ranked r2
WHERE r2.clientID=r.clientID
AND r2.projectID<r.projectID
AND r2.InvRank=r.InvRank-1
ORDER BY invitationID ASC
) earlierProject
ORDER BY r.clientID,r.projectID,r.invitationID;


The invitationID will be NULL in case of "0" in your table and set to the needed value in case of an item found.



Hint



There's no need for the APPLY actually. If you'd need the invitationID only, you might place the sub-query as a column directly (slightly faster). But this is better to read and you could get hands on the other columns as well...






share|improve this answer























  • Does what it needs to do. I am going to test this on the actual dataset :)

    – Henrov
    Nov 14 '18 at 12:14






  • 1





    @Henrov If performance matters (lot of rows) it might help to write the intermediate set into a temp table and place an indexes all involved columns (also on InvRank).

    – Shnugo
    Nov 14 '18 at 12:54






  • 1





    Dense_RANKI() works like a charm in this situation. Using a intermediate table with the proper indexes made for good performance . I swear my SSIS server was humming a happy tune when it was crunchinrg the numbers :D

    – Henrov
    Nov 23 '18 at 15:41














2












2








2







Might this help?



declare @table table (
[clientID] [int] NULL,
[projectID] [int] NULL,
[invitationID] [int] NULL
)
INSERT @table ([clientID], [projectID], [invitationID]) VALUES
(14, 267, 489),
(14, 267, 325),
(16, 385, 475),
(17, 546, NULL),
(17, 547, 885),
(17, 548, 901),
(18, 721, 905),
(18, 834, 906),
(18, 834, 907),
(19, 856, 908),
(19, 856, 929),
(19, 857, 931),
(19, 857, 945),
(19, 858, NULL);


--The query uses DENSE_RANK() and a correlated sub-query



WITH ranked AS
(
SELECT t.*
,DENSE_RANK() OVER(PARTITION BY t.clientID ORDER BY t.projectID) AS InvRank
FROM @table t
)
SELECT r.*
,earlierProject.invitationID
FROM ranked r
OUTER APPLY(SELECT TOP 1 *
FROM ranked r2
WHERE r2.clientID=r.clientID
AND r2.projectID<r.projectID
AND r2.InvRank=r.InvRank-1
ORDER BY invitationID ASC
) earlierProject
ORDER BY r.clientID,r.projectID,r.invitationID;


The invitationID will be NULL in case of "0" in your table and set to the needed value in case of an item found.



Hint



There's no need for the APPLY actually. If you'd need the invitationID only, you might place the sub-query as a column directly (slightly faster). But this is better to read and you could get hands on the other columns as well...






share|improve this answer













Might this help?



declare @table table (
[clientID] [int] NULL,
[projectID] [int] NULL,
[invitationID] [int] NULL
)
INSERT @table ([clientID], [projectID], [invitationID]) VALUES
(14, 267, 489),
(14, 267, 325),
(16, 385, 475),
(17, 546, NULL),
(17, 547, 885),
(17, 548, 901),
(18, 721, 905),
(18, 834, 906),
(18, 834, 907),
(19, 856, 908),
(19, 856, 929),
(19, 857, 931),
(19, 857, 945),
(19, 858, NULL);


--The query uses DENSE_RANK() and a correlated sub-query



WITH ranked AS
(
SELECT t.*
,DENSE_RANK() OVER(PARTITION BY t.clientID ORDER BY t.projectID) AS InvRank
FROM @table t
)
SELECT r.*
,earlierProject.invitationID
FROM ranked r
OUTER APPLY(SELECT TOP 1 *
FROM ranked r2
WHERE r2.clientID=r.clientID
AND r2.projectID<r.projectID
AND r2.InvRank=r.InvRank-1
ORDER BY invitationID ASC
) earlierProject
ORDER BY r.clientID,r.projectID,r.invitationID;


The invitationID will be NULL in case of "0" in your table and set to the needed value in case of an item found.



Hint



There's no need for the APPLY actually. If you'd need the invitationID only, you might place the sub-query as a column directly (slightly faster). But this is better to read and you could get hands on the other columns as well...







share|improve this answer












share|improve this answer



share|improve this answer










answered Nov 14 '18 at 11:31









ShnugoShnugo

49.7k72669




49.7k72669












  • Does what it needs to do. I am going to test this on the actual dataset :)

    – Henrov
    Nov 14 '18 at 12:14






  • 1





    @Henrov If performance matters (lot of rows) it might help to write the intermediate set into a temp table and place an indexes all involved columns (also on InvRank).

    – Shnugo
    Nov 14 '18 at 12:54






  • 1





    Dense_RANKI() works like a charm in this situation. Using a intermediate table with the proper indexes made for good performance . I swear my SSIS server was humming a happy tune when it was crunchinrg the numbers :D

    – Henrov
    Nov 23 '18 at 15:41


















  • Does what it needs to do. I am going to test this on the actual dataset :)

    – Henrov
    Nov 14 '18 at 12:14






  • 1





    @Henrov If performance matters (lot of rows) it might help to write the intermediate set into a temp table and place an indexes all involved columns (also on InvRank).

    – Shnugo
    Nov 14 '18 at 12:54






  • 1





    Dense_RANKI() works like a charm in this situation. Using a intermediate table with the proper indexes made for good performance . I swear my SSIS server was humming a happy tune when it was crunchinrg the numbers :D

    – Henrov
    Nov 23 '18 at 15:41

















Does what it needs to do. I am going to test this on the actual dataset :)

– Henrov
Nov 14 '18 at 12:14





Does what it needs to do. I am going to test this on the actual dataset :)

– Henrov
Nov 14 '18 at 12:14




1




1





@Henrov If performance matters (lot of rows) it might help to write the intermediate set into a temp table and place an indexes all involved columns (also on InvRank).

– Shnugo
Nov 14 '18 at 12:54





@Henrov If performance matters (lot of rows) it might help to write the intermediate set into a temp table and place an indexes all involved columns (also on InvRank).

– Shnugo
Nov 14 '18 at 12:54




1




1





Dense_RANKI() works like a charm in this situation. Using a intermediate table with the proper indexes made for good performance . I swear my SSIS server was humming a happy tune when it was crunchinrg the numbers :D

– Henrov
Nov 23 '18 at 15:41






Dense_RANKI() works like a charm in this situation. Using a intermediate table with the proper indexes made for good performance . I swear my SSIS server was humming a happy tune when it was crunchinrg the numbers :D

– Henrov
Nov 23 '18 at 15:41














1














You need a column that specifies the ordering. Let me assume that there is an invitation date as well as the other columns.



With this information, your flag is easily calculated by comparing two values:



  • minimum invitation date for the client

  • minimum invitation date for the client/project id

When these are the same, this is the first project with an invitation.



So:



select t.*,
(case when min(invitationDate) over (partition by clientId order by invitationDate) =
min(invitationDate) over (partition by clientId, projectId order by invitationDate)
then 0 else 1
end) as InvitedForPreviousProject
from @table t;





share|improve this answer























  • That seems like smart solution, thank you. I will test that too

    – Henrov
    Nov 14 '18 at 13:16















1














You need a column that specifies the ordering. Let me assume that there is an invitation date as well as the other columns.



With this information, your flag is easily calculated by comparing two values:



  • minimum invitation date for the client

  • minimum invitation date for the client/project id

When these are the same, this is the first project with an invitation.



So:



select t.*,
(case when min(invitationDate) over (partition by clientId order by invitationDate) =
min(invitationDate) over (partition by clientId, projectId order by invitationDate)
then 0 else 1
end) as InvitedForPreviousProject
from @table t;





share|improve this answer























  • That seems like smart solution, thank you. I will test that too

    – Henrov
    Nov 14 '18 at 13:16













1












1








1







You need a column that specifies the ordering. Let me assume that there is an invitation date as well as the other columns.



With this information, your flag is easily calculated by comparing two values:



  • minimum invitation date for the client

  • minimum invitation date for the client/project id

When these are the same, this is the first project with an invitation.



So:



select t.*,
(case when min(invitationDate) over (partition by clientId order by invitationDate) =
min(invitationDate) over (partition by clientId, projectId order by invitationDate)
then 0 else 1
end) as InvitedForPreviousProject
from @table t;





share|improve this answer













You need a column that specifies the ordering. Let me assume that there is an invitation date as well as the other columns.



With this information, your flag is easily calculated by comparing two values:



  • minimum invitation date for the client

  • minimum invitation date for the client/project id

When these are the same, this is the first project with an invitation.



So:



select t.*,
(case when min(invitationDate) over (partition by clientId order by invitationDate) =
min(invitationDate) over (partition by clientId, projectId order by invitationDate)
then 0 else 1
end) as InvitedForPreviousProject
from @table t;






share|improve this answer












share|improve this answer



share|improve this answer










answered Nov 14 '18 at 12:26









Gordon LinoffGordon Linoff

775k35306409




775k35306409












  • That seems like smart solution, thank you. I will test that too

    – Henrov
    Nov 14 '18 at 13:16

















  • That seems like smart solution, thank you. I will test that too

    – Henrov
    Nov 14 '18 at 13:16
















That seems like smart solution, thank you. I will test that too

– Henrov
Nov 14 '18 at 13:16





That seems like smart solution, thank you. I will test that too

– Henrov
Nov 14 '18 at 13:16

















draft saved

draft discarded
















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid


  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53298418%2fget-lead-value-over-multiple-partitions%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







這個網誌中的熱門文章

How to read a connectionString WITH PROVIDER in .NET Core?

In R, how to develop a multiplot heatmap.2 figure showing key labels successfully

Museum of Modern and Contemporary Art of Trento and Rovereto