Efficient way to find top K products given two lists










2















Given two lists of equal length N, I want to find the K largest products that can be made by multiplying an element from each list. For example, if



> A = [7, 9, 4, 1, 6]
> B = [8, 1, 3, 10, 7]
> K = 3


the result is [90, 72, 70] or [9*10, 9*8, 7*10], found by



> sorted([x*y for x in A for y in B], reverse=True)[:K]
[90, 72, 70]


Is there a more efficient algorithm that doesn't involve multiplying out all N^2 pairs?










share|improve this question
























  • You do not need to multiply every elements of the lists. If you need K values, you can just retrieve first K max values of each lists and multiply each pair. But I think there is an even more optimized solution.

    – LostReality
    Nov 15 '18 at 10:49















2















Given two lists of equal length N, I want to find the K largest products that can be made by multiplying an element from each list. For example, if



> A = [7, 9, 4, 1, 6]
> B = [8, 1, 3, 10, 7]
> K = 3


the result is [90, 72, 70] or [9*10, 9*8, 7*10], found by



> sorted([x*y for x in A for y in B], reverse=True)[:K]
[90, 72, 70]


Is there a more efficient algorithm that doesn't involve multiplying out all N^2 pairs?










share|improve this question
























  • You do not need to multiply every elements of the lists. If you need K values, you can just retrieve first K max values of each lists and multiply each pair. But I think there is an even more optimized solution.

    – LostReality
    Nov 15 '18 at 10:49













2












2








2








Given two lists of equal length N, I want to find the K largest products that can be made by multiplying an element from each list. For example, if



> A = [7, 9, 4, 1, 6]
> B = [8, 1, 3, 10, 7]
> K = 3


the result is [90, 72, 70] or [9*10, 9*8, 7*10], found by



> sorted([x*y for x in A for y in B], reverse=True)[:K]
[90, 72, 70]


Is there a more efficient algorithm that doesn't involve multiplying out all N^2 pairs?










share|improve this question
















Given two lists of equal length N, I want to find the K largest products that can be made by multiplying an element from each list. For example, if



> A = [7, 9, 4, 1, 6]
> B = [8, 1, 3, 10, 7]
> K = 3


the result is [90, 72, 70] or [9*10, 9*8, 7*10], found by



> sorted([x*y for x in A for y in B], reverse=True)[:K]
[90, 72, 70]


Is there a more efficient algorithm that doesn't involve multiplying out all N^2 pairs?







python algorithm big-o






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 15 '18 at 15:09









Peter B

13.5k52045




13.5k52045










asked Nov 15 '18 at 10:30









FlashFlash

8,43385382




8,43385382












  • You do not need to multiply every elements of the lists. If you need K values, you can just retrieve first K max values of each lists and multiply each pair. But I think there is an even more optimized solution.

    – LostReality
    Nov 15 '18 at 10:49

















  • You do not need to multiply every elements of the lists. If you need K values, you can just retrieve first K max values of each lists and multiply each pair. But I think there is an even more optimized solution.

    – LostReality
    Nov 15 '18 at 10:49
















You do not need to multiply every elements of the lists. If you need K values, you can just retrieve first K max values of each lists and multiply each pair. But I think there is an even more optimized solution.

– LostReality
Nov 15 '18 at 10:49





You do not need to multiply every elements of the lists. If you need K values, you can just retrieve first K max values of each lists and multiply each pair. But I think there is an even more optimized solution.

– LostReality
Nov 15 '18 at 10:49












3 Answers
3






active

oldest

votes


















3














As already noted, the first step is to sort both lists A and B in descending order (or just the K largest of both lists). Then, all the max K products will sit in a roughly triangular area in the top-left corner, the max product being A[0]*B[0]. In other words, if A[i]*B[j] is in the top K, then so must be both A[i-1]*B[j] and A[i]*B[j-1] (assuming i, j > 0).



Thus, you can start in the top-left corner and then use a Heap to expand both the "lower" and the "right" neighbor of the current element and put those onto the heap, too, until you have all the K elements you need. Or start with all the K largest elements of A paired with the largest from B already on the heap and only expand in one direction.





Example in Python, using the heapq module, but the same will work in almost any other language. Note that we are adding negative products to the heap as the heap will be sorted smallest-first.



def top_k_prod(A, B, k):
A = heapq.nlargest(k, A)
B = heapq.nlargest(k, B)
result =
heap = [(-A[i] * B[0], i, 0) for i in range(len(A))]
while heap and len(result) < k:
p, a, b = heapq.heappop(heap)
result.append(-p)
if b < len(B)-1:
heapq.heappush(heap, (-A[a] * B[b+1], a, b+1))
return result


Example:



import random
A = [random.randint(0, 100) for _ in range(100)]
B = [random.randint(0, 100) for _ in range(100)]
K = 20
result = top_k_prod(A, B, K)
test = sorted([x*y for x in A for y in B], reverse=True)[:K]
print(result)
# [9900, 9702, 9603, 9600, 9504, 9408, 9405, 9405, 9400, 9400, 9312, 9306, 9300, 9216, 9212, 9212, 9207, 9200, 9120, 9120]
print(result == test)
# True


The complexity should be about O(NlogN + KlogK) for sorting A and B and then about K iterations with heap-operations in the loop. Each cell in the triangular "target" region will only be expanded once from its left neighbor, and cells added to the heap but not used are also limited to K (one in each "row"), giving a maximum of 2*K elements inspected.






share|improve this answer

























  • This can be improved to O(N + KlogK + K) by using numpy.partition(A,n-k)[-k:] to find the K largest values in O(N), and only then sorting them.

    – wowserx
    Nov 15 '18 at 15:22












  • Can also use heapq.nlargest(k, A) to find the largest K values in order in O(KlogK).

    – wowserx
    Nov 15 '18 at 15:26











  • @wowserx Maybe, but originally the question was not even tagged as Python but just a generic algorithm question, so I wanted to keep it generic. You can use a heap in (almost) any language.

    – tobias_k
    Nov 15 '18 at 15:36











  • Right, and you can use a heap in exactly the same way: O(N) heapify, then K heappops for O(N + KlogK + K). I'm saying the algorithmic performance is better if you only sort the K largest elements, instead of calling A = sorted(A, reverse=True) right away.

    – wowserx
    Nov 15 '18 at 16:13


















0














Practical solution:



Find largest K elements from list A and K largest elements from list B by using partial_sort (this is a well-known modification of quick sort, and I am sure python has the same in its library). Largest products formed by these new lists are also the largest products of the original lists. Then use max-heap (priority queue) to find K largest products from new lists.






share|improve this answer






























    0














    If we would find out K max values from both the lists, we would have the max K products from both the lists.



    I would suggest two approaches to find out K max values:



    1. If K <<< N ( K in 10s and N in millions )

      Here you have couple of options.

      • You can use selection algorithm K times for both the lists. That would take O(N*K)

      • K iterations of either Selection Sort or Bubble Sort. You would have K max values at either at the beginning or at the end of the array depending on the type of implementation. Even that would be O(N*K)



    Note that because K <<< N you can say that O(N*K) is almost O(N)





    1. K can be as same as N

      • In this case, The best bet would be to just sort both the lists using Merge Sort or Quick Sort. That would be O(N*lgN)






    share|improve this answer























    • "because K <<< N you can say that O(N * K) is almost O(N)" Well, by that reasoning you could also say that sorting the entire list in O(nlogn) is almost O(n). And I agree, for mot practical purposes, it probably is. But then, why not just sort the entire list? In fact, with K=10 and N=1000000, nlogn is still smaller than n*k.

      – tobias_k
      Nov 15 '18 at 13:31












    • Also, even if you have the max K values from both lists, you still have to find the max K of those K² pairs/products of numbers. This part is not really addressed.

      – tobias_k
      Nov 15 '18 at 13:35











    Your Answer






    StackExchange.ifUsing("editor", function ()
    StackExchange.using("externalEditor", function ()
    StackExchange.using("snippets", function ()
    StackExchange.snippets.init();
    );
    );
    , "code-snippets");

    StackExchange.ready(function()
    var channelOptions =
    tags: "".split(" "),
    id: "1"
    ;
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function()
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled)
    StackExchange.using("snippets", function()
    createEditor();
    );

    else
    createEditor();

    );

    function createEditor()
    StackExchange.prepareEditor(
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader:
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    ,
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    );



    );













    draft saved

    draft discarded


















    StackExchange.ready(
    function ()
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53317379%2fefficient-way-to-find-top-k-products-given-two-lists%23new-answer', 'question_page');

    );

    Post as a guest















    Required, but never shown

























    3 Answers
    3






    active

    oldest

    votes








    3 Answers
    3






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    3














    As already noted, the first step is to sort both lists A and B in descending order (or just the K largest of both lists). Then, all the max K products will sit in a roughly triangular area in the top-left corner, the max product being A[0]*B[0]. In other words, if A[i]*B[j] is in the top K, then so must be both A[i-1]*B[j] and A[i]*B[j-1] (assuming i, j > 0).



    Thus, you can start in the top-left corner and then use a Heap to expand both the "lower" and the "right" neighbor of the current element and put those onto the heap, too, until you have all the K elements you need. Or start with all the K largest elements of A paired with the largest from B already on the heap and only expand in one direction.





    Example in Python, using the heapq module, but the same will work in almost any other language. Note that we are adding negative products to the heap as the heap will be sorted smallest-first.



    def top_k_prod(A, B, k):
    A = heapq.nlargest(k, A)
    B = heapq.nlargest(k, B)
    result =
    heap = [(-A[i] * B[0], i, 0) for i in range(len(A))]
    while heap and len(result) < k:
    p, a, b = heapq.heappop(heap)
    result.append(-p)
    if b < len(B)-1:
    heapq.heappush(heap, (-A[a] * B[b+1], a, b+1))
    return result


    Example:



    import random
    A = [random.randint(0, 100) for _ in range(100)]
    B = [random.randint(0, 100) for _ in range(100)]
    K = 20
    result = top_k_prod(A, B, K)
    test = sorted([x*y for x in A for y in B], reverse=True)[:K]
    print(result)
    # [9900, 9702, 9603, 9600, 9504, 9408, 9405, 9405, 9400, 9400, 9312, 9306, 9300, 9216, 9212, 9212, 9207, 9200, 9120, 9120]
    print(result == test)
    # True


    The complexity should be about O(NlogN + KlogK) for sorting A and B and then about K iterations with heap-operations in the loop. Each cell in the triangular "target" region will only be expanded once from its left neighbor, and cells added to the heap but not used are also limited to K (one in each "row"), giving a maximum of 2*K elements inspected.






    share|improve this answer

























    • This can be improved to O(N + KlogK + K) by using numpy.partition(A,n-k)[-k:] to find the K largest values in O(N), and only then sorting them.

      – wowserx
      Nov 15 '18 at 15:22












    • Can also use heapq.nlargest(k, A) to find the largest K values in order in O(KlogK).

      – wowserx
      Nov 15 '18 at 15:26











    • @wowserx Maybe, but originally the question was not even tagged as Python but just a generic algorithm question, so I wanted to keep it generic. You can use a heap in (almost) any language.

      – tobias_k
      Nov 15 '18 at 15:36











    • Right, and you can use a heap in exactly the same way: O(N) heapify, then K heappops for O(N + KlogK + K). I'm saying the algorithmic performance is better if you only sort the K largest elements, instead of calling A = sorted(A, reverse=True) right away.

      – wowserx
      Nov 15 '18 at 16:13















    3














    As already noted, the first step is to sort both lists A and B in descending order (or just the K largest of both lists). Then, all the max K products will sit in a roughly triangular area in the top-left corner, the max product being A[0]*B[0]. In other words, if A[i]*B[j] is in the top K, then so must be both A[i-1]*B[j] and A[i]*B[j-1] (assuming i, j > 0).



    Thus, you can start in the top-left corner and then use a Heap to expand both the "lower" and the "right" neighbor of the current element and put those onto the heap, too, until you have all the K elements you need. Or start with all the K largest elements of A paired with the largest from B already on the heap and only expand in one direction.





    Example in Python, using the heapq module, but the same will work in almost any other language. Note that we are adding negative products to the heap as the heap will be sorted smallest-first.



    def top_k_prod(A, B, k):
    A = heapq.nlargest(k, A)
    B = heapq.nlargest(k, B)
    result =
    heap = [(-A[i] * B[0], i, 0) for i in range(len(A))]
    while heap and len(result) < k:
    p, a, b = heapq.heappop(heap)
    result.append(-p)
    if b < len(B)-1:
    heapq.heappush(heap, (-A[a] * B[b+1], a, b+1))
    return result


    Example:



    import random
    A = [random.randint(0, 100) for _ in range(100)]
    B = [random.randint(0, 100) for _ in range(100)]
    K = 20
    result = top_k_prod(A, B, K)
    test = sorted([x*y for x in A for y in B], reverse=True)[:K]
    print(result)
    # [9900, 9702, 9603, 9600, 9504, 9408, 9405, 9405, 9400, 9400, 9312, 9306, 9300, 9216, 9212, 9212, 9207, 9200, 9120, 9120]
    print(result == test)
    # True


    The complexity should be about O(NlogN + KlogK) for sorting A and B and then about K iterations with heap-operations in the loop. Each cell in the triangular "target" region will only be expanded once from its left neighbor, and cells added to the heap but not used are also limited to K (one in each "row"), giving a maximum of 2*K elements inspected.






    share|improve this answer

























    • This can be improved to O(N + KlogK + K) by using numpy.partition(A,n-k)[-k:] to find the K largest values in O(N), and only then sorting them.

      – wowserx
      Nov 15 '18 at 15:22












    • Can also use heapq.nlargest(k, A) to find the largest K values in order in O(KlogK).

      – wowserx
      Nov 15 '18 at 15:26











    • @wowserx Maybe, but originally the question was not even tagged as Python but just a generic algorithm question, so I wanted to keep it generic. You can use a heap in (almost) any language.

      – tobias_k
      Nov 15 '18 at 15:36











    • Right, and you can use a heap in exactly the same way: O(N) heapify, then K heappops for O(N + KlogK + K). I'm saying the algorithmic performance is better if you only sort the K largest elements, instead of calling A = sorted(A, reverse=True) right away.

      – wowserx
      Nov 15 '18 at 16:13













    3












    3








    3







    As already noted, the first step is to sort both lists A and B in descending order (or just the K largest of both lists). Then, all the max K products will sit in a roughly triangular area in the top-left corner, the max product being A[0]*B[0]. In other words, if A[i]*B[j] is in the top K, then so must be both A[i-1]*B[j] and A[i]*B[j-1] (assuming i, j > 0).



    Thus, you can start in the top-left corner and then use a Heap to expand both the "lower" and the "right" neighbor of the current element and put those onto the heap, too, until you have all the K elements you need. Or start with all the K largest elements of A paired with the largest from B already on the heap and only expand in one direction.





    Example in Python, using the heapq module, but the same will work in almost any other language. Note that we are adding negative products to the heap as the heap will be sorted smallest-first.



    def top_k_prod(A, B, k):
    A = heapq.nlargest(k, A)
    B = heapq.nlargest(k, B)
    result =
    heap = [(-A[i] * B[0], i, 0) for i in range(len(A))]
    while heap and len(result) < k:
    p, a, b = heapq.heappop(heap)
    result.append(-p)
    if b < len(B)-1:
    heapq.heappush(heap, (-A[a] * B[b+1], a, b+1))
    return result


    Example:



    import random
    A = [random.randint(0, 100) for _ in range(100)]
    B = [random.randint(0, 100) for _ in range(100)]
    K = 20
    result = top_k_prod(A, B, K)
    test = sorted([x*y for x in A for y in B], reverse=True)[:K]
    print(result)
    # [9900, 9702, 9603, 9600, 9504, 9408, 9405, 9405, 9400, 9400, 9312, 9306, 9300, 9216, 9212, 9212, 9207, 9200, 9120, 9120]
    print(result == test)
    # True


    The complexity should be about O(NlogN + KlogK) for sorting A and B and then about K iterations with heap-operations in the loop. Each cell in the triangular "target" region will only be expanded once from its left neighbor, and cells added to the heap but not used are also limited to K (one in each "row"), giving a maximum of 2*K elements inspected.






    share|improve this answer















    As already noted, the first step is to sort both lists A and B in descending order (or just the K largest of both lists). Then, all the max K products will sit in a roughly triangular area in the top-left corner, the max product being A[0]*B[0]. In other words, if A[i]*B[j] is in the top K, then so must be both A[i-1]*B[j] and A[i]*B[j-1] (assuming i, j > 0).



    Thus, you can start in the top-left corner and then use a Heap to expand both the "lower" and the "right" neighbor of the current element and put those onto the heap, too, until you have all the K elements you need. Or start with all the K largest elements of A paired with the largest from B already on the heap and only expand in one direction.





    Example in Python, using the heapq module, but the same will work in almost any other language. Note that we are adding negative products to the heap as the heap will be sorted smallest-first.



    def top_k_prod(A, B, k):
    A = heapq.nlargest(k, A)
    B = heapq.nlargest(k, B)
    result =
    heap = [(-A[i] * B[0], i, 0) for i in range(len(A))]
    while heap and len(result) < k:
    p, a, b = heapq.heappop(heap)
    result.append(-p)
    if b < len(B)-1:
    heapq.heappush(heap, (-A[a] * B[b+1], a, b+1))
    return result


    Example:



    import random
    A = [random.randint(0, 100) for _ in range(100)]
    B = [random.randint(0, 100) for _ in range(100)]
    K = 20
    result = top_k_prod(A, B, K)
    test = sorted([x*y for x in A for y in B], reverse=True)[:K]
    print(result)
    # [9900, 9702, 9603, 9600, 9504, 9408, 9405, 9405, 9400, 9400, 9312, 9306, 9300, 9216, 9212, 9212, 9207, 9200, 9120, 9120]
    print(result == test)
    # True


    The complexity should be about O(NlogN + KlogK) for sorting A and B and then about K iterations with heap-operations in the loop. Each cell in the triangular "target" region will only be expanded once from its left neighbor, and cells added to the heap but not used are also limited to K (one in each "row"), giving a maximum of 2*K elements inspected.







    share|improve this answer














    share|improve this answer



    share|improve this answer








    edited Nov 15 '18 at 17:25

























    answered Nov 15 '18 at 11:18









    tobias_ktobias_k

    59k969108




    59k969108












    • This can be improved to O(N + KlogK + K) by using numpy.partition(A,n-k)[-k:] to find the K largest values in O(N), and only then sorting them.

      – wowserx
      Nov 15 '18 at 15:22












    • Can also use heapq.nlargest(k, A) to find the largest K values in order in O(KlogK).

      – wowserx
      Nov 15 '18 at 15:26











    • @wowserx Maybe, but originally the question was not even tagged as Python but just a generic algorithm question, so I wanted to keep it generic. You can use a heap in (almost) any language.

      – tobias_k
      Nov 15 '18 at 15:36











    • Right, and you can use a heap in exactly the same way: O(N) heapify, then K heappops for O(N + KlogK + K). I'm saying the algorithmic performance is better if you only sort the K largest elements, instead of calling A = sorted(A, reverse=True) right away.

      – wowserx
      Nov 15 '18 at 16:13

















    • This can be improved to O(N + KlogK + K) by using numpy.partition(A,n-k)[-k:] to find the K largest values in O(N), and only then sorting them.

      – wowserx
      Nov 15 '18 at 15:22












    • Can also use heapq.nlargest(k, A) to find the largest K values in order in O(KlogK).

      – wowserx
      Nov 15 '18 at 15:26











    • @wowserx Maybe, but originally the question was not even tagged as Python but just a generic algorithm question, so I wanted to keep it generic. You can use a heap in (almost) any language.

      – tobias_k
      Nov 15 '18 at 15:36











    • Right, and you can use a heap in exactly the same way: O(N) heapify, then K heappops for O(N + KlogK + K). I'm saying the algorithmic performance is better if you only sort the K largest elements, instead of calling A = sorted(A, reverse=True) right away.

      – wowserx
      Nov 15 '18 at 16:13
















    This can be improved to O(N + KlogK + K) by using numpy.partition(A,n-k)[-k:] to find the K largest values in O(N), and only then sorting them.

    – wowserx
    Nov 15 '18 at 15:22






    This can be improved to O(N + KlogK + K) by using numpy.partition(A,n-k)[-k:] to find the K largest values in O(N), and only then sorting them.

    – wowserx
    Nov 15 '18 at 15:22














    Can also use heapq.nlargest(k, A) to find the largest K values in order in O(KlogK).

    – wowserx
    Nov 15 '18 at 15:26





    Can also use heapq.nlargest(k, A) to find the largest K values in order in O(KlogK).

    – wowserx
    Nov 15 '18 at 15:26













    @wowserx Maybe, but originally the question was not even tagged as Python but just a generic algorithm question, so I wanted to keep it generic. You can use a heap in (almost) any language.

    – tobias_k
    Nov 15 '18 at 15:36





    @wowserx Maybe, but originally the question was not even tagged as Python but just a generic algorithm question, so I wanted to keep it generic. You can use a heap in (almost) any language.

    – tobias_k
    Nov 15 '18 at 15:36













    Right, and you can use a heap in exactly the same way: O(N) heapify, then K heappops for O(N + KlogK + K). I'm saying the algorithmic performance is better if you only sort the K largest elements, instead of calling A = sorted(A, reverse=True) right away.

    – wowserx
    Nov 15 '18 at 16:13





    Right, and you can use a heap in exactly the same way: O(N) heapify, then K heappops for O(N + KlogK + K). I'm saying the algorithmic performance is better if you only sort the K largest elements, instead of calling A = sorted(A, reverse=True) right away.

    – wowserx
    Nov 15 '18 at 16:13













    0














    Practical solution:



    Find largest K elements from list A and K largest elements from list B by using partial_sort (this is a well-known modification of quick sort, and I am sure python has the same in its library). Largest products formed by these new lists are also the largest products of the original lists. Then use max-heap (priority queue) to find K largest products from new lists.






    share|improve this answer



























      0














      Practical solution:



      Find largest K elements from list A and K largest elements from list B by using partial_sort (this is a well-known modification of quick sort, and I am sure python has the same in its library). Largest products formed by these new lists are also the largest products of the original lists. Then use max-heap (priority queue) to find K largest products from new lists.






      share|improve this answer

























        0












        0








        0







        Practical solution:



        Find largest K elements from list A and K largest elements from list B by using partial_sort (this is a well-known modification of quick sort, and I am sure python has the same in its library). Largest products formed by these new lists are also the largest products of the original lists. Then use max-heap (priority queue) to find K largest products from new lists.






        share|improve this answer













        Practical solution:



        Find largest K elements from list A and K largest elements from list B by using partial_sort (this is a well-known modification of quick sort, and I am sure python has the same in its library). Largest products formed by these new lists are also the largest products of the original lists. Then use max-heap (priority queue) to find K largest products from new lists.







        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Nov 15 '18 at 10:47









        gudokgudok

        2,79121224




        2,79121224





















            0














            If we would find out K max values from both the lists, we would have the max K products from both the lists.



            I would suggest two approaches to find out K max values:



            1. If K <<< N ( K in 10s and N in millions )

              Here you have couple of options.

              • You can use selection algorithm K times for both the lists. That would take O(N*K)

              • K iterations of either Selection Sort or Bubble Sort. You would have K max values at either at the beginning or at the end of the array depending on the type of implementation. Even that would be O(N*K)



            Note that because K <<< N you can say that O(N*K) is almost O(N)





            1. K can be as same as N

              • In this case, The best bet would be to just sort both the lists using Merge Sort or Quick Sort. That would be O(N*lgN)






            share|improve this answer























            • "because K <<< N you can say that O(N * K) is almost O(N)" Well, by that reasoning you could also say that sorting the entire list in O(nlogn) is almost O(n). And I agree, for mot practical purposes, it probably is. But then, why not just sort the entire list? In fact, with K=10 and N=1000000, nlogn is still smaller than n*k.

              – tobias_k
              Nov 15 '18 at 13:31












            • Also, even if you have the max K values from both lists, you still have to find the max K of those K² pairs/products of numbers. This part is not really addressed.

              – tobias_k
              Nov 15 '18 at 13:35
















            0














            If we would find out K max values from both the lists, we would have the max K products from both the lists.



            I would suggest two approaches to find out K max values:



            1. If K <<< N ( K in 10s and N in millions )

              Here you have couple of options.

              • You can use selection algorithm K times for both the lists. That would take O(N*K)

              • K iterations of either Selection Sort or Bubble Sort. You would have K max values at either at the beginning or at the end of the array depending on the type of implementation. Even that would be O(N*K)



            Note that because K <<< N you can say that O(N*K) is almost O(N)





            1. K can be as same as N

              • In this case, The best bet would be to just sort both the lists using Merge Sort or Quick Sort. That would be O(N*lgN)






            share|improve this answer























            • "because K <<< N you can say that O(N * K) is almost O(N)" Well, by that reasoning you could also say that sorting the entire list in O(nlogn) is almost O(n). And I agree, for mot practical purposes, it probably is. But then, why not just sort the entire list? In fact, with K=10 and N=1000000, nlogn is still smaller than n*k.

              – tobias_k
              Nov 15 '18 at 13:31












            • Also, even if you have the max K values from both lists, you still have to find the max K of those K² pairs/products of numbers. This part is not really addressed.

              – tobias_k
              Nov 15 '18 at 13:35














            0












            0








            0







            If we would find out K max values from both the lists, we would have the max K products from both the lists.



            I would suggest two approaches to find out K max values:



            1. If K <<< N ( K in 10s and N in millions )

              Here you have couple of options.

              • You can use selection algorithm K times for both the lists. That would take O(N*K)

              • K iterations of either Selection Sort or Bubble Sort. You would have K max values at either at the beginning or at the end of the array depending on the type of implementation. Even that would be O(N*K)



            Note that because K <<< N you can say that O(N*K) is almost O(N)





            1. K can be as same as N

              • In this case, The best bet would be to just sort both the lists using Merge Sort or Quick Sort. That would be O(N*lgN)






            share|improve this answer













            If we would find out K max values from both the lists, we would have the max K products from both the lists.



            I would suggest two approaches to find out K max values:



            1. If K <<< N ( K in 10s and N in millions )

              Here you have couple of options.

              • You can use selection algorithm K times for both the lists. That would take O(N*K)

              • K iterations of either Selection Sort or Bubble Sort. You would have K max values at either at the beginning or at the end of the array depending on the type of implementation. Even that would be O(N*K)



            Note that because K <<< N you can say that O(N*K) is almost O(N)





            1. K can be as same as N

              • In this case, The best bet would be to just sort both the lists using Merge Sort or Quick Sort. That would be O(N*lgN)







            share|improve this answer












            share|improve this answer



            share|improve this answer










            answered Nov 15 '18 at 11:55









            Anand UndaviaAnand Undavia

            1,7461924




            1,7461924












            • "because K <<< N you can say that O(N * K) is almost O(N)" Well, by that reasoning you could also say that sorting the entire list in O(nlogn) is almost O(n). And I agree, for mot practical purposes, it probably is. But then, why not just sort the entire list? In fact, with K=10 and N=1000000, nlogn is still smaller than n*k.

              – tobias_k
              Nov 15 '18 at 13:31












            • Also, even if you have the max K values from both lists, you still have to find the max K of those K² pairs/products of numbers. This part is not really addressed.

              – tobias_k
              Nov 15 '18 at 13:35


















            • "because K <<< N you can say that O(N * K) is almost O(N)" Well, by that reasoning you could also say that sorting the entire list in O(nlogn) is almost O(n). And I agree, for mot practical purposes, it probably is. But then, why not just sort the entire list? In fact, with K=10 and N=1000000, nlogn is still smaller than n*k.

              – tobias_k
              Nov 15 '18 at 13:31












            • Also, even if you have the max K values from both lists, you still have to find the max K of those K² pairs/products of numbers. This part is not really addressed.

              – tobias_k
              Nov 15 '18 at 13:35

















            "because K <<< N you can say that O(N * K) is almost O(N)" Well, by that reasoning you could also say that sorting the entire list in O(nlogn) is almost O(n). And I agree, for mot practical purposes, it probably is. But then, why not just sort the entire list? In fact, with K=10 and N=1000000, nlogn is still smaller than n*k.

            – tobias_k
            Nov 15 '18 at 13:31






            "because K <<< N you can say that O(N * K) is almost O(N)" Well, by that reasoning you could also say that sorting the entire list in O(nlogn) is almost O(n). And I agree, for mot practical purposes, it probably is. But then, why not just sort the entire list? In fact, with K=10 and N=1000000, nlogn is still smaller than n*k.

            – tobias_k
            Nov 15 '18 at 13:31














            Also, even if you have the max K values from both lists, you still have to find the max K of those K² pairs/products of numbers. This part is not really addressed.

            – tobias_k
            Nov 15 '18 at 13:35






            Also, even if you have the max K values from both lists, you still have to find the max K of those K² pairs/products of numbers. This part is not really addressed.

            – tobias_k
            Nov 15 '18 at 13:35


















            draft saved

            draft discarded
















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid


            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.

            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53317379%2fefficient-way-to-find-top-k-products-given-two-lists%23new-answer', 'question_page');

            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            這個網誌中的熱門文章

            How to read a connectionString WITH PROVIDER in .NET Core?

            In R, how to develop a multiplot heatmap.2 figure showing key labels successfully

            Museum of Modern and Contemporary Art of Trento and Rovereto