Elsaticsearch 6.3.1 provides different results on cloud and local despite using dfs_query_then_fetch. Query using python's elasticsearch package
I am using Elasticsearch for querying data. I query a medical term and in return i get the code for disease as output. Here is my sample query:
es.search(index="myindex", body="query": "match": "text_field": "search_term", search_type='dfs_query_then_fetch')
# Expected output - ABC
# Local Output - ABC
# Output on Amazon EMR - XYZ
The problem is when I run it on cloud my output is totally different.
I have exactly the same index on cloud and locally. Despite that the results on cloud are weird. We have an Amazon EMR instance where I have even tried re-creating the index. But no luck.
Local OS - Ubuntu 16.0.4
OS on Amazon EMR -Amazon Linux
Any help would be really appreciated.
elasticsearch amazon-emr elasticsearch-py
add a comment |
I am using Elasticsearch for querying data. I query a medical term and in return i get the code for disease as output. Here is my sample query:
es.search(index="myindex", body="query": "match": "text_field": "search_term", search_type='dfs_query_then_fetch')
# Expected output - ABC
# Local Output - ABC
# Output on Amazon EMR - XYZ
The problem is when I run it on cloud my output is totally different.
I have exactly the same index on cloud and locally. Despite that the results on cloud are weird. We have an Amazon EMR instance where I have even tried re-creating the index. But no luck.
Local OS - Ubuntu 16.0.4
OS on Amazon EMR -Amazon Linux
Any help would be really appreciated.
elasticsearch amazon-emr elasticsearch-py
Same number of shards for the index locally and on cloud?
– Russ Cam
Nov 12 at 10:18
Yes Russ. My index is small. So I have default number of shards and replicas. Its consistent on both cloud and local machine.
– Sagar Dawda
Nov 12 at 10:33
add a comment |
I am using Elasticsearch for querying data. I query a medical term and in return i get the code for disease as output. Here is my sample query:
es.search(index="myindex", body="query": "match": "text_field": "search_term", search_type='dfs_query_then_fetch')
# Expected output - ABC
# Local Output - ABC
# Output on Amazon EMR - XYZ
The problem is when I run it on cloud my output is totally different.
I have exactly the same index on cloud and locally. Despite that the results on cloud are weird. We have an Amazon EMR instance where I have even tried re-creating the index. But no luck.
Local OS - Ubuntu 16.0.4
OS on Amazon EMR -Amazon Linux
Any help would be really appreciated.
elasticsearch amazon-emr elasticsearch-py
I am using Elasticsearch for querying data. I query a medical term and in return i get the code for disease as output. Here is my sample query:
es.search(index="myindex", body="query": "match": "text_field": "search_term", search_type='dfs_query_then_fetch')
# Expected output - ABC
# Local Output - ABC
# Output on Amazon EMR - XYZ
The problem is when I run it on cloud my output is totally different.
I have exactly the same index on cloud and locally. Despite that the results on cloud are weird. We have an Amazon EMR instance where I have even tried re-creating the index. But no luck.
Local OS - Ubuntu 16.0.4
OS on Amazon EMR -Amazon Linux
Any help would be really appreciated.
elasticsearch amazon-emr elasticsearch-py
elasticsearch amazon-emr elasticsearch-py
asked Nov 12 at 7:05
Sagar Dawda
44529
44529
Same number of shards for the index locally and on cloud?
– Russ Cam
Nov 12 at 10:18
Yes Russ. My index is small. So I have default number of shards and replicas. Its consistent on both cloud and local machine.
– Sagar Dawda
Nov 12 at 10:33
add a comment |
Same number of shards for the index locally and on cloud?
– Russ Cam
Nov 12 at 10:18
Yes Russ. My index is small. So I have default number of shards and replicas. Its consistent on both cloud and local machine.
– Sagar Dawda
Nov 12 at 10:33
Same number of shards for the index locally and on cloud?
– Russ Cam
Nov 12 at 10:18
Same number of shards for the index locally and on cloud?
– Russ Cam
Nov 12 at 10:18
Yes Russ. My index is small. So I have default number of shards and replicas. Its consistent on both cloud and local machine.
– Sagar Dawda
Nov 12 at 10:33
Yes Russ. My index is small. So I have default number of shards and replicas. Its consistent on both cloud and local machine.
– Sagar Dawda
Nov 12 at 10:33
add a comment |
2 Answers
2
active
oldest
votes
For those who responded to my questions, thanks for the efforts.
I figured out what the problem was.
There's a bootstrap script running on AWS which starts the elasticsearch service and also runs my index creator python file in parallel.
Due to this a few requests get timed out during index creation as the cluster takes some time to get up and running. So ideally my index is partially created and therefore the varying results.
Hope this would be helpful for those running elasticsearch on Amazon EMR.
Cheers!
add a comment |
Try using the "preference" parameter while querying the data. Something like this:
es.search(index="myindex",
body="query": "match": "text_field": "search_term",
preference="_primary_first"
)
Update:
Some possible values like "_primary_first" have been deprecated as of Elasticsearch 6.x and will be completely removed in Elasticsearch 7.0
Will run this query and let you know how it goes.
– Sagar Dawda
Nov 12 at 10:32
This option is deprecated in version 6.x and will be removed in 7. Since I am using dfs_query_then_fetch it should ideally query all the shards.
– Sagar Dawda
Nov 12 at 12:29
add a comment |
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53257307%2felsaticsearch-6-3-1-provides-different-results-on-cloud-and-local-despite-using%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
For those who responded to my questions, thanks for the efforts.
I figured out what the problem was.
There's a bootstrap script running on AWS which starts the elasticsearch service and also runs my index creator python file in parallel.
Due to this a few requests get timed out during index creation as the cluster takes some time to get up and running. So ideally my index is partially created and therefore the varying results.
Hope this would be helpful for those running elasticsearch on Amazon EMR.
Cheers!
add a comment |
For those who responded to my questions, thanks for the efforts.
I figured out what the problem was.
There's a bootstrap script running on AWS which starts the elasticsearch service and also runs my index creator python file in parallel.
Due to this a few requests get timed out during index creation as the cluster takes some time to get up and running. So ideally my index is partially created and therefore the varying results.
Hope this would be helpful for those running elasticsearch on Amazon EMR.
Cheers!
add a comment |
For those who responded to my questions, thanks for the efforts.
I figured out what the problem was.
There's a bootstrap script running on AWS which starts the elasticsearch service and also runs my index creator python file in parallel.
Due to this a few requests get timed out during index creation as the cluster takes some time to get up and running. So ideally my index is partially created and therefore the varying results.
Hope this would be helpful for those running elasticsearch on Amazon EMR.
Cheers!
For those who responded to my questions, thanks for the efforts.
I figured out what the problem was.
There's a bootstrap script running on AWS which starts the elasticsearch service and also runs my index creator python file in parallel.
Due to this a few requests get timed out during index creation as the cluster takes some time to get up and running. So ideally my index is partially created and therefore the varying results.
Hope this would be helpful for those running elasticsearch on Amazon EMR.
Cheers!
answered Nov 12 at 18:19
Sagar Dawda
44529
44529
add a comment |
add a comment |
Try using the "preference" parameter while querying the data. Something like this:
es.search(index="myindex",
body="query": "match": "text_field": "search_term",
preference="_primary_first"
)
Update:
Some possible values like "_primary_first" have been deprecated as of Elasticsearch 6.x and will be completely removed in Elasticsearch 7.0
Will run this query and let you know how it goes.
– Sagar Dawda
Nov 12 at 10:32
This option is deprecated in version 6.x and will be removed in 7. Since I am using dfs_query_then_fetch it should ideally query all the shards.
– Sagar Dawda
Nov 12 at 12:29
add a comment |
Try using the "preference" parameter while querying the data. Something like this:
es.search(index="myindex",
body="query": "match": "text_field": "search_term",
preference="_primary_first"
)
Update:
Some possible values like "_primary_first" have been deprecated as of Elasticsearch 6.x and will be completely removed in Elasticsearch 7.0
Will run this query and let you know how it goes.
– Sagar Dawda
Nov 12 at 10:32
This option is deprecated in version 6.x and will be removed in 7. Since I am using dfs_query_then_fetch it should ideally query all the shards.
– Sagar Dawda
Nov 12 at 12:29
add a comment |
Try using the "preference" parameter while querying the data. Something like this:
es.search(index="myindex",
body="query": "match": "text_field": "search_term",
preference="_primary_first"
)
Update:
Some possible values like "_primary_first" have been deprecated as of Elasticsearch 6.x and will be completely removed in Elasticsearch 7.0
Try using the "preference" parameter while querying the data. Something like this:
es.search(index="myindex",
body="query": "match": "text_field": "search_term",
preference="_primary_first"
)
Update:
Some possible values like "_primary_first" have been deprecated as of Elasticsearch 6.x and will be completely removed in Elasticsearch 7.0
edited Nov 13 at 7:10
answered Nov 12 at 10:14
Abhilash Bolla
398
398
Will run this query and let you know how it goes.
– Sagar Dawda
Nov 12 at 10:32
This option is deprecated in version 6.x and will be removed in 7. Since I am using dfs_query_then_fetch it should ideally query all the shards.
– Sagar Dawda
Nov 12 at 12:29
add a comment |
Will run this query and let you know how it goes.
– Sagar Dawda
Nov 12 at 10:32
This option is deprecated in version 6.x and will be removed in 7. Since I am using dfs_query_then_fetch it should ideally query all the shards.
– Sagar Dawda
Nov 12 at 12:29
Will run this query and let you know how it goes.
– Sagar Dawda
Nov 12 at 10:32
Will run this query and let you know how it goes.
– Sagar Dawda
Nov 12 at 10:32
This option is deprecated in version 6.x and will be removed in 7. Since I am using dfs_query_then_fetch it should ideally query all the shards.
– Sagar Dawda
Nov 12 at 12:29
This option is deprecated in version 6.x and will be removed in 7. Since I am using dfs_query_then_fetch it should ideally query all the shards.
– Sagar Dawda
Nov 12 at 12:29
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53257307%2felsaticsearch-6-3-1-provides-different-results-on-cloud-and-local-despite-using%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Same number of shards for the index locally and on cloud?
– Russ Cam
Nov 12 at 10:18
Yes Russ. My index is small. So I have default number of shards and replicas. Its consistent on both cloud and local machine.
– Sagar Dawda
Nov 12 at 10:33