RASA NLU: Can't extract entity









up vote
0
down vote

favorite












I've trained my rasa nlu model in a way that It recognizes the content in between square brackets as pst entity. For the training part, I had covered both the scenarios with more than 50 examples.




There are two scenarios(only space difference):



  1. When I pass http://www.google.comm, 1283923, [9283911,9309212,9283238], it is considering only [ bracket as the pst entity.


  2. When I pass http://www.google.comm, 1283923, [9283911, 9309212, 9283238], it is working fine and recognizing [9283911, 9309212, 9283238] as the pst entity as expected.



For the scenario 1, I've tried all the possible pipelines, but it only recognizes the first square bracket [ as the pst entity



In the response, I am getting this output:




'intent':
'name': None,
'confidence': 0.0
,
'entities': [

'start': 0,
'end': 22,
'value': 'http://www.google.comm',
'entity': 'url',
'confidence': 0.8052099168500071,
'extractor': 'ner_crf'
,

'start': 24,
'end': 31,
'value': '1283923',
'entity': 'defect_id',
'confidence': 0.8334249141074151,
'extractor': 'ner_crf'
,

'start': 33,
'end': 34,
'value': '[',
'entity': 'pst',
'confidence': 0.5615805162522188,
'extractor': 'ner_crf'

],
'intent_ranking': ,
'text': 'http://www.google.comm, 1283923, [9283911,9309212,9283238]'




So, Can anyone tell me what I am missing in the configuration? The problem is happening because of spacing only, and my model should have the knowledge of spacing as I am providing the training data with both scenarios.










share|improve this question





















  • What pipeline, specifically, what tokenizer generated the above? What makes you think you need NLP rather than just a regex pattern matcher?
    – Caleb Keller
    Nov 10 at 16:25










  • I am running nlu for different intents and entities too, so I want to use only rasa nlu for this project.
    – abhishake
    Nov 12 at 13:11










  • I am using spacy_sklearn pipeline only. should I use any other piepline for extraction?
    – abhishake
    Nov 12 at 13:13














up vote
0
down vote

favorite












I've trained my rasa nlu model in a way that It recognizes the content in between square brackets as pst entity. For the training part, I had covered both the scenarios with more than 50 examples.




There are two scenarios(only space difference):



  1. When I pass http://www.google.comm, 1283923, [9283911,9309212,9283238], it is considering only [ bracket as the pst entity.


  2. When I pass http://www.google.comm, 1283923, [9283911, 9309212, 9283238], it is working fine and recognizing [9283911, 9309212, 9283238] as the pst entity as expected.



For the scenario 1, I've tried all the possible pipelines, but it only recognizes the first square bracket [ as the pst entity



In the response, I am getting this output:




'intent':
'name': None,
'confidence': 0.0
,
'entities': [

'start': 0,
'end': 22,
'value': 'http://www.google.comm',
'entity': 'url',
'confidence': 0.8052099168500071,
'extractor': 'ner_crf'
,

'start': 24,
'end': 31,
'value': '1283923',
'entity': 'defect_id',
'confidence': 0.8334249141074151,
'extractor': 'ner_crf'
,

'start': 33,
'end': 34,
'value': '[',
'entity': 'pst',
'confidence': 0.5615805162522188,
'extractor': 'ner_crf'

],
'intent_ranking': ,
'text': 'http://www.google.comm, 1283923, [9283911,9309212,9283238]'




So, Can anyone tell me what I am missing in the configuration? The problem is happening because of spacing only, and my model should have the knowledge of spacing as I am providing the training data with both scenarios.










share|improve this question





















  • What pipeline, specifically, what tokenizer generated the above? What makes you think you need NLP rather than just a regex pattern matcher?
    – Caleb Keller
    Nov 10 at 16:25










  • I am running nlu for different intents and entities too, so I want to use only rasa nlu for this project.
    – abhishake
    Nov 12 at 13:11










  • I am using spacy_sklearn pipeline only. should I use any other piepline for extraction?
    – abhishake
    Nov 12 at 13:13












up vote
0
down vote

favorite









up vote
0
down vote

favorite











I've trained my rasa nlu model in a way that It recognizes the content in between square brackets as pst entity. For the training part, I had covered both the scenarios with more than 50 examples.




There are two scenarios(only space difference):



  1. When I pass http://www.google.comm, 1283923, [9283911,9309212,9283238], it is considering only [ bracket as the pst entity.


  2. When I pass http://www.google.comm, 1283923, [9283911, 9309212, 9283238], it is working fine and recognizing [9283911, 9309212, 9283238] as the pst entity as expected.



For the scenario 1, I've tried all the possible pipelines, but it only recognizes the first square bracket [ as the pst entity



In the response, I am getting this output:




'intent':
'name': None,
'confidence': 0.0
,
'entities': [

'start': 0,
'end': 22,
'value': 'http://www.google.comm',
'entity': 'url',
'confidence': 0.8052099168500071,
'extractor': 'ner_crf'
,

'start': 24,
'end': 31,
'value': '1283923',
'entity': 'defect_id',
'confidence': 0.8334249141074151,
'extractor': 'ner_crf'
,

'start': 33,
'end': 34,
'value': '[',
'entity': 'pst',
'confidence': 0.5615805162522188,
'extractor': 'ner_crf'

],
'intent_ranking': ,
'text': 'http://www.google.comm, 1283923, [9283911,9309212,9283238]'




So, Can anyone tell me what I am missing in the configuration? The problem is happening because of spacing only, and my model should have the knowledge of spacing as I am providing the training data with both scenarios.










share|improve this question













I've trained my rasa nlu model in a way that It recognizes the content in between square brackets as pst entity. For the training part, I had covered both the scenarios with more than 50 examples.




There are two scenarios(only space difference):



  1. When I pass http://www.google.comm, 1283923, [9283911,9309212,9283238], it is considering only [ bracket as the pst entity.


  2. When I pass http://www.google.comm, 1283923, [9283911, 9309212, 9283238], it is working fine and recognizing [9283911, 9309212, 9283238] as the pst entity as expected.



For the scenario 1, I've tried all the possible pipelines, but it only recognizes the first square bracket [ as the pst entity



In the response, I am getting this output:




'intent':
'name': None,
'confidence': 0.0
,
'entities': [

'start': 0,
'end': 22,
'value': 'http://www.google.comm',
'entity': 'url',
'confidence': 0.8052099168500071,
'extractor': 'ner_crf'
,

'start': 24,
'end': 31,
'value': '1283923',
'entity': 'defect_id',
'confidence': 0.8334249141074151,
'extractor': 'ner_crf'
,

'start': 33,
'end': 34,
'value': '[',
'entity': 'pst',
'confidence': 0.5615805162522188,
'extractor': 'ner_crf'

],
'intent_ranking': ,
'text': 'http://www.google.comm, 1283923, [9283911,9309212,9283238]'




So, Can anyone tell me what I am missing in the configuration? The problem is happening because of spacing only, and my model should have the knowledge of spacing as I am providing the training data with both scenarios.







rasa-nlu






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Nov 10 at 15:06









abhishake

14519




14519











  • What pipeline, specifically, what tokenizer generated the above? What makes you think you need NLP rather than just a regex pattern matcher?
    – Caleb Keller
    Nov 10 at 16:25










  • I am running nlu for different intents and entities too, so I want to use only rasa nlu for this project.
    – abhishake
    Nov 12 at 13:11










  • I am using spacy_sklearn pipeline only. should I use any other piepline for extraction?
    – abhishake
    Nov 12 at 13:13
















  • What pipeline, specifically, what tokenizer generated the above? What makes you think you need NLP rather than just a regex pattern matcher?
    – Caleb Keller
    Nov 10 at 16:25










  • I am running nlu for different intents and entities too, so I want to use only rasa nlu for this project.
    – abhishake
    Nov 12 at 13:11










  • I am using spacy_sklearn pipeline only. should I use any other piepline for extraction?
    – abhishake
    Nov 12 at 13:13















What pipeline, specifically, what tokenizer generated the above? What makes you think you need NLP rather than just a regex pattern matcher?
– Caleb Keller
Nov 10 at 16:25




What pipeline, specifically, what tokenizer generated the above? What makes you think you need NLP rather than just a regex pattern matcher?
– Caleb Keller
Nov 10 at 16:25












I am running nlu for different intents and entities too, so I want to use only rasa nlu for this project.
– abhishake
Nov 12 at 13:11




I am running nlu for different intents and entities too, so I want to use only rasa nlu for this project.
– abhishake
Nov 12 at 13:11












I am using spacy_sklearn pipeline only. should I use any other piepline for extraction?
– abhishake
Nov 12 at 13:13




I am using spacy_sklearn pipeline only. should I use any other piepline for extraction?
– abhishake
Nov 12 at 13:13












1 Answer
1






active

oldest

votes

















up vote
0
down vote













It is good idea to use Regex for your purpose. Rasa NLU supports extraction of Entities by Regex. Normal NLU training data will have something like below




"rasa_nlu_data":
"common_examples": [

"text": "Hi",
"intent": "greet",
"entities":
]




You can provide Regex data for training as below in the NLU json file.




"rasa_nlu_data":
"regex_features": [

"name": "pst",
"pattern": "[..*]"
,
]




Reference: Regular Expression in Rasal NLU






share|improve this answer




















  • I've tried this solution, but unfortunately, it doesn't make any difference in the output.
    – abhishake
    Nov 12 at 16:26










Your Answer






StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);













 

draft saved


draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53240241%2frasa-nlu-cant-extract-entity%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes








up vote
0
down vote













It is good idea to use Regex for your purpose. Rasa NLU supports extraction of Entities by Regex. Normal NLU training data will have something like below




"rasa_nlu_data":
"common_examples": [

"text": "Hi",
"intent": "greet",
"entities":
]




You can provide Regex data for training as below in the NLU json file.




"rasa_nlu_data":
"regex_features": [

"name": "pst",
"pattern": "[..*]"
,
]




Reference: Regular Expression in Rasal NLU






share|improve this answer




















  • I've tried this solution, but unfortunately, it doesn't make any difference in the output.
    – abhishake
    Nov 12 at 16:26














up vote
0
down vote













It is good idea to use Regex for your purpose. Rasa NLU supports extraction of Entities by Regex. Normal NLU training data will have something like below




"rasa_nlu_data":
"common_examples": [

"text": "Hi",
"intent": "greet",
"entities":
]




You can provide Regex data for training as below in the NLU json file.




"rasa_nlu_data":
"regex_features": [

"name": "pst",
"pattern": "[..*]"
,
]




Reference: Regular Expression in Rasal NLU






share|improve this answer




















  • I've tried this solution, but unfortunately, it doesn't make any difference in the output.
    – abhishake
    Nov 12 at 16:26












up vote
0
down vote










up vote
0
down vote









It is good idea to use Regex for your purpose. Rasa NLU supports extraction of Entities by Regex. Normal NLU training data will have something like below




"rasa_nlu_data":
"common_examples": [

"text": "Hi",
"intent": "greet",
"entities":
]




You can provide Regex data for training as below in the NLU json file.




"rasa_nlu_data":
"regex_features": [

"name": "pst",
"pattern": "[..*]"
,
]




Reference: Regular Expression in Rasal NLU






share|improve this answer












It is good idea to use Regex for your purpose. Rasa NLU supports extraction of Entities by Regex. Normal NLU training data will have something like below




"rasa_nlu_data":
"common_examples": [

"text": "Hi",
"intent": "greet",
"entities":
]




You can provide Regex data for training as below in the NLU json file.




"rasa_nlu_data":
"regex_features": [

"name": "pst",
"pattern": "[..*]"
,
]




Reference: Regular Expression in Rasal NLU







share|improve this answer












share|improve this answer



share|improve this answer










answered Nov 12 at 15:48









Karthik Sunil

315




315











  • I've tried this solution, but unfortunately, it doesn't make any difference in the output.
    – abhishake
    Nov 12 at 16:26
















  • I've tried this solution, but unfortunately, it doesn't make any difference in the output.
    – abhishake
    Nov 12 at 16:26















I've tried this solution, but unfortunately, it doesn't make any difference in the output.
– abhishake
Nov 12 at 16:26




I've tried this solution, but unfortunately, it doesn't make any difference in the output.
– abhishake
Nov 12 at 16:26

















 

draft saved


draft discarded















































 


draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53240241%2frasa-nlu-cant-extract-entity%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







這個網誌中的熱門文章

Barbados

How to read a connectionString WITH PROVIDER in .NET Core?

Node.js Script on GitHub Pages or Amazon S3