RASA NLU: Can't extract entity

up vote
0
down vote

favorite

I've trained my rasa nlu model in a way that It recognizes the content in between square brackets as pst entity. For the training part, I had covered both the scenarios with more than 50 examples.

There are two scenarios(only space difference):

When I pass http://www.google.comm, 1283923, [9283911,9309212,9283238], it is considering only [ bracket as the pst entity.

When I pass http://www.google.comm, 1283923, [9283911, 9309212, 9283238], it is working fine and recognizing [9283911, 9309212, 9283238] as the pst entity as expected.

For the scenario 1, I've tried all the possible pipelines, but it only recognizes the first square bracket [ as the pst entity

In the response, I am getting this output:


 'intent': 
 'name': None,
 'confidence': 0.0
 ,
 'entities': [
 
 'start': 0,
 'end': 22,
 'value': 'http://www.google.comm',
 'entity': 'url',
 'confidence': 0.8052099168500071,
 'extractor': 'ner_crf'
 ,
 
 'start': 24,
 'end': 31,
 'value': '1283923',
 'entity': 'defect_id',
 'confidence': 0.8334249141074151,
 'extractor': 'ner_crf'
 ,
 
 'start': 33,
 'end': 34,
 'value': '[',
 'entity': 'pst',
 'confidence': 0.5615805162522188,
 'extractor': 'ner_crf'
 
 ],
 'intent_ranking': ,
 'text': 'http://www.google.comm, 1283923, [9283911,9309212,9283238]'

So, Can anyone tell me what I am missing in the configuration? The problem is happening because of spacing only, and my model should have the knowledge of spacing as I am providing the training data with both scenarios.

asked Nov 10 at 15:06

abhishake

14519

What pipeline, specifically, what tokenizer generated the above? What makes you think you need NLP rather than just a regex pattern matcher?
– Caleb Keller
Nov 10 at 16:25

I am running nlu for different intents and entities too, so I want to use only rasa nlu for this project.
– abhishake
Nov 12 at 13:11

I am using spacy_sklearn pipeline only. should I use any other piepline for extraction?
– abhishake
Nov 12 at 13:13

add a comment |

up vote
0
down vote

favorite

There are two scenarios(only space difference):

When I pass http://www.google.comm, 1283923, [9283911,9309212,9283238], it is considering only [ bracket as the pst entity.

When I pass http://www.google.comm, 1283923, [9283911, 9309212, 9283238], it is working fine and recognizing [9283911, 9309212, 9283238] as the pst entity as expected.

For the scenario 1, I've tried all the possible pipelines, but it only recognizes the first square bracket [ as the pst entity

In the response, I am getting this output:


 'intent': 
 'name': None,
 'confidence': 0.0
 ,
 'entities': [
 
 'start': 0,
 'end': 22,
 'value': 'http://www.google.comm',
 'entity': 'url',
 'confidence': 0.8052099168500071,
 'extractor': 'ner_crf'
 ,
 
 'start': 24,
 'end': 31,
 'value': '1283923',
 'entity': 'defect_id',
 'confidence': 0.8334249141074151,
 'extractor': 'ner_crf'
 ,
 
 'start': 33,
 'end': 34,
 'value': '[',
 'entity': 'pst',
 'confidence': 0.5615805162522188,
 'extractor': 'ner_crf'
 
 ],
 'intent_ranking': ,
 'text': 'http://www.google.comm, 1283923, [9283911,9309212,9283238]'

asked Nov 10 at 15:06

abhishake

14519

What pipeline, specifically, what tokenizer generated the above? What makes you think you need NLP rather than just a regex pattern matcher?
– Caleb Keller
Nov 10 at 16:25

I am running nlu for different intents and entities too, so I want to use only rasa nlu for this project.
– abhishake
Nov 12 at 13:11

I am using spacy_sklearn pipeline only. should I use any other piepline for extraction?
– abhishake
Nov 12 at 13:13

add a comment |

up vote
0
down vote

favorite

There are two scenarios(only space difference):

When I pass http://www.google.comm, 1283923, [9283911,9309212,9283238], it is considering only [ bracket as the pst entity.

When I pass http://www.google.comm, 1283923, [9283911, 9309212, 9283238], it is working fine and recognizing [9283911, 9309212, 9283238] as the pst entity as expected.

For the scenario 1, I've tried all the possible pipelines, but it only recognizes the first square bracket [ as the pst entity

In the response, I am getting this output:


 'intent': 
 'name': None,
 'confidence': 0.0
 ,
 'entities': [
 
 'start': 0,
 'end': 22,
 'value': 'http://www.google.comm',
 'entity': 'url',
 'confidence': 0.8052099168500071,
 'extractor': 'ner_crf'
 ,
 
 'start': 24,
 'end': 31,
 'value': '1283923',
 'entity': 'defect_id',
 'confidence': 0.8334249141074151,
 'extractor': 'ner_crf'
 ,
 
 'start': 33,
 'end': 34,
 'value': '[',
 'entity': 'pst',
 'confidence': 0.5615805162522188,
 'extractor': 'ner_crf'
 
 ],
 'intent_ranking': ,
 'text': 'http://www.google.comm, 1283923, [9283911,9309212,9283238]'

asked Nov 10 at 15:06

abhishake

14519

There are two scenarios(only space difference):

When I pass http://www.google.comm, 1283923, [9283911,9309212,9283238], it is considering only [ bracket as the pst entity.

When I pass http://www.google.comm, 1283923, [9283911, 9309212, 9283238], it is working fine and recognizing [9283911, 9309212, 9283238] as the pst entity as expected.

For the scenario 1, I've tried all the possible pipelines, but it only recognizes the first square bracket [ as the pst entity

In the response, I am getting this output:


 'intent': 
 'name': None,
 'confidence': 0.0
 ,
 'entities': [
 
 'start': 0,
 'end': 22,
 'value': 'http://www.google.comm',
 'entity': 'url',
 'confidence': 0.8052099168500071,
 'extractor': 'ner_crf'
 ,
 
 'start': 24,
 'end': 31,
 'value': '1283923',
 'entity': 'defect_id',
 'confidence': 0.8334249141074151,
 'extractor': 'ner_crf'
 ,
 
 'start': 33,
 'end': 34,
 'value': '[',
 'entity': 'pst',
 'confidence': 0.5615805162522188,
 'extractor': 'ner_crf'
 
 ],
 'intent_ranking': ,
 'text': 'http://www.google.comm, 1283923, [9283911,9309212,9283238]'

rasa-nlu

asked Nov 10 at 15:06

abhishake

14519

asked Nov 10 at 15:06

abhishake

14519

asked Nov 10 at 15:06

abhishake

14519

asked Nov 10 at 15:06

abhishake

14519

asked Nov 10 at 15:06

abhishake

14519

What pipeline, specifically, what tokenizer generated the above? What makes you think you need NLP rather than just a regex pattern matcher?
– Caleb Keller
Nov 10 at 16:25

I am running nlu for different intents and entities too, so I want to use only rasa nlu for this project.
– abhishake
Nov 12 at 13:11

I am using spacy_sklearn pipeline only. should I use any other piepline for extraction?
– abhishake
Nov 12 at 13:13

add a comment |

What pipeline, specifically, what tokenizer generated the above? What makes you think you need NLP rather than just a regex pattern matcher?
– Caleb Keller
Nov 10 at 16:25

I am running nlu for different intents and entities too, so I want to use only rasa nlu for this project.
– abhishake
Nov 12 at 13:11

I am using spacy_sklearn pipeline only. should I use any other piepline for extraction?
– abhishake
Nov 12 at 13:13

What pipeline, specifically, what tokenizer generated the above? What makes you think you need NLP rather than just a regex pattern matcher?
– Caleb Keller
Nov 10 at 16:25

I am running nlu for different intents and entities too, so I want to use only rasa nlu for this project.
– abhishake
Nov 12 at 13:11

I am using spacy_sklearn pipeline only. should I use any other piepline for extraction?
– abhishake
Nov 12 at 13:13

add a comment |

1 Answer
1

active

oldest

votes

up vote
0
down vote

It is good idea to use Regex for your purpose. Rasa NLU supports extraction of Entities by Regex. Normal NLU training data will have something like below


 "rasa_nlu_data": 
 "common_examples": [
 
 "text": "Hi",
 "intent": "greet",
 "entities": 
 ]

You can provide Regex data for training as below in the NLU json file.


 "rasa_nlu_data": 
 "regex_features": [
 
 "name": "pst",
 "pattern": "[..*]"
 ,
 ]

Reference: Regular Expression in Rasal NLU

answered Nov 12 at 15:48

Karthik Sunil

315

I've tried this solution, but unfortunately, it doesn't make any difference in the output.
– abhishake
Nov 12 at 16:26

add a comment |

Your Answer

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53240241%2frasa-nlu-cant-extract-entity%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

up vote
0
down vote

It is good idea to use Regex for your purpose. Rasa NLU supports extraction of Entities by Regex. Normal NLU training data will have something like below


 "rasa_nlu_data": 
 "common_examples": [
 
 "text": "Hi",
 "intent": "greet",
 "entities": 
 ]

You can provide Regex data for training as below in the NLU json file.


 "rasa_nlu_data": 
 "regex_features": [
 
 "name": "pst",
 "pattern": "[..*]"
 ,
 ]

Reference: Regular Expression in Rasal NLU

answered Nov 12 at 15:48

Karthik Sunil

315

I've tried this solution, but unfortunately, it doesn't make any difference in the output.
– abhishake
Nov 12 at 16:26

add a comment |

up vote
0
down vote

It is good idea to use Regex for your purpose. Rasa NLU supports extraction of Entities by Regex. Normal NLU training data will have something like below


 "rasa_nlu_data": 
 "common_examples": [
 
 "text": "Hi",
 "intent": "greet",
 "entities": 
 ]

You can provide Regex data for training as below in the NLU json file.


 "rasa_nlu_data": 
 "regex_features": [
 
 "name": "pst",
 "pattern": "[..*]"
 ,
 ]

Reference: Regular Expression in Rasal NLU

answered Nov 12 at 15:48

Karthik Sunil

315

I've tried this solution, but unfortunately, it doesn't make any difference in the output.
– abhishake
Nov 12 at 16:26

add a comment |

up vote
0
down vote

It is good idea to use Regex for your purpose. Rasa NLU supports extraction of Entities by Regex. Normal NLU training data will have something like below


 "rasa_nlu_data": 
 "common_examples": [
 
 "text": "Hi",
 "intent": "greet",
 "entities": 
 ]

You can provide Regex data for training as below in the NLU json file.


 "rasa_nlu_data": 
 "regex_features": [
 
 "name": "pst",
 "pattern": "[..*]"
 ,
 ]

Reference: Regular Expression in Rasal NLU

answered Nov 12 at 15:48

Karthik Sunil

315

It is good idea to use Regex for your purpose. Rasa NLU supports extraction of Entities by Regex. Normal NLU training data will have something like below


 "rasa_nlu_data": 
 "common_examples": [
 
 "text": "Hi",
 "intent": "greet",
 "entities": 
 ]

You can provide Regex data for training as below in the NLU json file.


 "rasa_nlu_data": 
 "regex_features": [
 
 "name": "pst",
 "pattern": "[..*]"
 ,
 ]

Reference: Regular Expression in Rasal NLU

answered Nov 12 at 15:48

Karthik Sunil

315

answered Nov 12 at 15:48

Karthik Sunil

315

answered Nov 12 at 15:48

Karthik Sunil

315

answered Nov 12 at 15:48

Karthik Sunil

315

I've tried this solution, but unfortunately, it doesn't make any difference in the output.
– abhishake
Nov 12 at 16:26

add a comment |

I've tried this solution, but unfortunately, it doesn't make any difference in the output.
– abhishake
Nov 12 at 16:26

I've tried this solution, but unfortunately, it doesn't make any difference in the output.
– abhishake
Nov 12 at 16:26

add a comment |

draft saved

draft discarded

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Odtnhj