An entity is a sequence of words united by meaning or rule. They represent specific information such as date, time, place, name, color, size, quantity, type of product or service, etc. Entities are not related to intentions and function independently of them.
For example, the phrases "vertical cordless vacuum cleaner" or "washing robot vacuum cleaner" refer to the entity "Type of vacuum cleaner". The expressions "Chinese washing machine as narrow as possible" or "Samsung washing machine from the two hundredth series" refer to the essence of "Type of washing machine".
There are 2 types of entities:
- System (preinstalled) are entities available to each agent that provide basic information processed by the NLU system. They cannot be added, changed or trained, developers are engaged in this process. List of system entities.
If you need to make changes or additions to system entities, contact technical support.
- Custom – allow you to increase the accuracy of intent recognition. Created and configured by the user. With their help, you can highlight any necessary information in phrases. After the entity is defined, it can be written to a variable for further use in the script.
- Each training example in each intention must be marked up if it contains words that fit one of the known entities. Even if this entity is inappropriate and not needed in this intent, it still needs to be done, since the entity classifier is trained on all examples, regardless of whether there are entities there or not, and it has nothing to do with the intent classifier (they are independent of each other).
- It is forbidden to intersect entities in the training examples. For example, in the phrase "lead pre–production engineer" it is necessary to single out only the entity "lead pre-production engineer" - the entities "lead engineer" and "engineer" cannot be distinguished.
- Punctuation marks inside entities are ignored, so you can write all the training examples for entities without them. For example, "lead engineer, the city of New York" is an unnecessary example, you can not specify a comma there.
- The word order of an entity in a phrase is also a feature that is used by the system to search for entities. The position of the entity in the phrase does not matter. It is recommended to add examples with different word order in the entity, if appropriate. For example, "leading engineer", "I'm looking for a leading engineer vacancy", "leading engineer in New York", "do you have a leading engineer vacancy in the Los Angeles", "is there any vacancy for a leading engineer somewhere within the New York region".
- The context of entities does not affect their training and the accuracy of their extraction. In principle, a couple of examples are enough to extract any simple entity (for example, "computer", "laptop", "laptop", "laptop", "computer"). However, for stable operation, it is recommended to specify at least 15-20 examples for each entity.
- When using entities, it is necessary to take into account that many of them are standard (basic) and are extracted by default. That is, it does not make sense to start entities such as date, time, place, address, name, full name, number, amount – they are extracted in any case and will only unnecessarily complicate the training examples.