NLP

From BC$ MobileTV Wiki
Jump to: navigation, search

Natural Language Processing (commonly abbreviated NLP) is the analysis of human language (usually as text input) to formulate understanding of the meaning (whether through semantic associations, statistics, some combination or other techniques) of the human language query or statement.


Specifications

No major specs published to date, but several competing RFCs and intriguing approaches exist.


Word2Vec

Words to Vectors (commonly abbreviated Word2Vec, also known as "Word Embeddings") are text words in a given language converted into numbers, whereby one or more different numerical representations may be made for the same text. This helps associate and group text "meanings of words" , to speed up tasks like classification, reasoning and NLP.


EXAMPLE

An example would be:

Hello everyone, my name is Bryan

An NLP system should recognize that:

  • Hello' is a salutation
  • everyone is the audience to which the opening salutation was directed
  • my correlates the upcoming noun to a person (namely, the person entering the query, if they are speaking in 1st person)
  • name indicates that it is a proper name (title or way to refer to someone or something)
  • is signifies a state of being, or a fact
  • Bryan represents the value of the fact (i.e. in computer code that might look like: name="Bryan" or name.equals("Bryan") or name->"Bryan" etc...)


EmbedVideo does not recognize the video service "googlevideo".





Tools

JAVA

Python

PHP

Proprietary/API

START

START is the the world's first Web-based question answering system, has been on-line and continuously operating since December, 1993. It has been developed by Boris Katz and his associates of the InfoLab Group at the MIT Computer Science and Artificial Intelligence Laboratory. Unlike information retrieval systems (e.g., search engines), START aims to supply users with "just the right information," instead of merely providing a list of hits. Currently, the system can answer millions of English questions about places (e.g., cities, countries, lakes, coordinates, weather, maps, demographics, political and economic systems), movies (e.g., titles, actors, directors), people (e.g., birth dates, biographies), dictionary definitions, and much, much more. Below is a list of some of the things START knows about, with example questions.


Resources


Tutorials


External Links

References

  1. The amazing power of word vectors: https://blog.acolyer.org/2016/04/21/the-amazing-power-of-word-vectors/
  2. Word2Vec Tutorial - The Skip-Gram Model: http://mccormickml.com/2016/04/19/word2vec-tutorial-the-skip-gram-model/
  3. Demystifying Word2Vec: https://www.deeplearningweekly.com/blog/demystifying-word2vec
  4. An Intuitive Understanding of Word Embeddings: From Count Vectors to Word2Vec: https://www.analyticsvidhya.com/blog/2017/06/word-embeddings-count-word2veec/
  5. models.word2vec – Deep learning with word2vecL https://radimrehurek.com/gensim/models/word2vec.html (the first major "word2vec" Python lib, based on the original C library)
  6. Natural-language/JJ Parsing/VBG For/IN The/DT Web/NN: http://nlp.naturalparsing.com/browserparser/parse
  7. Et tu, Watson? IBM's supercomputer can critique your writing: http://www.engadget.com/2015/07/17/ibm-watson-tone-analyzer-writing/
  8. UMBC WebBase corpus of 3B English words: http://ebiquity.umbc.edu/blogger/2013/05/01/umbc-webbase-corpus-of-3b-english-words/

See Also

Semantic Web | AI | Text | WordNet | Translation