Text analytics
{{Wikify|date=January 2008}}
{{Expand|date=January 2008}}{{context}}
The term '''text analytics''' describes a set of linguistic, lexical, pattern recognition,
extraction, tagging/structuring, visualization, and predictive techniques. The term
also describes processes that apply these techniques, whether independently or in
conjunction with query and analysis of fielded, numerical data, to solve business
problems. These techniques and processes discover and present knowledge – facts,
business rules, and relationships – that is otherwise locked in textual form, impenetrable
to automated processing.

A typical application is to scan a set of documents written in a [[natural language]] and either model the document set for predictive classification purposes or populate a database or search index with the information extracted. Current approaches to text analytics use [[natural language processing]] techniques that focus on specialized domains. 

Typical subtasks are:

* [[Named Entity Recognition]]: recognition of entity names (for people and organizations), place names, temporal expressions, and certain types of numerical expressions.
* [[Coreference]]: identification chains of [[noun phrase]]s that refer to the same object. For example, [[Anaphora (linguistics)|anaphora]] is a type of coreference.
* [[Relationship Extraction]]: extraction of named relationships between entities in text

==See also==

* [[Noisy text analytics]]
* [[Information extraction]]
* [[Computational linguistics]]
* [[Natural language processing]]
* [[Named entity recognition]]
* [[Text mining]]

==Software and Applications==

===Commercial Software and Applications===
 
* [[AeroText]] - provides a suite of text mining applications for content analysis. Content used can be in multiple languages.
* [[Alethes OpenEyes]][http://www.alethes.it] - provides a complete suite fot text analytics for 8 different language, including information extration, entity recognition, taxonomy generation, clustering, categorization, summarization, sentiment analysis. 
* [[Anderson Analytics]] - provider of text analytics and content analysis especially as it relates to consumer behavior.
* [[Attensity]] provides hosted, integrated and stand-alone text analytics software.
* [[Carabao Language Kit]] - suite of components for text analytics, categorization, sense disambiguation, idiom extraction, named entity recognition with tools to add a new language or edit exiting one(s). 
* [[Clarabridge]] is a provider of end-to-end text analytics software and solutions for Voice of the Customer, Quality Assurance, Competitive Intelligence and other application areas.
* [[Clearforest]] [http://www.clearforest.com] is a provider of solutions and software to extract structured data from unstructured texts. It recently got acquired by Reuters which was merged with Thomson. The new organization is now called Thomson Reuters.
* [[IBM LanguageWare]] [http://www.alphaworks.ibm.com/tech/lrw] is the IBM suite for Text Analytics (Tools and Runtime).
* Ixreveal [http://www.ixreveal.com] is commercial text mining and patented OLAP (OnLine Analytical Processing) for Text software vendor specialized in providing complete solution for structured and unstructured data using advanced analytics algorithms and techniques. uReveal and uReka! [http://www.ureka.info] products have been adopted by major international companies and US local and federal government agencies in areas like fraud and recovery, voice of the customer, and law enforcement. 
* [[Infonic]] provides commercial sentiment analysis of financial news feeds for the Thomson Reuters RMDS trading information system. The "sentiment scores" that this software provides are used within [[algorithmic trading]] systems by several major trading banks. [[Infonic]] also develops unique document summarization and textual navigation technologies that aid in [[Knowledge Management]].
* [[Rapid-I]] is a provider of predictive analytics, data mining, and text mining software, solutions, and services.
* [[SPSS]] [http://www.spss.com] - provider of SPSS Text Analysis for Surveys, Text Mining for Clementine, LexiQuest Mine and LexiQuest Categorize, commercial text analytics software that can be used in conjunction with SPSS Predictive Analytics Solutions. 
* [[Teezir Search Solutions]] designs, delivers and hosts knowledge management applications for professional services firms. Its flagship solution is Teezir Expert Finder, a search engine that identifies experts within an organization, based on all documents on the firm's networks
* [[TEMIS]] is a software editor providing innovative Information Discovery solutions to serve the Information Intelligence needs of business corporations.

===Open-Source Software and Applications===

* [[RapidMiner]]  - open-source software for data and text mining
* GATE - Open-source toolbox for text engineering and natural language processing

== External links==
{{linkfarm}} 
* Automatic Content Extraction, Linguistic Data Consortium: http://projects.ldc.upenn.edu/ace/
* Automatic Content Extraction, NIST: http://www.itl.nist.gov/iad/894.01/tests/ace/
* Message Understanding Conference: http://www.itl.nist.gov/iaui/894.02/related_projects/muc/
* Seth Grimes's Text Analytics expert channel at the Business Intelligence Network: http://www.b-eye-network.com/channels/index.php?filter_channel=1394
* Text Analytics Summit: http://www.textanalyticsnews.com/
* Text Analytics Wiki: http://textanalytics.wikidot.com/start
* Text Analytics Yahoo group: http://tech.groups.yahoo.com/group/TextAnalytics/
* Text Analytics Linkedin group: http://www.linkedin.com/e/gis/22313/3A5CAF691C78

[[es:Extracción de la información]]
[[Category:Natural language processing]]