Have BI Vendors Mastered Text Mining?
July 11, 2008 BI Solutions, Data 1 CommentTo date the focus in BI has been on structured data. However, large volumes of information is contained within unstructured documents such as blogs, wikis, news feeds, transcripts, pdf’s, email, word documents, and multi media. In fact, one report I read suggested that 85 percent of that company’s data is unstructured data. Whilst this may be a little on the high side for many businesses, it does herald the increasing volumes of data that are not being captured in current BI tools.
In response to this trend, BI vendors are gearing up efforts in text mining capabilities. Both Google and Microsoft have published Enterprise Search solutions that parse unstructured data sources throughout the enterprise to provide results similar to those of Internet Search Engine results. However, this fails to provide the deeper answers that BI tools have become synomonous with. In other words, search can tell you what is happening, but not why it is happening.
Text mining takes unstructured data to this next level, by transforming text into a structured format. It automatically classifies documents and identifies key relationships that provide insight into the WHY. Such relationships are not possible with standard Enterprise Search.
Text Mining is much more than searching and filtering to find the right document – it needs to be able to extract key data and insights from such documents, and connect this with processes and tools used for mining structured data. To date, BI vendors have yet to achieve this capability. However, with the speed of progress being made in data mining, we can hold out hope that this will not be too distant.

