Community

Processing and content analysis of various document types using MapReduce and InfoSphere BigInsights

(Sajad Izadi, Benjamin G. Leonhardi and Piotr Pruski) Businesses often need to analyze large numbers of documents of various file types. Apache Tika is a free open source library that extracts text contents from a variety of document formats, such as Microsoft® Word, RTF, and PDF

Read More - Register for Free Membership

Tags: