Apache Big Data Seville 2016 – Classifying Unstructured Text – Deterministic and Machine Learning Approaches – Christian Winkler & Stephanie Fischer

Classifying Unstructured Text – Deterministic and Machine Learning Approaches – Christian Winkler & Stephanie Fischer

Text is one of the most used forms of communication and ubiquitous in the Internet. Social networks like Facebook and Twitter mainly contain unstructured text; the same is true for content-driven websites.

For humans it is easy to grasp the meaning of text – much more difficult for computers. Used correctly, computers can help humans tremendously in structuring and classifying huge amounts of text. This “symbiosis” can help humans work more efficiently, reduce repetitve work and use the uncovered structure.

Our talk starts with visualizations giving us ideas how to automatically classify texts. Then we will demonstrate that manual intervention is sometimes necessary and how this can be used as a basis for machine learning. This helps significantly in classifying more complicated cases.

As software tools we use R, Apache Solr, D3.js, and several NLP and ML tools from the ASF.

More information about this talk

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s