KUB Datalab: Harvesting

Tools

At KUB Datalab we use and support a large array of software.

We have tried to organise our main tools into categories below. Bear in mind, that many types of software can be used for multiple purposes. We have tried to categorise by main purpose.

Tools for harvesting data

Orange

Orange is a component-based visual programming software package for data visualization, machine learning, data mining, and data analysis.
Orange components are called widgets. They range from simple data visualization, subset selection, and preprocessing to empirical evaluation of learning algorithms and predictive modeling.
Visual programming is implemented through an interface in which workflows are created by linking predefined or user-designed widgets, while advanced users can use Orange as a Python library for data manipulation and widget alteration.

Python

Python is a programming language available under an Open Source license. It is smart to know a bit about python programming, partly because the programming language is becoming more and more widespread and used in research, partly because more analyzes in the humanities, social sciences and natural sciences depend on algorithms and calculations. KUB Datalab’s python courses deal with e.g. on analyzes of text data and web scraping.

R

R is a programming language specifically designed for statistical data analysis. It is more or less the industry standard for explorative data analysis, data cleaning and visualization.
KUB Datalab offers courses, both general and tailored to specific needs in R. In our open workshops, we consults on how solve specific problems in and with R. You must find out on your own which statistical test you need to apply, and what visualization best suits your data. When that is decided – we will do our utmost to get you to your goal.
Our approach is based on the tidyverse. We find that Base R solutions, in general are more difficult for beginners to grasp. Close collaboration with our resident Python experts ensures that we are ready to switch gears if necessary.

RegExp

Regular expressions is a structured way of describing patterns in text. A solid grasp of regular expressions enables us to find every word in a text that begins with "th", is followed by 3 or 4 characters and ending with either e or r. Regular expressions is a useful technique in a lot of situations and available in several of the software packages supported in KUB Datalab. We aim to provide training in regular expressions and incorporate the method in other situations where useful.