Tools for cleaning data


At KUB Datalab we use and support a large array of software.

We have tried to organise our main tools into categories below. Bear in mind, that many types of software can be used for multiple purposes. We have tried to categorise by main purpose.

OpenRefine is a free tool, which can help you clean messy data. A typical workflow is to import a data file, work with the many data cleaning options in OpenRefine, and export the file after the cleaning. OpenRefine has a range of import and export options. Users can use OpenRefine’s graphical user interface and coding (GREL and Regular Expressions). OpenRefine does not help users collect data, analyse, or visualise data.

Regular expressions is a structured way of describing patterns in text. A solid grasp of regular expressions enables us to find every word in a text that begins with "th", is followed by 3 or 4 characters and ending with either e or r. Regular expressions is a useful technique in a lot of situations and available in several of the software packages supported in KUB Datalab. We aim to provide training in regular expressions and incorporate the method in other situations where useful.