Courses
Text Functions
Data cleanup
Unlike numbers, text (also known as strings) usually comes in non-standard formats where a clean-up is needed. This is due to the broad input possibilities that text fields allow for. Text can contain numbers, blank spaces, special characters etc. It is not uncommon that a certain text input is separated in several fields.
Problems with text fields:
- Blank spaces: Too many blank spaces, that may be hard to spot.
- Wrong input: E.g. surname in first name field and first name in surname field.
- Misunderstandings: E.g. what is considered a middle name or what is the format for country code?
- Hard to analyze: Hard to extract and analyze information, e.g. large text fields with comments instead of drop-lists with limited choices.
To force users to input cleaner data from the start, text fields should be restricted with Data Validation if possible. This is often the case to some extent in software like ERP systems, but there still a lot of room for input errors. There can also be problematic programming logic like programmers of the ERP only created one input field for a contact name, and you want to work with surnames. Text functions in Excel can handle much of the basic cleanup needed due to input errors and faulty programming logic.
Tip: Input with limited choices
The rule of thumb is that if an input field can be divided into several fields with limited choices for the user (preferably restricted by a drop-down list), it should, as this makes data easier to analyse later on.
Warning: Clean-up at the core
Avoid workarounds when it comes to severe input errors. Prioritize clean-up in the systems you extract data from over clean-up of extracts. This is part of the daily work maintaining correct master data.

