Here are five tips borne of hard hours grafting through messy databases in search of some grains of gold. Sounds like an exaggeration? Think again… or at least have a read first!
1. Preserve your source data
When you’re about to spend your weekend crunching through a data dump, do yourself a favour and make sure you can back out, or at least understand, any changes you have made.
To gain insight from data you will often need to clean it (for typos or inconsistent categorisations for instance). Keep a separate, unedited worksheet containing the original data, and make any cleaning edits to a copy. Make sure you highlight and document any changes made so you can do it again. Better still, don’t change the data at all but do it with formulae – for example, if you’re correcting a common misspelling, use an INDEX/MATCH combination to replace the misspelling in the raw data with the correct spelling on a clean copy.
2. Save calculation time
If you’re dealing with a massive database and asking Excel to perform many calculations, then you can hard-paste formulae once they’re calculated. This will significantly speed up a workbook that is slow to calculate.
Make sure you keep one row of the formula – so you can copy down to update the calculation when you add to or change your source data. And make sure you shade that row differently and clearly label what you have done.
3. Format consistently
Don’t save good formatting ONLY for the results of your analysis. Instead, focus on your spreadsheet – and your work – as a whole. Deciding on a few rules for formatting will make your analysis more comprehensible, not least to yourself. For example, shade raw input data cells one colour, calculations another, parameters a third. Use consistent formatting for headings, and for units, and for annotations. You can do this with Excel’s built-in themes, but it’s not necessary and arguably less transferable – more important that you get the rules straight in your head.
4. Filter to understand your data
Spend a bit of time getting to know your data. In particular, use Advanced Filter (or, if you prefer, a quick PivotTable) to generate a list of unique categories – this will expose any typos or inconsistencies in classification. For example in the data below, a unique list exposes that some entries for Algeria record the country name in French:
5. Use bins to get more from your data
If you’ve a field such as customer size, categorise the data records in bins to try and understand the characteristics of each type. In this example we’ve just allocated stores into ‘Small’ or ‘Large’ bins according to their revenue:
But we could have more ‘bins’, and allocate stores to them using the MATCH() function. We can then use other data – staff numbers for instance – to understand the characteristics of each of our bins.