Think Database even when Spreadsheeting

Despite my preference for R over Excel, I still use a spreadsheet for many applications. But I find spreadsheets are often misused, or poorly constructed, and can create work later that can exceed the time and effort they might have saved you in the beginning. One way to prevent this is to think database when you are spreadsheeting.

An Excel spreadsheet is not a database and shouldn’t be thought of one, even if the two words are used interchangeably by people like, say, upper management (if your organization is anything like mine). But if you are doing anything with a lot of rows or columns of data, it is always a good idea to construct the spreadsheet as if it were a database. This at least gives you a chance that you will be able to add or extract data from the spreadsheet.

For example, we routinely receive large spreadsheets of thousands of rows long that are, outside of the bounds of the spreadsheet format, almost useless. The addition of fancy titles, merged cells, subtotals, empty rows, and other spreadsheet junk that disrupts the data makes extracting the data tedious at best and nearly impossible at worst. It would have been so much more useful had the author made one sheet hold the data—one header row with unique field names and just the raw data below, one record per row—and one sheet holding the spreadsheet math to provide the summary and calculated information. This is actually easier to construct and doesn’t lock the data in the spreadsheet.  It is easy enough to export the raw data as a dbf or comma delimited file to work on it in a proper database program or just use the data without relying on the Excel structure.

I know there is a temptation there to add the spreadsheet junk.  Pretty colors, lines, comment blocks, font changes, etc. are fine for that sheet that does the calcs (even if all that is usually a waste of time), just don’t let the junk get in the way of the data.  That’s the important stuff.

What do you think?  Drop us a comment.

Advertisements
Published in: on November 10, 2009 at 1:24 pm  Leave a Comment  

The URI to TrackBack this entry is: https://badengineering.org/2009/11/10/think-database-even-when-spreadsheeting/trackback/

RSS feed for comments on this post.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: