Think Database even when Spreadsheeting

Despite my preference for R over Excel, I still use a spreadsheet for many applications. But I find spreadsheets are often misused, or poorly constructed, and can create work later that can exceed the time and effort they might have saved you in the beginning. One way to prevent this is to think database when you are spreadsheeting.

An Excel spreadsheet is not a database and shouldn’t be thought of one, even if the two words are used interchangeably by people like, say, upper management (if your organization is anything like mine). But if you are doing anything with a lot of rows or columns of data, it is always a good idea to construct the spreadsheet as if it were a database. This at least gives you a chance that you will be able to add or extract data from the spreadsheet.

For example, we routinely receive large spreadsheets of thousands of rows long that are, outside of the bounds of the spreadsheet format, almost useless. The addition of fancy titles, merged cells, subtotals, empty rows, and other spreadsheet junk that disrupts the data makes extracting the data tedious at best and nearly impossible at worst. It would have been so much more useful had the author made one sheet hold the data—one header row with unique field names and just the raw data below, one record per row—and one sheet holding the spreadsheet math to provide the summary and calculated information. This is actually easier to construct and doesn’t lock the data in the spreadsheet.  It is easy enough to export the raw data as a dbf or comma delimited file to work on it in a proper database program or just use the data without relying on the Excel structure.

I know there is a temptation there to add the spreadsheet junk.  Pretty colors, lines, comment blocks, font changes, etc. are fine for that sheet that does the calcs (even if all that is usually a waste of time), just don’t let the junk get in the way of the data.  That’s the important stuff.

What do you think?  Drop us a comment.


Great Things #1

Do you have that thing that you always come back to, that you never seem to tire of, that always seems to have hidden depths and complexities that await exploring, that just makes you feel good? No, I ‘m not talking about your spouse (but if you were thinking about yours, congratulations). And no, not your Ferrari 599 either (you lucky bastard). Nor your HP 15C. Okay, maybe the 15 comes close.

I know, this is totally subjective and the answers are as varied as we all are. But for me, at least most of the time, it is \LaTeX. No not the shiny black stuff worn by the nubile young things on the Internet (which isn’t too bad in itself… smooth curves catching the light just so, hints of smooth…) sorry, seemed to have drifted off there for a second. Where was I? Oh, yeah, \LaTeX. The document preparation system. (Pronounced lah-tech or lay-tech).

\LaTeX is essentially a set of macros sitting on top of the venerable typesetting system, \TeX, invented by that computer master, Donald E. Knuth. No, it isn’t a word processor, and it shows its age by being almost GUI-free, but it is the thing to use to produce documents you can be proud of. All with a simple text editor and some markup commands that can be simple enough for a school child or complicated enough to keep the most nerdy of PhDs happy for years.

I’ve prepared three books and many papers and reports with \LaTeX and it was a pleasure to use and produced beautiful documents with little need for me to fiddle with margins, headers, page and section breaks, tables of contents, indexes, citations, etc., since most of these jobs are done automatically. You can, of course, customize the document to the nth degree, but most of the time the default settings are fine. And all the time you know what the program is doing, unlike M$ Word which I have to say drives me crazy.

And all you need is a decent text editing program and you’re in business. You’re source file is in plain old ASCII which means you can edit the file on virtually any computer and compile it there too–and get the same output whether you happen to be using a PC, a Mac, Linux, or pretty much any other system. And no worries about corrupted files. The source file isn’t touched during compilation, and when was the last time your text editor screwed up your ASCII file? Probably never.

Yes, the learning curve for \LaTeX can be steep–depending on what you’re trying to do–but it is so satisfying to learn and use. Especially compared to that pinacle of mediocrity that is Word. Better words than I can write about that comparison can be found here:

And the following link discusses the beauty of \LaTeX:

And I almost forgot one feature that stands above most others, the ability to typeset beautiful equations.  Though I gather the M$ equation editor is getting better, imagine just being able to write something like this:

\int_{0}^{1} x dx = \left[ \frac{1}{2}x^2 \right]_{0}^{1}
= \frac{1}{2}

and get something like this:

\int_{0}^{1} x dx = \left[ \frac{1}{2}x^2 \right]_{0}^{1} =\frac{1}{2}

or this:

\gamma \equiv \frac{1}{\sqrt{1 - v^2/c^2}}

and get this:

\Large{\gamma \equiv \frac{1}{\sqrt{1 - v^2/c^2}}}

Even WordPress here and WikiPedia use \LaTeX for formula setting.  So much easier than futzing with the M$ equation editor.

Am I being archaic and anachronistic? Sure. But it is better, a better way, and just more satisfying than almost any other thing on my computer. As a matter of fact, \LaTeX and R (see previous post) are the only things I need on my computer. Okay, a browser too. With these I can do anything I want. And all for free, I might add, though I do support the users group ( and its talented membership who make all this possible. I’d much rather give them my money rather than Micro$oft. I think they have enough.

So if you’re up to a challenge, give it a try and let us know what you think.

R in the News


So I was reading the latest R journal and there was a reference to an article in the New York Times on R. Here is the link:

This just makes me happy that this great, free, software is finally being given it’s due. I never thought it would happen in this world dominated with Micro$oft mediocrity — in this case, Excel. Now I’ve used Excel a lot and it does some things well, or at least adequately, but until you move from the GUI thinking you really don’t know what you’re missing.

R is an interactive statistical environment that generally has a terminal text interface. And it’s brilliant. Its capabilities are almost endless — if the number and variation of packages are any indication — and once you get past the initial steep learning curve it makes data analysis so fast and easy and deep that you will hate to go back to Excel.

Don’t believe me? Here is a small example:

Problem: Show a histogram of 100 values pulled from a normal distribution with a mean of 10 and a standard deviation of 2.

Excel solution:

1. Open new file

2. Make column of 100 numbers from -4 to 4 (z graph limits).

3. Calculate x values in next column (x=z*std+mean); copy down to fill column.

4. Calculate f(x) using Excel’s NORMDIST function; copy down to fill column.

5. Load Analysis toolpack.

6. Make bins for the histogram.

7. Open data analysis window, choose histogram, input range for data and bins.

8. Open chart wizard, choose column chart.

9. Input data ranges for bins and values.

10. Format data series to reduce gaps between columns.

R solution:

1. Open R.

2. Type: x=rnorm(100,mean=10,sd=2) and hit enter.

3. Type: hist(x, breaks=20, col=”blue”) and hit enter.

Now really, which would you really want to do? And don’t think this is an untypical example just to make R look good; almost everything I need to do can be done better, faster, and more efficiently with R than with spreadsheets.

Give it a try.

Design by PowerPoint

Might as well dive right into the deep end and talk about one of my “favorite” manifestations of Bad Engineering.  

How many of you cringe when you hear the words, “make up some slides on this”, or “need to make a brief …” or “next slide please.”?  Well, you should.  

Okay, okay, I’ll concede that there is a purpose for Powerpoint-like tools in some circumstances, but what I cringe about is the use of powerpoint presentations as a substitute to technical reports.  It used to be that you might make some presentation viewgraphs (foils, transparencies, etc.) after you had written the report.  Now we just skip the report altogether and go right to the powerpoint.  What’s wrong with this?  Well, nothing, if you are purposely trying to keep your audience in the dark.  Course this is mainly due to how the briefs are constructed.

You know the drill, you’ve been in countless meeting where someone stands at the lectern, looks at the screen, and reads slide after countless slides until the audience is asleep or have blown their brains out.  Let me tell you, this is not the way to get your message out, and it is certainly not the way to do engineering.  Even if you aren’t verbose in the slides, the very format of the powerpoint presentation counters richness of detail that is so important in engineering.

This hasn’t been lost on some people who haven’t been seduced by the cheap allure of a gizmo-ridden slide.  Powerpoint was identified as a major source of the widespread technical communication problem at NASA by the Columbia Accident Investigation Board (CAIB):

As information gets passed up an organizational hierarchy, from people who do analysis to mid-level managers to high-level leadership, key explanations and supporting information is filtered out.  In this context, it is easy to understand how a senior manager might read this PowerPoint slide and not realize that it addresses a life-threatening situation.

At many points during the investigation, the Board was surprised to receive similar presentation slides from NASA officials in place of technical reports.  The Board views the endimic use of PowerPoint briefing slides instead of technical papers as an illustration of the problematic methods of technical communication at NASA.

[Columbia Accident Investigation Board, Report, volume 1 (August 2003), p. 191.]

The problems with PP and its ilk are masterfully discussed in the great Edward Tufte’s pamphlet, The Cognitive Style of PowerPoint.  He discussed not only the problems with bullet points and incomplete sentences, but the inability for PP to handle high information density.  I urge you to get a copy (or a few to spread around).



What is the alternative, I hear you ask.  Well, here’s what I do now.  I write a technical report with all the explanations, data, and information graphics.  This I provide in advance of any briefings.  If a briefing is still needed, and if it is just a few people in an office or small conference room, then I pull out the important points, data, and graphics and put together an 11X17 briefing sheet that usually provides plenty of room to contain all the detail I want.  Besides, they always have my original report to refer to later, if needed.  I use this handout as my brief and go through it as a lap brief.  If the brief is to a larger audience, I use the same handout but project the major data graphics on the screen – sans gizmos, flying text, or 3D cartoon graphs.  

Yeah, I know.  It’s tough to buck against decades of mediocre briefings and even harder to sit down and write the report when you know that a quick PP presentation would do.  But then that would be Bad Engineering.  And we don’t do that here, right?

(Oh and by the way, if you do use powerpoint, for cripe’s sake make it a pps file so you aren’t fumbling around for the “play presentation” button.)