Author Archives: dre

Life with a Scientist and Pets

experiment


It’s an experiment!

My wife lives with two dogs and me. I like to consider myself a scientist, so this leads to a funny scenario that often happens. Sometimes our pets will beg for food. I do not condone giving them food. However, sometimes they will beg for food my wife will assure the dogs (earnestly) they will not like it. On several occasions I’ve be incredulous that we don’t know this. Of course, an experiment is required! Case in point, this last week we’ve discovered our dogs do in fact like butternut squash soup, and tomatoes!

Where’ve I been?

Apologies but I’ve been downright terrible updating this blog. I’ve had personal projects I’ve been working on and other blogs I’ve been at! I’ll do my best to try and least “check in” weekly here, but no promises. Here’s what I’ve been up to recently:

Cool Stuff

I just saw “the Book of Mormon” with my wife. I loved it and definitely recommend it, but warning the language is quite vulgar.

Speaking of my awesome wife, she completed a half marathon this last weekend!

I just finished reading Seconds by Bryan Lee O’Malley, a fantastic science-fantasy book about small choices.

Finally, I found two great quotes on failure this week, here are they are in Tweet format!

Seeya next time!

-Dre

Data Cleaning

The Issue isn’t “Big Data”, it’s “Clean Data”

Hoo boy! Wow, I started this blog “back up” in the middle of June! Since then, I’ve moved to Oshkosh, Wisconsin and been busy as heck. My blogging has lagged, but I don’t want that to be a habit. My goal will definitely be one post a week (interspersed with whatever Quora/Basketball stuff I get in), as well as the weekly Boxscore Geeks podcast (on Thursdays). Bug me if you don’t see it :)

Big Data?

In graduate school I was running face recognition on a “huge set of data”. Be ready to laugh. I believe it was somewhere around 2-4 gigs worth. Since then data has exploded in both ease to get and ease to store. I have gigs and gigs of data on my Amazon AWS account. About once a month they send me a bill for a little under a dollar. If you’re not talking in Terabytes, or heck, Petabytes, it’s small potatoes. Yet, interestingly, the issue I see with data is not big data, no, it’s clean data!

On a great podcast with Ari Caroline about healthcare, this issue came up. In sports, we have lots of data. And most of it is in useful, tabular formats. Want to know what hand a player shoots with? That’s a check box in a column on some site. In healthcare, it gets more complex. From any set of doctors’ notes, you could easily infer some information. And you can easily store an infinite amount of notes on the cloud. For making robust data sets easy to browse though, you’d need to be able to ask the notes question. And there the data gets trickier.

This isn’t uncommon. In fact, on a recent Freakonomics podcast, Steven Levitt said this was a problem he noticed at many of the companies he consulted for.

I never would have thought this before I started working with companies. I never would have imagined that it is an I.T. problem that you simply cannot get the data you want, and the data are held in 27 different data sets that have different identifiers, so you simply…So sometimes when my little consulting firm TGG comes into a company we’ll spend something like three or six person months working with a company of trying to just put together a data set to do a basic analysis that I think many listeners would think wow I would think that a big, fancy company would be able to do this with the push of a button. But it really is… the I.T. support and the complexity in these big firms blows your mind about how hard it is to do the littlest, simple things.

The issue isn’t companies don’t have the data. It’s that they don’t have the data in easy to digest formats!

I noticed an example of this first hand at a job for one of the big companies I worked at. I had a side project I wanted to work on. In the middle of one of those team building sessions, I was talking with a co-worker. A project he was working on lined up perfectly! I excitedly told my boss about it in our next one on one. And… he was completely confused. He didn’t realize my co-worker even was working on a project that related. This was baffling to me. Seriously? My boss wasn’t aware of what one of his own employees was working on? And consider the implications. If instead of randomly talking with my co-worker, I’d asked my boss: “Who would know best about…?”, could he have answered it?

It used to be really hard to collect data. That’s changed. Write a scraper, sign up for a cloud account, and go! If you’re inside a company, the data’s probably somewhere… Now, the issue is how to make sure the data is in a usable state to glean information from it. It’s a much harder problem, and one I hope gets as much buzz and press as “Big Data.”

-Dre

I have returned.

Welcome Back!

And I have returned with a blog! Yes, I’m currently using the older WordPress 2012 theme. Frankly, it’s simple and easy to customize. I’ve got some fun projects I’d like to share, and figured it was worth getting my own place on the web to do it again.

While I’m doing intros, I figured I’d throw out that I don’t intend to do comments on my blog. Don’t get me wrong, a ton of you say great stuff. The time and effort to cultivate a good comment section though is more than I want right now. So, feel free to ping me @nerdnumbers on Twitter. I’ll probably add a contact form at some point.

Anyway, it’s good to be back!