Maryland Blogger Alliance

Alliance FAQs

Latest MBA Posts


April 01, 2008

Stupidity squared

As my regular readers know, I'm something of an expert in stupidity, mostly my own. But that's not what I'm referring to right now.

I write for a living, and I'm also anal-retentive enough to be careful about grammar, usage, and spelling. Not that I never make a mistake, but you're not going to find totally illiterate writing here, the way you might in some sectors of internet-land.

So I had to ask myself: How difficult is it really to distinguish between coherent writing and stupid, illiterate writing? Not at all difficult is my answer. But apparently some people think it's necessary to have a computer analyze the writing to determine whether it's stupid. (And this is not some April Fool's joke, like that pathetic joke Google put up on Blogger Buzz.)

Here's the deal. Go to stupidfilter.org and you find this: "Too long have we suffered in silence under the tyranny of idiocy. In the beginning, the internet was a place where one could communicate intelligently with similarly erudite people. Then, Eternal September hit and we were lost in the noise. The advent of user-driven web content has compounded the matter yet further, straining our tolerance to the breaking point. It's time to fight back."

Sound annoying enough yet? Well, just wait.

The solution we're creating is simple: an open-source filter software that can detect rampant stupidity in written English. This will be accomplished with weighted Bayesian or similar analysis and some rules-based processing, similar to spam detection engines. The primary challenge inherent in our task is that stupidity is not a binary distinction, but rather a matter of degree. To this end, we're collecting a ranked corpus of stupid text, gleaned from user comments on public websites and ranked on a five-point scale.
And just how useful do you think this project is? Consider this: "u r dum !!11! lol" is stupid to the extreme. But do we need a computer applet to tell us that? Not really, but that's what we find at stupidfilter. From the FAQs:
The idea is that the most egregiously stupid comments will also be the easiest to detect while remaining ignorant of context; comments with too much or too little capitalization, too many text-message abbreviations, excessive use of "LOL," exclamation points, and so on.
And how hard is it to rate comments like the one I made up above? Not very, but that's what we find at the same FAQs:
Keep in mind we grade stupidity on a scale of 1 to 5. Someone might get a 1 or 2 for a comment that used no punctuation, whereas a comment consisting of nothing but text message abbreviations with a dash of LOLLLLL thrown in for good measure would probably rate a solid 4 or 5.
Oy, vey. Well, just to show I'm a stand-up guy, I decided to give the program a chance.

Test 1. I took two phrases from a relatively recent post of mine about some stupid parent in Montgomery County who was arrested at his kid's elementary school when he got over-aggressive about his desires for his kid's curriculum.

First, here's a baseline test: "Is it just me or is it somewhat odd for a parent to be arrested (second item) when discussing his kid's elementary school work with school officials? I don't know. I've lived for over 20 years in Montgomery County, where helicopter parenting takes on a whole new meaning, and yet this still seems a little out of line." Fairly non-stupid, at least for Pillage Idiot, right?

Here's the result at stupidfilter:
Text is not likely to be stupid.

CLASSIFY succeeds; success probability: 0.5066 pR: 0.0114
Best match to file #0 (/home/sfp/code/nonstupid_cor.css) prob: 0.5066 pR: 0.0114
Total features in input file: 11472919
#0 (/home/sfp/code/nonstupid_cor.css): features: 11472919, L1: 240419182 L2: 340256281 L3: 553124907, l4: 1327203205 prob: 5.07e-01, pR: 0.01
#1 (/home/sfp/code/stupid_cor.css): features: 1510682, L1: 29910336 L2: 41683685 L3: 67105951, l4: 160824653 prob: 4.93e-01, pR: -0.01
I have no idea what most of that means, but perhaps that's because I'm, er, stupid.

Next, I checked the statement of the man who was arrested, who said this: "According to an arrest document, the 6-foot-4 Rogers 'stood up, cupped his hands around his mouth and screamed very loudly, "'I am Rosa Parks. I will not ride on the back of the bus."'" That's about as stupid as a person can be, but here's what the stupidfilter computer showed:
Text is not likely to be stupid.

CLASSIFY succeeds; success probability: 0.6175 pR: 0.2080
Best match to file #0 (/home/sfp/code/nonstupid_cor.css) prob: 0.6175 pR: 0.2080
Total features in input file: 11472919
#0 (/home/sfp/code/nonstupid_cor.css): features: 11472919, L1: 130118990 L2: 202077007 L3: 390256047, l4: 1207598011 prob: 6.17e-01, pR: 0.21
#1 (/home/sfp/code/stupid_cor.css): features: 1510682, L1: 16486110 L2: 23417883 L3: 38819315, l4: 94786575 prob: 3.83e-01, pR: -0.21
It's "not likely to be stupid," eh? Well, I guess the "not likely" gives them an out. Yeah, yeah, I know the filter isn't directed to substantive stupidity. That's my point.

Test 2. Hillary's early response to the Tuzla exposure was this: "You know, I think that, a minor blip, you know, if I said something that, you know, I say a lot of things – millions of words a day - so if I misspoke, that was just a misstatement." Totally stupid, right?

A little surprise here:
Text is likely to be stupid.

CLASSIFY fails; success probability: 0.4605 pR: -0.0688
Best match to file #1 (/home/sfp/code/stupid_cor.css) prob: 0.5395 pR: 0.0688
Total features in input file: 1510682
#0 (/home/sfp/code/nonstupid_cor.css): features: 11472919, L1: 136163155 L2: 194867166 L3: 334824012, l4: 914439786 prob: 4.60e-01, pR: -0.07
#1 (/home/sfp/code/stupid_cor.css): features: 1510682, L1: 17608926 L2: 25394698 L3: 45709338, l4: 138558262 prob: 5.40e-01, pR: 0.07
I guess it must have been all the "you know's." To verify this, I deleted them and tried it again: "I think that, a minor blip, if I said something that - I say a lot of things – millions of words a day - so if I misspoke, that was just a misstatement."

The results? A surprise again.
Text is likely to be stupid.

CLASSIFY fails; success probability: 0.4804 pR: -0.0340
Best match to file #1 (/home/sfp/code/stupid_cor.css) prob: 0.5196 pR: 0.0340
Total features in input file: 1510682
#0 (/home/sfp/code/nonstupid_cor.css): features: 11472919, L1: 118999011 L2: 174065450 L3: 306798586, l4: 865356242 prob: 4.80e-01, pR: -0.03
#1 (/home/sfp/code/stupid_cor.css): features: 1510682, L1: 15294168 L2: 22135632 L3: 39684900, l4: 120275436 prob: 5.20e-01, pR: 0.03
And now, I've rewritten Hillary's stupidity so that it reads well, even though it's still substantively stupid: "It was a minor blip. I say perhaps millions of words a day, and this was just a misstatement."

This time, the result is negative:
Text is not likely to be stupid.

CLASSIFY succeeds; success probability: 0.5241 pR: 0.0420
Best match to file #0 (/home/sfp/code/nonstupid_cor.css) prob: 0.5241 pR: 0.0420
Total features in input file: 11472919
#0 (/home/sfp/code/nonstupid_cor.css): features: 11472919, L1: 71720625 L2: 104271070 L3: 181129542, l4: 487604746 prob: 5.24e-01, pR: 0.04
#1 (/home/sfp/code/stupid_cor.css): features: 1510682, L1: 9177190 L2: 12891812 L3: 21612594, l4: 56643680 prob: 4.76e-01, pR: -0.04
Well, I'm a little tired of this, and I'm sure you are, too, which is why you're not even bothering to read this post any more.

What I'm looking for is a computer that can distinguish substantively stupid writing, like my rewrite of Hillary's statement, or some of the rantings of her supporters, or the New York Times editorial page. Find it, and that's when I'll be interested.