Just How Good Was Chuck?

If Only We Could Know!

Much like global warming, the science is in. We can all stop arguing now, because 97.24% of all scientists and TV watchers agree and there’s just no more disputing the facts.

What am I talking about, you ask? Via a political site called Ace of Spades (Fair warning, “Ace” may not be your cup of tea – I’m being polite about the credit-trail) you too can get a handle on this question by going to Graph TV. There’s also a bit of an explanation about what all the charts and graphs represent at a place called A.V. Club.

In short, for any show you care to name (and I care to name – Chuck!) Graph TV shows you a plot of the IMDb ratings for each and every episode.

Now go take a peek and I’ll wait for you to get back. Then you can follow me over the jump!

What Does It All Mean?

So, did you see the graph of IMDb ratings for all five seasons of Chuck? If you’re at all like me, your first reaction was “Huh?” and your second reaction, as you absorbed the implications, was more like a big “WTF??? How come it is that S3 is rated highest??? What am I looking at here?

Now please, for a moment, I’d like you to ignore the lines and focus on the dots. Close inspection shows that the episode with the highest IMDb rating is Colonel The Ring (Hey! It really starts with Colonel, right???) (Which makes sense.) followed closely by The Other Guy (where Sarah finally lets Chuck know that she loves him too), Chuck defeating the uber-detested Shaw in The Ring, Pt. 2, and as expected, the couple’s engagement and wedding episodes. But wait, it looks like the controversial series finale is rated every bit as highly.

[Explain that, Buckley!]

I will, but it’s pretty clear that the dots are telling us something. BTW, if you put your cursor over any of the dots, you’ll see the name of the episode it represents. From Dan Selcke’s article at the A.V. Club site:

Type in your favorite show and get a definitive chart attesting to its quality, thus ending all future arguments on the matter before they begin.

So the plot provides a visualization of the show’s quality. Well, even Selcke tells us that’s not exactly the best way to look at it.

The website, created by software engineer Kevin Wu, looks up the IMDB user ratings for every episode of a given TV show and turns them into points on simple graphs that show the ebb and flow of public opinion over the course of a series.

That’s a more accurate and more informative statement. The graphs are a collection of data that somehow represent opinions, opinions about quality, maybe. May we ask who’s opinions? No, not really. But the opinions are not (cannot be!) unbiased; they are self-selected. Further, they are biased in an unknown way. That makes a difference when we start to interpret those graphs. In fact, it makes interpretation very difficult.

So all is not quite what it seems. Still, I don’t want to denigrate the effort. There’s valuable data in those graphs and the interpretation is worthy of discussion. But as I learned way back as a student studying hard to be a scientist, a collection of data points is not information is not knowledge is not intelligence is not wisdom. What we have here is raw data and some lines.

It’s a pretty impressive collection of data, actually, with over 85,000 ratings used. That’s a reasonably large sample of opinions, and the graph is said to be an indication of the show’s quality. It’s not.

The best we can say, I think, is that it’s a representation of the opinion of a universe of Chuck fans who also opted to express their opinions about the relative quality of the shows episodes. There’s no attempts made to correlate the values given for Chuck to the numbers given for, say, Castle. In fact, it would be wrong to compare our show’s numbers to the numbers for, say, The Bachelor, because that’s a completely different show with a completely different audience. In fact, that’s more than wrong. It’s so wrong it’s illegal in 29 states and the District of Columbia!

The last thing to notice is that the lines drawn through the data may be prejudicial. I could cite some examples (and give you one heck of a personal anecdote) but I’ll spare you that. Suffice to say that I could find only a little information about how those lines were generated; based on the R-squared values shown when you put your cursor over any of the lines, they appear to be a piece-wise (per season) linear, least-squares fit to the data, which would be within the scope of Kevin Wu’s project. The R-squared numbers show the “scatter” in the data.

Even so, there’s no guarantee that the a better representation wouldn’t have been a fit to a curve instead of a straight line. More importantly, outliers (usually the highest and lowest points) are often excluded in such an analysis. That’s probably not warranted here, but it might be better to exclude episodes that occurred after unusual and non-repeating events (**cough**Olympics**Cough** **cough**Presidential Speeches**cough**). That would change the appearance of the graph, but perhaps make it more informative.

Take away the lines, which are at best mathematical summaries of parts of the data set, and you can see some interesting things. There’s a huge, general rise in the perceived quality of the show noted by the participants from the beginning through season two. It looks like momentum carried the show through “The Misery Arc” at least until Chuck and Sarah got to Paris. But the data show a kind of “choppiness” after that. It’s clear that the respondents were pretty happy with the way S3 ended with the defeat of Daniel Shaw and the Sacking of the Buy More. I know I was.

S4 shows a downward trend, but that’s only relative to S2 and S3. The episodes are still rated generally higher that S1. Somehow, even though my reviews of S4 are very high, this seems strange. Perhaps it has something to do with “Great Expectations” (apologies to Charles Dickens). Certainly the people who submitted ratings for S1 episodes were not the same people who submitted ratings for S4. Apples, oranges, fruit cup.

The most distinct thing about the graph is the amazing upsurge in ratings through S5. It started very low and ended very high. The high is easily explained – nearly everyone knew the show was ending and wanted to see the finale simply because there would be no more. Excitement was in the air and there could very well have been a large, new batch of IMDb respondents for Sarah and Goodbye. The low beginning may have had very little to do with episode quality and much more to do with fan exhaustion after several “Save Chuck” efforts came to ultimate fruition. High school seniors are prone to the same phenomenon, and some were already looking for the next, new thing. (See? I told you I’d get back to it!)

Finally, there seems to be a big sine wave running through all five seasons. Chuck starts low, goes up through the end of S2, levels off, goes down though the beginning on S5 and rises, up and down and up like a wave. That’s not about episode quality so much as it is about mass psychology, I think. A large part of the fan base was reacting to the longer story arcs, Chuck and Sarah’s story and off-camera stories (like ComicCon). But there were also import things happening behind the scenes and in the front offices (like Comcast buying NBC-U and the subsequent effects on the budget) that did affect show quality and viewer perceptions. All that, in one graph!

[And for my next trick, I’ll show you how all the plots about Global Warming need to be examined CAREFULLY! – The preceding is a paid, political announcement – ed.]

That’s my $0.02. Now tell me, please. What do YOU see in the graphs?

– joe


About joe

In my life I've been a professor, martial artist, rock 'n roller, rocket scientist, lover, poet and brain surgeon. I'm lying about the brain surgery.
20 Responses to Just How Good Was Chuck?

  1. joe says:

    Arg! I keep needing to go back and edit! The word “data” is PLURAL not SINGULAR! Data are, not data is. AAARRRGGGG!!!

  2. phaseou812 says:

    I lost my 2 cents so I don’t even have that to give you . . . but it looks like a lot of dots . . . and a dazed and hazed remembrance of my show over the years. For the record though, I do see Chuck vs. the Ring (9.3) as being the highest rated episode. And I suppose it does make sense with Season 4 flat lining as I remember there seemed to be a big drop off or a matter of disagreement after season 3 amongst the fans of the show. Interesting to see the sharp incline on season 5 considering that would be the season, if you had not been a fan of Chuck prior . . . the most difficult to follow from a consistency perspective or an arc perspective.

  3. atcDave says:

    I’m really not comfortable drawing any conclusions from that. It looks the lowest rated episodes would Helicopter and Bearded Bandit. I think a huge flaw in sampling from IMDb is that those ratings were generated over such a long period, people participating were likely not even the same from 2007 to current. And even those who were may have greatly, and unknowingly changed their own standards over time.
    I think overall, more meaningful statistics would be gained by a full context, all at once sort of thing. That would at least eliminate some of the time related biases.

  4. Ernie Davis says:

    I’m mostly with Dave, that it’s tough to draw conclusions, so I’ll speculate on something Joe said.

    Finally, there seems to be a big sine wave running through all five seasons. Chuck starts low, goes up through the end of S2, levels off, goes down though the beginning on S5 and rises, up and down and up like a wave. That’s not about episode quality so much as it is about mass psychology, I think.

    I think that is generally true, but even more so, you can, by using the trendline as an axis impose a fuzzy sine wave on each season that tracks the elements and pace of … wait for it … The Hero’s Journey.

    The seasons are all structured the same way based on the storytelling framework. We start with a generally elevating mood as we go from the premier with it’s initial advances and growth and victories for the characters to their first failures which are usually attributable to the hero’s inadequacies, pride or weakness. The heroes, after their low point start to recover until the next challenge (an external as opposed to an internal challenge) bring them low, in danger of losing it all, but in the end, the hero having grown is finally able to persevere and emerge triumphant. So there is a psychological up and down built into the storytelling that I think is reflected in many peoples opinions.

    There will be significant outliers, such as Phase 3, which is a really great and popular episode, but actually represents both Chuck and Sarah at a low point. But if you can look at it a little fuzzy I think it is there.

    • atcDave says:

      I think the poll you did for this site is really much more interesting and useful Ernie.
      The biggest reason being, it was done all at once with every participant evaluating the entire series. I think that reduces a lot of the time related biases.
      Neither good nor bad, it also eliminates more casual viewers. I’ll say anyone coming to this site (unlike IMDb) is by definition not a casual viewer.
      The biggest drawback to your data is just the sample size. Obviously we cannot compete with a site like IMDb with its thousands of participants. So our poll will be dominated by a lunatic fringe of the fandom (I resemble that remark!).

      • noblz says:


        Sample size is the problem with the IMdB data. Each episode has a wildly varying number of “votes”. Some near 1000, some less than 100. The math needed to normalize for that problem with the data is very, very difficult and I’m sure not corrected for by the site authors.

        Back during many arguments with ernie on the IMdB blog, I once thought to use the IMdB votes but couldn’t get past the variation in the samples. Take these data with a grain of salt.


      • atcDave says:

        I didn’t realize some of those sample sets are no bigger than ours! Interesting.

      • joe says:

        Noblz, I agree with that. But I’m pretty certain that they did not intend to publish something in a peer-reviewed journal based on a linear regression of ratings by a self-selected sample.* Sheesh. I doubt they even wanted to imply that some real science could be done like this – it’s barely a first cut at a reasonable massaging of the available data to get information out of it.

        But honestly, considering these data, it’s not a bad first cut. The real problem is with making any sort of judgment about shows this way. No one can really go very far interpreting the results and no one should try.

        So, to mangle Shakespeare, Horatio, the problem lies not in the stars, but within ourselves!

        * I also need to add that there’s something very wrong with the so-called peer-review process the way it’s practiced today generally. From what I can tell, depending on what’s being analyzed, the problem is with the corrupting power of money, politics and academic pressures colluding to bring nearly all research into question. “Publish or perish” is a beast! But that’s a much bigger topic.

  5. gatesoutcast says:

    Darn it I should have paid attention in my freshman year stats class. What is the line,”there are lies, damm lies, then statistics!”

  6. garnet says:

    I’ll take the whoel anaylsis with a grain of salt. I know you can’t compare apples with oranges, but I checked a few other shows. Breaking Bad rates much higher than CHUCK, but for me I’d watch ANY episode of CHUCK with more enjoyment than any BB. White Collar rates slightly higher than CHUCK, but I find CHUCK’s characters much more entertaining (and I would consider myself a White Collar fan. I’ll be mildly bothered if they don’t manage to get at least a partial season six, but nowhere near as bothered as I was by the end of CHUCK). Dexter rates significantly higher, but I think of Dexter, the whole concept rates a “meh” from me. And yes I read the novel on which Dexter was, fairly loosely, based. All in all interesting, but it isn’t going to change my opinion.

    • atcDave says:

      No doubt I’d rate Chuck above any of those!

    • joe says:

      Now, now, Garnet. I tried to tell everyone in the post, you really can’t go comparing the data for any two shows that way. The self-selected nature of the respondents makes that invalid.

      But I know what you mean. “Quality” is not the same as entertainment value, and the value any one of us places on a given episode is different from anyone else’s. That doesn’t stop us from *wanting* some objective measure to agree with our opinions, though! 😉

      • atcDave says:

        Gee, you mean there isn’t some objective criteria for ranking shows that all viewers agree on, even those who don’t like any of the same shows…

        I know I’m shocked!

  7. anthropocene says:

    Those R-squared values are abysmal even for ***cough***social-science***cough***research. A non-linear regression model couldn’t do any worse.

      • anthropocene says:

        Because it’s Friday afternoon (it is here, anyway), i put the IMDb ratings for Season 4 into SPSS and tried a bunch of regression models. Best was cubic, which gave a kind of sine wave with a big tail, and R^2 = 0.303.

  8. CaptMediocre says:

    Do shows get better over time, or do disgruntled fans “go away” no longer ratings the eps?

  9. Zsjaer says:

    two seasons good.. As many times discussed.

