Jeremy Sklarsky

The World Wide Web is My Oyster.

Seinfeld Rankings Part II: The Writers

A blog about nothing

(not that there’s anything wrong with that)

In my previous post, we examined NY Magazine’s vulture.com rankings of Seinfeld episodes. By assigning each episode a points based on their rankings (I arbitrarily made a linear point scale from worst to best - unless anyone reading is a stats person and wants to weigh in…), we found that Season 5 was the runaway winner. An average Season 5 episode would’ve ranked ~52nd, meaning its episodes were, on average, in the the top 1/3 of the whole show. I thought it would be fun to continue this thought experiment.

Who are the best writers in the history of the show? Solving this would take two wi-fi enabled flights back and forth to Tampa and a couple hours when you wake up earlier than your wife even though you’re on vacation and you can’t sleep in.

In the previous exercise, we grabbed our rankings from Vulture’s website using a simple jQuery statement, and then iterated through the strings to get the info we needed to make our rankings. Now, we needed to get each episode’s writer and link them up to rankings to know who is truly master of the domain.

I’ll show some code, but if you’re bored or uninterested, just skip to the bottom for the summary.

Writer? We’re talking about a sitcom

Luckily for us, there is a free API called The Open Movie Database that gives us most of the same information found on IMDB. I may do something with IMDB ratings at some point, but to be honest, they are mostly useless and would probably yield very little information of interest. Anyhow, OMDb has a very simple API as far as TV series are concerned. There are 2 methods I used. The first is http://www.omdbapi.com/?t={show}&Season={season#}. A request for Seinfeld’s first season http://www.omdbapi.com/Season=1 returns a nice JSON object:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
{
  Title: "Seinfeld",
  Season: "1",
  Episodes: [
    {
      Title: "The Stakeout",
      Released: "1990-05-31",
      Episode: "2",
      imdbRating: "7.8",
      imdbID: "tt0697784"
    },
    {
      Title: "The Robbery",
      Released: "1990-06-07",
      Episode: "3",
      imdbRating: "7.7",
      imdbID: "tt0697768"
    },
    {
      Title: "Male Unbonding",
      Released: "1990-06-14",
      Episode: "4",
      imdbRating: "7.6",
      imdbID: "tt0697645"
    },
    {
      Title: "The Stock Tip",
      Released: "1990-06-21",
      Episode: "5",
      imdbRating: "7.7",
      imdbID: "tt0697788"
    }
  ],
  Response: "True"
}

So now, we simply make this call for each season of the show, incrementing up the season number until there is no response. Along the way, we collect the imdbID’s for our second call. Then, we use the i={imdbID} API method for each episode we collected. A sample response would look like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
{
  Title: "Male Unbonding",
  Season: "1",
  Episode: "4",
  Runtime: "23 min",
  Genre: "Comedy",
  Director: "Tom Cherones",
  Writer: "Larry David (created by), Jerry Seinfeld (created by), Larry David, Jerry Seinfeld",
  Actors: "Jerry Seinfeld, Julia Louis-Dreyfus, Michael Richards, Jason Alexander",
  imdbRating: "7.6",
  imdbID: "tt0697645",
  seriesID: "tt0098904",
}

Now, we just have to parse the Writer attribute, split on the commas and filter out any of the writer credits we want to ignore (e.g. “created by”, “story by”, “story editor”, etc). We are just interested in the main billed writer for this experiment.

It’s going in the vault

So, now that it looks like I’m going to be making upwards of 180 or so API calls, I’m going to want to persist this data so I don’t have to make all these calls over and over again. We’ll make a few tables: Episodes, Seasons, Writers, and EpisodeWriters (a join table since an episode can have multiple writers.) Join them up and we’re ready to go.

Working in a Ruby ActiveRecord enabled environment makes this all really easy. Maybe I’ll do this all again in Node one day..

Here’s the Writer class:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
class Writer < ActiveRecord::Base
  has_many :episode_writers
  has_many :episodes, through: :episode_writers

  def total_points
    self.episodes.size > 0 ? self.episodes.collect {|ep| ep.points}.inject(:+) : 0
  end

  def average
    self.episodes.size > 0 ? (total_points / self.episodes.size.to_f).round(2) : 0
  end

  def episode_count
    self.episodes.size
  end

  def average_rank
    Episode.all.size - self.average
  end

  def self.all_time_points
    Writer.all.sort_by {|writer|writer.total_points}.reverse
  end

  def self.all_time_average
    Writer.all.sort_by {|writer|writer.average}.reverse
  end
end

He named names!

For arguments, sake we’ll limit our sample to those writers who contributed more than 5 episodes over the course of the series. This gives us a total of 11 writers, sorted by average episode rank (writing duos Jeff Schaffer / Alec Berg and Tom Gammill / Max Pross are listed together. Jerry and Larry, while collaborators on a lot of episodes, are listed separately because of Larry’s significant solo career). Ready?

1
2
3
4
5
6
7
8
9
10
11
12
13
| Writer          | Average Rank | Total Points | Episode Count |
|-----------------|--------------|--------------|---------------|
| Schaffer / Berg | 48.9         | 1429         | 12            |
| Carol Leifer    | 54           | 570          | 5             |
| David Mandel    | 61.4         | 746          | 7             |
| Larry Charles   | 67.1         | 1404         | 14            |
| Greg Kavett     | 82.7         | 768          | 9             |
| Andy Robin      | 84.6         | 917          | 11            |
| Gammill / Pross | 90           | 936          | 12            |
| Larry David     | 93.2         | 3739         | 50            |
| Jerry Seinfeld  | 97.1         | 1064         | 15            |
| Peter Mehlman   | 99.2         | 1307         | 19            |
| Spike Feresten  | 100.5        | 540          | 8             |

Very interesting! Larry David is by far the biggest (billed) individual contributor to the show, but his average episode ranks towards the bottom. By contrast, Jeff Schaffer and Alec Berg, who later went on to create the league have both the second most total points (behind Larry) and the highest average. Here’s a list of their episodes and rankings:

1
2
3
4
5
6
7
8
9
10
11
12
'The Summer of George' Rank: 9
'The Secret Code'      Rank: 10
'The Gymnast'          Rank: 19
'The Doodle'           Rank: 33
'The Voice'            Rank: 38
'The Butter Shave'     Rank: 47
'The Seven'            Rank: 48
'The Maid'             Rank: 50
'The Chicken Roaster'  Rank: 57
'The Foundation'       Rank: 64
'The Calzone'          Rank: 97
'The Strike'           Rank: 127

10 of 12 episodes are considered by Vulture above average, 9 of 12 in the top 1/3, 3 in the Top 20, and only 1 in the bottom 1/3 of episodes. In fact, an entire season of Schaffer/Berg episodes would’ve ranked higher than Season 5, the overall top ranked season.

Larry, on the other hand, has the dubious distinction of having written both Vulture’s top and bottom ranked episode. Of course, calling “The Finale” the worst Seinfeld episode is a stunt. It’s barely a Seinfeld episode and outside of the music and cast is nothing like the rest of the series. So it is at this point, perhaps, the ranking system breaks down. “The Contest” is not only the most acclaimed Seinfeld episodes, it is roundly considered one of the greatest scripted half hours in television history. In fact, it’s critical success is partly what lead to the explosion in Seinfeld’s popularity. Without it, there might not have been five more seasons. Is it worth only 1 more point in the rankings than the next best episode on the list (“The Subway”)? I can’t say for sure.

I should, however, note that while Larry and Jerry’s rankings are somewhat low this is skewed data. It is said that the pair (through the end of Season 7, and later Jerry alone) had final say on all scripts and did the last pass of every episode. They tended to take billing as an episode’s writer less and less throughout the series and were more likely the billed writers in the early, low ranking seasons. Whether this is reflective of their contributions to the show’s writing I cannot say, but thought it worth mentioning.

Directors

We can do a similar analysis using virtually the same code, just swapping out Writer for Director because object oriented metaprogramming is cool like that.

We’ll just check in on the two directors who directed the overwhelming majority of the show’s episodes: Tom Cherones (Seasons 1-5) and Andy Ackerman (6-9). My gut told me that Ackerman would have better stats, which he did, though slightly.

1
2
3
4
| Director        | Average Rank | Total Points | Episode Count |
|-----------------|--------------|--------------|---------------|
| Andy Ackerman   | 77           | 6828         | 75            |
| Tom Cherones    | 90           | 5308         | 68            |

Ranking Seinfeld Seasons

I was recently sent an email containing a link to NYMag/Vulture’s rankings of every Seinfeld episode, which they put out in response to Hulu releasing the entire series. What follows is my response to the group email.

All: Using my new fangled programming skills I was able to parse the text of the article and crunch a couple of numbers. I think there are some legitimate grips with some individual episode rankings. But with the exception of what Shosh might deem a too highly ranked Season 8, I think in aggregate the total season rankings ring somewhat true to me.

What I’ve done is take each ranking and assign it it’s inverse point value (total # of episodes minus ranking). So for example, the #2 ranked episode gets 168 points and the 167th ranked episode gets 2 points. I did both point totals and averages just to see if there would be any discrepancy due to the early seasons having fewer episodes. There was no discrepancy.

I also was able to infer, based on the average score, what the “average” episode’s rank would’ve been. So for example, the average season 5 episode would’ve ranked about 52nd.

According to this article, the overall best season is Season 5, and by a long shot. The average season 5 episode ranks 21 places hire then the next best season, #6. As expected, season 1, 2, and 3 are ranked lowest (and in that order) both in total score and average score.

1
2
3
4
5
6
7
8
9
10
11
| SEASON | TOTALS | AVERAGES | AVERAGE RANK |
|--------|--------|----------|--------------|
| 5      | 2455   | 116.9    | 52.1         |
| 6      | 2106   | 95.72    | 73.3         |
| 9      | 2102   | 95.54    | 73.5         |
| 8      | 1937   | 88.04    | 81.0         |
| 4      | 1703   | 77.4     | 91.6         |
| 7      | 1609   | 76.61    | 92.4         |
| 3      | 1544   | 70.18    | 98.8         |
| 2      | 652    | 54.33    | 114.7        |
| 1      | 73     | 18.25    | 150.8        |

Tech notes, if you’re interested

Used jQuery to scrape the website $('p:contains("(Season")').each(function(i, item){console.log(item.textContent.split(").")[0])})

Defaulted back to Ruby for the rest, putting the list into an array STRING

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
STRING = [
  '169. "The Puerto Rican Day Parade" (Season 9',
  '168. "The Outing" (Season 4',
  '167. "The Finale" (Season 9',
  '166. "The Jacket" (Season 2',
  '165. "The Tape" (Season 3',
  '164. "The Deal" (Season 2',
  etc...
]

def convert_num(num)
  169-num
end

def sort_hash(hash)
  hash.sort_by {|k,v| v}.reverse.to_h
end

rankings_hash = Hash.new { |h, k| h[k] = [] }
average_score = {}
total_score = {}

STRING.each do |rank|
  num = convert_num(rank.split(".").first.to_i)
  season = rank.split(" (Season ").last
  rankings_hash[season].push(num.to_i)
end

rankings_hash.each do |season, scores|
  total_score[season] = scores.inject(:+)
  average_score[season] = scores.inject{ |sum, el| sum + el }.to_f / scores.size
end

puts sort_hash(total_score)
puts sort_hash(average_score)

The Good Parts

I recently finished reading “Javascript: The Good Parts” in an effort to expand my understanding of Javascript, the language I’ve used on a daily basis since starting as a UI developer at MediaMath.

I was a little let down by the experience. Much of it was familiar to me - as I’ve intuited a great deal from writing Javascript for the better part of the last 6 months. The “good parts” to which the author refers are mostly a set of flexible capabilities that allow for a degree of customization on the part of the user. The author highlights two such categories, namely, assigning Functions to objects and how the language handles inheritance.

I was hoping for more insight into the inner workings of the language. The title is a misnomer and should perhaps be entitled “Javascript: Overcoming the Bad Parts.” The author demonstrates a series of patterns he has implemented to make assigning functions and creating class-like objects a bit easier. Perhaps I was expecting something different.

Instead, I thought I would highlight my new understanding of one of the fundamental aspects of Javascript I’d heard but scarcely understood. The curicculum at Flatiron School was largely based around Ruby, its ecosystem, and the Rails MVC framework. Javascript, at least while I was there, was just used as a way to manipulate the DOM and make our sites interactive.

Functions: First or Second Class Citizen?

I was able to grasp pretty quickly what people meant by “Objects being first class citizens” in Ruby. While a “Class” is a technically a special type of object whose class is “Class,” this is a bit reductive. Most Ruby programming, especially in a Rails environment, is organized around classes. Every instance of a class is an object and they all have the same abilities. All objects inherit directly from their class. Their only differences are the literal values stored in instance variables. We can pass objects around and store objects in variables - anything with a return value of any kind (which is itself an object). They have to be literal “things.” A method is not a literal thing - it is a set of instructions that when executed, are turned into a thing.

Javascript is different. We are told that functions, not objects, are first class citizens in Javascript. I never quite understood what that meant.

As Crockford puts it:

The lineage of an object is irrelevant. What matters about an object is what it can do, not what it is descended from.

You can make class-like objects in Javascript that have custom functions, constants, and defaults. Then you can make other objects inherit from that object - whether prototypical or pseudoclassical. It’s not really important. But something really caught my eye in the chapter about arrays.

Because an array is really an object, we can add methods directly to an individual array.

Mind === blown. In Ruby, the only way to add an Array method would be to extend it to the entire class from which the object descended. Then, ALL arrays in the program would have access to that method. In Javascript, once an object is created, it doesn’t really communicate back up to the mothership. We can assign a function to a single array and it would only affect that specific object.

Examples

So, for example, if you had an array, var myArray = [1,2,3];, you could define a function on that array:

1
2
3
myArray.myLength = function() {
  alert(this.length)
}

This helped me get to an understanding of the difference between Ruby and Javascript and their treatment of objects and functions. Take an object in Ruby, say, person = Person.new. To set a property, or instance variable, we’d need instance method that would allow us to set that instance variable:

1
2
3
4
5
Class Person
  def instance_variable=(arg)
    @instance_variable = arg
  end
end

That variable could only be another object - a standard data type like a String, Array, Float, or an instance of another class. But it would have to be an object.

In Javascript, we not only don’t need permission from the class to set any property on our objects (no more of those nasty No Method errors!), but the value can either be literal data types or functions themselves. That’s why we’re able to make a custom function and assign it to an array. There’s no magic happening with the Array Prototype, we’re simply setting a property on the array, like length, only our array doesn’t care if the property is equal to a string, a number, an object, another array, or even a function!

Here’s the fun part. We can actually see the evidence! In a browser console, when we write console.dir(myArray), we get a very interesting output.

1
2
3
4
5
6
0: 1
1: 2
2: 3
length: 3
myLength: ()
__proto__: Array[0]

The array itself has 3 numbered properties - also known as the indeces of the array (myArray[1], for example), a property called length, and a property called myLength. That’s the function we made. Undeneath that is __proto__: Array[0]. I won’t list all the details, but expanding that list will show us every function and property our array inherited from the Array Protoype object that birthed it. myLength() is stored as a property on the object. The object still has access to all the functions it got from its parent object, but that list is untouched. This is the proof that you can not only store functions in variables, but you can set them as property values in objects. That’s the guts of how functions get “first class citizen” status in the javascript universe.

Of course, we could have always defined myLength as a function on Array.protoype. This would have extended the function all other arrays. Another one of Javascript’s “good parts” is that allows for both styles of inheritance.

Callbacks! A Good Part!

Similarly, Ruby only allows objects to be passed into methods as arguments, but because of this feature in Javascript we can pass functions in as arguments. Because these functions don’t have to be invoked, or evaluated, until they are called, we can pass in functions as callbacks. They don’t even have to be named, we can declare an anonymous function right there in the invoking of another function.

I’ve been using this stuff without even realizing it. Take this example from one of our Backbone views:

1
2
3
4
5
6
7
8
9
10
11
12
orgId: -1,
defaultEntity: 'advertisers',
segments: [],

stripId: function (str) {
  if (str) {
    var id = str.match(/[(][0-9]+[)]/);
    return $.trim(str.replace(id, ''));
  }

  return null;
},

In like 10 lines of code, we have examples of 4 properties, each a different thing: an integer, a string, an array, and a function. Javascript doesn’t care. This gives you such incredible flexibility. That, to me, would constitute a “Good Part!”

Accessing Shadow DOM Elements With Selenium Webdriver

Web components are a fantastic way to speed up front end development and standardize design aspects across a large team. But they pose a huge problem when it comes to writing tests.

Shadow DOM elements are not accessible to selenium-webdriver’s traditional means of searching the window for HTML elements. Using javascript and selenium’s executeScript() function, you can find the element or value that is hidden by a shadow DOM and return its value or WebElement.

The easiest nut to crack (cross your fingers) is when a site has already incorporated jQuery. This will allow you to use a $ selector to access elements within the shadow DOM. This is one of those programming problems where I spent hours working out the solution and the answer was a single line of code. Assuming you’ve named your selenium-webdriver instance driver, you can simply run the following:

1
driver.executeScript("return $('body /deep/ <#yourSelector>')")

What this does is have selenium execute a literal javascript in your browser instance. By putting return at the front of the javascript, it will send a value back to your test file.

The command above will return a web element. In the case of the test I was writing, I wanted to check to see if some text matched my expectation. (Author’s note: test was written in Javascript using the mocha framework and Chai ‘should’ assertion syntax).

1
2
3
driver.executeScript("return $('body /deep/ ._mm_column')[0].textContent").then(function(title){
  title.should.contain(segmentName);
});

That’s it. Now I can access the shadow DOM.

Checksum Gives Me Indigestion

Today I encountered a problem I hadn’t thought much about before. How can I tell if the contents of two files are the same? If we’re directly comparing two files, that should be pretty simple. Given 3 files, how can we tell? Simply, we’d read the contents of the file and figure out if those objects have equivalency.

Let’s make 3 files. test1.txt and test2.txt will contain the string “THIS IS SOME TEXT.” test3.txt will contain “THIS IS SOME OTHER TEXT.”

1
2
3
4
5
6
one = File.open("test1.txt", "r").read
two = File.open("test2.txt", "r").read
three = File.open("test3.txt", "r").read

puts one == two
puts one == three

What do we expect this program to output? We expect line 5 to evaluate the true and line 6 to evaluate to false, which it does.

This is a good solution but it is not scalable. What if instead of only comparing three files, we wanted to compare a file many times larger across hundreds of thousands of files? That would be a nightmare. So we need to find a more efficient way to do this.

Why would we need to do this you ask? Well, if we’re maintaining a file server or database we’d need a quick way to eliminate duplicate files to keep the server lean and prevent confusion later down the line. Another common application for needing to check file equivalency is for checking your data’s integrity during transmission or storage.

How is this done? By creating something called a checksum. Wireshark provides this summary:

A checksum is basically a calculated summary of such a data portion. Network data transmissions often produce errors, such as toggled, missing or duplicated bits. As a result, the data received might not be identical to the data transmitted, which is obviously a bad thing. Because of these transmission errors, network protocols very often use checksums to detect such errors. The transmitter will calculate a checksum of the data and transmits the data together with the checksum. The receiver will calculate the checksum of the received data with the same algorithm as the transmitter. If the received and calculated checksums don’t match a transmission error has occurred.

In other words, data transmitted over a network is being spell checked as it is copied.

Checksums and Ruby Data Structures

What’s another reason for this? Storing a bunch of files in memory gets expensive very quickly. If files all have different names, then the only way to search for duplicate values is by reading the contents of a file and then comparing it to all the values stored in memory, like we did in the first example. What if instead we just stored a checksum, a smaller digital fingerprint of the file’s contents? Then we have any number of ways to store, search, or compare our data.

Ruby doesn’t natively support hashing algorithms, but fortunately the Digest module and the MD5 hashing algorithim are built into the standard library so all we have to do is require them.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
require 'digest/md5'
one = File.open("test1.txt", "r")
two = File.open("test2.txt", "r")
three = File.open("test3.txt", "r")

def checksum(*files)

  hash = Hash.new { |h, k| h[k] = [] }
  files.each do |file|
    # for each file, read the contents
    # and store a checksum as a key in the hash
    md5 = Digest::MD5.new
    md5 << file.read
    hash[md5.hexdigest] << file
  end
  hash
end

checksum(one, two, three)

Running this program results in this: => {"81e3a7e854d334e82f75a2bcdbe6a3da"=>[#<File:test1.txt>, #<File:test2.txt>], "32b2eccab2dcc035c50820d0943e5b94"=>[#<File:test3.txt>]}

So even though these were three different files, our checksum algorithm was able to determine that the first two files have equivalent values. What’s the application for this?

Searching through a hash for a key is fast - much faster than iterating through an array. So if we wanted to find duplicate files, instead of using the file name (an intuitive choice) for the key, we could store this checksum value as the key. In a sense, the checksum is both the key AND the value. With its place reserved in memory, all we’d have to do is check to see if the new file’s checksum exists as a key in our hash.

Consider this program. We initialize a Checker class with two files, test1.txt and test3.txt. Then we run our unique? function on test2.txt. Remember, files 1 and 2 have the same contents. We now have very small fingerprints of 1 and 3 stored in memory, and instead of reading their entire contents to check them against our new file, we simply create a fingerprint for the new file and compare it to our current set of fingerprints.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
class Checker
  require 'pry'
  require 'digest/md5'

  attr_accessor :my_hash, :files

  def initialize(*files)
    @files = files
    checksum
  end

  def checksum

    @my_hash = {}
    files.each do |file|
      # for files we want to store
      # create a checksum, create a key value pair
      # :checksum => file
      md5 = Digest::MD5.new
      md5 << file.read
      @my_hash[md5.hexdigest] = file
    end
    @my_hash
  end

  def unique?(file)
    # to check if a file is unique compared to the 
    # rest of the system
    md5 = Digest::MD5.new
    md5 << file.read
    # will return true if the file's checksum is unique
    # else, => false
    !my_hash.has_key?(md5.hexdigest)
  end

end

#load our files into memory
one = File.open("test1.txt", "r")
two = File.open("test2.txt", "r")
three = File.open("test3.txt", "r")

#create a new checker instance
check = Checker.new(one, three)
# check if new file is unique
puts check.unique?(two)

Since the checksum value already exists in the hash, the check.unique?(two) returns false.

More on MD5

Life or Death? The Emperor’s Proposition

Time for another probability brain teaser. This one comes to us from Braingle. Here’s a quick rundown of the problem:

You are a prisoner sentenced to death. The Emperor offers you a chance to live by playing a simple game. He gives you 50 black marbles, 50 white marbles and 2 empty bowls. He then says, “Divide these 100 marbles into these 2 bowls. You can divide them any way you like as long as you use all the marbles. Then I will blindfold you and mix the bowls around. You then can choose one bowl and remove ONE marble. If the marble is WHITE you will live, but if the marble is BLACK… you will die.”

How do I want to solve this? Unless I can figure out the answer with my own intuition, we’re going to have to brute force this thing. So, the most obvious thing to do would be set up a simulation where we:

• Figure out all possible marble arrangement scenarios
• Run each scenario a lot of times and check how many times we survive
• Sort the results by % of times we survivce

The fun part here is building a digital model of the problem. We have a couple of pieces to account for. Two “pouches,” 100 “marbles,” and a random selector. For the pouches, we’ll use an array of two arrays, and fill it with “marble” elements, and a the ruby method sample to randomly choose which “pouch” to pick from, and sample again to pick an element from the sub-array (the pouch). In order to set this experiment up to run all at once, we’ll collect all the pouches in a hash. This way, we can store both the pouches and some descriptive info so we can discern the results later.

1
2
3
4
5
6
7
8
9
class Emperor

  attr_accessor :marble_count, :pouches

  def initialize(marble_count)
    @marble_count = marble_count
    @pouches = {}
  end
end

The tricky part from a Ruby perspective is going to be populating the arrays with every possible marble arrangement. There are no limitations on how you can arrange the marbles. We don’t need to set up both pouches, just figure out what one pouch looks like, and make the other pouch the inverse.

This means there can be anywhere between 0-50 white marbles in one pouch, accompanied by anywhere from 0-50 black marbles. So 51 possible white marble arrangements, each with 51 possible black marble arrangements. 51 * 51 means there are 2601 possible combinations.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
  def populate
    @combinations = (marble_count / 2) + 1
    combinations.times do |w|
      combinations.times do |b|
        name = "#{w} white, #{b} black"
        pouches[name] = [ [], [] ]

        w.times do
          pouches[name][0] << true
        end

        b.times do
          pouches[name][0] << false
        end

        (marble_count-w).times do
          pouches[name][1] << true
        end

        (marble_count-b).times do
          pouches[name][1] << false
        end
      end
    end
  end

So now, we have a hash where the key describes the contents of the first pouch (we’ll just infer the contents of the second pouch): something like “20 white, 30 black.” The value is then an array with two arrays representing the pouches. But instead of “white” and “black”, we’ll use the values true and false to represent white and black since it will be pretty easy to translate the boolean into passing or failing the Emperor’s test.

Now that the pouches are filled, its time to run some tests. For each set of pouches, let’s say we’re going to run the test 10,000 times and store the results in a results = {} hash. In short, the emperor will randomly choose a pouch, and then randomly choose an element from that pouch. If it comes back true, add 1 point to the current marble configuration. For argument’s sake, we’ll say that if you choose a pouch with zero marbles in it, that will count as a death sentence!

1
2
3
4
5
6
7
8
  def choose
    pouches.each do |descriptor, pouches|
      10000.times do
        results[descriptor] ||= 0
        results[descriptor] += 1 if pouches.sample.sample
      end
    end
  end

Now all we have to do is sort the results hash and print the winner.

1
2
3
4
5
6
7
8
  def sort
    results.max_by{|k,v|v}
  end

  def print_winner
    winner = sort
    puts "#{winner[0]} will win #{(winner[1]/10000.0)*100}% of the time"
  end

We can delegate the entire operation to a single call method.

1
2
3
4
5
  def call
    populate
    choose
    print_winner
  end

So running the following Emperor.new(100).call gives us the following result: 1 white, 0 black will win 74.96% of the time

Interesting! The best scenario is putting one white marble in the first pouch, and the remaining 49 white and all 50 of the black marbles in the second pouch.

Why is this the case? An even split of marbles will produce odds of 50%. But if we limit the range of possiblities for the one of the pouches, we can guarantee that every time the emperor picks that pouch we get the desired outcome, while picking the second pouch is almost a 50/50 shot. So we get closer and closer to 75% chance of living. That’s really the best odds this poor sap can hope for.

Complete code

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
class Emperor

  attr_accessor :marble_count, :pouches, :combinations, :results, :sorted_results

  def initialize(marble_count)
    @marble_count = marble_count
    @pouches = {}
    @results = {}
  end

  def populate
    @combinations = (marble_count / 2) + 1
    combinations.times do |w|
      combinations.times do |b|
        name = "#{w} white, #{b} black"
        pouches[name] = [ [], [] ]

        w.times do
          pouches[name][0] << true
        end

        b.times do
          pouches[name][0] << false
        end

        (marble_count-w).times do
          pouches[name][1] << true
        end

        (marble_count-b).times do
          pouches[name][1] << false
        end
      end
    end
  end

  def choose
    pouches.each do |descriptor, pouches|
      10000.times do
        results[descriptor] ||= 0
        results[descriptor] += 1 if pouches.sample.sample
      end
    end
  end

  def sort
    results.max_by{|k,v|v}
  end

  def print_winner
    winner = sort
    puts "#{winner[0]} will win #{winner[1]/10000.0}% of the time"
  end

  def call
    populate
    choose
    print_winner
  end

end

Emperor.new(100).call

So Long, Farewell, Auf Wiedersehen, Goodbye

Our time at Flatiron is coming to an end. While we still have 2 weeks left, I thought it would be a good time to reflect back on what has transpired over the last 10 weeks. In the run up to starting school, I constantly searched for blog posts by Flatiron students about the experience and read whatever I could find. What I didn’t find, however, were as many posts that offered advice on how to best take advantage of the course. So I thought it would be useful to give some advice to myself of 10 weeks ago.


Caption: Me Ten Weeks Ago

#1. Labs exist for you, not the other way around

Trying to complete every single lab as they are deployed looks like this: Lucy

While it is not impossible to finish every lab, that shouldn’t be your primary goal. At Flatiron, there are no grades and the points don’t matter. Really understanding a few labs will be far more valuable than just getting the tests to pass on a lot. The labs will always be there for you - this 12 week period where your entire job is to learn new things won’t be.

#2. Build side projects

There’s a reason the end of the semester culminates in project mode. Building things from scratch takes the training wheels off and takes you out of your comfort zone of programming to someone else’s test. In nearly all of my group projects, we at some point encountered a problem that I had seen before - not in a lab, but in a side project. If you can find the time, this will be invaluable.

#3. Start playing with API’s early

Don’t be scared of API’s! There’s a good chance at some point you will encounter an API towards the end of the semester, either in a lab or on one of your group projects. A few suggestions of really awesome, free API’s with good documentation: Google Maps, Foursquare, Spotify, Wunderground, and NYC Open Data just to name a few. When you get to Hashketball and Green Grocer, don’t get mad, get even. They will help make working with API’s super, super simple.

#4. Don’t just make the tests pass, read the tests.

Testing is crazy useful and I wish I paid more attention to it during the semester. When working on a project I’m excited about, I sometimes code like an ambulance driver going to a crash scene. Writing tests might seem boring, but there will come a time - likely during project mode or on a side project - where you will find yourself debugging a feature or a model and repeating yourself over and over again. At times I thought, “I won’t need to learn testing, I’ll just program it right the first time!” How arrogant I could (a.k.a. can) be!

dinners

While making a Potluck dinner organizer app, every time we wanted to test a feature, I had to create a new event by manually clicking around site to see what happened. Look how many I created! How much easier would it have been to write a test that could automate this process? Sometimes it felt like we were testing how fast I could create a new dinner!

Eventually your projects will just become to large to manually test the entire flow of a use case. Practice writing RSPEC and Capybara powered tests and let the computer do the work for you. That’s, after all, why we’re here.

#5 Go on, git!

As labs get bigger, get in the habit of using version control, branching, and merging your branches onto master. Don’t code on your master branch! You can get away with it when working on your labs, but when you get to project mode you will want git to be second nature. Not only will it be there to save your ass when you delete something you shouldn’t have, but its virtually impossible to collaborate on a big project without using git and github.

#6 Remember why you came here

I remember being a liberal arts, history major as an undergrad and EVERYTHING WAS SO IMPORTANT! Ideas, concepts, beliefs, principles, everything was so GD important. At Flatiron, that experience has been similar in its unique way. It is so easy to be overwhelmed by the amount there is to learn, the fear of having to get a job or learn a new language, stress at home from being so busy, getting pissed over group and table assignments, resenting your instructors for deploying labs in the wrong order or changing the schedule without telling you. I could go on…

But when you’re in the trenches it can be easy to focus all your energy on that one thing that’s in front of you at the moment regardless of how insignificant it might be in the long run. You’re a passionate person, with beliefs, goddamnit! That’s why they accepted you in the first place. But in the throes of it all, it can be easy lose perspective of why you came to Flatiron for in the first place. If you’re like me, coding is not some interest you decided to pursue on a lark. It is a lifeline, a guardian angel sent from heaven, a search and rescue mission to pull you from the rubble of your past. In short, a key to a better life. Remember what they tell you at the beginning of the semester. While at Flatiron you will learn to write code and program Ruby, but that’s not really why you’re there. You are there because you want to change your life.

Get Informed About Form_for Formed Forms

Over the weekend, in building a super-fun, and very simple one page app called Am I Ruby? with Kate, Rachel, and Sophie, we stumbled upon a most curious feature of Rails.

Embracing object oriented design to the best of our abilities, we attempted to implement the so-called “fat mode, skinny controller” principle. Simply, most of the application specific logic should be in a model and the controller should serve only as a traffic cop routing information to and from views.

Seeing as this was a simple one page and no immediate need for persistence, our search model was a tableless. In other words, it was a plain old Ruby class.

After much of our application was built, we set up our controller:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
class SearchesController < ApplicationController

  def index
    @search = Search.new
  end

  def show
  end

  def create

    @result = Search.new.am_i_ruby(search_params)
    respond_to do |f|
      f.html
      f.js
    end
  end

  private

  def search_params
    params[:search][:keyword]
  end

end

The controller would define a new Search model in order to serve up an object to the form_for @search in our main page’s view.

1
2
3
4
<%= form_for(@search, remote: true) do |f| %>
  <%= f.text_field :keyword, id: 'keyword' %>
  <%=f.submit "Search"%>
<% end %>

We were feeling pretty good…until we saw this: undefined method 'model_name' for #<Search:0x007fbc18f959d0>. OUCH!! Especially since we never even wrote a method model_name…what gives?

A quick glance at the trace shows us this:

Actionpack, activeview, activesupport…a lot of things I kind of recognize but am not super familiar about. About halfway down this huge trace I finally see one I do know about for sure: activerecord. Since our Search class was a tableless class, we never told it to inherit from ActiveRecord. So at some point, form_for tried to call model_name on a class that doesn’t have access to that method. Let’s assume that has something to do with ActiveRecord.

Model Name

As it turns out, rails form helpers rely heavily on ActiveRecord. It’s not magic: at some point rails has to introspect on the object and be able to know what controller and action to send the submitted data and how to format the params hash. That’s really it.

model_name is a part of the ActiveModel module that helps rails with its naming conventions, magically pluralizing and singularizing our model names at will. How do we fix this? We have to give our class access to this method. We search for it and find where it lives in the rails source code.

1
2
3
class Search
  extend ActiveModel::Naming
end

Refresh the page and we get a new error: undefined method 'to_key' for #<Search:0x007fbc1849fcb0>. to_key is also an activemodel method but not in the Name class, its in the Conversion class. What it does is return an array of all the object’s attributes as keys. We’re going to assume that those will later be used in form_for’s creation of the params hash.

1
2
3
4
class Search
  extend ActiveModel::Naming
  include ActiveModel::Conversion
end

Refresh again, and we get another error: undefined method 'persisted?'. We know the drill by now: persisted? is an activerecord method that returns a boolean that tells us whether or not an object has been persisted to the database. At this point we have a couple of options.

We can define persisted? and basically be done. As long as the Search class has a defined attribute for every field in the form, the page will load error free.

1
2
3
4
5
6
7
8
9
10
class Search

  extend ActiveModel::Naming
  include ActiveModel::Conversion

  def persisted?
    false
  end

end

That’s not too bad. But I have an inquiring mind and want to keep pushing this further.

We can keep going down the rabbit hole of including/extending modules based on the error messages we find.

1
2
3
4
5
6
7
8
9
class Search

  extend ActiveModel::Naming
  include ActiveModel::Conversion
  include ActiveRecord::Persistence
  include ActiveRecord::Core
  extend ActiveRecord::ModelSchema::ClassMethods

end

But since I’ve now added 5 modules, 3 of which are a part of ActiveRecord and its well after midnight and there’s no end in sight, I’m starting to get really tired of this. I have this feeling that all roads will at some point lead to ActiveRecord::Base.

When we look at the activerecord source code, we find that require 'active_model' is literally the third line. In fact, Base has no code of its own - it just requires, includes, and extends nearly all the other Activemodules in the Rails source code. So why not just make the class inherit from ActiveRecord and cover all our (ActiveRecord) bases?

1
2
class Song < ActiveRecord::Base
end

So even though our model is not going to talk to a database, we can still give it all the functionality of ‘tabled’ class that allows form_for to do its thing. But, if you’re afraid this might confuse another developer (or future you), this is all you need to add to your class to allow it to interact with form_for:

1
2
3
4
5
6
7
8
9
10
class Search

  extend ActiveModel::Naming
  include ActiveModel::Conversion

  def persisted?
    false
  end

end

The SQL Was Better Than the Original

One of the greatest benefits of developing rails applications with a framework like Ruby on Rails is the flexibility of being able to access a data from our databases without having to write ugly SQL statements. While SQL is a very powerful and efficient language for retrieving data, at least to this author, is nowhere near as elegant or intuitive as Ruby.

As part of the Rails framework, we have access to ActiveRecord, a Ruby DSL (domain specific language) that allows us to communicate with the database with ease. ActiveRecord builds out custom methods that erase the blood-brain barrier of database and our Ruby model classes. It allows our models to treat its relevant data as though they were were attributes of each instance of a particular class.

My first instinct as someone who has recently fallen in love with Ruby is to try to do as much with that language as possible. The problem with that it is nowhere near as efficient as using a simple SQL statement. In our labs and early projects, that isn’t really an issue as the size of our databases and programs don’t require the efficiency of a larger scale application. But as students, we need to train ourselves to develop good coding habits! That is where ActiveRecord comes in to play.

Experiment

For science!

To begin our test, I created a simple domain model that many of my Flatiron classmates would be very familiar with: Artist and Song. For every step along the way, I seed the database with a number of artists, and randomly assign them songs (at a rate of 100 songs per artist.)

1
2
3
4
5
6
7
8
9
10
11
12
13
def make_artists(num)
  num.times do |i|
    artist = Artist.new
    artist.save
  end
end

def make_songs(num)
  num.times do |i|
    song = Song.create(:artist_id => rand(1..100))
    song.save
  end
end

For our experiment we’ll be executing a simple query: Which artist has the most songs?

Approach 1: Ruby

We create a primitive data structure, load in our data and sort with basic ruby methods. We create a giant hash where each artist in the database is a key pointing to an array of each of their songs. We then sort by the size of values to find our artist with the most songs.

1
2
3
4
5
6
7
  def self.most_songs_ruby
    hash = {}
    Artist.all.each do |artist|
      hash[artist] = artist.songs
    end
    hash.max_by{|k,v| v.size }
  end

Approach 2: ActiveRecord and SQL

1
2
3
4
5
6
  def self.most_songs_sql
    joins(:songs).
    group(:artist_id).
    order("COUNT(*) DESC").
    limit(1).first
  end

Using ActiveRecord, we join up the artists and songs table, group our results by artist_id, and order our results by the count of songs each artist has. ActiveRecord then translates this into a SQL statement that queries our database.

1
2
3
4
SELECT * FROM songs
INNER JOIN artists ON artist_id = artists.id
GROUP BY artist_id
ORDER BY COUNT(*) DESC

After each round of seeding, we run puts Benchmark.measure {Artist.most_songs_sql} and puts Benchmark.measure {Artist.most_songs_ruby} from a rails console to determine the speed of each querying method.

1
2
3
4
5
6
7
8
9
| Artists/Songs | Ruby (seconds) | SQL (seconds) | Mult. |
|---------------|----------------|---------------|-------|
| 1/100         | .117           | .019          | 6.2   |
| 5/500         | .088           | .012          | 7.4   |
| 10/1000       | .128           | .012          | 10.93 |
| 50/5000       | .183           | .014          | 13.44 |
| 100/10000     | .300           | .015          | 19.84 |
| 500/50000     | 4.396          | .131          | 33.43 |
| 1000/100000   | 15.830         | .282          | 56.12 |

Results

Even with a small database, we immediately find SQL querying to be vastly faster than querying with Ruby. What slows us down in Ruby is needing to load every row from our database into a ruby object. The SQL code, on the other hand, sorts a pre-existing data structure and only needs to return 1 object into memory: the one artist with the most songs.

So even though finding the artist with the most songs when there are 10 artists in the database only takes 1/10 of a second with our Ruby method, ActiveRecord executes that query 10x faster.

More importantly, we can see how quickly using Ruby to sort through data becomes inefficient. As our database grows in size, Ruby’s sort times scale up with it. SQL sort times, just barely inch along. As a result, we find that as the dataset grows, SQL becomes even more efficient. So by the time there are 1000 artists and 100,000 songs in our database, SQL is approximately 60x faster. Our ruby method takes almost 17 seconds (a.k.a. You’re fired).

As an aside, I found that because of how a Ruby interpreter stores objects in memory, repeating the method on the same dataset improved run times. However, we see similar improvements in SQL run times. And of course, this assumes a static dataset which we wouldn’t find in the real world.

Conclusion

If you just need to pull one object out of the database(e.g. Artist.find_by(:name => "Lady Gaga").songs), Ruby is fine. But once you need to sort, calculate, count, or do anything else, go SQL or go home.

Buyer Beware?

Since starting Flatiron School, I’ve seen an increasing number of articles about coding schools and bootcamps. Most of the coverage is pretty positive - generally people support the idea that a) learning to code is a valuable skill and b) education should as a general rule prepare people to enter the workforce.

But one article in the Washington Post that was recently sent my way bothered me a great deal. While it talks about some of the success stories from my coding school, the author isn’t all positive.

The bigger challenge, I fear, is how well these kinds of programs will scale. Unlike most training programs, Flatiron is extraordinarily selective. Its admissions rate of 6 percent rivals Harvard’s. All admits must go through interviews with both co-founders and jump through other hoops such as coding a tic-tac-toe game (even if they have no background in programming). It’s no wonder, then, that employers return again and again to Flatiron for high-quality hires: Flatiron has not only trained these students, but has also pre-screened them to make sure it ends up offering only the most perseverant, passionate, marketable workers around.

I’m encouraged by the success stories. I really did my research to try to get into the right school and hope to have the same success. But when the authors says that Flation is successful because it is so selective and “The bigger challenge, I fear, is how well these kinds of programs will scale” - this bothers me.

Doesn’t everybody know the reason Harvard graduates are so successful is because Harvard is so selective? It’s not the education - its the pre-screened candidates getting to build a life long professional network with other pre-screened students. Why is nobody asking that question about a liberal arts college education?

It is literally insane that we not only encourage but de facto require teenagers to virtually mortgage their futures to spend four years doing something that prepares them to enter an economy that existed in 1985 but has changed drastically today.

Part of the backlash, I wonder, is an entrenched workforce - both within tech and in other industries - who might somehow feel as if coding schools are “cheating” and not playing by a set of unspoken rules: pay your dues, do your time, and only then are you allowed to get a job. We sometimes have a tendency to put down another’s route to make ourselves feel better about the fact that we took the scenic route.

So while yes - buyer beware of these coding programs and do your research as I did - but if someone wants to spend 10 or 15 thousand dollars learning to code people start to freak out, but an 18 year old can study creative writing at a small liberal arts college for 4 years of their life, spend $160,000 or be in debt for years with no immediate job prospects and no one bats an eye. Maybe policy makers do need to keep an eye on these new schools, but I wish that logic were applied to college too.

If it is incumbent on a professional school to put up high job placement numbers, why don’t we hold college to the same standard? If Flatiron and similar schools are successful I don’t think the end result will necessarily be that everyone will skip college for technical training - my hope is that their success will force colleges to take a look what they are doing to prepare students to enter the world.