So You’re Saying There’s a Chance…

Feb 25th, 2015 12:37 pm

I have always been fascinated by probability. Everything that is going to happen can be expressed in terms of probability. Of course, just because something can be expressed that way doesn’t mean we always have the tools to accurately express it. But sometimes we do know, objetively, what the likelihood something is going to occur and we ignore it anyway. That’s because humans are notoriously bad at understanding probabilities and effectively changing our behaviors based on those numbers. Very few people are afraid of driving, but many (including your faithful author) are terrified of flying even though for sometime the most dangerous part of a commercial flight is driving to the airport.

I think part of the reason for this is that for the most part, we’ve been taught to approach probablistic problems with math. Why is this a problem? Because I suck at math. Also, we descended from apes. They don’t typically consult actuarial tables when determining whether the sound in the bush is a predator.

The Birthday Problem

Here’s a quick question: If you stick a random group of people in a room, what the odds that 2 of them have the same birthday? Or to be more specific, how many people do you need for there to be at least a 50% chance that two of them share a birthday. The first time I heard this I thought “well you’d probably need 182.5 people.”

That’s wrong. You only need 23 people for there to be a better than 50% chance that two of them share the same birthday. Do you know how to solve this with math?

MATH

This is ~~garbage~~ pretty hard to understand.

Here’s why I LOVE programming - especially object oriented programming. Ruby allows me to think about how I would approach solving the problem, not just algorithmically but in the literal sense - with words. How would we solve this problem? We’d take a bunch of people, give them a random birthday, and then check to see if any of them share the same birthday. We’d do this a lot - say, 50,000 times - and take the average. Programming lets us do this in a couple lines of code. Math does not. Make a new instance of the Birthday class and pass in how many people you want to test for and how many simulations you want to run.

class Birthday

  attr_accessor :people, :matches

  def initialize(people, simulations)
    @people = people.to_i
    @matches = 0
    @simulations = simulations.to_i
    @simulations.times do |i|
      self.call
    end
    self.percents
  end

  def call
    birthdates = []
    @people.times do |i|
      birthdates << rand(365)
    end
    if birthdates.size != birthdates.uniq.size
      @matches += 1
    end
  end

  def percents
    percent = (@matches / @simulations.to_f.round(4))*100
    puts "With #{@people} people in a room, there is a #{percent}% chance"
    puts "that at least 2 people will share a birthday."
    puts "#{@simulations} simulations were run."
    puts ""
  end

end

Here’s what we get when we run our simulation 50,000 times (with the expected %’s per wikipedia listed as reference:

| n  | % expected | % actual |
|----|------------|----------|
| 5  | 2.7%       | 2.62%    |
| 10 | 11.7%      | 11.8%    |
| 20 | 41.1%      | 40.9%    |
| 23 | 50.7%      | 50.9%    |
| 30 | 70.6%      | 70.7%    |
| 40 | 89.1%      | 89.1%    |
| 50 | 97%        | 97.1%    |
| 60 | 99.4%      | 99.4%    |
| 70 | 99.9%      | 99.9%    |

Looks like we were pretty close. That was pretty fun! Let’s do another one. This might cause some controversy.

Let’s Make a Deal

Marilyn vos Savant, is aptly named as she’s a genius with one of the highest recorded IQ’s on record, if you care about that sort of thing. She even had a syndicated column in Parade magazine where people wrote in with brain teasers. Her most famous column was about the notorious “Monty Hall Problem.” She answered the question correctly, but not before being publicly lambasted by some of the country’s “leading” math and statistics experts. To this day the problem causes a fight whenever it comes up.

Here’s the scenario. You’re on a game show and the host says “There are 3 doors. Behind one of these doors is a car. Behind the other two are goats.”

You pick Door #1. The host will then open one of the two remaining doors that contains a goat - let’s say he opens Door #3. Then you are given the option to keep your original choice of Door #1 or switch to Door #2.

Here’s the question: is it better to keep your original pick or switch to the other remaining door? Or does it not matter?

Of course, when I first heard this problem I thought “Easy - it doesn’t matter whether you switch or not. Its a 50/50 coin flip. Two doors, two choices. One is a goat, one is a car. The answer is it doesn’t matter whether you switch.”

It really hurt my brain to find out this was very wrong. It is always better to switch. Like 100% better. Twice as good. But I didn’t feel too bad. Math professors were getting it wrong too.

Writing code to solve this problem not only gives you the answer, but helps you understand the problem the better.

class MontyHall

  attr_accessor :doors, :user_choice, :total_games, :staying_wins, :switching_wins

  def initialize(times)
    @total_games = 0
    @staying_wins = 0
    @switching_wins = 0
    @times = times
  end

  def call
    @times.times do |i|
      @doors = ["goat", "goat", "goat"]
      hide_car
      get_user_choice
      @total_games += 1
      if staying_wins?
        @staying_wins += 1
      elsif switching_wins?
        @switching_wins +=1
      end
    end
  end

  def hide_car
    doors[rand(0..2)] = "car"
  end

  def get_user_choice
    @user_choice = rand(0..2)
  end

  def staying_wins?
    doors[user_choice] == "car"
  end

  def switching_wins?
    doors[user_choice] == "goat"
  end

  def percents
    puts "After #{@total_games} games:"
    puts "Staying every time wins #{(@staying_wins / @total_games.to_f.round(2))*100}%"
    puts "Switching every time wins #{(@switching_wins / @total_games.to_f.round(2))*100}%"
  end

end

This script puts “car” in a slot in an array, gives the user a random pick, and then tests whether keeping the user’s pick or switching results in winning the car. In the end, of course we find that when you switch doors you will win the car 66.6% of the time and will win the car only 33.3% when you keep your original choice.

But the code skips a step - it skips the part where the host opens a door and then gives the contestant the choice. Why does it skip that part? BECAUSE IT IS IRRELEVANT TO THE OUTCOME.

The above code is an exercise in futility, as it simply an obfuscated way of testing how random your computer’s random number generator is. But that’s the point of Monty Hall. It tries to hide how simple the problem is. The question shouldn’t be “is it to your advantage to switch doors?” That is just a fancier way of asking “what are the odds that your original pick was correct?” That is obvious and intuitive: 1 in 3. The fact that the host opens one of the doors DOES NOT CHANGE THIS. Your original pick is still only a 1 in 3 shot of being right. Ergo, the one remaining door will have the car 2/3 of the time.

This is why programming is fun. Without the benefit of math, we have to express these complex problems in abstract, metaphorical terms that are easier to understand regardless of your background in math.

Head over to the NYTimes feature on the Monty Hall Problem to play an interactive version of the game.

Bash: How I Learned to Stop Worrying and Love the Terminal

Feb 15th, 2015 7:34 am

Background

When you first had a computer, how did you use it? Some of that will depend on when you were born. My family first purchased a computer (custom built IBM Compatible Clone!) when I was 9, in 1992. It did not have Windows. It was DOS.

DOS was Microsoft’s version of UNIX.

DOS

One time, I thought it would be funny to delete the contents of my computer’s autoexec.bat file. This was basically a set of plain text instructions for what the computer should do when you turned it on. Turns out “nothing” is what happens when you delete those contents. Literally. We had to call a tech-savvy neighbor who brought a copy of his file to rewrite ours.

At 9 years old I was opening files, making new directories, and searching all from the DOS prompt. But then Windows happened, I got into Apple products, and was spoiled forever. Typing clunky commands seemed pretty archaic compared to pointing at what you want and getting it on demand, and less precise. I thought UNIX would be lost on me forever.

Enter the Terminal

GCT
^This is a terminal. People are so afraid of Terminal they call this thing Grand Central Station instead.

This, it turns out, was incorrect thinking. Windows and OSX are the bouncers of the computing world. They tell you what you can do and how to do it. Terminal, on the other hand is like the guy who works at the bar, lets you in when the place is closed and lets drink whatever you want.

This is your OS

The biggest benefit is speed.

Most people in Flatiron School know about tab completion at this point. Quite simply, while in a directory, start typing the name of a file or folder, hit tab, and Terminal will finish your command for you. From your home directory, “$cd doc” hit tab becomes “$cd Documents” pretty quickly. If you have multiple files that start with similar strings, there is a helper shortcut for you too. Option+ Tab will give you a list of folders and files in your directory starting with a string such that “$cd d” option+tab will yield Desktop/ Documents/ Downloads/ Dropbox/ dev/ so you’ll at least know what you’re working with.

The real fun is the speed and convenience that comes from automation. With a few quick key commands, custom Bash scripts, and aliases. You can save yourself major time. Let’s look at a few by opening up our bash profile.

Building our own scripts

cd ~
“What was the name of that file? types ls”
“Hmm, must be a hidden file” types ls -a
OK. subl .bash_profile
“Man that was annoying, and now I’ve forgotten what directory I used to be in.”

We could simply type subl ~/.bash_profile but even that get tedious. So let’s add this function to our profile. Now when we type “subl-bash” anywhere in Terminal. And guess what, tab completion works for functions too.

  function subl-bash {
          subl ~/.bash_profile
      }

That was fun. Let’s make another one. What’s something you want to do almost every time you make a new directory? Generally its navigate into that directory.

mkdir my-awesome-directory
cd my-awesome-directory

I think we can cut that down to one line of code. Like ruby, Bash functions can take an argument (), and then we access them in our function with $. So “mkcd directory name will now create a directory and navigate to it one shot.

  function mkcd () {
          mkdir $1; #makes the directory
          cd $1;    #navigates to the new directory
      }

What’s another thing we do as Flatiron students literally every day? We clone a git repository, try to remember the name of repo, and then navigate into that directoy. With this script, utilizing the bash command ‘basename’ [1] you can clone and navigate with a single command. Use this script by simply typing “gcl” and pasting the address of the github page you want to clone.

#alias gcl="git clone"  <-- if you have a gcl alias already, make sure to comment it out.

function gcl () {
          git clone $1;
          cd `basename $1 .git`;   #in bash, everything between backticks 
                                   #will be replaced with the output of the command. 
                                   #basename chops off the path from the file name
                                   # .git removes the file extension 
  }

This one’s my favorite so far. First, install the chrome-cli gem. This gives you command line access to chrome.

Then, add this function to your .bash_profile. Enter “labs” into Terminal and in one shot we navigate to our labs directory, close whatever nonsense we have open in Chrome, and go to the Ironboard login page.

  function labs {
          cd ~/dev/web-007/labs-ruby; #this should be the path to wherever your labs folder is.
          chrome-cli close -w; #closes whatever time-wasting tabs you have open
          chrome-cli open http://learn.flatironschool.com/users/auth/github; #opens the ironboard homepage
          clear; #clears the terminal window
  }

This next one navigates us to my blog’s directory, opens the posts folder in sublime, opens chrome to my local preview, and generates a local preview of the blog. Your function might have to change a bit if you are using a different blogging platform. Good thing Octopress lets use the command line!

  function blog {
          cd ~/dev/blog/jeremysklarsky.github.io/; #navigates to my blog's directory
          subl ~/dev/blog/jeremysklarsky.github.io/source/_posts; #opens up the posts folder in Sublime
          chrome-cli open localhost:4000; #Opens up live preview page in Chrome
          rake preview; #generates live preview (give chrome a second for the preview page to refresh)
  }

And for good measure, let’s add some custom aliases.

  alias web='cd ~/dev/web-007'
  alias dev='cd ~/dev'
  alias ruby-labs='cd ~/dev/web-007/labs-ruby'

Now we have one word shortcuts to get our most commonly used directories. Think how much time we can save just over the course of the semester with a few of these shortcuts. The Terminal is our oyster.

Bruce

References:
1. http://unix.stackexchange.com/questions/44735/how-to-get-only-filename-using-sed

Yelp! I Need Some Data, Not Just Any Data…

Feb 14th, 2015 12:31 pm

Authors note: This is my first ever post!

Background

My first two weeks at Flatiron School have been an absolute whirlwind. I can feel us slowly approaching a time when we can actually deploy our skills and do something useful.

Diving into Ruby headfirst has been a joy, though it remained largely in the realm of theoretical and hypothetical. I’m not one scoff at intellectual exercises for their own sake, but the painstaking task of iterating over made up hashes was starting to test my own powers of concentration and emotional constitution.

Among the more powerful lectures have involved data scraping and parsing demonstrations.

In a couple of quick clicks our lead ~~guru~~ instructor managed to pull all the prices of every apartment listed on craigslist and find the average. Looks like we haven’t been iterating for nothing. BOOM!

Kramer Shocked
^everyone in class.

Writing a Yelp-er Method

That looked like fun and I wanted a turn. For some years I’ve been fairly obsessed with data collection and manipulation. But my participation in the sport was limited by both, obviously, my data collection and manipulation abilities. But no longer! In my first two weeks as a developer, I’ve learned to adopt the idea that I am not limited by what my computer tells me what I’m allowed to do. It is a tool to do my bidding!

For my test case, I thought scraping Yelp would be a good place to start. After a few clicks, I realized the easiest way to access Yelp’s API would not be parsing a long JSON string like we had in the Spotify lab. Luckily for you, dear readers, Yelp’s API documentation lead us to a very helpful Ruby gem for parsing search results directly into a hash with tons of data pertaining to your search.

Let’s get started:
1. In terminal, run ‘gem install yelp’.
2. ‘require yelp’ in your ruby program, or include ‘yelp’ in your gemfile.
3. Make a million dollars.

require 'yelp'

client = Yelp::Client.new({ consumer_key: YOUR_CONSUMER_KEY,
                            consumer_secret: YOUR_CONSUMER_SECRET,
                            token: YOUR_TOKEN,
                            token_secret: YOUR_TOKEN_SECRET
                          })

When the interpreter runs this code, your client object has access to all of the methods defined in the gem. You interact with Yelp’s data by using a the parameters laid out in their API within the gem’s syntax. Yelp gives you plenty of helpful tags to customize your search. Check out the documentation in the search API for some of them.

italian_places = Hash.new{|k, v| k[v] = {}}

params = { term: 'italian',
           category_filter: 'restaurants'
         }

locale = { cc: "US", lang: 'en' }

client.search('Park Slope', params, locale)

For our test search, I focused on some pretty basic ones: search for Italian restaurants in the nearby neighborhood of Park Slope. The return of this search is an array like object called a Burst, which contains instances of Business objects.

Yelp then has a separate Business API for accessing the info in the returned data structure. The gem lets us call them as methods on each individual business object returned in the Burst. Here are a few of the major ones.

## search
response = client.search('San Francisco')

response.businesses
# [<Business 1>, <Business 2>, ...]

response.businesses[0].name
# "Kim Makoi, DC"

response.businesses[0].rating
# 5.0

## business
response = client.business('yelp-san-francisco')

response.name
# Yelp

response.categories
# [["Local Flavor", "localflavor"], ["Mass Media", "massmedia"]]

Now here is the fun part: parsing the data. We can then iterate through each object’s keys and store the values in our own hash, which I’ve created and called italian_places.

client.search('Park Slope', params, locale).businesses.each do |place|
  italian_places[place.name] = {
    :review_count => place.review_count,
    :rating => place.rating,
    :phone => place.phone}
end

For the moment, I can’t seem to get more than ~17 results to show up. This may have something to do with my amateurish Yelp authorization key. In any event, the results look something like this:

italian_places = {
 "Piccoli Trattoria"=>{:review_count=>213, :rating=>4.5, :phone=>"7187880066"},
 "Al Di La Trattoria"=>{:review_count=>516, :rating=>4.0, :phone=>"7186368888"},
 "Peppino's"=>{:review_count=>218, :rating=>4.5, :phone=>"7187687244"},
 "Mariella"=>{:review_count=>62, :rating=>4.5, :phone=>"7184992132"},
 "Scottadito Osteria Toscana"=>{:review_count=>484, :rating=>4.0, :phone=>"7186364800"},
 "Giovanni's Brooklyn Eats"=>{:review_count=>178, :rating=>4.0, :phone=>"7187888001"},
 "Scalino"=>{:review_count=>142, :rating=>4.0, :phone=>"7188405738"}}

From there we have a hash with which we can do anything we want. Sort by rating, sort by rating and most reviews, create adjusted ratings for restaurant categories based on the category average (i.e. grading the restaurant on a curve), whatever we want. The world (wide web) is our oyster!

Blog Archives Newer →

Jeremy Sklarsky

The World Wide Web is My Oyster.

So You’re Saying There’s a Chance…

The Birthday Problem

Let’s Make a Deal

Bash: How I Learned to Stop Worrying and Love the Terminal

Background

Enter the Terminal

Building our own scripts

Yelp! I Need Some Data, Not Just Any Data…

Background

Writing a Yelp-er Method