Word Count Exercise in Ruby

UPDATE: I’ve learned a lot since writing this post awhile back. I can’t say enough good things about Sandi Metz’s book, Practical Object-Oriented Design in Ruby. It’s recommended reading for any new programmer on my team.


“Exercism” is a great little Ruby Gem & Website made by Katrina Owen that allows you to get crowd-sourced code reviews on practice problems. I’ve completed the first five problems and just started working on the sixth. It’s been great to get feedback from other people and learn some great new tricks in Ruby!

One of the great features of the website is the ability to compare your first iteration with your final/approved version. I’d love to show you the refactoring I did for the “word count” exercise.

Here are the instructions for the challenge:

WORD COUNT

Write a program that given a phrase can count the
occurrences of each word in that phrase.

For example for the input "olly olly in come free":

olly: 2
in: 1
come: 1
free: 1

Sounds simple enough. Let’s do this!

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
class Phrase
  attr_accessor :words

  def initialize(words)
    @words = words
  end

  def word_count
    word_list = {}
    words.split(" ").each do |word|
      word_list = process_word_in_list(word, word_list)
    end
    word_list
  end

private
  def process_word_in_list(word, word_list)
    return word_list if word.empty?
    word_list[word] ||= 0
    word_list[word] += 1
    word_list
  end

end

Seemed pretty straight-forward at first. We’ll make a hash, then add to a counter inside the hash if the word shows up. Easy peasy.

But there are three issues:

  1. The word_count method isn’t super easy to read
  2. The process_word_in_list method is crazy ugly
  3. This version doesn’t pass all the tests. We must also account for word case and punctuation.

(This is about to get much uglier before it gets better)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
class Phrase
  attr_accessor :words

  def initialize(words)
    @words = words
  end

  def word_count
    prep_words_for_counting()
    word_list = {}
    words.split(" ").each do |word|
      word_list = process_word_in_list(word, word_list)
    end
    word_list
  end

private
  def process_word_in_list(word, word_list)
    word = parse_word(word)
    return word_list if word.empty?
    word_list[word] ||= 0
    word_list[word] += 1
    word_list
  end

  def prep_words_for_counting
    words.gsub!(",", " ")
  end

  def parse_word(word)
    word.gsub!(/[^a-zA-Z0-9]/, "")
    word.downcase!
    word
  end

end

Oh lord. This thing is a beast. But I didn’t see much hope yet, so I sent it off for review.

The first review of my code exposed a few great learning points:

  1. You can create a Hash with a default value. Hash.new(0) creates a hash and sets “0” as the default value for every new key. This solves my word_list[word] ||= 0 problem at line 21.
  2. I don’t need to downcase each word at line 32 – I can downcase the whole string.
  3. The magical Ruby String method, scan

Scan iterates through str, matching the pattern (which may be a Regexp or a String). For each match, a result is generated and either added to the result array or passed to the block.

Some examples:

a = "cruel world"
a.scan(/\w+/)        #=> ["cruel", "world"]
a.scan(/.../)        #=> ["cru", "el ", "wor"]

With my new-found Ruby knowledge, here’s v2:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
class Phrase
  attr_accessor :words

  def initialize(words)
    @words = words
  end

  def word_count
    word_list = Hash.new(0)
    words.downcase.scan(/\w+/) do |word|
      word_list[word] += 1
    end
    word_list
  end

end

So much cleaner! Using scan removed a huge portion of my checks and pattern matching. And although I think this code should get a good grade, I learned even more from the great community on Exercism.

Raymond Vellener from the Netherlands offered this simple piece of advice: “Great improvement from the last submission. Perhaps you could consider separating the word_count method?”

Definitely a nitpick, but I could see Ray’s point. The words/downcase/regex/array combo in the middle of that method could be more expressive. How about we extract these details into its own block?

The final result:

class Phrase
  attr_accessor :words

  def initialize(words)
    @words = words
  end

  def word_count
    list = Hash.new(0)
    each_word { |word| list[word] += 1 }
    list
  end

private
  def each_word
    words.downcase.scan(/\w+/) { |word| yield word }
  end

end

My new each_word method is very expressive: it tells the reader exactly what will come from it, while moving the implementation details to a discreet private method.

That’s it! This exercise annoyed me at first, but I loved learning about scan and found a great example of moving complex methods into my own Ruby block.

Have any other tips or tricks? Have you done any exercises on Exercism or need help completing one? Let me know on Twitter!

UPDATE

I’ve gotten some great feedback on Twitter about simplifying the Hash buildup using inject or each_with_object.

Jérémy Lecour gave me a good rule on when to use each: “For me: if you accumulate (build a Hash, …), use #each_with_object, but if you really mutate/change the object, then use #inject”. To that end, I’ve further simplified the example:

class Phrase
  attr_accessor :words

  def initialize(words)
    @words = words
  end

  def word_count
    each_word.each_with_object(Hash.new(0)) { |word, hash| hash[word] += 1 }
  end

private
  def each_word
    words.downcase.scan(/\w+/)
  end

end

Hi there, I'm Jon.

Writer. Musician. Adventurer. Nerd.

Husband. Dad to three. From: all over the place.

Exvangelical, but still amazed. Enneagram 7.