The syntax of a programming language is one of the key points when it comes to the difficulty in its learning process. Here we examine the traditional syntaxes for iterating through a collection of any type, and place them face to face with a newer, more semantic one based on objects called Enumerators.
Introduction
Let’s imagine we’ve got a collection of items, such as an array, a hash or a set. We may want to be able to iterate through all its elements, either with a determinate order or without it. To achieve this task in most programming languages we usually resort to loop structures, for example a for loop. In this case, we will need to use a special syntax in order to describe the conditions in which the iterations will be made. For example, in a C-like language:
for (int i = 0; i < arr_size; i++) {
// Code goes here
}
Likewise, in a Pascal-style language we could write something like the following:
for index := 0 to arr_size do
begin
// Code goes here
end;
Since loops of this kind have a very generic syntax, and they’re frequently used for these tasks, some languages provide a for..in loop able to iterate through the items of a collection:
// C++11
for (auto& item : collection) { }
// Pascal
for item in collection do ;
These syntaxes can sometimes become cumbersome, for example, when using the
::iterator
class of STL containers in C++:
for (vector<MyClass>::iterator it = collection.begin(); it != collection.end; ++it) { }
This makes the code more difficult to read and makes it less semantic, since a programmer who isn’t aware of the use of iterators in C++ wouldn’t find it easy to understand.
Let’s fix this.
The Enumerable mixin
each
Let me introduce you to our first Ruby enumerator, given by the each
method:
collection.each do |item|
# Do things with 'item'
end
Well that was easy. But still, a couple of things are happening here. First, we
are calling the each
method for collection
(suppose this is an array, for example)
which accepts no parameters (so we could have written .each() do...
).
Then, we are specifying a block between do
and end
; this is a piece of
code received by the method, who can call it on demand (more on that later).
The block does however receive one parameter, indicated between vertical bars
|item|
. Each time the block is run, item
will contain a different element
of the collection.
This method is nonetheless a very special one, since a whole module of code
can be incorporated to any class that implements it. This new module is the
Enumerable mixin1. A mixin is a piece of code that adds
functionality but isn't autonomous on its own. Enumerable incorporates several
methods that take advantage of the each
method to be able to retrieve elements
and look for items with certain properties, map functions to all of them,
accumulate elements using an operator, etc.
In the following subsections we will take a look at the main purposes these methods can fulfill. Not all the methods available will be covered, but an exhaustive reference can be found in the Ruby documentation.
Taking elements
Extracting elements from a collection is usually a functionality implemented directly into the class, but despite that, Enumerable includes some basic methods which may come in handy.
Firstly, the take
method is pretty straightforward: it returns as many
elements from the collection as specified. Likewise, the first
method is as simple as
it sounds: if no parameters are passed, it returns the first item. Otherwise,
it can achieve the same task as take
.
arr = [6, 2, 8, 3, 1, 8]
arr.take 3
=> [6, 2, 8]
arr.first
=> 6
arr.last
=> 8
Other interesting methods that return elements of a collection are cycle
and
drop
. The first one is able to indefinitely provide with items by cycling
through the collection, whereas the latter returns the elements left after
dropping as many as the parameter indicates.
(0..2).cycle(3).to_a
=> [0, 1, 2, 0, 1, 2, 0, 1, 2]
(0..5).drop 3
=> [3, 4, 5]
Detecting items with certain properties
The Enumerable mixin incorporates a lot of methods for detecting and finding
elements by user criteria. The simplest and less informative ones just return
a boolean value; for instance any?
, all?
, one?
, none?
:
arr = [5, 2, 7]
arr.any? &:even?
=> true
arr.all? &:even?
=> false
arr.one? &:even?
=> true
arr.none? &:even?
=> false
The syntax used in this example is a shortcut for passing a block, that is, the
call to any?
above is equivalent to the following:
arr.any? do |i|
i.even?
end
=> true
Other methods that return specific elements are find
, find_index
, max
and min
. The find
method takes a block, calls it for each element of the
collection, and returns the first element that makes the block evaluate
as true. find_index
behaves similarly but returns the index of the matching
element, instead of the element itself. Lastly, the min
and max
methods
return the minimum and maximum values respectively, accepting a block that
can serve as a comparison operator.
(10..20).find do |i|
i % 6 == 0
end
=> 12
(10..20).find_index do |i|
i % 6 == 0
end
=> 2
arr.min
=> 2
arr.max
=> 7
%w(hola hi hei).min do |a, b|
a.length <=> b.length
end
=> "hi"
- Note
- Comparison operators in Ruby work as a three-way comparison2, that is,
they return 1, 0 or -1 according to whether , or .
The spaceship (
<=>
) operator implements this kind of behavior.
Filtering elements
An useful application of the iteration through a collection is to filter its
elements and obtain a new collection, either by retaining or dropping the ones
which match a certain criterion. This can be achieved with two methods called
select
and reject
:
data = [0, 2, 3, 0, 0, 0, 1, 1, 5, 6, 0]
data.select &:zero?
=> [0, 0, 0, 0, 0]
data.reject &:zero?
=> [2, 3, 1, 1, 5, 6]
- Note
- As seen in the example, these methods don’t actually modify the object
identified by
data
, but instead they return a new array as a result. Their “exclaimed” counterparts,select!
andreject!
, on the contrary, do alter the object with the result of the filter.data.select! &:zero? => [0, 0, 0, 0, 0] data => [0, 0, 0, 0, 0]
The exclamation notation for methods that alter the object they’re called on, or have side effects in general, is pretty common in Ruby, and the manipulation methods listed on the following subsection also have an “exclaimed” duplicate.
Manipulating collections
Enumerable provides several methods that allow reordering and applying
different processes to items. The sort
method is simple and works as
expected, using Quicksort as the underlying algorithm3.
A block can be passed to sort
to be relied on as comparison operator.
l18n = ["Hola mundo!", "Hello world!", "Salut le monde!"]
l18n.sort # Lexicographic order
=> ["Hello world!", "Hola mundo!", "Salut le monde!"]
l18n.sort do |a, b|
b.length <=> a.length # Inverse length order
end
=> ["Salut le monde!", "Hello world!", "Hola mundo!"]
The zip
method allows to create tuples of elements by passing one or
several collections:
langs = [:es, :en, :fr]
=> [:es, :en, :fr]
langs.zip l18n
=> [[:es, "Hola mundo!"], [:en, "Hello world!"], [:fr, "Salut le monde!"]]
The group_by
method classifies the items of the collection
according to the different values the block returns and composes a Hash:
(1..8).group_by &:even?
=> {false=>[1, 3, 5, 7], true=>[2, 4, 6, 8]}
Finally, a more generic methods that enables the programmer to apply a function
to every element and put together an array with the results is map
, also
known as collect
:
(0..8).map do |i|
i % 5
end
=> [0, 1, 2, 3, 4, 0, 1, 2, 3]
Other methods
There are two more methods worthy of mention: reduce
, which is able to
aggregate values into an accumulator, and lazy
, which creates a lazy
enumerator out of the collection (an object that can serve as a wrapper
for manipulation of infinite elements).
Firstly, reduce
generalizes the task of iteratively merging elements
into any kind of result such as a sum or a mean. It can be used jointly with
the map
method to gather information about the collection:
# Sum the length of all the words in a string
%w(I can has cheezburger).map(&:length).reduce(&:+)
=> 18
For its part, lazy
can take any collection and create a lazy enumerator out
of it, but it’s most useful when used with infinite sequences:
nonnegative = (0..Float::INFINITY).lazy
fibonacci = nonnegative.map do |n|
if n <= 1
n
else
# Sum the two previous elements
fibonacci.take(n).drop(n - 2).reduce(&:+)
end
end
fibonacci.take(10).force
=> [0, 1, 1, 2, 3, 5, 8, 13, 21, 34]
Note that in this last example the force
method is used to make the interpreter
calculate the results. Otherwise, it would just return another lazy enumerator
to allow chaining methods. This particular version of the fibonacci sequence is very slow, because it
will recursively calculate every number in the sequence out of the previous
ones. A memoized version can be written with a lambda function4. Later
on, another faster way to enumerate the Fibonacci sequence will be shown.
Using Enumerable in your class
Adding all this functionality to a class in Ruby is as easy as implementing an
each
method and including the Enumerable mixin, like in the following example:
class Blog
include Enumerable
def initialize
# The class uses any kind of internal collection
@posts = [
"Mean inequalities",
"Introduction to JavaScript",
"Introduction to Category Theory",
"Genetic algorithms"
]
end
# The each method should invoke 'yield' for every element
def each
@posts.each do |p|
yield p
end
end
end
libreim = Blog.new
libreim.first
=> "Mean inequalities"
Using blocks
Being able to receive and call blocks of code within methods is the key point for
Enumerable methods to work. The each
method provides the ability to iterate
through all elements, which allows the rest of the methods to act on each element
according to the returned value of a block.
As we’ve seen before, a block in Ruby is just a piece of code wrapped between
do
and end
. Method symbols can be used together with an ampersand (&) as
a short-hand for blocks. This means that if we have a collection of objects
of class MyClass
, and they implement method my_method
, then its symbol is
:my_method
, and one could use the syntax &:my_method
to compact the following
block:
do |i|
i.my_method
end
Any method defined by the programmer can receive and call a block, either via the
yield
statement or the call
method. In this first example, we use yield
to
execute a block without parameters:
def call_block
puts "Before calling block"
yield
puts "After calling block"
end
call_block do
puts "This is block"
end
Output:
Before calling block
This is block
After calling block
The second example shows how to receive a block as a parameter with the ampersand
syntax, execute it with call
and pass parameters to it:
def call_block(&block)
puts "Before calling block"
block.call 4
puts "After calling block"
end
call_block do |arg|
puts "I was passed argument #{arg}"
end
Output:
Before calling block
I was passed argument 4
After calling block
Lastly, blocks can be stored as objects of class Proc in variables5. In order to do this, we can just pass the block to the constructor:
square = Proc.new do |n|
n**2
end
square.call 5
=> 25
Procs can be passed as blocks by using the same ampersand syntax. Notice that in
this case the variable itself is used instead of a symbol (:square
), because we
want to pass the whole object instead of its name.
def apply_proc(arg, &proc)
proc.call arg
end
apply_proc 5, &square
=> 25
The Enumerator class
Most of the methods seen above return processed results when they receive a block, but when they don’t, they can return an object of class Enumerator6. This object is a wrapper that contains the information necessary to iterate through the collection.
Additionally, Enumerators implement the each
method as well, so
Enumerable methods can also be called on them. This means Enumerators can be
chained, which is useful to modify the way we act on collections without implementing
new methods. For example, if we wanted to enumerate an array of items starting from
the back and grouping them according to their index modulo 3, we would write something
like the following:
arr = %w(a b c d e)
arr.reverse_each.group_by.each_with_index do |item, index|
index % 3
end
=> {0=>["e", "b"], 1=>["d", "a"], 2=>["c"]}
Notice how the chaining order changes how elements are returned. Indexes are not
returned in the example above, but they can be obtained by just swapping group_by
and each_with_index
:
arr.reverse_each.each_with_index.group_by do |item, index|
index % 3
end
=> {0=>[["e", 0], ["b", 3]], 1=>[["d", 1], ["a", 4]], 2=>[["c", 2]]}
Enumerators need not be created out of existing collections, method new
of the
class can be used as well. This method accepts a block with a parameter that will
act as “yielder object”. The block is expected to iteratively push each element to
the yielder, which will in turn retrieve only the elements needed, pausing the
generation of elements otherwise. A simple Enumerator can be just a countdown from
10 to 1:
countdown = Enumerator.new do |yielder|
n = 10
until n == 0
yielder << n
n -= 1
end
end
countdown.take 5
=> [10, 9, 8, 7, 6]
Since the yielder will only use the necessary elements, Enumerators can generate infinite sequences. For example, the following Enumerator generates prime numbers, and the next one is a fast Fibonacci generator.
# Primes Enumerator
primes = Enumerator.new do |yielder|
n = 2
loop do
yielder << n
# Find next prime
prime = false
until prime
n += 1
prime = (2..Math.sqrt(n).floor).all? do |i|
n % i != 0
end
end
end
end
primes.take 10
=> [2, 3, 5, 7, 11, 13, 17, 19, 23, 29]
# Fibonacci Enumerator
fibonacci = Enumerator.new do |yielder|
a, b = 0, 1
loop do
yielder << a
a, b = b, a + b
end
end
# Calculate 500 elements, retrieve last
fibonacci.take(500).drop(499)
=> [86168291600238450732788312165664788095941068326060883324529903470149056115823592713458328176574447204501]
Have you noticed the loop
word used above to create an infinite loop? It’s actually
just a function using an Enumerator which generates infinitely many nil
values. It can be seen
as well how the typical []
accesor can’t be used on an Enumerator, simply because
it’s not implemented; but we can do that for ourselves:
class Enumerator
def [](index)
take(index)[index - 1]
end
end
fibonacci[100]
=> 218922995834555169026
Conclusions
In this article we’ve studied how different syntaxes work for the same purposes of iteration. Classic
for
loops are generic and not very semantic. This means that, in addition to being a
special structure, they commonly force the programmer to expose the logic used to
iterate through a collection. This is something that generally should be avoided, and
the class of the collection should be the one providing the functionality.
Furthermore, different tasks can be achieved with generic loops, that then need an explanation of some comments. To solve this, the Enumerable mixin incorporates very specific iteration methods that are mostly self-explanatory, and ease programming as well as later readings of the code. Finally, in addition to iterating through existing collections, items can be generated on demand with Enumerator objects. Enumerators and lazy Enumerators are powerful tools to calculate elements of finite and infinite sequences. I’d encourage you to give them a try, be creative and find new ways to use them.