Wednesday, January 7, 2009

Streams, Lazy Programs for Lazy Programmers.


Streams are nothing new at all. Furthermore they have been introduced at least once into the Ruby World on James' excellent blog, Shades of Gray, with which many of the Ruby community are familiar, at least I hope ;). I can only encourage you to read the whole High Order Ruby series. For those looking for more reading on this topic the following link provides a gold mine; High Order Perl.

Although I had read James' blog entry about two years ago and was fascinated by it, I somehow never had the time to play around with streams. It took a second motivation which I found in a remarkable site: Structure and Interpretation of Computer Programs, Video Lecture. This was an absolute treasure for me. Looking at Abelson's presentation of Streams I really wanted to have that in Ruby.

For those who do not like my blah, blah, and that is probably about everybody save YHS, but ok it is my Blog after all, I have compiled a small TOC so that they do not have to read the dull stuff while I can still write it, yeah!





TOC

Streams in Ruby
A verbose introduction why, how, when I did what and why, how and when I did not do other things.

Diving into Streams
A step by step introduction into streams and their implementation in Ruby.

Ruby Meets Streams
Enumerable Ruby Objects can easily be seen as streams too.

Streams' Applications
And yes of course they do serve.





Back to TOC
Streams in Ruby

A stream is a lazy, potentially endless data structure. In Scheme a stream is constructed with the cons-stream special form. As an example Abelson constructs the stream of all integers as follows:
1 (define (integers-from start)
2 (cons-stream start (integers-from (1+ start) ) )
3 )
4 (define (integers)
5 (integers-from 0)
6 )

As such the cons-stream form constructs a stream by creating a Lisp cons cell with a car that consists of an already computed value, called head and a cdr that contains a promise to compute a new head and a new promise.

A practical side note: This was the most difficult thing to remember for me. The concept seems simple enough, but whenever I had a problem in my code I had forgotten that the promise has to return a new stream and not just a new head. For me the following two invariants on streams were the absolute cornerstone for understanding them:
  • While constructing a stream the tail is a promise and thus not evaluated.
  • The tail of a stream will always yield a stream again, potentially the empty stream.
Once these two rules sank into my brain it was amazing how easy stream manipulation can become.

Given the first of the two invariants above it is quite clear how to specify a
promise in Ruby, with code blocks.
We can therefore allready imagine how a cons_stream method will look like in Ruby.
1 def cons_stream head, &tail
2 # . . .
3 end
The result of the cons_stream call of course is an object implementing Streams in Ruby. This can easily be verified on the command line:
ruby -rlab419/functional/streams -e 'include Lab419::SimpleStream;p cons_stream( 42 ){ EmptyStream }'
#<Lab419::SimpleStream::Stream:0xb7d85664 @tail=#<Proc:0xb7d8572c@-e:1>, @head=42>

We can very well see that the promise is stored as a proc object and the head as an actual value.

Here however I will take a different path then James took, and I encourage you (again) to read his blog if you prefer the more idiomatic Ruby way to look at streams. I however really wanted
to emphasize on the abstraction streams constitute. And although it is wonderful how easily this can be done in Ruby, I believe that the stream paradigm itself merits to be looked at
at a more abstract level.
This said, my implementation gives you the best of the two worlds of course. The functional layer being only a wrapper around the Ruby Object layer you can use both at will.




Back to TOC
Diving into Streams

Let us have some fun now and play with streams:
1 include Lab419::SimpleStream
2 constant = cons_stream( 42 ){ constant }
3 p constant.head        # --> 42
4 p constant.tail.head   # --> 42
5 p constant.tail        # --> #<Lab419::SimpleStream::Stream:0xb7d68424
# @tail=#<Proc:0xb7d6867c@diving-into-streams-01.rb:2>, @head=42>


It is quite remarkable, that we can use constant inside the promise ( which is implemented as the code block, remember? ) without any problems [line #2]. No surprises in lines 3 & 4. And if we remember the second invariant line #5 is not going to surprise us either. Furthermore we can already see some of the internal representation of streams. As mentioned before it is a plain vanilla Ruby class, that exposes an Object Oriented interface to streams as well.

The aforementioned recursiveness of streams can be seen in the next example again.
1 include Lab419::SimpleStream
2 def integers from=0
3   cons_stream( from ){ integers( from + 1 ) }
4 end
5
6 i = integers
7 p i.head             # --> 0
8 p i.tail.tail.head   # --> 2

As soon as we have understood the lazy nature of the promise the recursive call to integers does not trouble us any more. You can compute the following without any problems:
1 42_420.times do
2   i = i.tail
3 end
4 p i.head


One of the big advantages of using streams is that we can exploit these cheap recursions of streams to formulate elegant recursive procedures without the danger of stack overflows and even those who could not be tail call optimized.

As an example let us consider the Fibonacci numbers as a stream. Then a natural definition of such a stream, let us call it fibo would be:


fibo is a stream for which the head is 0.
fibo is also a stream for which the head of its tail is 1.
fibo is eventually a stream for which the head of the tail of the tail is the
sum of the head and the head of the tail.


We will try now to translate this into Ruby, and we will see that this can be done in an analogous way, without any effort:
 1 include Lab419::SimpleStream
 2
 3 def add_streams lhs, rhs
 4   cons_stream( lhs.head + rhs.head ){
 5     add_streams( lhs.tail, rhs.tail )
 6   }
 7 end
 8
 9 def fibo
10   cons_stream( 0 ) {
11     cons_stream( 1 ){
12       add_streams( fibo, fibo.tail )
13     }
14   }
15 end
16
17 p fibo[0..6]

Running this program delivers

512/31 > ruby -rlab419/functional/streams diving-into-streams-03.rb
[0, 1, 1, 2, 3, 5, 8]


Please note, that this implementation of add_streams does not consider the possibility of finite streams, we will come to that later.
Another important point is that we can consider streams as Enumerables very easily too.

Actually this is about all, yes the power of streams has much to do with their simplicity. Everything else I will talk about now will not introduce anything new. It will however show some of the conveniences built into the implementation, and show us how to handle Finite Streams. It is somehow funny that they are the special case while infinite streams are the normal case.




Back to TOC
Ruby Meets Streams

Finite Streams
Finite Streams are just a special case of Infinite Streams. A mathematical definition of finite streams might look like the following:

A finite is a stream for which exists a promise delivering an empty stream.
An empty stream is defined as a stream for which the tail delivers an empty stream
and for which the behavior of head remains undefined.

Following this definition and having already an EmptyStream at our disposal it is straightforward to define a finite stream from a given number of objects:
1 def finite_stream *elements
2   return EmptyStream if elements.empty?
3   cons_stream( elements.shift ) { finite_stream( *elements  ) }
4 end


But as infinity is such an important concept for streams, we need a way to make finite streams infinite, the most logical approach is to cycle through them. Such a task is of course almost frivolously easy:
1 include Lab419::SimpleStream
2
3 def cyclic_stream elements, current=0
4   cons_stream( elements[current] ) { cyclic_stream( elements, current.succ % elements.size ) }
5 end
6
7 primes = cyclic_stream [ 2, 3, 5, 7 ]
8
9 p primes[0..9]


Ruby Objects as Streams
The implementation of streams in Labrador (branch lab419-functional-0.1 ) is quite straightforward and provides a functional interface as implemented in module Lab419::SimpleStream or an object oriented interface.
Please note that you can use all enumerables as streams by means of the #to_stream method, while String implements a #to_byte_stream method too.




Back to TOC
Streams' Applications

Without this section of this Blog entry one might be thinking that streams are a neat concept but that would be about all.

Abelson shows a very elegant solution of the N Queens problem by using streams. I will settle here for something less spectacular.

Streams as a Lexer
The name stream somehow induces a typical usage of streams. IOStreams can very
well be implemented as streams. The following example implements a very simple lexer that works as a stream.

 1 
 2 class String
 3   def next_token *tokens
 4     tokens.each do | token |
 5       if /^#{token}/ === self then
 6         return [ $1 || $& , self.sub( /^#{token}/, "" ) ]
 7       end
 8     end
 9   end
10   [ nil, nil ]
11 end
12
13 def lexer_stream io, line="", *tokens
14   line = io.readline.chomp if line.empty?
15   token, line = line.next_token( *tokens )
16   cons_stream( token ){ lexer_stream( io, line, *tokens ) }
17 rescue EOFError
18   EmptyStream
19 end
20
21 lexer = lexer_stream( $stdin, "", /\s*(\w+)/, /\s*(\d+)/, /\s*(.)/ )
22
23 until lexer.empty? do
24   puts lexer.head
25   lexer = lexer.tail
26 end
27


First we define a get_token helper in class String. This does not modifie its receiver but returns the parsed string and the rest of the line. It is really just a toy to test the stream part of the example.

Things become a little bit more interesting at line #13. A lexer_stream is constructed by getting the first token from the input (io) which is line buffered by means of line and defining the tail as a lexer_stream with the same io, the same tokens and the new line buffer. Simple?

And it works like charm.

In parsing it is sometimes convenient to implement a certain lookahead. Imagine how elegantly a parser can be written if it has the possibility to push tokens back on the input stream. With a stream based lexer this is a one-liner.

lexer = cons_stream( pushbacktoken ){ lexer }

So we will have
 1 
 2 class String
 3   def next_token *tokens
 4     tokens.each do | token |
 5       if /^#{token}/ === self then
 6         return [ $1 || $& , self.sub( /^#{token}/, "" ) ]
 7       end
 8     end
 9   end
10   [ nil, nil ]
11 end
12
13 def lexer_stream io, line="", *tokens
14   line = io.readline.chomp if line.empty?
15   token, line = line.next_token( *tokens )
16   cons_stream( token ){ lexer_stream( io, line, *tokens ) }
17 rescue EOFError
18   EmptyStream
19 end
20
21 def push_back a_stream, a_token
22   cons_stream( a_token ){ a_stream }
23 end
24
25 lexer = lexer_stream( $stdin, "", /\s*(\w+)/, /\s*(\d+)/, /\s*(.)/ )
26
27 until lexer.empty? do
28   if lexer.head == "break" then
29     lexer = push_back( "break", lexer )
30     puts "returning #{lexer.inspect}"
31     break
32   end
33
34   puts lexer.head
35   lexer = lexer.tail
36 end
37




That is about all, please check out the project at Rubyforge and thanx for your attention.



Update
thanks to Brian Candler

Brian pointed out correctly that Ruby Enumerators are lazy too in Ruby1.9, and almost lazy in 1.8. He made a nice contribution to Facets where he made them lazy in Ruby1.8 too.
Please check it out here on Rubyforge.

I knew actually that Enumerables and Enumerators could be lazy, but I ignored that Ruby1.9 had a new syntax which really encourages this kind of constructs.

1 x = Enumerator::new do | enator | 
2   loop do
3     enator.yield rand
4   end
5 end
6
7 p x.next
8 p x.next
9 p x.take( 10 )


As a matter of fact Generators, Fibers and Enumerators in 1.9 are very, very promising.

2 comments:

robert said...

Your article inspired my to this streams for starters:
class Object
  Stream = Struct.new :head, :promise do
    def tail
      self.class.new(promise[head], promise)
    end
  end

  def to_cons(&promise)
    Stream.new(self, promise)
  end
end

integers = 1.to_cons {|n| n + 1}

3.times do
  p integers.head
  integers = integers.tail
end
:-)

Robert Dober said...

This is very nice, indeed I ask myself why I confuse all those Ruby programmers with my functional style ;)