Wednesday, July 29, 2009

How to module_eval

In my whole Ruby Metaprogrammer life I have never used any *_eval string.
This was a deliberate choice I made and I still believe that it was reasonable in the sense that it suits my style. Personally I was able to structure and refactor my metacode much better, well, I thought...

Now I am working on code that shall be used as a library, closely related to core, thus, hopefully widely used. I am not a fan of premature optimization. I often use methods like Enumerable#inject for there elegance not worrying at all about the speed.
This is the exception
But then there is this of course:

Monday, July 27, 2009

Why I use Verify

I was just "specing" up my Thread Safe Wrappers when I stumbled about methods on Array I had never seen before.Eg. #taguri=. Realizing that it ( and some others ) were introduced by rspec I had a choice between a rock and a hard place: Either not using introspection in my definitions of methods in the Array wrapper (and probably the same for Hash, String and friends), or deleting those methods manually.
The first choice is just impossible because it would mean lots and lots of typing, which is just unacceptable, and the second choice would mean that there would be code in the production code which is there for the sole purpose of not implementing methods from the Rspec framework.

It occurred to me than that Verify has been written for that exact reason by YHS. Not to step on the toes of the testee!

I immediately started to rewrite my *_spec.rb files into *_verify.rb files (not that this convention is enforced by Verify but it seems good practice ;). And it was just then that I realized I did not know my API (as lean as it is) by heart and I had to go into the examples folder of my GIT repository to find it.

That is unacceptable, we need this stuff on the web, right?

Here we go then:

 7 require 'mockify'
 8 require 'verify'
10 Verify "Some Important Facts" do
11   verify "42 is true" do
12     42
13   end
14   refute "42 is pretty big" do
15     42 > 100_000
16   end
17   verify_exceptions NoMethodError do
18     42.unrelinguished!
19   end
20 end
22 Verify "With parameters" do
23   verify msg: "Everything is well", target: 42, actual: 21 * 2
24   refute msg: "But not too much",   target: 42, actual: 41
25 end
27 Verify "Capturing stdout" do
28   out = with_output do | o |
29     puts 42
30     verify target: "42\n", actual: o.string
31     puts
32     verify target: "42\n\n", actual: o.string
33   end
34   verify msg: "with_output converted out to an array of lines",
35     actual: &:chomp ),
36     target: [ "42", "" ]
37 end
39 Verify "Providing stdin" do
40   with_input %w{The Quick Brown Fox} do
41     verify %{first line is "The"} do
42       "The" == gets.chomp
43     end
44   end
45 end

Monday, January 19, 2009

Methodless Ruby

In an earlier post I have shown how Ruby can be used (or abused) as a language implementing different object models than the built in model. Ruby's built in model is Single Inheritance and Module Mixin as probably everybody reading this knows. If you have stumbled upon this blog and do not know Ruby's object model you might want to read about it first. E.g. in the famous Pickaxe Book.

The last time I was exploring object models I blogged about "Classless Ruby".
I was however still making use of modules and methods.

Today I will get rid of them too, well I will still write some Core class methods but once that is done there will be no more classes, no more modules, no more methods and no more instance variables.

The outcome will resemble prototypes, but the beauty of the whole, the beauty of Ruby, is that even that is just my implementation and different developement paradigms could be explored almost as easily.

'nuff ta'king for now...


Where To Put The Behavior?
A very revolutionary idea stolen from Perl 5! (five not 120)

But What Is Behavior, Now?
Hashes might still store proc objects for different purposes though.

Fixing Some Problems
Some of the shortcomings discovered so far can be fixed...

Quinctilie Vare, legiones hash redde![1]
I am much luckier than Quinctilius Varus, I can give the hash back.

Being lazy I want mixins an that kind of stuff, right!

What did I actually do? Where shall I go?

Back to TOC
Where To Put Behavior?

I did not really think a long time about this, and I do not believe that there are many alternatives with the given constraints. Hashes really should do the job nicely.
They allow us to store the data and the behavior together ( we are not abandoning the encapsulation principle yet ). As I mentioned above that is exactly what Larry Wall did in Perl 5 and that allowed for some nice object oriented programs after all.

A straightforward shot at this might look like the following
 1 book = {
 2   :author => 'Mark Twain',
 3   :date   => 2364,
 4   :title  => 'Tom Sawyer meets Data',
 5   :image  => proc{ | myself |
 6     puts %<"#{myself[:title]}" by #{myself[:author]} was published in #{myself[:date]}>
 7 }
 8 }
10 book[:image].call( book )

When we look at line #10 however we are not very happy, are we? Well we should not. There are two redundancies involved. Firstly we have to pass the receiver into the proc object again, and secondly we have to call the proc object recursively. We will get rid of this nuisance immediately.

1 class Hash
2   alias_method :__old_get__, :[]
3   def [] name, *args
4     value = __old_get__( name )
5     return value unless value && value.is_a? Proc
6 self, *args )
7   end
8 end

and now this works

1 book[:image] # --> "Tom Sawyer meets Data" by Mark Twain was published in 2364

Back to TOC
But What Is Behavior, Now?

We are very, very happy with our design, we can pass arguments into our new [] call syntax as the following example shows:
 1 book = {
 2   :author => 'Mark Twain',
 3   :date   => 2364,
 4   :title  => 'Tom Sawyer meets Data',
 5   :image  => proc{ | myself |
 6     puts %<"#{myself[:title]}" by #{myself[:author]} was published in #{myself[:date]}
 7     annotations: #{(myself[:annotations]||[]).join(", ")}>
 8   },
 9   :add_annotations => proc{ |myself, *annotations|
10     myself[ :annotations ] ||= []
11     myself[ :annotations ] += annotations
12   }
14 }
16 book[:add_annotations, "Edited by J-L.Picard", "Reviewed by William T. Riker"]
17 book[:image]

and we get the hoped for

"Tom Sawyer meets Data" by Mark Twain was published in 2364
annotations: Edited by J-L.Picard, Reviewed by William T. Riker

Thus I am happy again, and often when I am happy, there is something really, really wrong. One of the things that are wrong became apparent when I needed to store a proc object in the hash and that proc object was not implementing behavior of the object.
1 book[:adder] = proc{ | a, b | a + b }
2 a = book[:adder]

When running this I got of course an error

method-less-4.rb:34:in `block in
': undefined method `+' for # (NoMethodError)
from method-less-4.rb:10:in `call'
from method-less-4.rb:10:in `[]'
from method-less-4.rb:35:in `

Back to TOC
Fixing Some Problems

So far we have successfully bound behavior to data with procs. We have however introduced an ambiguity between Behavior members and Proc members in our objects.
Furthermore we have not yet introduced a generic object creation mechanism. No, dup will not suffice, but there is not needed much more.
Worse though, there is no object initialization mechanism either. We will fix these three issues now and then review what we have got, again.

Behavior is not Proc
Subclassing Proc shall give us the information we need to do the right thing when accessing an object's member. Furthermore a nice subclass factory method, e.g. Kernel#behavior will provide more readable code.

 2 Behavior = Class::new Proc
 4 module Kernel
 5   def behavior &blk
 6     Behavior::new( &blk )
 7   end
 8 end

In lines 1 to 8 we just subclass Proc and define a Kernel method that wraps the constructor call of Behavior which is the inherited one.
10 class Hash
11   alias_method :__old_get__, :[]
12   def [] name, *args
13     value = __old_get__( name )
14     return value unless value && value.is_a?( Behavior )
15 self, *args )
16   end
17 end

Thanks to our little sub-classing game before we can dispatch on Behavior now. By specifying Behavior with the behavior method, and Proces with the proc method (see lines 23, 27 & 31) we implement the desired behavior ( no pun intended ).

19 book = {
20   :author => 'Mark Twain',
21   :date   => 2364,
22   :title  => 'Tom Sawyer meets Data',
23   :image  => behavior{ | myself |
24     puts %<"#{myself[:title]}" by #{myself[:author]} was published in #{myself[:date]}
25     annotations: #{(myself[:annotations]||[]).join(", ")}>
26   },
27   :add_annotations => behavior{ |myself, *annotations|
28     myself[ :annotations ] ||= []
29     myself[ :annotations ] += annotations
30   },
31   :adder => proc{ | a, b | a + b }
32 }
34 book[:add_annotations, "Edited by J-L.Picard", "Reviewed by William T. Riker"]
35 book[:image]
38 book[:adder] = proc{ | a, b | a + b }
39 a = book[:adder]
40 p 40, 2 )

and we get

"Tom Sawyer meets Data" by Mark Twain was published in 2364
annotations: Edited by J-L.Picard, Reviewed by William T. Riker

Constructing and Initializing Objects
In order to construct objects we have to make sure that they do not share any data with the prototype. This holds even more as, in this kind of prototype programming any object can be used as the receiver of the construction message.
1 class Hash
2   def new initial_values = {}
3     dup
4     initial_values.inject( dup ){ | o, (k, v) |
5       o.update k => ( v.dup rescue v )
6     }
7   end
8 end

In the code above we just define a new method for a hash. N.B. This is an instance method and must not be confused with Hashes class method. Maybe a different name would have been in order, I honestly do not know. We duplicate the receiver and inject the initial values hash. That can than be used as follows.
 1 book = {
 2   :image  => behavior{ | myself |
 3     puts %<"#{myself[:title]}" by #{myself[:author]} was published in #{myself[:date]}
 4     annotations: #{(myself[:annotations]||[]).join(", ")}>
 5   },
 6   :add_annotations => behavior{ |myself, *annotations|
 7     myself[ :annotations ] ||= []
 8     myself[ :annotations ] += annotations
 9   }
10 }
12 i_robot = :title => "I Robot", :author => "Isaac Asimov", :date => 1950 )
13 pickaxe  = :title => "Programming Ruby 1.9", :author => "Dave Thomas", :date => "2009" )
14 pickaxe[:add_annotations, "Live Long and Ruby!", "and there was Andy Hunt", "... and Chad Fowler" ]
16 i_robot[ :image ]
17 pickaxe[ :image ]

which duly produces the following output:

"I Robot" by Isaac Asimov was published in 1950
"Programming Ruby 1.9" by Dave Thomas was published in 2009
annotations: Live Long and Ruby!, and there was Andy Hunt, ... and Chad Fowler

Please also note that we could have written pickaxe = in line #13 above, without any change in the semantics of the program.

Did I say Prototype?

I have used the term prototype rather nonchalantly so far. Maybe it is time to define a bit better what I understand under this term in this context.

A prototype is an object oriented model in which the behavior of any object can be shared by using this very object as a prototype. This is a little more general than e.g. the Javascript prototype model in which each object simply has a prototype and this prototype cannot be used as a member of the set of objects it defines.
In our case however that is perfectly possible.
That is the reason why the above remark about line #13 holds: Either book or i_robot can be used as a prototype, although they are perfectly members of the set of objects defined by the prototype.

Although I find it very amusing, amazing and instructive what I have done here I have to get my feet back on the ground again.

This design suffers from some serious drawbacks.

  • All members are accessible.

  • Calling behavior with blocks has a clumsy syntax (namely send :[],... do end).

  • Name Errors (e.g. behavior names misspelled ) are completely obfuscated.

  • There is one only constructor, it does not allow for different behavior depending on the prototype.

  • Last but not least, we have monopolized hashes. And that of course is completely inacceptable.

The public accessibility of all members is somehow not really an issue in some application contexts. The clumsy syntax can be worked around and probably the naming problem can be resolved by a different implementation of Hash#[] e.g. using #fetch.
However, the last two points are inacceptable.

Back to TOC
Quinctilie Vare, legiones hash redde!

Our first duty to the emperor Ruby is to give her her hashes back. I have lost them in the quest for new territory for the Ruby empire but luckily for me I can give them back (for a - however very little - prize ).

The solution of course is not to use instances of Hash as our Object -- Behavior glue. Well actually we still will use hashes, but specialized hashes, a lession learned from our Behavior class. Well let's do it:
 2 class BHash < Hash
 3   def [] name, *args
 4     value = super( name )
 5     return value unless value && value.is_a?( Behavior )
 6 self, *args )
 7   end
 8   def new *args
 9     dup.tap do | o |
10       o[:init, *args] if o.fetch( :init, nil ).is_a? Behavior 
11     end
12   end
13 end

First we have defined our subclass of Hash called BHash (for Behavior Hash). We can find the redefinition of the #[] method again and the generic constructor. Thus we have freed the core Hash class from all our achievements. Please do note that some other issues could be handled here easily now, as e.g. raising a NameError for access to undefined keys. This would make the code even longer though and not be of much technical interest here.

15 class Hash
16   def to_bhash
17     BHash::new.update self
18   end
19 end
21 module Kernel
22   def object a_hash
23     a_hash.to_bhash
24   end
25 end

In the next block of code we have handled the issue I had mentioned above, the small prize to pay for our sub-classing. This prize might even be a blessing in disguise because it frees the hash literal syntax, well for, hash literals.
Thus first I defined a convenient conversion method, that creates a BHash out from a Hash and then I provided a wrapper method in Kernel called object. This changes the look of file of our object definitions a little bit.

27 book = object :image  => behavior{ | myself |
28                 puts %<"#{myself[:title]}" by #{myself[:author]} was published in #{myself[:date]}
29     annotations: #{(myself[:annotations]||[]).join(", ")}>
30     },
31     :add_annotations => behavior{ | myself, *annotations |
32       myself[ :annotations ] ||= []
33       myself[ :annotations ] += annotations
34     },
35     :init => behavior{ | myself, title, others={} |
36       myself[ :title ] = title
37       myself.update others
38     }
41 gray = "The Picture of Dorian Gray", :author => "Oscar Wilde", :date => 1890 )
42 pickaxe  = "Programming Ruby 1.9", :author => "Dave Thomas", :date => 2009 )
43 pickaxe[:add_annotations, "Live Long and Ruby!", "and there was Andy Hunt", "... and Chad Fowler" ]
45 gray[ :image ]
46 pickaxe[ :image ]

And this code is satisfying our conditions above and produces the expected output:

"The Picture of Dorian Gray" by Oscar Wilde was published in 1890
"Programming Ruby 1.9" by Dave Thomas was published in 2009
annotations: Live Long and Ruby!, and there was Andy Hunt, ... and Chad Fowler

 1 Behavior = Class::new Proc
 3 module Kernel
 4   def behavior &blk
 5     Behavior::new( &blk )
 6   end
 7   def object a_hash
 8     a_hash.to_bhash
 9   end
10 end
12 class BHash < Hash
13   def [] name, *args
14     value = super( name )
15     return value unless value && value.is_a?( Behavior )
16 self, *args )
17   end
18   def new *args
19     dup.tap do | o |
20       o[:init, *args] if o.fetch( :init, nil ).is_a? Behavior 
21     end
22   end
23 end
25 class Hash
26   def to_bhash
27     BHash::new.update self
28   end
29 end
30 book = object :image  => behavior{ | myself |
31                 puts %<"#{myself[:title]}" by #{myself[:author]} was published in #{myself[:date]}
32     annotations: #{(myself[:annotations]||[]).join(", ")}>
33     },
34     :add_annotations => behavior{ | myself, *annotations |
35       myself[ :annotations ] ||= []
36       myself[ :annotations ] += annotations
37     },
38     :init => behavior{ | myself, title, others={} |
39       myself[ :title ] = title
40       myself.update others
41     }
44 gray = "The Picture of Dorian Gray", :author => "Oscar Wilde", :date => 1890 )
45 pickaxe  = "Programming Ruby 1.9", :author => "Dave Thomas", :date => 2009 )
46 pickaxe[:add_annotations, "Live Long and Ruby!", "and there was Andy Hunt", "... and Chad Fowler" ]
48 gray[ :image ]
49 pickaxe[ :image ]

Back to TOC
Code Reuse

Well we are all lazy, it's almost a hype ;). Seriously there is one feature we might miss at first sight, that of mixins. On second sight of course and knowing that we use a subclass of Hash as our behavior and state container we see that we can simply update the container with different behavior. A naive, but perfectly working approach would be to do the following:

1 class BHash < Hash
2   def add_behavior behavior
3     abort "Not a behavior #{behavior}" unless 
4       behavior.is_a? self.class  # a tailormade exception would be raised in production code, of course
5     update behavior
6   end
7 end

Given this primitive wrapper around Hash#update we can now extend our prototypes / objects as in the following code:
 1 isbn = object :set_isbn => behavior { |myself, isbn | myself[ :isbn ] = myself[ :add_isdn_check, isbn ] },
 2               :add_isdn_check => behavior { | _, isbn | 
 3                 # begin of boring cksum code
 4                 isbn + "*"
 5                 # end of boring chksum code
 6               }
 8 book.add_behavior isbn
 9 hegel = "Hegel's Philosophy of Reality", :author => "Robert M. Wallace", :date => 2005 )
10 hegel[ :set_isbn, "521844843" ]
11 hegel[ :image ]
12 puts hegel[ :isbn ]


"Hegel's Philosophy of Reality" by Robert M. Wallace was published in 2005

Back to TOC

I have now implemented the basic devices needed for prototype based programming, and I am asking myself, What now?. As a matter of fact I am looking back on the code I have written I am missing some properties badly:

  • Call syntax is clumsy for blocks.

  • Code reuse does not allow access to overridden behavior.

  • Data and behavior are mixed.

  • Data and behavior is public.

Now I am sure, that all of these matters can be fixed, but shall this be done with hashes as the basic device?

I am really not sure and there is just one way to find out, implementing some of it and playing around with it...

And that is what I will be doing now. Thanks for having stepped by.

Wednesday, January 7, 2009

Streams, Lazy Programs for Lazy Programmers.

Streams are nothing new at all. Furthermore they have been introduced at least once into the Ruby World on James' excellent blog, Shades of Gray, with which many of the Ruby community are familiar, at least I hope ;). I can only encourage you to read the whole High Order Ruby series. For those looking for more reading on this topic the following link provides a gold mine; High Order Perl.

Although I had read James' blog entry about two years ago and was fascinated by it, I somehow never had the time to play around with streams. It took a second motivation which I found in a remarkable site: Structure and Interpretation of Computer Programs, Video Lecture. This was an absolute treasure for me. Looking at Abelson's presentation of Streams I really wanted to have that in Ruby.

For those who do not like my blah, blah, and that is probably about everybody save YHS, but ok it is my Blog after all, I have compiled a small TOC so that they do not have to read the dull stuff while I can still write it, yeah!


Streams in Ruby
A verbose introduction why, how, when I did what and why, how and when I did not do other things.

Diving into Streams
A step by step introduction into streams and their implementation in Ruby.

Ruby Meets Streams
Enumerable Ruby Objects can easily be seen as streams too.

Streams' Applications
And yes of course they do serve.

Back to TOC
Streams in Ruby

A stream is a lazy, potentially endless data structure. In Scheme a stream is constructed with the cons-stream special form. As an example Abelson constructs the stream of all integers as follows:
1 (define (integers-from start)
2 (cons-stream start (integers-from (1+ start) ) )
3 )
4 (define (integers)
5 (integers-from 0)
6 )

As such the cons-stream form constructs a stream by creating a Lisp cons cell with a car that consists of an already computed value, called head and a cdr that contains a promise to compute a new head and a new promise.

A practical side note: This was the most difficult thing to remember for me. The concept seems simple enough, but whenever I had a problem in my code I had forgotten that the promise has to return a new stream and not just a new head. For me the following two invariants on streams were the absolute cornerstone for understanding them:
  • While constructing a stream the tail is a promise and thus not evaluated.
  • The tail of a stream will always yield a stream again, potentially the empty stream.
Once these two rules sank into my brain it was amazing how easy stream manipulation can become.

Given the first of the two invariants above it is quite clear how to specify a
promise in Ruby, with code blocks.
We can therefore allready imagine how a cons_stream method will look like in Ruby.
1 def cons_stream head, &tail
2 # . . .
3 end
The result of the cons_stream call of course is an object implementing Streams in Ruby. This can easily be verified on the command line:
ruby -rlab419/functional/streams -e 'include Lab419::SimpleStream;p cons_stream( 42 ){ EmptyStream }'
#<Lab419::SimpleStream::Stream:0xb7d85664 @tail=#<Proc:0xb7d8572c@-e:1>, @head=42>

We can very well see that the promise is stored as a proc object and the head as an actual value.

Here however I will take a different path then James took, and I encourage you (again) to read his blog if you prefer the more idiomatic Ruby way to look at streams. I however really wanted
to emphasize on the abstraction streams constitute. And although it is wonderful how easily this can be done in Ruby, I believe that the stream paradigm itself merits to be looked at
at a more abstract level.
This said, my implementation gives you the best of the two worlds of course. The functional layer being only a wrapper around the Ruby Object layer you can use both at will.

Back to TOC
Diving into Streams

Let us have some fun now and play with streams:
1 include Lab419::SimpleStream
2 constant = cons_stream( 42 ){ constant }
3 p constant.head        # --> 42
4 p constant.tail.head   # --> 42
5 p constant.tail        # --> #<Lab419::SimpleStream::Stream:0xb7d68424
# @tail=#<Proc:0xb7d6867c@diving-into-streams-01.rb:2>, @head=42>

It is quite remarkable, that we can use constant inside the promise ( which is implemented as the code block, remember? ) without any problems [line #2]. No surprises in lines 3 & 4. And if we remember the second invariant line #5 is not going to surprise us either. Furthermore we can already see some of the internal representation of streams. As mentioned before it is a plain vanilla Ruby class, that exposes an Object Oriented interface to streams as well.

The aforementioned recursiveness of streams can be seen in the next example again.
1 include Lab419::SimpleStream
2 def integers from=0
3   cons_stream( from ){ integers( from + 1 ) }
4 end
6 i = integers
7 p i.head             # --> 0
8 p i.tail.tail.head   # --> 2

As soon as we have understood the lazy nature of the promise the recursive call to integers does not trouble us any more. You can compute the following without any problems:
1 42_420.times do
2   i = i.tail
3 end
4 p i.head

One of the big advantages of using streams is that we can exploit these cheap recursions of streams to formulate elegant recursive procedures without the danger of stack overflows and even those who could not be tail call optimized.

As an example let us consider the Fibonacci numbers as a stream. Then a natural definition of such a stream, let us call it fibo would be:

fibo is a stream for which the head is 0.
fibo is also a stream for which the head of its tail is 1.
fibo is eventually a stream for which the head of the tail of the tail is the
sum of the head and the head of the tail.

We will try now to translate this into Ruby, and we will see that this can be done in an analogous way, without any effort:
 1 include Lab419::SimpleStream
 3 def add_streams lhs, rhs
 4   cons_stream( lhs.head + rhs.head ){
 5     add_streams( lhs.tail, rhs.tail )
 6   }
 7 end
 9 def fibo
10   cons_stream( 0 ) {
11     cons_stream( 1 ){
12       add_streams( fibo, fibo.tail )
13     }
14   }
15 end
17 p fibo[0..6]

Running this program delivers

512/31 > ruby -rlab419/functional/streams diving-into-streams-03.rb
[0, 1, 1, 2, 3, 5, 8]

Please note, that this implementation of add_streams does not consider the possibility of finite streams, we will come to that later.
Another important point is that we can consider streams as Enumerables very easily too.

Actually this is about all, yes the power of streams has much to do with their simplicity. Everything else I will talk about now will not introduce anything new. It will however show some of the conveniences built into the implementation, and show us how to handle Finite Streams. It is somehow funny that they are the special case while infinite streams are the normal case.

Back to TOC
Ruby Meets Streams

Finite Streams
Finite Streams are just a special case of Infinite Streams. A mathematical definition of finite streams might look like the following:

A finite is a stream for which exists a promise delivering an empty stream.
An empty stream is defined as a stream for which the tail delivers an empty stream
and for which the behavior of head remains undefined.

Following this definition and having already an EmptyStream at our disposal it is straightforward to define a finite stream from a given number of objects:
1 def finite_stream *elements
2   return EmptyStream if elements.empty?
3   cons_stream( elements.shift ) { finite_stream( *elements  ) }
4 end

But as infinity is such an important concept for streams, we need a way to make finite streams infinite, the most logical approach is to cycle through them. Such a task is of course almost frivolously easy:
1 include Lab419::SimpleStream
3 def cyclic_stream elements, current=0
4   cons_stream( elements[current] ) { cyclic_stream( elements, current.succ % elements.size ) }
5 end
7 primes = cyclic_stream [ 2, 3, 5, 7 ]
9 p primes[0..9]

Ruby Objects as Streams
The implementation of streams in Labrador (branch lab419-functional-0.1 ) is quite straightforward and provides a functional interface as implemented in module Lab419::SimpleStream or an object oriented interface.
Please note that you can use all enumerables as streams by means of the #to_stream method, while String implements a #to_byte_stream method too.

Back to TOC
Streams' Applications

Without this section of this Blog entry one might be thinking that streams are a neat concept but that would be about all.

Abelson shows a very elegant solution of the N Queens problem by using streams. I will settle here for something less spectacular.

Streams as a Lexer
The name stream somehow induces a typical usage of streams. IOStreams can very
well be implemented as streams. The following example implements a very simple lexer that works as a stream.

 2 class String
 3   def next_token *tokens
 4     tokens.each do | token |
 5       if /^#{token}/ === self then
 6         return [ $1 || $& , self.sub( /^#{token}/, "" ) ]
 7       end
 8     end
 9   end
10   [ nil, nil ]
11 end
13 def lexer_stream io, line="", *tokens
14   line = io.readline.chomp if line.empty?
15   token, line = line.next_token( *tokens )
16   cons_stream( token ){ lexer_stream( io, line, *tokens ) }
17 rescue EOFError
18   EmptyStream
19 end
21 lexer = lexer_stream( $stdin, "", /\s*(\w+)/, /\s*(\d+)/, /\s*(.)/ )
23 until lexer.empty? do
24   puts lexer.head
25   lexer = lexer.tail
26 end

First we define a get_token helper in class String. This does not modifie its receiver but returns the parsed string and the rest of the line. It is really just a toy to test the stream part of the example.

Things become a little bit more interesting at line #13. A lexer_stream is constructed by getting the first token from the input (io) which is line buffered by means of line and defining the tail as a lexer_stream with the same io, the same tokens and the new line buffer. Simple?

And it works like charm.

In parsing it is sometimes convenient to implement a certain lookahead. Imagine how elegantly a parser can be written if it has the possibility to push tokens back on the input stream. With a stream based lexer this is a one-liner.

lexer = cons_stream( pushbacktoken ){ lexer }

So we will have
 2 class String
 3   def next_token *tokens
 4     tokens.each do | token |
 5       if /^#{token}/ === self then
 6         return [ $1 || $& , self.sub( /^#{token}/, "" ) ]
 7       end
 8     end
 9   end
10   [ nil, nil ]
11 end
13 def lexer_stream io, line="", *tokens
14   line = io.readline.chomp if line.empty?
15   token, line = line.next_token( *tokens )
16   cons_stream( token ){ lexer_stream( io, line, *tokens ) }
17 rescue EOFError
18   EmptyStream
19 end
21 def push_back a_stream, a_token
22   cons_stream( a_token ){ a_stream }
23 end
25 lexer = lexer_stream( $stdin, "", /\s*(\w+)/, /\s*(\d+)/, /\s*(.)/ )
27 until lexer.empty? do
28   if lexer.head == "break" then
29     lexer = push_back( "break", lexer )
30     puts "returning #{lexer.inspect}"
31     break
32   end
34   puts lexer.head
35   lexer = lexer.tail
36 end

That is about all, please check out the project at Rubyforge and thanx for your attention.

thanks to Brian Candler

Brian pointed out correctly that Ruby Enumerators are lazy too in Ruby1.9, and almost lazy in 1.8. He made a nice contribution to Facets where he made them lazy in Ruby1.8 too.
Please check it out here on Rubyforge.

I knew actually that Enumerables and Enumerators could be lazy, but I ignored that Ruby1.9 had a new syntax which really encourages this kind of constructs.

1 x = Enumerator::new do | enator | 
2   loop do
3     enator.yield rand
4   end
5 end
7 p
8 p
9 p x.take( 10 )

As a matter of fact Generators, Fibers and Enumerators in 1.9 are very, very promising.