html2text function in Ruby

Whilst adding text-email feature in RssFwd (Ruby on Rails app), I had some problems looking for a html2text ruby library that'll do what I need - simple, clear text output, so I wrote my own.

So, in case somebody else is looking for this little function (but irritating to be missing) here's my shot at getting a similar-to-lynx-dump kind of output:

The code:

require 'cgi'

def html2text html
text = html.
    gsub(/( |\n|\s)+/im, ' ').squeeze(' ').strip.
    gsub(/<([^\s]+)[^>]*(src|href)=\s*(.?)([^>\s]*)\3[^>]*>\4<\/\1>/i, '\4')

links = []
linkregex = /<[^>]*(src|href)=\s*(.?)([^>\s]*)\2[^>]*>\s*/i
while linkregex.match(text)
    links << $~[3]
    text.sub!(linkregex, "[#{links.size}]")
end

text = CGI.unescapeHTML(
    text.
      gsub(/<(script|style)[^>]*>.*<\/\1>/im, '').
      gsub(//m, '').
      gsub(/<hr(| [^>]*)>/i, "___\n").
      gsub(/<li(| [^>]*)>/i, "\n* ").
      gsub(/<blockquote(| [^>]*)>/i, '> ').
      gsub(/<(br)(| [^>]*)>/i, "\n").
      gsub(/<(\/h[\d]+|p)(| [^>]*)>/i, "\n\n").
      gsub(/<[^>]*>/, '')
).lstrip.gsub(/\n[ ]+/, "\n") + "\n"

for i in (0...links.size).to_a
    text = text + "\n [#{i+1}] <#{CGI.unescapeHTML(links[i])}>" unless links[i].nil?
end
links = nil
text
end

Sample html input string:

<h1>Title</h1>
This is the body. Testing <a href="http://www.google.com/">link to Google</a>.<p />
Testing image <img src="/noimage.png">.<br />
The End.

The generated output string:

Title

This is the body. Testing [1]link to Google.

Testing image [2].
The End.

[1] <http://www.google.com/>
[2] </noimage.png>

Comments are welcome.

Note:

The <> around the list of links at the bottom? they supposedly helps more arcanic e-mail programs understand that they should not be broken up into multiple lines.

The HTML entities aren't fully converted by CGI.unescapeHTML(e.g. —) if there's a better method to use, lemme know