Syntax highlighting without breaking HTML

Ruby on Rails provides the very nice helper method highlight to identify search terms within a string. We’ve been using it quite a bit on a large project but recently began to notice that it broke the HTML in certain places.

It turned out that our problem was that our search term was showing up within an attribute on a tag, and so highlight, oblivious to the content of the text it was parsing, was inserting tags in the middle of attributes, and all kinds of craziness was ensuing that just wouldn’t do.

A little time with google led me to Enhance Usability by Highlighting Search Terms, an article that appeared on alistapart a couple of years ago. In it Matt Riggott and Brian Suda explain how they used regular expressions in PHP to get around precisely this problem. I downloaded the code and set about converting it to ruby. Mine doesn’t do nearly as much as theirs, which includes tools to extract search terms from various search engines. Ours only needs to work with our internal search tool and so it was as simple as:

 def highlight_in_html(content, term)
   regexp = /(< [\/\!]*?[^]*?>)([^' + content + '</p>'

   matches = content.scan(get_tags)
   content = matches.collect { |a,b| [a, highlight(b, term)] }.flatten.join

It’s not thoroughly tested as yet, but seems to do the job.

Tags: ,

Comments are closed.