In a lengthy blog post detailing many of the intricacies and some of the politics relating to character encodings in Ruby, Yehuda Katz has a few paragraphs that left me more than a little excited:
The most common scenario where you can see this issue is when the user pastes in content from Microsoft Word, and it makes it into the database and back out again as gibberish.
After a lot of research, I have discovered several hacks that, together, should completely solve this problem. I am still testing the solution, but I believe we should be able to completely solve this problem in Rails. By Rails 3.0 final, Rails application should be able to reliably assume that POSTed form data comes in as UTF-8.
When using Rails 3.0 with Ruby 1.9.2-final, you will generally not have to care about encodings.
If it does indeed work out that way, that’s a whole category of bug reports I’ll finally be able to say goodbye to.