Allow any tags to contain unknown tags (Steven Parkes)
Remove escaped quote (') from matching (#55)
Fix ‘undefined method downcase for nil:NilClass’ on JRuby (#58)
Unescape hex numeric character references
GH #21, #32, #33, #36: Fix for reported segfaults
GH#8: Nil-check before downcasing attribute key
GH#25: Proper ruby 1.9 encoding support
GH#28. Use integers instead of ?? on 1.9, which is just a string.
including noscript to ElementInclusions , so that hpricot wont fail when trying to parse a meta tag inside head section when noscript is present.
latest changes from fast_xs mainline
Fixes to get Hpricot running on Rubinius:
Use free, not XFREE
Remove RSTRUCT craziness, don’t break Array#at
Bring JRuby support up to speed, including Java-based hpricot_css support
Change JRuby fast_xs to have same escaping behavior as C fast_xs
fix for issue #2, downcasing of html attributes inside the parser.
solve issue #3 with bogus etags being preserved in `to_s` rather than just `to_original_html`.
fix error when attempting to reparent cleared node. (issue #5)
Hpricot::Attributes proxy object for using `ele.attributes = v` directly. however, it is preferred to use the jquery-like `elements.attr(k, v)`.
big problems on Ruby 1.8.6, use INT2FIX instead of INT2NUM. hashes were being cast to bignums.
patch for 1.8.5 to define RARRAY_PTR. thanks, mike perham!
inspecting empty document bug, courtesy of @TalLevAmi.
Saving memory and speed by using RStruct-based elements in the C extension.
Bug in tag parsing, causing runaway <script> and <style> tags in HTML.
Problem compiling under Ruby 1.9, due to our_rb_hash_lookup function meant for Ruby 1.8.
CData was missing inner_text method.
Rewritten parser routine, much lighter on memory, quite a bit faster.
Friendlier with Ruby 1.9.
Fixes to nth-child and text() selectors.
Hpricot for JRuby -- nice work Ola Bini!
Inline Markaby for Hpricot documents.
XML tags and attributes are no longer downcased like HTML is.
new syntax for grabbing everything between two elements using a Range in the search method: (doc/(“font”..“font/br”)) or in nodes_at like so: (doc/“font”).nodes_at(“*”..“br”). Only works with either a pair of siblings or a set of a parent and a sibling.
Ignore self-closing endings on tags (such as form) which are containers. Treat them like open parent tags. Reported by Jonathan Nichols on the hpricot list.
Escaping of attributes, yanked from Jim Weirich and Sam Ruby’s work in Builder.
Element#raw_attributes gives unescaped data. Element#attributes gives escaped.
Added: Elements#attr, Elements#remove_attr, Elements#remove_class.
Added: Traverse#preceding, Traverse#following, Traverse#previous, Traverse#next.
support for a[text()=“Click Me!”] and h3 and the like.
Hpricot.buffer_size accessor for increasing Hpricot’s buffer if you’re encountering huge ASP.NET viewstate attribs.
some support for colons in tag names (not full namespace support yet.)
Element.to_original_html will attempt to preserve the original HTML while merging your changes.
Element.to_plain_text converts an element’s contents to a simple text format.
Element.inner_text removes all tags and returns text nodes concatenated into a single string.
no @raw_string variable kept for comments, text, and cdata – as it’s redundant.
xpath-style indices (//p/a[1]) but keep in mind that they aren’t zero-based.
node_position is the index among all sibling nodes, while position is the position among children of identical type.
comment() and text() search criteria, like: //p/text(), which selects all text inside paragraph tags.
every element has css_path and xpath methods which return respective absolute paths.
more flexibility all around: in parsing attributes, tags, comments and cdata.
The :fixup_tags option will try to sort out the hierarchy so elements end up with the right parents.
Elements such as script and style (identified as having CDATA contents) receive a single text node as their children now. Previously, Hpricot was parsing out tags found in scripts.
Better scanning of partially quoted attributes (found by Brent Beardsly on uswebgen.com/)
Better scanning of unquoted attributes – thanks to Aaron Patterson for the test cases!
Some tags were being output in the empty tag style, although browsers hated that. FIXED!
Added Elements#at for finding single elements.
Added Elem::Trav#[] and Elem::Trav#[]= for reading and writing attributes.
Fixed negative string size error on empty tokens. (news.bbc.co.uk)
Allow the parser to accept just text nodes. (such as: Hpricot.parse('TEXT'))
from JQuery to Hpricot::Elements: remove, empty, append, prepend, before, after, wrap, set, html(…), to_html, to_s.
on containers: to_html, replace_child, insert_before, insert_after, innerHTML=.
Hpricot(…) is an alias for parse.
open up all properties to setters, let people do as they may.
use to_html for the full html of a node or set of elements.
doctypes were messed.
Rewrote the HTree parser to be simpler, more adequate for the common man. Will add encoding back in later.
For whatever reason, wrote this HTML parser in C. I guess Ragel is addictive and I want to improve HTree.
Generated with the Darkfish Rdoc Generator 2.