include/HTMLPurifier/standalone/HTMLPurifier/Lexer/PEARSax3.php

Show: inherited
Table of Contents

\HTMLPurifier_Lexer_PEARSax3

Package: SugarCRM

Proof-of-concept lexer that uses the PEAR package XML_HTMLSax3 to parse HTML.

PEAR, not suprisingly, also has a SAX parser for HTML. I don't know very much about implementation, but it's fairly well written. However, that abstraction comes at a price: performance. You need to have it installed, and if the API changes, it might break our adapter. Not sure whether or not it's UTF-8 aware, but it has some entity parsing trouble (in all areas, text and attributes).

Quite personally, I don't recommend using the PEAR class, and the defaults don't use it. The unit tests do perform the tests on the SAX parser too, but whatever it does for poorly formed HTML is up to it.

Parent(s)
\HTMLPurifier_Lexer
Todo
Generalize so that XML_HTMLSax is also supported.  
Warning
Entity-resolution inside attributes is broken.  

Properties

Propertyprotected  $_special_entity2str= 'array( '&quot;' => '"', '&amp;' => '&', '&lt;' => '<', '&gt;' => '>', '&#39;' => "'", '&#039;' => "'", '&#x27;' => "'" )'
inherited

Most common entity to raw value conversion table for special entities.

Inherited from: \HTMLPurifier_Lexer::$$_special_entity2str
Default valuearray( '&quot;' => '"', '&amp;' => '&', '&lt;' => '<', '&gt;' => '>', '&#39;' => "'", '&#039;' => "'", '&#x27;' => "'" )Details
Type
n/a
Inherited_from
\HTMLPurifier_Lexer::$$_special_entity2str  
Propertyprotected  $last_token_was_empty= ''
Details
Type
n/a
Propertyprivate  $parent_handler= ''
Details
Type
n/a
Propertyprivate  $stack= 'array()'
Default valuearray()Details
Type
n/a
Propertyprotected  $tokens= 'array()'

Internal accumulator array for SAX parsers.

Default valuearray()Details
Type
n/a
Propertypublic  $tracksLineNumbers= 'false'
inherited

Whether or not this lexer implements line-number/column-number tracking.

Inherited from: \HTMLPurifier_Lexer::$$tracksLineNumbers

If it does, set to true.

Default valuefalseDetails
Type
n/a
Inherited_from
\HTMLPurifier_Lexer::$$tracksLineNumbers  

Methods

methodprotectedCDATACallback(  $matches ) : void
staticinherited

Callback function for escapeCDATA() that does the work.

Inherited from: \HTMLPurifier_Lexer::CDATACallback()
Parameters
Name Type Description
$matches
Details
Params
$matches PCRE matches array, with index 0 the entire match and 1 the inside of the CDATA section.  
Returns
Escaped internals of the CDATA section.  
Warning
Though this is public in order to let the callback happen, calling it directly is not recommended.  
methodpublic__construct( ) : void
inherited

Inherited from: \HTMLPurifier_Lexer::__construct()
methodpubliccloseHandler(  $parser,  $name ) : void

Close tag event handler, interface is defined by PEAR package.

Parameters
Name Type Description
$parser
$name
methodpubliccreate( \$config $config ) : \Concrete
staticinherited

Retrieves or sets the default Lexer as a Prototype Factory.

Inherited from: \HTMLPurifier_Lexer::create()

By default HTMLPurifier_Lexer_DOMLex will be returned. There are a few exceptions involving special features that only DirectLex implements.

Parameters
Name Type Description
$config \$config

Instance of HTMLPurifier_Config

Returns
Type Description
\Concrete lexer.
Details
Note
The behavior of this class has changed, rather than accepting a prototype object, it now accepts a configuration object. To specify your own prototype, set %Core.LexerImpl to it. This change in behavior de-singletonizes the lexer object.  
methodpublicdataHandler(  $parser,  $data ) : void

Data event handler, interface is defined by PEAR package.

Parameters
Name Type Description
$parser
$data
methodprotectedescapeCDATA( \$string $string ) : void
staticinherited

Translates CDATA sections into regular sections (through escaping).

Inherited from: \HTMLPurifier_Lexer::escapeCDATA()
Parameters
Name Type Description
$string \$string

HTML string to process.

Details
Returns
HTML with CDATA sections escaped.  
methodprotectedescapeCommentedCDATA(  $string ) : void
staticinherited

Special CDATA case that is especially convoluted for