include/HTMLPurifier/standalone/HTMLPurifier/Lexer/PEARSax3.php

Show:

inherited

Classes
- \HTMLPurifier_Lexer_PEARSax3

Table of Contents

\HTMLPurifier_Lexer_PEARSax3
jump to top

Package: SugarCRM

Proof-of-concept lexer that uses the PEAR package XML_HTMLSax3 to parse HTML.

PEAR, not suprisingly, also has a SAX parser for HTML. I don't know very much about implementation, but it's fairly well written. However, that abstraction comes at a price: performance. You need to have it installed, and if the API changes, it might break our adapter. Not sure whether or not it's UTF-8 aware, but it has some entity parsing trouble (in all areas, text and attributes).

Quite personally, I don't recommend using the PEAR class, and the defaults don't use it. The unit tests do perform the tests on the SAX parser too, but whatever it does for poorly formed HTML is up to it.

Parent(s): \HTMLPurifier_Lexer
Todo: Generalize so that XML_HTMLSax is also supported.
Warning: Entity-resolution inside attributes is broken.

Properties


                 $_special_entity2str= 'array(
                    '&quot;' => '"',
                    '&amp;'  => '&',
                    '&lt;'   => '<',
                    '&gt;'   => '>',
                    '&#39;'  => "'",
                    '&#039;' => "'",
                    '&#x27;' => "'"
            )'

inherited

Most common entity to raw value conversion table for special entities.

Inherited from: \HTMLPurifier_Lexer::$$_special_entity2str

Default value

array(
                    '&quot;' => '"',
                    '&amp;'  => '&',
                    '&lt;'   => '<',
                    '&gt;'   => '>',
                    '&#39;'  => "'",
                    '&#039;' => "'",
                    '&#x27;' => "'"
            )

Details

Type: n/a
Inherited_from: \HTMLPurifier_Lexer::$$_special_entity2str


                 $last_token_was_empty= ''

Details

Type: n/a


                 $parent_handler= ''

Details

Type: n/a


                 $stack= 'array()'

Default valuearray()Details

Type: n/a


                 $tokens= 'array()'

Internal accumulator array for SAX parsers.

Default valuearray()Details

Type: n/a


                 $tracksLineNumbers= 'false'

inherited

Whether or not this lexer implements line-number/column-number tracking.

Inherited from: \HTMLPurifier_Lexer::$$tracksLineNumbers

If it does, set to true.

Default valuefalseDetails

Type: n/a
Inherited_from: \HTMLPurifier_Lexer::$$tracksLineNumbers

Methods

CDATACallback(
           $matches
          )
        
        :
          void

staticinherited

Callback function for escapeCDATA() that does the work.

Inherited from: \HTMLPurifier_Lexer::CDATACallback()

Parameters

Name	Type	Description
$matches

Details

Params: $matches PCRE matches array, with index 0 the entire match and 1 the inside of the CDATA section.
Returns: Escaped internals of the CDATA section.
Warning: Though this is public in order to let the callback happen, calling it directly is not recommended.

__construct(
          
          )
        
        :
          void

inherited

Inherited from: \HTMLPurifier_Lexer::__construct()

closeHandler(
           $parser,  $name
          )
        
        :
          void

Close tag event handler, interface is defined by PEAR package.

Parameters

Name	Type	Description
$parser
$name

create(
          \$config $config
          )
        
        :
          \Concrete

staticinherited

Retrieves or sets the default Lexer as a Prototype Factory.

Inherited from: \HTMLPurifier_Lexer::create()

By default HTMLPurifier_Lexer_DOMLex will be returned. There are a few exceptions involving special features that only DirectLex implements.

Parameters

Name	Type	Description
$config	\$config	Instance of HTMLPurifier_Config

Returns

Type	Description
\Concrete	lexer.

Details

Note: The behavior of this class has changed, rather than accepting a prototype object, it now accepts a configuration object. To specify your own prototype, set %Core.LexerImpl to it. This change in behavior de-singletonizes the lexer object.

dataHandler(
           $parser,  $data
          )
        
        :
          void

Data event handler, interface is defined by PEAR package.

Parameters

Name	Type	Description
$parser
$data

escapeCDATA(
          \$string $string
          )
        
        :
          void

staticinherited

Translates CDATA sections into regular sections (through escaping).

Inherited from: \HTMLPurifier_Lexer::escapeCDATA()

Parameters

Name	Type	Description
$string	\$string	HTML string to process.

Details

Returns: HTML with CDATA sections escaped.

escapeCommentedCDATA(
           $string
          )
        
        :
          void

staticinherited

Special CDATA case that is especially convoluted for

include/HTMLPurifier/standalone/HTMLPurifier/Lexer/PEARSax3.php

\HTMLPurifier_Lexer_PEARSax3jump to top

Properties

Methods

\HTMLPurifier_Lexer_PEARSax3
jump to top