How are non-alphanumeric characters handled in the XML, e.g., & and % and the copyright symbol

For the character issue, I'll walk through the scenarios we've thought through and how we are handling each: 

  1. Items in an ordered lists entered through the WYSIWYG editor will come across in HTML:
  1. <ol>
  2.   <li>Item 1</li>
  3.   <li>Item 2</li>
  1. Items entered with a bullet symbol using the WYISWYG Symbol Library are initially saved as HTML-named entities by the WYISWYG editor. We convert the HTML-named entity to an HTML-decimal numbered entity.
  2. Any UTF-8 character (non-alphanumeric) entered via the WYSIWYG editor is saved as an HTML-named entity. We convert all HTML-named entities to their equivalent HTML-decimal numbered entity.
  3. Any content the editor types directly into the raw HTML that is saved as UTF-8 (example: H&HN vs. H&amp;HN) in the teaser and body fields or in the non-WYSIWYG editor controlled fields (like Headline and Sub-headline) are converted / translated to HTML-decimal numbered entities.
  4. And if a user enters the HTML-hexadecimal numbered entity, we convert THAT to HTML-decimal numbered entities, wherever we find it.

Basically, you can count on our system returning the HTML-decimal numbered entity for non-alphanumeric characters.  EXCEPTION: If you have articles that were imported, you may have the special character for the ampersand & vs. the html-named entity &amp;. If that is the case, we do not translate the special character.  If you have a specific set of articles that fall into this category, do the following: 

  1. Open the article in the admin tool
  2. Open the relevant field's (Teaser or Body) WYSIWYG editor
  3. Close and Save the editor's updates.
  1. The special characters will now be saved as HTML-named entities. 
  1. Save the article.
  2. Re-try your export feed.

 NOTE: For Teasers, depending on your site, you may need to remove the <p> tags from around the teaser's copy to prevent awkward breaks on the front-pages of your site.   Example:  <p>sample teaser &amp; data</p> becomes just sample teaser &amp; data. TURN THE BELOW INTO AN HTML TABLE WITH 4 columnsHTML Decimal Numbered Entity = UTF-8 and HTML Named Entity and HTML Hexadecimal Numbered Entity&#38; = & and &amp; and &x26;