Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Original

Result

SCP-001

scp-001

User Curated Lists

user-curated-lists

Kate McTiriss's Proposal

kate-mctiriss-s-proposal

(TODO)However, Wikidot URLs can be more complicated. For instance, they may specify any number of categories that the page is in, in a specific order:

Original

Result

FRAGMENT:some page (1)

fragment:some-page-1

deleted:Spc 1059

deleted:spc-1059

:fragment::page:

fragment:page

Multiple colons are merged into one, and any trailing or leading colons are stripped.

This also applies to dashes, multiple will be merged into one, and any leading or trailing dashes will be stripped. Because spaces and extraneous characters are converted to dashes, this essentially removes them entirely. This also occurs at category boundaries:

Original

Result

some--page

some-page

-spaghetti

spaghetti

(TOP SECRET) Special File!

top-secret-special-file

fragment: !Page

fragment:page

-category-:-page-

category:page

Underscores

It is notable that, unlike dashes, underscores are treated specially. Effectively they are treated as any other non-normal character, and converted into dashes. However, a single underscore is permitted at the start of any given section of a name. This allows for special pages like _template or _404 to exist, even in categories.

Character Transformations

In addition to the transformations above, Wikidot also converts several Latin Unicode characters to their simplified ASCII variants, removing diacritics and other modifiers. For instance ě to e and À to A. There are some notable cases, such as characters like Ö becoming Oe or Ü becoming Ue.

This step also converts various punctuation like spaces, commas, and slashes to dashes. This is unusual given that a later conversion step achieves the same result.

Full Procedure

All of the steps performed by the normalization process are as follows:

  • Trim leading and trailing whitespace

  • Transform characters to their ASCII equivalents (see above)

  • Lowercase all ASCII characters

  • Convert all non-normal characters (alphanumeric, dashes, underscores, colons) to dashes

  • Remove all leading and trailing dashes

  • Merge multiple dashes into a single dash

  • Merge multiple colons into a single colon

  • Remove all leading and trailing dashes next to colons (e.g. fragment:-testfragment:test)

  • Remove all leading and trailing dashes next to underscores (e.g. _-template-_template)

  • Remove all leading and trailing colons