User:WindBOT/Documentation

I am a fully automated bot, but that doesn't mean that I am not flexible. You can tweak me from the wiki itself by editing my filters!

Technical details
I am written in Python (2.6). If you want to modify my code, make sure you know enough Python stuff before continuing.

All code must be placed on User:WindBOT/Filters, and indented by two spaces (for the top-level block). Whitespace determines Python's code blocks. Although this is not a necessity, try to use 4 spaces per indentation level.

How to disable a filter
Wait! Before you disable a filter, consider why you want to disable it. Did it produce an expected result but on a page where such a result is not appropriate? In this case, blacklist this page instead of disabling the filter. If I am malfunctioning, chances are that the problem lies in one of my filters. Thus, instead of completely shutting me down, it would be wiser to disable only the chunk of code that is misbehaving. To make me ignore a certain line, add a "#" in front of it: # This line will be ignored If there are multiple lines, wrap them inside triple-quotes: """This line will be ignored and this one as well  and this one is cake  and the previous one was a lie but it was still ignored""" If all else fails, you can simply delete the block of code from the page. I can't come up with code by myself yet, so I won't do anything. If the problem really is elsewhere, [ block the bot].

Filter types
I work using filters. They are simply Python functions which take a certain input as argument, and is expected to return a modified version of this input (if the filter changed something) or an identical version (if the filter didn't change anything). There are multiple types of filters:
 * Regular filters: These are no-frills, direct filters.
 * When to use: When no other filter type is adequate. This type of filter can be very destructive if the function is not careful enough.
 * Input/Output: Raw Wikitext of a page
 * Implementation details: Your filter is called only once, over the whole content of the page.
 * How to use: To register a regular filter, call the  function:
 * Safe filters: These are Wikitext safe filters.
 * When to use: Semantics-related filters are a good fit for these. Use them to filter human-readable text.
 * Input/Output: Sanitized Wikitext of a page (readable text content), external and internal wikilinks labels (labels only), internal wikilinks URLs only when combined with its label ( like this ).
 * Implementation details: Your filter is called once over the textual body of the page, then once per link label.
 * How to use: To register a safe filter, call the  function:
 * Link filters: These filters act on links within wiki pages.
 * When to use: When you want to apply filters on links. Note: Use a safe filter if all you want to do is to modify link labels (unless you don't want to modify the page's body as well).
 * Input/Output: A single link instance. The class definition is given at the bottom of this document.
 * Implementation details: Your filter is called once per link in the page.
 * How to use: To register a link filter, call the  function:
 * Locale filters: These filters act on localization dictionaries.
 * When to use: When you want to extract only certain parts of translation files.
 * Input/Output: N/A
 * Implementation details: The localization dictionary is a huge dictionary with keys being the string IDs (, etc.), and each key being another dictionary. This inner dictionary has language names as keys ( , ...) and the actual translated string as value.
 * How to use: Call the  function:   where:
 * (Required)  is the localization dictionary.
 * (Optional)  filters strings by their translation availability. For example,   will only keep strings which are available in both   and.
 * (Optional)  filters strings by their string ID, which must contain this string as prefix.
 * (Optional)  filters strings by their string ID, which must contain this string as suffix.
 * (Optional)  is a list of keys that should be excluded no matter what.

Filters themselves may be filtered (yeah, really) so that they are only applied to certain articles:
 * Use  (where   is one of the functions described above) to add ,  ,  ... as filters that will only be applied on German pages.

Filter generators
As previously mentioned, filters are Python functions. However, a lot of filters are similar in function and in purpose. Therefore, declaring a new Python function for each filter would be redundant and cumbersome.

Since functions are first-class variables in Python, you can pass around, edit, and create functions programatically. This is what filter generators are. They take a few argument about your desired filter's details, and generate a corresponding Python function, which you can then add using the method described above.

Filter generators: {   'text1': 'replacement1', 'text2': 'replacement2', 'text3': 'replacement3' } {    'regex1': 'replacement1', 'regex2': 'replacement2', 'regex3': 'replacement3' }
 * : This generates a straightforward text replacement filter, which replaces all instance of  by.
 * : The bulk version of . Generates a text replacement filter with multiple things to replace.   should be a Python dictionary of the form:
 * : This generates a simple regex filter. To use backreferences in the replacement argument, use  for group 1,   for group 2, etc.
 * : The bulk version of the  filter generator. To use backreferences in the replacement argument, use   for group 1,   for group 2, etc.   should be a Python dictionary of the form:
 * : This generates a filter guaranteed to be applied only to whole words (if used as safe filter), and with wikitext aliases. The first argument,, is the "correct" spelling of the word. The rest of the arguments are regular expressions (that only match whole words! You do not need to check for this) which will be replaced with  . Note that you can (and should) repeat   as one of the alternate spellings, in order to enforce  's capitalization.
 * : This is effectively the same as . It adds   itself as a spelling of , which replaces all instances of   by the correctly-capitalized version of it. Note: You do not need to call   on this one.   automatically calls   by itself, as it is meant to be used only on textual content.
 * : This generates word filters for all strings in the localization dictionary, going from language   to language  . This function automatically adds the generated word filters to the safe filters list. If the   argument is provided, the word filters will be applied only on pages in that language (for example,   will make the filters only be applied on /de pages).