Subscribe to News

Bots in WikiProjects

Author : Jbuenol

From TechnologicalWiki

Jump to: navigation, search

Contents

[edit] Introduction.What is a Bot?

Writing in the wiki (Wikipedia,Wikimedia, ...) is a process which is realized by a lot of people in the world daily. They create articles which must be corrected by other people to get a good level in the editing of those articles. This is a hard work due to the great amount of articles which are written. Because of this, bots are used. What is a bot?. A bot is a program, an application or a script which is responsible for the execution of an automatized task. When we apply the use of bots in the wiki, they make the task that would make a normal user, such as logging, articles reviewing...

[edit] PyWikipedia Bots Framework

There is a framework that is composed of a set of libraries and utilities which you can realize automatized tasks, and create you own bots. This framework is developed using Python language. The bots are not part of the wiki, They are client applications which interact with the wiki.

[edit] Prerequisites

You need to install the Python environment first.

You can download it doing click here. The framework has been tested with Python 2.5. For this reason, it is recommended that you install this version.

[edit] Framework Installation

The easiest way to download PyWikipediaBot is to use the latest nightly release available at this site. Outdated versions can be found at Sourceforge. All you have to do is download PyWikipedia to your computer and decompress the file, there is no further installation required.

The pywikipedia files are found here.

- Download with SVN -

You can use SVN (subversion.tigris.org) to retrieve an up-to-date version of PyWikipediaBot. If you use Windows TortoiseSVN is advised.

To check out the source code using the command line SVN client use this command:

   $ svn checkout http://svn.wikimedia.org/svnroot/pywikipedia/trunk/pywikipedia/ pywikipedia

Or, without the spell-checking files (saves a while), add --ignore-externals:

   $ svn checkout --ignore-externals http://svn.wikimedia.org/svnroot/pywikipedia/trunk/pywikipedia/ pywikipedia

With either of those commands the source code will be in a new directory inside your current working directory named pywikipedia. (the last positional parameter is used as the destination directory)

For non command line tools, the only information needed is the repository path: http://svn.wikimedia.org/svnroot/pywikipedia/trunk/pywikipedia/

On a Mac, follow these instructions.

[edit] Setting the Framework to your Wiki

[edit] user-config.py file

The first step is to configure your wiki and the language. To complete this process, you have to create a file[1] ( user-config.py ) with the following content:

                                                  user-config.py FILE

    family= 'yourwiki'
    mylang = 'en'           # en,es,fr,...

    usernames[family][mylang] = u'YourWikiUser'
    sysopnames[family][mylang] = u'YourWikiUser'

This file must be created in the folder where you have stored the framework. It's recommendable to register the user as a bot in the wiki if you are going to use it as a bot.So, Changes made by the user ( Recent Changes ) will not be shown unless specified otherwise.

[edit] family.py file

It's necessary to create a new file for your site if the bot is used in a private wiki. These kind of files are located in families folder. The name of these files is : <name_of_site>_family.py. There are several family files by default when you download the framework :

  • wikipedia_family.py
  • wikitech_family.py
  • wikitravel_family.py

...

This file creates a instance about a new family[2] (new site). It must contain the wiki name, the URL path and the site lenguage. The next code represents a basic example of how to create a new family for the framework:

                                                  family.py FILE

import family

# The wikimedia family that is known as yourwiki
# Translation used on all yourwiki for the 'article' text.
# A language not mentioned here is not known by the robot

class Familyfamily.Family):

    def __init__(self):

        family.Family.__init__(self)
        self.name = 'yourwiki'
        self.langs = {

            'de':'de',                # Prefixes of the languages.
            'en':'en',                # Only these languages are correct in the use-config.py file
            'sv':'sv',
            'fr':'fr',
            'ro':'ro',
            'es':'es',
            'ksh':'ksh',
        }
    def hostname(self,code):
        return 'MyDomain'             # The IP or the domain.

    def path(self, code):
        return '/wiki/index.php'      # This is the path to interact with the wiki.

    # we suposse the URL of the wiki is http://MyDomain/wiki/index.php/<name_of_article>

[edit] Utilities,Libraries and Robots (Framework Content)

To know the content about this framework, click here

[edit] Using Bots

These bots are written in Python. To use the bots, they must be executed from a Python enviroment through the line command.

[edit] Running a bot

You must use the python command to throw a bot. For example to execute[3] the replace.py script you have to write :

    python replace.py param1 param2 param3 param4 ... paramN

[edit] Some examples

  • login.py - loggin into the wiki. The user logged in the wiki is the same as that which it was registered in the user-config.py file.
  • replace.py - replace each occurrence of a given pattern by a chain.
  • speelcheck.py - parse an article using a dictionary rewritable.
  • followlive.py - establish a supervision for short articles.

...

[edit] Creating your first Bot

We are going to learn how to develop a basic bot. First, we will learn how to create it, then we will execute it in our wiki. Once we have set the file user-config.py as we seen before (family wiki and user Bot established), the only thing we have to do is create our script (bot). In this example, the script is named MyFirstBot.py. To interact with the wiki, we are going to use 2 python instances, the wikipedia.py instance and the pagegenerators.py instance.

The first steps is to import in our code the wikipedia.py library. This is accomplished through instruction

   import wikipedia.py

From this moment it's possible to access to the wikipedia object from our script. Now, we can use some features associated with the site and the articles. In this module, there are 2 classes which can be instanced ( page and site ).Now ,wikipedia represents the wiki wrapper, and we can instance the site object and the articles (page object). In the example we are going to read all articles, and then we are going to append to the end of the first article the next line : "This has been modified through a Bot". To get an article instance, the bot must execute the Page(site, title) function :

eg.

   # This example retrieves the article named "MyArticle" from the wiki named "yourwikiname"
   mypage = wikipedia.Page(wikipedia.Site("en", "yourwikiname"), u"MyArticle")
   mysite = wikipedia.Site("en", "yourwikiname")

mypage variable contains the page wrapper. You can read and write in the article :

  mypage.get()
  # to obtain the text of the article.
  mypage.put(u"new content of the article",u"subject")
  # to re-write the text of the article.
  mypage.put(u"%s\nnew content of the article" % mypage.get() ,u"subject")
  # to append a line to the text of the article.

Now, we are going to catch all articles from the wiki. To do it, we need to use a generator of python.A generator is a function that produces a sequence of results instead of a single value, in this case a sequence of pages. To do it, we are going to use the pagegenerators library (pagegenerator.py file) :

   import pagegenerators

With this, we can use some generators of pages lists to manage them. We are going to use the AllpagesPageGenerator() function which execution retrieves an iterator object of a list containing all pages from the wiki site.

   pages = pagegenerators.AllpagesPageGenerator()
   # pages contains a list of all pages. So, We can manage each one of them.
   Other Generators :
         NewpagesPageGenerator()
         CategorizedPageGenerator( category )
         ShortPagesPageGenerator()
         LongPagesPageGenerator()
                  ...
         (for more information see pagegenerators.py)

To access to the elements of the list, you can iterate over the iterator object using the next() function which return the next element from the list. So, the bot can process each article individually. We can also use a loop for to iterate over the list:

   First way     :     page = pages.next()
                       # To get the next page from the list.
                       text = page.get()
                               ...
   Second way    :     for page in pages:
                              text = page.get()
                              # To get all pages and process them.
                                      ...

So, we have created our first bot for the wiki. To know the responses from the wiki you can use the output() function.

   wikipedia.output(u"%s" % page.get())
   # To show the article content on the screen.

To finish, we are going to see the complete example of our bot:

                                                  MyFirstBot.py FILE

         # This bot retrieves all pages from the wiki and writes every content on the screen.

         import wikipedia
         import pagegenerators

         mypage = wikipedia.Page(wikipedia.Site("en", "yourwikiname"), u"Title_of_an_article")
         mysite = wikipedia.Site("en", "yourwikiname")

         pages = pagegenerators.AllpagesPageGenerator()

         for page in pages:
                 wikipedia.output(u"%s" % page.get())

[edit] The Page Class

This class represents a instance of a wiki article.

   __init__              : Page(Site, Title) - the page with title Title on wikimedia site Site
   title                 : The name of the page, in a form suitable for an interwiki link
   urlname               : The name of the page, in a form suitable for a URL
   titleWithoutNamespace : The name of the page, with the namespace part removed
   section               : The section of the page (the part of the name after '#')
   sectionFreeTitle      : The name without the section part
   aslink                : The name of the page in the form Title or lang:Title
   site                  : The wiki this page is in
   encoding              : The encoding of the page
   isAutoTitle           : If the title is a well known, auto-translatable title
   autoFormat            : Returns (dictName, value), where value can be a year, date, etc.,
                           and dictName is 'YearBC', 'December', etc.
   isCategory            : True if the page is a category, false otherwise
   isImage               : True if the page is an image, false otherwise
   get (*)               : The text of the page
   exists (*)            : True if the page actually exists, false otherwise
   isRedirectPage (*)    : True if the page is a redirect, false otherwise
   isEmpty (*)           : True if the page has 4 characters or less content, not
                           counting interwiki and category links
   botMayEdit (*)        : True if bot is allowed to edit page
   interwiki (*)         : The interwiki links from the page (list of Pages)
   categories (*)        : The categories the page is in (list of Pages)
   linkedPages (*)       : The normal pages linked from the page (list of Pages)
   imagelinks (*)        : The pictures on the page (list of ImagePages)
   templates (*)         : All templates referenced on the page (list of strings)
   getRedirectTarget (*) : The page the page redirects to
   isDisambig (*)        : True if the page is a disambiguation page
   getReferences         : List of pages linking to the page
   namespace             : The namespace in which the page is
   permalink (*)         : The url of the permalink of the current version
   move                  : Move the page to another title
   put(newtext)          : Saves the page
   put_async(newtext)    : Queues the page to be saved asynchronously
   delete                : Deletes the page (requires being logged in)
   (*) : This loads the page if it has not been loaded before; permalink might
         even reload it if it has been loaded before

[edit] The Site Class

This class represents the Wiki Site. This is the way to get the articles.

   messages                            :     There are new messages on the site
   forceLogin()                        :     Does not continue until the user
                                              has logged in to the site
   getUrl()                            :     Retrieve an URL from the site
   mediawiki_message(key)              :     Retrieve the text of the MediaWiki
                                              message with the key "key"
   has_mediawiki_message(key)          :     True if this site defines a MediaWiki
                                              message with the key "key"
   Special pages:
       Dynamic pages:
           allpages()                  :     Special:Allpages
           newpages()                  :     Special:Newpages
           longpages()                 :     Special:Longpages
           shortpages()                :     Special:Shortpages
           categories()                :     Special:Categories
       Cached pages:
           deadendpages()              :     Special:Deadendpages
           ancientpages()              :     Special:Ancientpages
           lonelypages()               :     Special:Lonelypages
           uncategorizedcategories()   :     Special:Uncategorizedcategories
           uncategorizedpages()        :     Special:Uncategorizedpages
           unusedcategories()          :     Special:Unusuedcategories

[edit] Other Functions

   getall()                            :     Load pages via Special:Export
   setAction(text)                     :     Use 'text' instead of "Wikipedia
                                              python library" in editsummaries
   handleArgs()                        :     Checks whether text is an argument
                                              defined on wikipedia.py (these are -family,
                                              -lang, -log and others)
   translate(xx, dict)                 :     dict is a dictionary, giving text
                                              depending on language, xx is a language.
                                              Returns the text in the most
                                              applicable language for the xx: wiki
   setUserAgent(text)                  :     Sets the string being passed to the HTTP
                                              server as the User-agent: header. Defaults
                                              to 'Pywikipediabot/1.0'.
   output(text)                        :     Prints the text 'text' in the encoding of
                                              the user's console.
   input(text)                         :     Asks input from the user, printing the
                                              text 'text' first.
   showDiff(oldtext, newtext)          :     Prints the differences between oldtext and
                                              newtext on the screen.
   getLanguageLinks(text,xx)           :     get all interlanguage links in wikicode
                                              text 'text' in the form xx:pagename.
   removeLanguageLinks(text)           :     gives the wiki-code 'text' without any
                                              interlanguage links.
   replaceLanguageLinks(oldtext, new)  :     in the wiki-code 'oldtext' remove the
                                              language links and replace them by the language
                                              links in new, a dictionary with the languages as
                                              keys and either Pages or titles as values
   getCategoryLinks(text,xx)           :     get all category links in text 'text'
                                              (links in the form xx:pagename)
   removeCategoryLinks(text,xx)        :     remove all category links in 'text'
   replaceCategoryLinks(oldtext,new)   :     replace the category links in oldtext by
                                              those in new (new a list of category Pages).
   stopme()                            :     Put this on a bot when it is not or not
                                              communicating with the Wiki any longer. It will
                                              remove the bot from the list of running
                                              processes, and thus not slow down other bot
                                              threads anymore.

[edit] External links

http://emijrp.blogspot.com/2008/02/curso-de-bots-i.html - Bots course 1 part (SPANISH)

http://emijrp.blogspot.com/2008/02/curso-de-bots-ii.html - Bots course 2 part (SPANISH)

http://meta.wikimedia.org/wiki/Using_the_python_wikipediabot - Using the python wikipediabot

http://meta.wikimedia.org/wiki/Pywikipedia_bot_on_non-Wikimedia_projects - Pywikipedia bot on non-Wikimedia projects

[edit] Notes

  1. Be careful with the file format. It's necessary to save the files in UTF-8 format the get a proper functioning.
  2. In the families folder there are several families created by default. The configuration of these files is more complete in some cases.
  3. Do not do massive tests on Wikipedia. To test your bot uses the test area of the wikipedia (Testing zone) or your own wiki.
Main Collaborators