Bots in WikiProjects
Author : Jbuenol
From TechnologicalWiki
Contents |
[edit] Introduction.What is a Bot?
Writing in the wiki (Wikipedia,Wikimedia, ...) is a process which is realized by a lot of people in the world daily. They create articles which must be corrected by other people to get a good level in the editing of those articles. This is a hard work due to the great amount of articles which are written. Because of this, bots are used. What is a bot?. A bot is a program, an application or a script which is responsible for the execution of an automatized task. When we apply the use of bots in the wiki, they make the task that would make a normal user, such as logging, articles reviewing...
[edit] PyWikipedia Bots Framework
There is a framework that is composed of a set of libraries and utilities which you can realize automatized tasks, and create you own bots. This framework is developed using Python language. The bots are not part of the wiki, They are client applications which interact with the wiki.
[edit] Prerequisites
You need to install the Python environment first.
You can download it doing click here. The framework has been tested with Python 2.5. For this reason, it is recommended that you install this version.
[edit] Framework Installation
The easiest way to download PyWikipediaBot is to use the latest nightly release available at this site. Outdated versions can be found at Sourceforge. All you have to do is download PyWikipedia to your computer and decompress the file, there is no further installation required.
The pywikipedia files are found here.
- Download with SVN -
You can use SVN (subversion.tigris.org) to retrieve an up-to-date version of PyWikipediaBot. If you use Windows TortoiseSVN is advised.
To check out the source code using the command line SVN client use this command:
$ svn checkout http://svn.wikimedia.org/svnroot/pywikipedia/trunk/pywikipedia/ pywikipedia
Or, without the spell-checking files (saves a while), add --ignore-externals:
$ svn checkout --ignore-externals http://svn.wikimedia.org/svnroot/pywikipedia/trunk/pywikipedia/ pywikipedia
With either of those commands the source code will be in a new directory inside your current working directory named pywikipedia. (the last positional parameter is used as the destination directory)
For non command line tools, the only information needed is the repository path: http://svn.wikimedia.org/svnroot/pywikipedia/trunk/pywikipedia/
On a Mac, follow these instructions.
[edit] Setting the Framework to your Wiki
[edit] user-config.py file
The first step is to configure your wiki and the language. To complete this process, you have to create a file[1] ( user-config.py ) with the following content:
user-config.py FILE
family= 'yourwiki'
mylang = 'en' # en,es,fr,...
usernames[family][mylang] = u'YourWikiUser'
sysopnames[family][mylang] = u'YourWikiUser'
This file must be created in the folder where you have stored the framework. It's recommendable to register the user as a bot in the wiki if you are going to use it as a bot.So, Changes made by the user ( Recent Changes ) will not be shown unless specified otherwise.
[edit] family.py file
It's necessary to create a new file for your site if the bot is used in a private wiki. These kind of files are located in families folder. The name of these files is : <name_of_site>_family.py. There are several family files by default when you download the framework :
- wikipedia_family.py
- wikitech_family.py
- wikitravel_family.py
...
This file creates a instance about a new family[2] (new site). It must contain the wiki name, the URL path and the site lenguage. The next code represents a basic example of how to create a new family for the framework:
family.py FILE
import family
# The wikimedia family that is known as yourwiki
# Translation used on all yourwiki for the 'article' text.
# A language not mentioned here is not known by the robot
class Familyfamily.Family):
def __init__(self):
family.Family.__init__(self)
self.name = 'yourwiki'
self.langs = {
'de':'de', # Prefixes of the languages.
'en':'en', # Only these languages are correct in the use-config.py file
'sv':'sv',
'fr':'fr',
'ro':'ro',
'es':'es',
'ksh':'ksh',
}
def hostname(self,code):
return 'MyDomain' # The IP or the domain.
def path(self, code):
return '/wiki/index.php' # This is the path to interact with the wiki.
# we suposse the URL of the wiki is http://MyDomain/wiki/index.php/<name_of_article>
[edit] Utilities,Libraries and Robots (Framework Content)
To know the content about this framework, click here
[edit] Using Bots
These bots are written in Python. To use the bots, they must be executed from a Python enviroment through the line command.
[edit] Running a bot
You must use the python command to throw a bot. For example to execute[3] the replace.py script you have to write :
python replace.py param1 param2 param3 param4 ... paramN
[edit] Some examples
- login.py - loggin into the wiki. The user logged in the wiki is the same as that which it was registered in the user-config.py file.
- replace.py - replace each occurrence of a given pattern by a chain.
- speelcheck.py - parse an article using a dictionary rewritable.
- followlive.py - establish a supervision for short articles.
...
[edit] Creating your first Bot
We are going to learn how to develop a basic bot. First, we will learn how to create it, then we will execute it in our wiki. Once we have set the file user-config.py as we seen before (family wiki and user Bot established), the only thing we have to do is create our script (bot). In this example, the script is named MyFirstBot.py. To interact with the wiki, we are going to use 2 python instances, the wikipedia.py instance and the pagegenerators.py instance.
The first steps is to import in our code the wikipedia.py library. This is accomplished through instruction
import wikipedia.py
From this moment it's possible to access to the wikipedia object from our script. Now, we can use some features associated with the site and the articles. In this module, there are 2 classes which can be instanced ( page and site ).Now ,wikipedia represents the wiki wrapper, and we can instance the site object and the articles (page object). In the example we are going to read all articles, and then we are going to append to the end of the first article the next line : "This has been modified through a Bot". To get an article instance, the bot must execute the Page(site, title) function :
eg.
# This example retrieves the article named "MyArticle" from the wiki named "yourwikiname"
mypage = wikipedia.Page(wikipedia.Site("en", "yourwikiname"), u"MyArticle")
mysite = wikipedia.Site("en", "yourwikiname")
mypage variable contains the page wrapper. You can read and write in the article :
mypage.get() # to obtain the text of the article. mypage.put(u"new content of the article",u"subject") # to re-write the text of the article. mypage.put(u"%s\nnew content of the article" % mypage.get() ,u"subject") # to append a line to the text of the article.
Now, we are going to catch all articles from the wiki. To do it, we need to use a generator of python.A generator is a function that produces a sequence of results instead of a single value, in this case a sequence of pages. To do it, we are going to use the pagegenerators library (pagegenerator.py file) :
import pagegenerators
With this, we can use some generators of pages lists to manage them. We are going to use the AllpagesPageGenerator() function which execution retrieves an iterator object of a list containing all pages from the wiki site.
pages = pagegenerators.AllpagesPageGenerator() # pages contains a list of all pages. So, We can manage each one of them.
Other Generators :
NewpagesPageGenerator()
CategorizedPageGenerator( category )
ShortPagesPageGenerator()
LongPagesPageGenerator()
...
(for more information see pagegenerators.py)
To access to the elements of the list, you can iterate over the iterator object using the next() function which return the next element from the list. So, the bot can process each article individually. We can also use a loop for to iterate over the list:
First way : page = pages.next()
# To get the next page from the list.
text = page.get()
...
Second way : for page in pages:
text = page.get()
# To get all pages and process them.
...
So, we have created our first bot for the wiki. To know the responses from the wiki you can use the output() function.
wikipedia.output(u"%s" % page.get()) # To show the article content on the screen.
To finish, we are going to see the complete example of our bot:
MyFirstBot.py FILE
# This bot retrieves all pages from the wiki and writes every content on the screen.
import wikipedia
import pagegenerators
mypage = wikipedia.Page(wikipedia.Site("en", "yourwikiname"), u"Title_of_an_article")
mysite = wikipedia.Site("en", "yourwikiname")
pages = pagegenerators.AllpagesPageGenerator()
for page in pages:
wikipedia.output(u"%s" % page.get())
[edit] The Page Class
This class represents a instance of a wiki article.
__init__ : Page(Site, Title) - the page with title Title on wikimedia site Site title : The name of the page, in a form suitable for an interwiki link urlname : The name of the page, in a form suitable for a URL titleWithoutNamespace : The name of the page, with the namespace part removed section : The section of the page (the part of the name after '#') sectionFreeTitle : The name without the section part aslink : The name of the page in the form Title or lang:Title site : The wiki this page is in encoding : The encoding of the page isAutoTitle : If the title is a well known, auto-translatable title autoFormat : Returns (dictName, value), where value can be a year, date, etc., and dictName is 'YearBC', 'December', etc. isCategory : True if the page is a category, false otherwise isImage : True if the page is an image, false otherwise get (*) : The text of the page exists (*) : True if the page actually exists, false otherwise isRedirectPage (*) : True if the page is a redirect, false otherwise isEmpty (*) : True if the page has 4 characters or less content, not counting interwiki and category links botMayEdit (*) : True if bot is allowed to edit page interwiki (*) : The interwiki links from the page (list of Pages) categories (*) : The categories the page is in (list of Pages) linkedPages (*) : The normal pages linked from the page (list of Pages) imagelinks (*) : The pictures on the page (list of ImagePages) templates (*) : All templates referenced on the page (list of strings) getRedirectTarget (*) : The page the page redirects to isDisambig (*) : True if the page is a disambiguation page getReferences : List of pages linking to the page namespace : The namespace in which the page is permalink (*) : The url of the permalink of the current version move : Move the page to another title put(newtext) : Saves the page put_async(newtext) : Queues the page to be saved asynchronously delete : Deletes the page (requires being logged in) (*) : This loads the page if it has not been loaded before; permalink might even reload it if it has been loaded before
[edit] The Site Class
This class represents the Wiki Site. This is the way to get the articles.
messages : There are new messages on the site
forceLogin() : Does not continue until the user
has logged in to the site
getUrl() : Retrieve an URL from the site
mediawiki_message(key) : Retrieve the text of the MediaWiki
message with the key "key"
has_mediawiki_message(key) : True if this site defines a MediaWiki
message with the key "key"
Special pages:
Dynamic pages:
allpages() : Special:Allpages
newpages() : Special:Newpages
longpages() : Special:Longpages
shortpages() : Special:Shortpages
categories() : Special:Categories
Cached pages:
deadendpages() : Special:Deadendpages
ancientpages() : Special:Ancientpages
lonelypages() : Special:Lonelypages
uncategorizedcategories() : Special:Uncategorizedcategories
uncategorizedpages() : Special:Uncategorizedpages
unusedcategories() : Special:Unusuedcategories
[edit] Other Functions
getall() : Load pages via Special:Export
setAction(text) : Use 'text' instead of "Wikipedia
python library" in editsummaries
handleArgs() : Checks whether text is an argument
defined on wikipedia.py (these are -family,
-lang, -log and others)
translate(xx, dict) : dict is a dictionary, giving text
depending on language, xx is a language.
Returns the text in the most
applicable language for the xx: wiki
setUserAgent(text) : Sets the string being passed to the HTTP
server as the User-agent: header. Defaults
to 'Pywikipediabot/1.0'.
output(text) : Prints the text 'text' in the encoding of
the user's console.
input(text) : Asks input from the user, printing the
text 'text' first.
showDiff(oldtext, newtext) : Prints the differences between oldtext and
newtext on the screen.
getLanguageLinks(text,xx) : get all interlanguage links in wikicode
text 'text' in the form xx:pagename.
removeLanguageLinks(text) : gives the wiki-code 'text' without any
interlanguage links.
replaceLanguageLinks(oldtext, new) : in the wiki-code 'oldtext' remove the
language links and replace them by the language
links in new, a dictionary with the languages as
keys and either Pages or titles as values
getCategoryLinks(text,xx) : get all category links in text 'text'
(links in the form xx:pagename)
removeCategoryLinks(text,xx) : remove all category links in 'text'
replaceCategoryLinks(oldtext,new) : replace the category links in oldtext by
those in new (new a list of category Pages).
stopme() : Put this on a bot when it is not or not
communicating with the Wiki any longer. It will
remove the bot from the list of running
processes, and thus not slow down other bot
threads anymore.
[edit] External links
http://emijrp.blogspot.com/2008/02/curso-de-bots-i.html - Bots course 1 part (SPANISH)
http://emijrp.blogspot.com/2008/02/curso-de-bots-ii.html - Bots course 2 part (SPANISH)
http://meta.wikimedia.org/wiki/Using_the_python_wikipediabot - Using the python wikipediabot
[edit] Notes
- ↑ Be careful with the file format. It's necessary to save the files in UTF-8 format the get a proper functioning.
- ↑ In the families folder there are several families created by default. The configuration of these files is more complete in some cases.
- ↑ Do not do massive tests on Wikipedia. To test your bot uses the test area of the wikipedia (Testing zone) or your own wiki.


