The wiki for open technologies analysis
From TechnologicalWiki
Contents |
Scope
This article will define "The wiki for open technologies" requirements. This project aim to be a reference in the internet for open technologies and its goal is to empower innovation.
Contact
Oscar Puyal oscar.puyal@cesla.info
Raquel Frisa raquel.frisa@vodafone.com
Contributors
Alberto Bernal
Javier Bueno
mail : javier.bueno@cesla.info
wiki user : jbuenol
Daniel Perella
David Henar
Elena Moreno
Executive Summary
This document is organized in the following sections:
- Introduction. – This section describes the current state of the art related to the wiki world.
- The technological wiki Application. – This section describes the platform objectives for The technological wiki project and summarizes the requirements. It also proposes the architecture of the application, and packages.
- The technological wiki service. – This section describes the objectives for the service and summarizes the requirements. It also proposes pages and portals.
- Physical Architecture.
- Logical Architecture.
- Wiki Contents.
- External Search Service. - This section describes the functionalities for the external search service.
Introduction
A wiki is a collection of web pages designed to enable anyone who accesses it to contribute or modify content, using a simplified markup language. A defining characteristic of wiki technology is the ease with which pages can be created and updated. Generally, there is no review before modifications are accepted.
Many wikis are open to alteration by the general public without requiring them to register user accounts. Sometimes logging in for a session is recommended, to create a "wiki-signature" cookie for signing edits automatically. Many edits, however, can be made in real-time and appear almost instantly online. This can facilitate abuse of the system. Wiki promotes meaningful topic associations between different pages by making page link creation almost intuitively easy and showing whether an intended target page exists or not.
A wiki is not a carefully-crafted site for casual visitors. Instead, it seeks to involve the visitor in an ongoing process of creation and collaboration that constantly changes the Web site landscape.
The success of Wikipedia has showed how efficient wikis can be for collaborative projects. Besides Wikis are used in business to provide intranets and knowledge management systems. Most or the wikis are implemted on Mediawiki software, which is used extensively for private networks and for public projects. Plugings and modificacions are growing every day on this software an over other platforms that are following the same principle, resulting a big number of solutions. Beyound doubt, presently wikis is one of the most important trends on the Internet.
The technological wiki Application
CESLA aims to develop “The technological wiki” platform. This platform is addressed to the open source community and to the internet world on the whole to allow collaboration between different groups and organizations.
The application pursues various objectives:
- To collect knowledge about new technologies.
- To share knowledge to final internet and mobile user.
- Implement the last improvements in wiki technologies.
- To launch a sustainable initiative from an economical point of view.
Requirement
Use Cases
The Tecnological Wiki will have to cover the following Use cases summarized below.
UC1: The wiki platform will be Mediawiki.
UC2: Open office edition.
UC3: The main page will be a dynamic one.
UC4: Content organized by portals.
UC5: Private and public space.
UC6: Editors management.
UC7: External search options.
Physical architecture
Logical architecture
Mediawiki
| Package | Function | |||
|---|---|---|---|---|
| Index.php | Main acces point for the MediaWiki software. | |||
| Api.php | External access point for the API, but can also be used internally by other code. | |||
| Directory /includes/ | This directory stores all files needed by MediaWiki. | |||
| Article.php | It represents an article of the wiki, can modify the article (edit, deletion, ...) and maintains state such as text (in wikitext format), flags, etc. | |||
| LinkCache.php | Keeps information on existence of articles. | |||
| Linker.php | These functions are used for primarily page content: links, embedded images, table of contents. Links are also used in the skin | |||
| OutputPage.php | Attempt to clean up some of the insanity in creating meta and link tags in the headers. Values are now escaped consistently, which should be a good thing. | |||
| Pager.php | Contains the IndexPager class used for paging results of MySQL queries. | |||
| Parser.php | This file defines the parser object used to convert wikitext to HTML. | |||
| Setup.php | It included some commonly used files and create the “Global object variables” so that MediaWiki can work. | |||
| Skin.php | Encapsulates a "look and feel" for the wiki. All of the functions that render HTML, and make choices about how to render it, are here, and called from various other places when needed. | |||
| Title.php | Represents the title of an article, and does all the work of translating among various forms such as plain text, URL, database key. It also represents a few features of articles that don't involve their text, such as access rights. | |||
| User.php | Encapsulates the state of the user viewing/using the site. Can be queried for things like the user's settings, name, etc. Handles the details of getting and saving to the “user table” of the database, and dealing with sessions and cookies. | |||
| WebStart.php | It does the initial setup for a web request: security checks, loads “Settings.php” and “Manual:Setup.php” | |||
| Wiki.php | This file consists of the definition of the class MediaWiki. MediaWiki is the to-be base class for this whole project | |||
| Directory /languages/ | This directory contains files used for “internationalisation”. | |||
| Language.php | Contains the Language class, it represents the language used for incidental text, and also has some character encoding functions and other locale stuff. | |||
| Directory /maintenance | This directory contains maintenace scrpits that must be runned from a command line interface. | |||
| Directory /skins/ | This directory contain all skins classes, JavaScripts, CSS and some images used by that skins. | |||
Installed Extensions
- AddAuthor. Extension to link the article to its author and show it into the article header.
- author Javier Bueno
- license http://www.gnu.org/copyleft/gpl.html GNU General Public License 2.0 or later
- Cite. This extension adds two parser hooks to MediaWiki,
<ref> and <references />
these operate together to add citations to pages.- author Ævar Arnfjörð Bjarmason
- license http://www.gnu.org/copyleft/gpl.html GNU General Public License 2.0 or later
- Contribution Scores. Extension for scoring the contribution from users.
- author Tim Laqua
- Dynamic Page List. This extension allows include a list of links which aim to the last modified articles sorted by category.
- author n:en:User:IlyaHaykinson, n:en:User:Amgine, w:de:Benutzer:Unendlich, m:User:Dangerman, m:User:Algorithmix <gero.scholz@t-online.de>
- license http://opensource.org/licenses/gpl-license.php GNU Public License
- Interwiki. This extension allows include remote content from another wiki into a local article.
- author Javier Bueno
- license http://opensource.org/licenses/gpl-license.php GNU Public License
- MySearch. Extension to search a term into pages belongings to a set of sites/domains ( using mediawiki searcher form ).
- author Elena Moreno, Javier Bueno
- ProjectsUI. This extension adds a set of tabs at the beginning of the page. This feature is destinated to better the projects management,so you can create a new project section for each tab created.
- author Javier Bueno
- license http://opensource.org/licenses/gpl-license.php GNU Public License
- ReCaptcha. Adds a captcha for preventing spam and robots.
- author Mike Crawford, Ben Maurer
- license MIT License
- Rss Reader. This extension allow users include a RSS Reader on her page. Now is installed on main page and is used to show recent changes.
- author mutante, Duesentrieb, Rdb, Mafs, Alxndr, Cmreigrut .
- Requires:
- magpie rss parser <http://magpierss.sourceforge.net/>
- iconv <http://www.gnu.org/software/libiconv/>, see also <http://www.php.net/iconv>
- RstToHTML. This extension allows include reStructuredText code in an article.
- autor Paul Kippes
- Requires local installation of reStructuredText parser - See: http://docutils.sourceforge.net/rst.html
- SimplyPermission. Adds the feature to give permissions per-page and user group (Read - Protect). The target of the extension is to provide of an private area for the pages. Only the permitted users can view the pages.
- author Aran Dunkley User:Nad
- licence GNU General Public Licence 2.0 or later
- Uncategorized. A template at the beginning of the article is showed if the article doesn't belong to any valid category.
- author Javier Bueno
- license http://opensource.org/licenses/gpl-license.php GNU Public License
- UserRightList. Show or modify users rigths. This extension is for Sysops.
- author Jim Hu
- license MIT License
- WebChat. This extension adds a tab called "chat" on every page. When you click in the tab, you log in the chat room for the page that are viewing. This extension is recommended to talk or discuss online about a certain content of a page. For example, if you want help for some page, you can view Who is online and you can try to contact with them.
- author Robert Leverington <robert@rhl.me.uk>
- license http://www.gnu.org/copyleft/gpl.html GNU General Public License 2.0 or later
- WhosOnline. Show current online users.
- author Maciej Brencz
- license http://www.gnu.org/copyleft/gpl.html GNU General Public License 2.0 or later
- YouTube Viewer. This extension allow users include links which YouTube Videos can be seen.
- author Sylvain Machefert
To a complete description of the extensions click here.
Wiki's content
Wiki main page
The main access to the wiki is through the main page. This page shows the portals in which it is organized the information. The main page is maintaining the wikipedia look and feel to facilitate the wiki usability. In this page the most relevant articles and news will be showed mostly in text to facilitate the mobile integration.
If you want to change your wiki's look and feel you can find more information here.
Project portal
It is possible to create new projects with a structured information organized with tags. Organizations and independent people will have an space to organize information about their projects or ideas. These projects have a reference in the project portal.
The main points in the Projects portal are:
- Introduction
- Relevant project
- Tutorial
- List of projects
When a project is created, it uses a Default Project Template which specifies the minimal structure a project should contain. This initial structure can be customized to suit the user needs.
Open source portal
Innovation in open technologies is the main goal of the wiki. Mobile phone applications, Mobile linux initiatives, most important associations, and other open source articles have a point of reference in this portal. Open source content, in this wiki, is managed from this page.
The main points in the Projects portal are:
- Introduction
- Relevant collaborations
- List of articles
- Articles to review
- Pending articles
How-to portal
As a way to empower innovation, this portal is centralising How-to articles on the whole. The portal aims to collect the main procedures which are relevant for researchers and How to implement services for final users.
The main points in the Projects portal are:
- Introduction
- Relevant collaborations
- List of articles
- Articles to review
- Pending articles
Technologies portal
What is going to happen in the future and what technologies will lead the internet change. The wiki community and readers have a point of reference in this portal to discuss and show their ideas.
Community portal
The community portal is a reference for the Wiki community where contributors an visitors will find:
- Reference to "your firs article"
- Reference to "your firs article"
- The five pillars
- Behavior rules
- Link to how to collaborate
- Community dicussin (CAFE )
- Link to contact (Spanish Wikipedia)
Administration portal
The Administration portal is where wiki's administrators can to access to the main parts of the wiki.
- Portals Management
- Articles to review
- User Management
- Documentation
Intellectual property rights
The Legal collaboration framework is defined with the Licence use references. In the same way that the most important wikis the Wiki for Open Contents will define the following points at the bottom of the common view:
- GNU Free Documentation License
- Copyrights
- Privacy policy
- About wiki for open technologies
- Disclaimers
External Search
This section summarizes the desired requirements for the External Search Service functionality provided by MySearch service and specialized in Open Source contents.
The service will be provided as follows:
- Through the left menu option "Search", it will be possible to search for related information internally within this wiki and externally through the Internet.
- The results will be presented separately:
- Page title matches for exact matching criteria with an article.
- Page text matches for articles containing search criteria in its content.
- External matches for external sites containing search criteria in its content.
Proposed modifications
Next, proposed modifications are enumerated.
- Number of results to show. Total number of results must be shown. Currenlty, only the default number only represents the amount of wiki results.
- Direct links to Search Subsections. It is necessary to add direct links to each of search subsections at the top of the page. When the number of results is high, e.g. higher than the configured number of shown results (20), it is necessary to scroll the whole page in order to view the external search results. This action may be uncomfortable for some users.
- Advanced search option for External Search Service. Additionally, external search service may provide the user with different options to make the search more accurante. It is necessary to add a direct link to an "Advanced Search" option.
- The options presented need to be discussed according to the search engine features.
- External matches. This section show configured number of results matching with search criteria. The results are shown as a enumeration of site title and summary. See next figure.
- It is necessary to perform some modifications to highlight the results:
- Search terms must be highlighted in the summary by printing them in red coulour, as it is done for internal results. For example, a search by "java programming" is producing the results showed previously. Thus the words "java" and "programming" should appear in red.
- URL addresses should appear in blue.
- Tools for user feedback. A wiki is essentially a collaborative tool for users, and this is the main goal of this one. It is then essential to provide with alternatives to introduce user's feedback regarding search facilities. It is proposed to include these specific functions:
- Form to add URL to the search results provided. This feature must be contrasted with the current features of MySearch search engine. Ideally, the user would add the URL and title of new entries for the search engine index.
- Form to propose articles to be created.
- When a user is searching and there is no internal article associated with the criteria, the option create page is given to the user. But the user may not be an expert in the field or he does not have the time to create an article from scratch. Thus, it is necessary to establish a mechanism to propose the creation of such article.
- Wikipedia provides with an Special page where proposed articles for creation are enumerated by categories (Economics, Science, Arts, Technology). We can afford a similar approach by defining our own categories for new articles. The page should be "Special" and editable only by administrators, but also modifiable through this form by any registered user.
Search Infrastructure
1. Nutch
External Search is based on Nutch & Solr. Nutch (v1.0) is open source web-search software. It builds on Lucene Java, adding web-specifics, such as a crawler, a link-graph database, parsers for HTML and other document formats, etc. Nutch works like a Vertical Searcher, so it searches on particular sites, delving into the different site levels, that is, Nutch is focused on specific slices of content. This is really interesting, when the searchs are focused on one specific type of users. For example, we suppose there is a web-site focused on technology. If we want to find articles refer to Java, the Java island results are not interesting for the user, just interesting for JAVA topics as software.
External search consist of 3 parts :
- Crawler
- Searcher
- Web Service
Crawler
Web crawling or spidering proccess is done in this module. Many sites, in particular search engines, use spidering as a means of providing up-to-date data. Web crawlers are mainly used to create a copy of all the visited pages for later processing by a search engine that will index (by Solr) the downloaded pages to provide fast searches.
Nutch mainly is a Web crawler, which can be configured through some parameters, in order to index a set of pages storing the data in a database. The crawling starts with a list of URLs to visit, called the seeds which are provided into a config file. As the crawler visits these URLs, it identifies all the hyperlinks in the page and adds them to the list of URLs to visit, called the crawl frontier. URLs from the frontier are recursively visited according to a set of policies.
Searcher
Data from visited pages are stored locally in a set of folders. These data are accesed thought Nutch. Nutch implemets a mechanism to search topics through keywords. Nutch also allows to apply some filters in search.
Web Service
This is the service which throws the request to Nutch from the MySearch extension. This JAVA web service, is running in an APACHE Tomcat server.
2. Solr
Solr is the popular, blazing fast open source enterprise search platform from the Apache Lucene project. Its major features include powerful full-text search, hit highlighting, faceted search, dynamic clustering, database integration, and rich document (e.g., Word, PDF) handling. Solr is highly scalable, providing distributed search and index replication, and it powers the search and navigation features of many of the world's largest internet sites.
Solr is written in Java and runs as a standalone full-text search server within a servlet container such as Tomcat. Solr uses the Lucene Java search library at its core for full-text indexing and search, and has REST-like HTTP/XML and JSON APIs that make it easy to use from virtually any programming language. Solr's powerful external configuration allows it to be tailored to almost any type of application without Java coding, and it has an extensive plugin architecture when more advanced customization is required.
To know how to config / use the platform more info here
Remote Content
Remote Content feature allows to the user embed remote content from another wiki in any article written in this wiki. It is based on requests to mediawiki API. An article throws a request to remote API, and the remote API returns the answer in format xml ( json, ... ). Local wiki processes So, remote content is synchronized with local content. When anybody changes the remote content, this change is reflected in the local article.
The goal of this feature is to create a network of wikis. As a result, the visibility of articles will be expanded to the users. In addition, the user gets a synchronization of content.
There are old mediawiki software versions which API needs a login process to answer from a query request. This is not possible from this feature.
To embed a remote content the user only has to follows the next steps :
1. Create a article in the usual way.
2. Type the following content :
<interwiki> Domain=http://en.wikipedia.org APIprefix=w Prefix=wiki Article=Ruby_on_Rails Templates=no </interwiki>
- Domain : This is the domain of the remote content. In the example, the article is got from http://en.wikipedia.org domain.
- APIprefix : An API answers through a prefix in URL. In the example w -> http://en.wikipedia.org/w/api.php
- Prefix : Prefix to be added to the domain to create the remote links. In the example wiki -> http://en.wikipedia.org/wiki/Ruby_on_Rails
- Article : Article name. In the example Ruby_on_Rails.
- Templates : Each wiki has its owns features & integrated extension. Rendering process could show stranger results. So, the platform allows to custom remot features and remote templates.
External API
Mediawiki APIs are usually available through the http protocol. Its path may change from a wiki to another. For example, in wikipedia : http://en.wikipedia.org/w/api.php. Using some parameters, third-party applications can make requests, and retrieve data from the remote wiki.
1. Login - In modern versions do not need to request data from an article -
http://en.wikipedia.org/w/api.php?action=login&lgname=user&lgpassword=password
2. Query -
http://en.wikipedia.org/w/api.php?action=query&prop=revisions&meta=siteinfo&titles=Main%20Page&rvprop=content





