Relation of wiki and manuals

From MEPIS Documentation Wiki

Jump to: navigation, search

There is some effort underway to automate some aspects of data transfer from this wiki to MEPIS Manual, see e.g. Talk:Relation of wiki and manuals and Talk:KDE4 desktop. It is not yet sure which tools to use, who would do that and whether it really helps or not. There are however some tiny fragments of code which do something like this, there is some related knowledge and related docs. This page collects these useful things.


Code fragments

Get list of all pages in category Manual

 awk <Category\:Manual '/Articles in category/,/Retrieved from/'|
   fgrep "li>"|tr '"' ' '|awk '{print $3}'|tr / ' '|awk '{print $NF}' >man_list

(Maybe does not work for pages with spaces in names - should test) (Likely will not work when pages are in multiple columns)

This fragment is a real mess. Parsing formatted HTML output is a desperate programming paradigm and doing it using a shaky pile of shell commands does not help. But I think that HTML is still the best way to get this list out of wiki as we are using a public interface. (XML export cannot be used for special pages.)

Download xml text of all pages in category Manual

 for x in `cat man_list`; do wget -O $x.input 'http'://$x; done

The XML export is a nice and clean way to get data out of wiki, likely the best one.

Convert xml-encoded text to clean text only

 awk '/<text xml:/,/<\/text>/' <Bluetooth.input | sed -e 's/.*<text xml:space="preserve">//' -e 's/<\/text>//' >Bluetooth.txt

Some tool doing a real XML parsing and element extraction would be better but parsing it as a piece of text is not so bad either.

At this stage we got the text which is available when user starts editing a page. We could also try to pretend that our script is an user going to edit the page and get the text this way but unfortunately this cannot be done by wget, there is likely some javascript action involved.

Convert wiki-style links to url-style links

 sed -e 's/\[http:\([^ ]*\) \([^]]*\)]/<a href="http:\1">\2<\/a>/g'   <Bluetooth.txt >Bluetooth.out

Resulting file can be seen as a source of this page: Bluetooth.out

(Category was changed by hand to avoid interference with script collecting all Manual things)

Changing ====Title==== in <h4>Title</h4>

 sed -e 's/^====\(.*\)====/<h4>\1<\/h4>/'

Changing ===Title=== in <h3>Title</h3>

 sed -e 's/^===\(.*\)===/<h3>\1<\/h3>/'

Changing ==Title== in <h2>Title</h2>

 sed -e 's/^==\(.*\)==/<h2>\1<\/h2>/'

Unordered lists

*SomeText converts to <li>SomeText</li>. Frame the whole list with the tags <ul> and </ul>

 awk '/^\*/&&!i{print "<ul>";i=1} /^[^*]/&&i{print "</ul>";i=0} /^\*/{print "<li>" $0 "</li>";next} {print}'|sed 's/<li>\*/<li>/'

Ordered lists

#SomeText converts to <li>SomeText</li>. Frame the whole list with the tags <ol> and </ol>

 awk '/^#/&&!i{print "<ol>";i=1} /^[^#]/&&i{print "</ol>";i=0} /^#/{print "<li>" $0 "</li>";next} {print}'|sed 's/<li>#/<li>/'

Other desired transforms

  • ...

Packing the results

 tar czvf for-manual.tgz *.out

Full script and HTML fragments for manual

Full script using the above fragments, all intermediate files and the resulting HTML fragments for-manual.tgz can be found at

The only way to trigger update there is to ask User:Hanzl. If he is not in the right mood to do that, you can also get the script and run it yourself (if you dare):

 mkdir for-manual
 cd for-manual
 chmod +x get-from-wiki

Keep in mind that you are running script written by somebody else and any errors can result in e.g. removal of all your files. To play it safe, you could create a special user account just for this.

Feel free to modify code fragments and make your own full scripts and put it all on your own web servers if you feel that User:Hanzl lost interest in this or lost the internet connection or whatever.


XML Export format

Personal tools