Home Development of Websites Improving the forum engine on the client side

Improving the forum engine on the client side

by admin

Many of us know about the ability to change the appearance of websites by local means, without modifying files on the server.There are different ways to do this, the most popular are custom scripts and styles, which are automatically applied to the loaded page by the browser. Of course, the ability to make such modifications is very limited, and solving any serious problem requiring a non-standard request to the database, this way you can not.
Nevertheless, even in this case it may turn out that not everything is hopeless. In this article I want to tell about my own experience of creating a local add-in for forum engine phpBB2 which fixes displaying read status of topics and posts. In spite of the fact that the result of this attempt is quite capable product which I am using now permanently, the purpose of writing this article is not a presentation of the product but description of my approach to solving the problem. I am pasting the code of the resulting program, but due to the specifics of the task it is not universal enough and cannot be used as is, without preliminary (and rather laborious) adjustment to the engine of a particular site. I decided to warn you about it beforehand, so that there will be no disappointment.
Now, let’s go into more detail about the problem statement.
Despite the fact that the third version of phpBB has long been considered the current version, the second is still quite common on the Internet. Here’s a list of shortcomings of phpBB2, which my tool is designed to combat :

  1. After you re-login or restart your browser, or after a certain period of inactivity, all unread topics are automatically marked as read.
  2. When you open a topic, it becomes read in its entirety, whichever page you open.
  3. Read status is stored in cookies as a serialized array. The maximum size of cookies is usually limited to 4KB, which allows the status to be stored for ~140 topics only. For an active forum, this is very small.

Some might shrug their shoulders: what’s the problem? The problem is that ifyou’re trying to keep track of things, marking unread topics and posts with an orange sheet (in the base skin) is a great way to quickly see which topics are updated and which are not. If you decide, say, now to go to the forum and quickly respond to any post, and postpone reading the rest of the updated topics for later, on your next visit unread status will be reset, and have to search for new messages manually, remembering when was the last visit to each specific topic, and looking for the dates or texts of posts that have appeared in it since then. And even if you always read all the threads at once, it’s not a guarantee: browsers can crash, computers can crash, and power can go out (and not everyone has a UPS).
Of course, the most correct solution to this problem would be to modify the forum engine, but in practice this is not always possible. So I decided to try to solve this problem myself, on the client side. Of course, a full-fledged ideal solution without direct access to the database on the server is out of the question. In majority of cases it’s impossible to define presence/absence of unread topics in certain sub-forum to correctly display its icon on main page (unless you’ll download and analyze all pages with topics from it, but it’s too long). Nevertheless, some significant improvements are quite realistic.
The general architecture of the add-in looks like this: on the client machine there is a local base where time stamps of last attendance for each topic are stored, plus a proxy server which passes through all forum pages, correcting labels of topics and messages according to the real read status taken from the local base (and of course updating the base as necessary).
As a database format, I chose a text file with a set of lines of the form "000000000000000000". Each line corresponds to one topic, the line number (counting from zero) is a topic identifier, and the contents of the line are the timestamp of the last visit in UNIX-time format. There is no zero topic, so the zero line is used in a special way: it stores the date/time for the whole forum, so that all topics can be quickly marked as read at once. Thus, for each topic, the real time of last attendance is the maximum of the two values: forum-wide and this particular topic.
Why have I opted for the text format? Firstly, it was convenient to correct errors, if necessary, without going to the binary editor. Secondly, my proxy-server is written in Perl, and it is more convenient to work with text files than with binary ones. Parsing numbers from text string is not such a resource-intensive operation, and thanks to fixed-length string you can jump to the desired entry directly by the index, without having to read the whole file line by line.
As far as proxy server implementation is concerned, I chose Perl because it’s very convenient to work with text (HTML parsing will be, of course, the key part). By my standards the resulting speed is quite acceptable (in any case, the potential gain in productivity by switching to another language looks to me less significant than the time and effort involved). The proxy server listens on its own port and, having received a request from the browser, sends the request to the destination server, reads the response and sends the contents of the page received to the browser. On its way, the page passes through a filter whose behavior depends on the destination address. If it is one of the forums we want, and not just anything, but one of the scripts viewtopic.php , viewforum.php , index.php (or just the root URL of the forum), then the filter starts parsing the page, substituting topic and post labels based on the date/time of last visit taken from the local database. Otherwise, the filter doesn’t work, it just sends unmodified content to the browser.
The most difficult and unpredictable part is the parsing. The problem is that each particular forum may have different themes and extensions installed, so that the page HTML code received from the server varies within a very wide range. To account for the peculiarities of different forums, I hardcoded only the key structural features of the engine phpBB2, and specific signal lines made in separate modules, which are loaded by a proxy server on startup and allow to process each forum in its own set of rules. Of course, if you need to add support for a new forum, all work on the allocation and configuration of alarm strings will have to do manually, by analyzing the received from the server HTML-page. If the changes to the engine are minor, that will be all. But it may also be that the engine has been heavily modified, and the HTML structure is completely different. Then the entire core module will have to be redesigned, and not the fact that in general it will be possible to maintain "multiforum". It may be easier to keep a separate proxy, customized specifically for one of the "sophisticated" forums.
For parsing to work correctly you will also have to make changes in your forum profile. The thing is that by default the engine generates the date/time of posts without specifying seconds, and since we can’t get the time from anywhere else, it turns out that the error in marking will be one minute, which is, of course, too much. Therefore, it is necessary to choose in your profile settings a fuller format of time, with seconds. I stopped at format "D d.m.Y, G:i:s", which will allow to output date as "Mon 14.02.2011, 10:57:44" (of course, the proxy server does not need day of the week, but you should not forget about comfort of your own). If you are more comfortable with other format, you will need to make appropriate fixes in the timestamp parsing function. Do not forget about the modifications of phpBB, which can insert the words "today" or "yesterday" instead of the date. Unfortunately, all of this will also require tweaking the code.
And then there are a few more features left to mention.

  • Perhaps the main problem is that it is extremely inconvenient to use such a proxy server on a regular basis. If you can get over the slowdowns of a single forum, then it will be very unpleasant to be bothered with slowness in the daily work. I have solved this problem for myself in the following way: I have long used Proxomitron as an advertising cutter and add all sorts of plushkas. Among other things, it has the ability to use different proxy servers depending on the requested address. I simply created rules to make forums I wanted to load through my proxy (and not the whole page, but only processed pages), and everything else would work directly, so I am not bothered by this drawback. Those who doesn’t use Proxomitron may look for some alternative variants or just set up quick manual proxy switching in my browser (if it’s possible in my browser).
  • I didn’t want to create full-fledged HTTP server or find some ready-made solution, so I limited myself to simple self-written, but pretty limited version (only HTTP/1.0 is supported with gzip-compression turned off). That’s good enough for me personally, but it may not be enough for some people.
  • There’s another problem with phpBB that’s been bugging me for a while now. If I open or refresh several forum pages at once, the engine throws me out of the forum (which, combined with the resetting of topic read statuses just freaked me out). I haven’t figured out the reasons for this behavior yet, but my proxy server at the same time gave me a way around that problem as well. It was originally multi-threaded, so that processing one page would not slow down processing other pages. I commented out some multithreading code, and got slower but consistent paging, so now I can open a bunch of links in background tabs without worrying about getting logged out (of course, now it’s not so fatal, but I don’t want to have to enter my login/password and refresh a bunch of pages already open). Disadvantages of this solution, of course, also is: if one of the pages for some reason hangs, the rest will not be loaded, and you have to either wait for the exit by timeout, or restart the proxy server. And when working with multiple forums to wait for one because of the load another wrong. A solution could be to start several independent threads, one for each forum, or to start several proxy instances on different ports. Just in case I didn’t remove the multithreading code, I just commented it out, so you can uncomment it if you want and use the normal multithreading version.
  • Proxy server can not keep track of the number of lines in the base, and add new ones as needed (I’ll be honest: I was too lazy to implement it), so it needs to create a file filled with null lines beforehand. The number of lines depends on the forum (or rather, the number of topics in it). To avoid problems, it is better to create a file with a good margin.
  • The proxy server also includes support for the phpBB mod "View single post" (script viewpost.php ). When viewing an individual post, the time visited in the database is not updated, because in this case it’s easy to miss unread posts that precede the one being viewed. If anyone thinks this is wrong, the code can easily be corrected by removing one if

And if anyone is still not scared off by all this hassle, the server itself can be downloaded from here (archive, 5 Kb). It contains the ruleset for the official Total Commander forum, which I took as a basis for my experiments and which is basically what this whole mess was made for.

You may also like