Home Java AlfrescoDocument Management System

AlfrescoDocument Management System

by admin

AlfrescoDocument Management System
A search of the Habrum did not find any detailed articles on Alfresco.In this article I will try to kill two birds with one stone: tell you what Alfresco system is and how we use it in our work.
How do you store documents in a small organization? The simplest is on a local disk. And if you need to work together, they are sent by mail or, the most popular option, on a network drive. Another great option is Google Docs, but I’m not sure that it is widely used in Russian practice.
I do not know what size the organization should reach in order to think about implementing an electronic document management system, but I think around this figure in the region of 50-100 employees working with documents.
When thinking about an electronic document management system, the first thing that comes to mind are expensive solutions from well-known vendors such as Microsoft, EMC, 1C, etc. But there is an alternative to closed solutions – an open source document management system Alfresco Or, in English, Open Source Enterprise Content Management System (ECM, CMS).
Alfresco’s competitors are closed software such as EMC Documentum, Open Text, Sharepoint. The developers of Alfresco themselves write about their competitors as a legacy from the ’90s, which :

  • costs too much
  • too difficult to use, deploy, scale
  • too difficult to modify to fit your needs
  • too "proprietary"
  • I will try to tell you about the system, and you decide whether the developers were right or not.

    What is Alfresco

    Alfresco was originally conceived as an alternative to Microsoft Sharepoint open source. But in the course of its development it went away from that, and provides a number of unique features not available to other similar systems. Suffice it to say that Alfresco stably works over the Sharepoint protocol via HTTPS.
    I think its openness is the main advantage of Alfresco: there is no "lock-in" for any vendor and the system is free. Another advantage of Alfresco I see is that it is built on modern Java technologies such as Spring, JSF, Hibernate, Lucene; new versions will use Spring Surf. And I know that big, serious businesses love Java-based systems.
    Users work with the system through a browser. It is also possible to work with files through Windows Explorer, as with the usual network folder (CIFS protocol) or via FTP. We work with English version, there is a Russian localization.
    AlfrescoDocument Management System
    Screenshot of Alfresco Document Management standard page
    Alfresco provides the ability to create, store, modify documents and much more. It is possible to create a document directly in the system, either blank or based on your company’s templates. The system allows you to search through the contents of documents, supports document versioning. All history of changes is stored, you can always see who added or deleted what.
    There is a document management system, the ability to change the scheme of work right on the fly. Good article on the subject : " Electronic document management or what not to do ".

    Is it suitable for your application? Extensibility

    Alfresco is fully ready to use, you can download the free Community Edition, install it and start using it today, it’s very easy. There is also a paid Enterprise Edition, the main difference is the availability of technical support.
    Alfresco is installed on both Windows and *nix compatible systems, required Java Runtime Environment The delivery includes built-in OpenOffice, for converting between different document types, extracting text data for indexing, and full-text search capabilities. Also included is Tomcat, which can be replaced with any suitable web container
    Alfresco maintains its own user database. However, it is possible to auto-create users on first login or synchronize with an external source: LDAP, Microsoft Active Directory, company domain, etc.
    Industry-accepted ECM standards are supported. For example, Alfresco’s storage system shifts seamlessly from its own implementation of the standard JSR-170 to data access via CMIS , removing the last restriction of using the storage supplied with Alfresco.
    The system works with documents in any format: Microsoft Office, Open Office, pdf, etc. If required format is not in the list of supported formats – you can add your own conversion module to one of the supported ones and a chain of conversions to all the required output formats will be built.
    The advantage of Alfresco as an open system is the full access to the source code, you can change any part of the system, if you have good specialists of course. License allows.
    The system allows you to extend its functionality with extension modules. The modules can contain anything: business logic, page styles, new pages, data model extensions, and new services. The extension modules can work with Alfresco via a number of protocols, the best supported protocol is REST The user interface is proposed to implement with Spring Surf, for the rest there are no restrictions, most often used Java, less often server-side JavaScript, Groovy, JRuby. The main thing is to have CMIS support.
    You can completely abandon the standard web interface and implement your own one. Then Alfresco will be used only as a storage.
    To integrate with other software, various types of authentication are supported, it is possible to chain them together. For example, the user can get into the system with Single sign-on If the user came in unauthorized, Alfresco will try to authorize him (it will ask for username and password or certificate, depending on how the system is configured).
    Alfresco has a very flexible data model and a lot of possibilities to extend it, but that’s a topic for a separate article. In short, it’s worth mentioning that the model supports multiple inheritance, it’s dynamic, so you can add an aspect to an object at any time and the object will have all the properties of that aspect.
    Access to data and functionality can be flexibly configured. The authorization system operates with such notions as: data object, permission, user, group, role. Roles are assigned to users and groups while the application is running, including the possibility of assigning roles cascading to an entire subtree of data.
    There are a large number of ready-made extensions to Alfresco.

    Number of users. Scalability

    Because Alfresco is open and free, you are not limited by the number of client licenses. Rather, you are limited by the performance of your servers and database and the scalability of the system.
    Based on our experience, an Intel Core 2.4 GHz server with 8Gb of memory is enough to serve up to a thousand registered active users. As the number of users increases, you need to analyze which parts of the system are the busiest. The system works reliably in a cluster, ensuring the integrity and relevance of data, but you need to configure it properly, more details will be written below.
    There are examples of Alfresco implementation in a large non-profit organization in Russia with a user base of 40, 000 or more. Examples of foreign implementations also include variants of using Alfresco with hundreds of thousands of active users. Or with a much smaller number of users but with a multi-terabyte storage.

    Our experience with Alfresco

    The system is used in a company which is Europe’s largest software producer. Estimated number of internal users : 30 thousand. Expected number of external users : more than 3 million.
    Alfresco was chosen as the only ECM system option on the market, with good enterprise support, Sharepoint protocol implementation, available implementation examples with 1000+ users. Microsoft Sharepoint didn’t have one, as far as I know, although it may not have fit the other criteria.
    The repository currently holds ~2000 documents of 5-10MB each.
    Major revisions done :

    • Changing the look of the system. Added hats, company logos where needed.
    • Alfresco modified to work with the application server, database and authentication system adopted as standard within the company.
    • Bind Alfresco to existing metadata in the company portal, such as country registries, customer categories, etc.
    • A module for creating so-called "projects" by templates, creating documents by templates.
    • Access differentiation system. According to Alfresco representatives, this is the only implementation with such a deep use of Alfresco’s access control system.
    • Publish documents that go through the document workflow steps to other resources in the company. Reverse import of documents into the system.
    • Significantly changed standard workflow in accordance with company standards.
    • Implemented the ability to customize workflow on the fly, via the user interface, including sending notifications to those responsible for the job at each step.
    • Pairing with a third-party library for converting and extracting data from documents.

    The system is already "in production". There are a number of problems encountered, some of which have not yet been solved.
    For example, when running on the developer’s local machine, the system runs quite briskly. However, when run on a client in a cluster of 5 application servers, the system sometimes starts to slow down unreasonably. We are still unable to solve the problem although we have even connected the developers of Alfresco to the problem.
    Unfortunately, the architecture of our system is built so that the indexes of the search system (Lucene) are stored on the network disk. This seriously contradicts the developers’ recommendations, and we often encounter the fact that the indexes collapse.
    Another problem with OpenOffice when converting and extracting data from documents. Even the latest version of OpenOffice in server mode can only convert one file at a time. Attempting to convert multiple files at once leads to unpredictable results. OpenOffice also has the unpleasant tendency to eat up a lot of memory over time and stop responding to requests. I can recommend several ways to :

    • use JODConverter to start and automatically restart multiple OpenOffice servers at once;
    • Using other libraries for data conversion and extraction (e.g. Aspose, but it is not free).

    The developers recommend using MySQL/InnoDB as a metadata store, but you can also use other databases for which there are Hibernate/iBatis dialects.
    There are also a number of recommendations that increase performance and reliability. Among the most important :

    • as already mentioned, do not use network drives to store Lucene indexes;
    • using a filesystem with modern anti-fragmentation features (EXT4).

    At the moment our project is still under active development. Despite some management and technical errors in the implementation of our project, I like the system Alfresco, it is pleasant to work with it, I believe in the prospects of open systems for business.

    Conclusion

    Alfresco is a good base for building a company’s document management system. I think in the near future Alfresco may become a replacement for many obsolete systems. Of course there are still some problems to solve, and Alfresco will hardly take over the whole world, but I don’t think it can take over a significant part of the market of corporate document storage and workflow.
    It is possible to use Alfresco in the cloud. For example, Amazon AWS already has ready instances with Alfresco preinstalled.
    Rumor has it that Oracle has its eye on buying Alfresco. How this threatens or shines for Alfresco is not yet known, time will tell.
    It would be very interesting to see your Alfresco implementation stories in the comments.

You may also like