This article proposes a brief introduction about CMIS and Talend. Then it describes a simple use case of the tAdvancedCMISInput component.
What is CMIS ?
CMIS (Content Management Interoperability Service) is an open standard, aiming to provide web-based interoperability between various document/content management systems. The first release of this specification has been published by the OASIS committee in May, 2010.
The latest version (v1.1) has just been approved and published a few month ago (November, 2012).
Many major ECM providers (Microsoft, IBM, Oracle, Open Text, Alfresco, Nuxeo, …) are involved in these specifications project. This gives a certain legitimacy to the standard and has contributed to its dissemination and its implementation in open source ecosystem but also on existing commercial solutions.
Despite its youth, the current CMIS version already provides an interesting range of features :
- CMIS objects : Document, Folder, Relationship, Policy, Access Control, Version management
- Repository service : describes capabilities, type and properties in the target repository
- Navigation service : allows to navigate in the repository tree, through descendant/ascendant queries
- Object Service : CRUD of documents, folders, relationships and policies
- Multi-filing service : used to file/un-file objects into/from folders
- Discovery service : for querying cmis objects through CMIS Query Language (SQL like)
- Versioning service : used to navigate or update version series
- Relationship service : used to retrieve cmis object relationships
- Policy service : used to apply or remove a policy on cmis objects
- ACL service : used to apply or remove ACL on cmis objects
Since standardization is a quite long process, some restrictions exist in the current version of CMIS (v1.1). But many improvements are expected for the next major release.
Talend is an open source software vendor that provides data quality, data integration, master data management, enterprise application integration and big data solutions.
Talend offer includes the following major components:
- Data quality : data profiling tool, that analyses data conformance and generates graphical reports
- Data integration : graphical editing of data integration jobs and publishing to various runtime environment (batch, JEE, ESB and OSGI containers)
- Master Data Management (MDM) : master data governance and provisioning for enterprise level data
- Enterprise Service Bus (ESB) : a modular application integration framework, with graphical editing, powered by Apache CXF, Apache Camel and Apache ActiveMQ open source integration projects.
- Business Process Management (aka Bonita Open Solution) : workflow editing and deployment
CMIS and Talend
Talend already provides some capabilities to implement CMIS through its data integration suite:
- Talend BPM supports CMIS to access remote content management server. It provides a set of connectors for common CRUD operations while following a business process.
- Talend ESB supports WebServices and REST endpoint usage with SOA principle. Then it is compatible with CMIS
But so far there is no component for Talend Data Integration.
This kind of component can be useful in the following use cases:
- Data migration : mass import of metadata and content from an existing CMS or a folder tree, to a CMIS repository
- Point to point integration : between two CMS, between a CMS and a Datawarehouse, …
CMIS components for Talend Data Integration
- Version : 0.1-alpha
- CMIS implementation library : Apache Chemistry OpenCMIS
- Talend CMIS Plugin : for business data model discovery UI
- tAdvancedCMISInput : connects to a CMIS repository and extract metadata and content
- tAdvancedCMISOuput: connects to a CMIS repository and load metadata and content
Getting started with tAdvancedCMISInput
The following scenario consists in a very simple use case :
- Connect to the online Nuxeo CMIS demo server
- Extract metadata and content for all Pictures (a cmis:document sub-type available on the demo server)
- Display the metadata in the console
- Store the content to a temporary directory
- Java6 or higher versions
- Talend 5.0 or higher versions
- tAdvancedCMISInput installed in the palette (See installation documentation here).
- In the repository view, Right click on jobs and select Create a new job
- In the component Palette, select Business > CMIS > tAdvancedCMISInput
- Drag and drop tAdvancedCMISInput in your current job
- Connect a tLogRow in order to display the result in the console
- Your job should look like this
- Fill the connection properties with the connection parameters of the Nuxeo server
- Open the type and properties selector with the Object Type button
- Once you’ve selected the type and properties, the query and mapping table are updated with your selection
- Check Download Content as needed and choose a content path to store the documents on your local drive
- Create a new schema for row1 data flow
- Map the cmis properties with the column of the schema (in the mapping table)
- Run you job
You’re Done ! Your data and content have been extracted from Nuxeo ! The Talend console display the metadata, and the content is stored in your temporary directory.Google+