CMIS and Talend in action

| 4 minutes read

In the context of internal work, ekito developed CMIS components prototype for Talend Open Studio for Data Integration.

This article proposes a brief introduction about CMIS and Talend. Then it describes a simple use case of the tAdvancedCMISInput component.

What is CMIS ?

CMIS (Content Management Interoperability Service)  is an open standard, aiming to provide web-based interoperability between various document/content management systems. The first release of this specification has been published by the OASIS committee in May, 2010.

The latest version (v1.1) has just been approved and published a few month ago (November, 2012).

Many major ECM providers (Microsoft, IBM, Oracle, Open Text, Alfresco, Nuxeo, …) are involved in these specifications project. This gives a certain legitimacy to the standard and has contributed to its dissemination and its implementation in open source ecosystem but also on existing commercial solutions.

Despite its youth, the current CMIS version already provides an interesting range of features :

  • CMIS objects :  Document, Folder, Relationship, Policy, Access Control, Version management
  • Repository service : describes capabilities, type and properties in the target repository
  • Navigation service : allows to navigate in the repository tree, through descendant/ascendant queries
  • Object Service : CRUD of documents, folders, relationships and policies
  • Multi-filing service : used to file/un-file objects into/from folders
  • Discovery service : for querying cmis objects through CMIS Query Language (SQL like)
  • Versioning service : used to navigate or update version series
  • Relationship service : used to retrieve cmis object  relationships
  • Policy service : used to apply or remove a policy on cmis objects
  • ACL service : used to apply or remove ACL on cmis objects

Since standardization is a quite long process, some restrictions exist in the current version of CMIS (v1.1). But many improvements are expected for the next major release.

More detailed information can be found on wikipedia and OASIS committee page.

About Talend

Talend LogoTalend is an open source software vendor that provides data quality, data integration, master data management, enterprise application integration and big data solutions.

 

Talend Unified Platform

Talend offer includes the following major components:

  • Data quality : data profiling tool, that analyses data conformance and generates graphical reports
  • Data integration : graphical editing of data integration jobs and publishing to various runtime environment (batch, JEE, ESB and OSGI containers)
  • Master Data Management (MDM) : master data governance and provisioning for enterprise level data
  • Enterprise Service Bus (ESB) : a modular application integration framework, with graphical editing, powered by Apache CXF, Apache Camel and Apache ActiveMQ open source integration projects.
  • Business Process Management (aka Bonita Open Solution) : workflow editing and deployment

CMIS and Talend

Talend already provides some capabilities to implement CMIS through its data integration suite:

  • Talend BPM supports CMIS to access remote content management server. It provides a set of connectors for common CRUD operations while following a business process.
  • Talend ESB supports WebServices and REST endpoint usage with SOA principle. Then it is compatible with CMIS

But so far there is no component for Talend Data Integration.

This kind of component can be useful in the following use cases:

  • Data migration : mass import of metadata and content from an existing CMS or a folder tree, to a CMIS repository
  • Point to point integration : between two CMS, between a CMS and a Datawarehouse, …

CMIS components for Talend Data Integration

ID card

  • Version : 0.1-alpha
  • CMIS implementation library : Apache Chemistry OpenCMIS
  • Talend CMIS Plugin : for business data model discovery UI

CMIS type selector

  • tAdvancedCMISInput : connects to a CMIS repository and extract metadata and content
  • tAdvancedCMISOuput: connects to a CMIS repository and load metadata and content

Getting started with tAdvancedCMISInput

Scenario

The following scenario consists in a very simple use case :

  1. Connect to the online Nuxeo CMIS demo server
  2. Extract metadata and content for all Pictures (a cmis:document sub-type available on the demo server)
  3. Display the metadata in the console
  4. Store the content to a temporary directory

Prerequisite

  1. Java6 or higher versions
  2. Talend 5.0 or higher versions
  3. tAdvancedCMISInput installed in the palette (See installation documentation here).

Process

  • In the repository view, Right click on jobs and select Create a new job
  • In the component Palette, select Business > CMIS > tAdvancedCMISInput
  • Drag and drop tAdvancedCMISInput in your current job
  • Connect a tLogRow in order to display the result in the console
  • Your job should look like this
tAdvancedCMISInput job example

tAdvancedCMISInput job example

  • Fill the connection properties with the connection parameters of the Nuxeo server
tAdvancedCMISInput connection parameters

Connection parameters

  • Open the type and properties selector with the Object Type button
CMIS type selector

CMIS type selector

  • Once you’ve selected the type and properties, the query and mapping table are updated with your selection
Mapping parameters

Mapping parameters

  • Check Download Content as needed and choose a content path to store the documents on your local drive
  • Create a new schema for row1 data flow
Schema of the data flow

Schema of the data flow

  • Map the cmis properties with the column of the schema (in the mapping table)
  • Run you job
Talend job execution

Talend job execution

You’re Done ! Your data and content have been extracted from Nuxeo ! The Talend console display the metadata, and the content is stored in your temporary directory.

These components have been published on Github (Source Code) and TalendForge (Binaries).

Julien Boulay Author: Julien Boulay

Eclectic developer & architect
Activist for usability, performance and interoperability of systems.

My favorite quote : "The best feature is the one we don't need to develop !"

My hashtags : #windchill #java #talend #nodejs #angularjs #oss #docker

Like it?  Share  it!

Share Button