challenges and solutions

To discuss on few challenges and their solutions:

Performance problem:
Scenario: Code was initially written in Pro – C. Due to business and technical enhancements, talend tool was used.
Challenge: Validations was done by storing value in variable, which completed occupied the memory heap space. This was done without knowledge of components.
Solution: code was enhanced with help of talend experts and it had a great deal in improving performance
Business situation: Faced this problem in real time. We didn’t face this problem in other testing environments, since input file, we processed was of small size. But in real time (first release), input file was around 100MB, creating havoc among production support team, and was solved in a less than week’s time.

Parallel Processing:
Scenario: slow performance of running jobs (modules) one after another.
Challenge: Even though, performance was improvised with first release, it was taking time for huge files.
Solution: we ran all sub jobs in parallel using a component called tparallelize, leading to a major performance improvement.
Business situation: with enhanced performance, customer was delighted. Now, client has started recommending talend tool for their new projects too.

Latest Trading transaction:
Scenario: We faced a situation in production, where trading was done every second. So in database, we need to delete the antiquated trade details and store the latest one for a particular customer.
Challenge: Multiple threads were running on the talend program. When one thread was deleting and inserting the record, before it committed its transaction, another thread deleted the same record, leading to conflicts. In talend, trading code was complicated and effective use of components was not made. Most code was written directly in java, not giving way to thread synchronization.
Solution: Talend code was modularized i.e. made into smaller modules. A table for locking was introduced in database. Only one thread was allowed to insert/update/delete in all trade tables. Failed transactions of other threads were reprocessed.
Business situation: Criticality of business was very high. Since trading details were not shown properly, customers got pestered. With a limited time, code was developed, tested, deployed.

Efficient logging of Error:
Scenario: Error was logged ineffectively
Challenge: it was difficult to identify the root cause since the same error/warning was logged multiple times.
Solution: we used a component called taggregaterow which groups data based on given data
Business situation: with enhanced logging, issues were resolved instantly and having little dependence on the development team.


My Recommendations

Few areas that still yet to be explored by me and have full potential are:

In Data Integration tool, the analysts can provide clear requirements of the customer by functional diagrams (similar to UMLs) by using Business Intelligence modules, which is better than verbal or written elaboration. Developers can use this, for a clear understanding and meeting the demands technically.

More efficient components which can add to performance and meet requirements easily like sending mail, RSS input and output, fetching mail from server and saving it locally, requesting a web service and reading its result, fetching a file through http, start and stop clock to determine the time differences, loading dynamic properties at run time, synchronizing and parallelizing sub jobs, pattern matching, adding CRC for all rows in a flow.

Currently, all our modules are created in java technology. Talend open source can also be used in Perl language, which is yet to be utilized for its full potential.

Cons of the Data Integration Tool

    Here are the cons mostly based on my usage

  • Code Generated in code mode is non editable. So flow can be altered only in design mode. Thereby degrading the flexibility for technology specialist developers.
  • Limit on the components usage in a single job. Maximum accumulated size on the component usage depends on the data integration software used.
  • Multitasking is not possible. i.e. talend software can’t be opened in different workspace simultaneously.
  • Access to source requires a JDBC driver.
  • There are no complementary profiling, data quality or metadata management products. These roles can be fulfilled by third part products but there are few open source options currently available.
  • The product can allocate resources across a grid but does not have the underlying automation of data partitioning and repartitioning and this could result in bottleneck jobs.

Pros of Data Integration Tool

    To my knowledge and usage of the talend integration suite tool, these are the pros

  • Configured easily for technologies like JAVA and PERL, and all databases like DB2, MySQL, Access, Netezza, Sybase, and lots more
  • An efficient ETL tool in converting the source data to any destinations. Source and Destination can be flat file, database, anything else too
  • Version control feature inbuilt
  • Jobs can be designed in design mode, using components from the palette. Easy traversal to code generated, viewable in code mode.
  • Reusability of jobs (graphical design, of one or more components connected together), components (preconfigured connector used to perform a specific data integration operation
  • Reliable exports of graphical design to runnable java archives with an auto generated batch and shell scripts.
  • Developer friendly tool, useful for setting the flow of data integration. Can be used by beginners to develop enhanced modules using components, with an auto generated code. The GUI ETL interface for import of metadata, configuration and linking of components and generation of code will provide productivity gains for developers and will be faster than standard programming.
  • Comparison utility to identify changes between versions of the same job.
  • Modules can be imported/exported to independent machines. Thereby, support multiuser enhancements on the same graphical design.
  • Common routines can be used for validation and conversion. It allows developer to write user defined routines, which can be shared with in the project.
  • Document on the job can be generated using the generate document as HTML option, which provides the flow, details and settings of all the component used in the job and the same can be used for reviewing the job instead of going through the job.
  • Modules i.e. graphical design can be locked i.e. non editable version for security purposes.
  • Supports modular programming with easy components to call sub jobs from parent job. Features like die on child error, run child job as independent jobs corroborate to its efficiency.

  • Context node groups files can be used for holding the contextual variables such as file paths or DB connection details
  • SQL Templates node groups all system SQL templates and gives the possibility to create user-defined SQL templates.
  • Routine, a piece of code which can be iterative in a technical Job hence is likely to be reused several times within the same project.
  • Business Models allow data integration project stakeholders to graphically represent their needs regardless of the technical implementation requirements. Business Models help the developers understand these expressed needs and translate them into technical processes (Jobs).
  • Metadata is definitional data that provides information about or documentation of other data managed within Talend Open Studio. Used to store DB connections, SAP connections, File Excels and many more at the default specified repository.
  • Graphical designs to be reused can be created as Joblets.
  • Exception handling can be monitored and logged to any data storage elements.
  • Statistics on performance can be logged in the separate files if needed.

Talend Data Integration Tool

I have worked with the Talend Integration Suite to meet the requirements of my client. Talend Integration Suite is a licensed software of Talend and was introduced as an enhanced version for Talend open studio with advanced features. Few features worth mentioning are professional-grade technical support and additional enterprise-class features.

    Talend Integration Suite is available in various editions that meet the needs of any organization:

  1. Talend Integration Suite Team Edition provides all collaborative features and time-based scheduling functions.
  2. Talend Integration Suite Professional Edition extends the Team Edition with advanced event-based scheduling and extra execution features, error recovery management and real-time capabilities.
  3. Talend Integration Suite RTx extends the Professional Edition with SOA features, as well as additional real-time capabilities to focus on intensive real-time operational integration needs.
  4. Talend Integration Suite Enterprise Edition includes additional capabilities such as high availability and grid management among other enterprise-grade functionality.
  5. Talend Integration Suite MPx extends the Enterprise Edition with massively parallel execution and large volume handling capabilities.

More information on Teamwork and Consolidation of Development, Industrialization, Managing Complex Deployments, Execution Monitoring, Massively Parallel, Real Time Integration, Technical support is available at this hyperlink

Talend Open Source, can be downloaded at this hyperlink

Information on usage of Talend open source and details on all the components (preconfigured connector used to perform a specific data integration operation) can be found at same site.


Data Management

  • Talend Open Studio: an open source solution for data integration to improve the efficiency of data integration job design through a graphical development environment.
  • Talend Integration Suite: a subscription service that extends Talend Open Studio with technical support and enterprise-class features.
  • Talend Open Profiler: an open source data profiling tool that examines the content, structure and quality of complex data structures.
  • Talend Data Quality: an open source data quality product that cleanses inaccurate and inconsistent data, identifies and resolves duplicate records and provides the capability to augment and enhance data.
  • Talend MDM Community Edition & Enterprise Edition: a set of functions for master data management that provides functionality for integration, quality, governance, mastering and collaborating on enterprise data.
  • Talend Integration Express: a hosted service that extends Talend Open Studio with technical support and additional features for teamwork and productivity.

  • Application Integration

  • Talend ASF: delivers enterprise service management capability on an Apache open source foundation.
  • Talend ESB Standard Edition & Enterprise Edition: an enterprise service bus and a standards-based connectivity layer used to integrate distributed systems across functional, enterprise and geographic boundaries. It is powered by the Apache CXF, Apache Camel and Apache ActiveMQ open source integration projects.
  • Talend ESB Studio: an Eclipse-based tooling environment for modeling, configuring and deploying integration solutions using Talend ESB.
  • Talend Service Factory: based on the Apache projects for web services and OSGi, Talend Service Factory provides a single, pre-configured package to enable developers to build and deploy SOAP and REST Web Services on Tomcat, Jetty, Equinox, Felix or other JEE Web Servers.
  • Talend Integration Factor: a fully supported, production ready distribution based on Apache Camel.
  • Talend Integration Factory: a single installation package certified for a wide variety of Java production environments.

  • Cloud Based Integration

  • Talend Cloud: a unified platform for data integration, data quality, master data management and enterprise service bus that enables integration in cloud deployments and hybrid environments
  • Introduction

    Talend is an open source software vendor that provides data integration, data management and enterprise application integration software and solutions. Headquartered in Suresnes, France and Los Altos, California, Talend has offices in North America, Europe and Asia, and a global network of technical and services partners.

    Credited as the first commercial open source vendor of data integration software. Talend first product, Talend Open Studio, an open source was a hit in software industry with maximum downloads.

    Few others products worth mentioning are Talend Integration Suite, Talend Open Profiler, Talend Data Quality, Talend MDM CE & EE, Talend Integration Express, Talend ASF, Talend ESB SE & EE, Talend ESB Studio, Talend Service Factory, Talend Integration Factory, Talend Cloud.

    More information can be obtained at talend official site at