Informatica

Hej och välkommen på Informatica frukostseminarium!

Vinsterna är många med en sk ”Lean Application Portofolio” eller som vi skulle säga på svenska en smidig applikationsporfölj. Har du en strategi för hur ditt företag kan tillgodogöra sig vinsterna och har du en IT-strategi som behandlar utfasning av applikationer löpande?

Välkommen på ett Informatica seminarium den 11 december på Grand Hotel i Stockholm kl 08:00-10:30. Vi börjar med frukost och registrering kl 08;00.

Under seminariet kommer vi att diskutera det ständigt föränderliga landskapet av applikationer och hur bördan av äldre system blir allt tyngre och undergräver ditt företags IT-miljö.

Vi kommer att diskutera

  • Vad begreppet ”Lean Application Portifolio” innebär
  • Vad händer om du inte gör något åt den växande och spretiga applikationsporföljen
  • Hur bygger man ett business case och hur ser projektets förberedelserna ut
  • Hur kan du implementera en IT strategi kring hur du stänger ner applikationer
  • Informaticas lösning för att effektivt uppnå en ”Lean Application Portofolio”
  • Kundexempel  och projekterfarenheter
Date: 11 december 2014
Tid: kl 08:00-10:30
Location: Grand Hôtel, Stockholm
Södra Blasieholmshamnen 8
Stockholm, 111 48
MS SQL Transformation – Informatica: ETL Informatica PowerCenter 9.5
Transformation type: Active/PassiveConnected

The SQL transformation processes SQL queries midstream in a pipeline. You can insert, delete, update, and retrieve rows from a database. You can pass the database connection information to the SQL transformation as input data at run time. The transformation processes external SQL scripts or SQL queries that you create in an SQL editor. The SQL transformation processes the query and returns rows and database errors.

You might need to create database tables before adding new transactions. You can create an SQL transformation to create the tables in a workflow. The SQL transformation returns database errors in an output port. You can configure another workflow to run if the SQL transformation returns no errors.

When you create an SQL transformation, you configure the following options:

Mode.The SQL transformation runs in one of the following modes:
Script mode. The SQL transformation runs ANSI SQL scripts that are externally located. You pass a script name to the transformation with each input row. The SQL transformation outputs one row for each input row.
Query mode. The SQL transformation executes a query that you define in a query editor. You can pass strings or parameters to the query to define dynamic queries or change the selection parameters. You can output multiple rows when the query has a SELECT statement.

 Passive or active transformation. The SQL transformation is an active transformation by default. You can configure it as a passive transformation when you create the transformation. Database type. The type of database the SQL transformation connects to. Connection type. Pass database connection information to the SQL transformation or use a connection object.

 Script Mode

An SQL transformation running in script mode runs SQL scripts from text files. You pass each script file name from the source to the SQL transformation ScriptName port. The script file name contains the complete path to the script file.

When you configure the transformation to run in script mode, you create a passive transformation. The transformation returns one row for each input row. The output row contains results of the query and any database error.

When the SQL transformation runs in script mode, the query statement and query data do not change. When you need to run different queries in script mode, you pass the scripts in the source data. Use script mode to run data definition queries such as creating or dropping tables.

When you configure an SQL transformation to run in script mode, the Designer adds the ScriptName input port to the transformation. When you create a mapping, you connect the ScriptName port to a port that contains the name of a script to execute for each row. You can execute a different SQL script for each input row. The Designer creates default ports that return information about query results.

An SQL transformation configured for script mode has the following default ports:

Port Type Description
ScriptName Input Receives the name of the script to execute for the current row.
ScriptResult Output Returns PASSED if the script execution succeeds for the row. Otherwise contains FAILED.
ScriptError Output Returns errors that occur when a script fails for a row.

You need to create order and inventory tables before adding new data to the tables. You can create an SQL script to create the tables and configure an SQL transformation to run the script.

You create a file called create_order_inventory.txt that contains the SQL statements to create the tables.

The following mapping shows how to pass the script name to the SQL transformation:

The Integration Service reads a row from the source. The source row contains the SQL script file name and path: C:\81\server\shared\SrcFiles\create_order_inventory.txt (In your case is D: or E: Drive)

The transformation receives the file name in the ScriptName port. The Integration Service locates the script file and parses the script. It creates an SQL procedure and sends it to the database to process. The database validates the SQL and executes the query.

The SQL transformation returns the ScriptResults and ScriptError. If the script executes successfully, the ScriptResult output port returns PASSED. Otherwise, the ScriptResult port returns FAILED. When the ScriptResult is FAILED, the SQL transformation returns error messages in the ScriptError port. The SQL transformation returns one row for each input row it receives.

Rules and Guidelines for Script Mode

Use the following rules and guidelines for an SQL transformation that runs in script mode:

You can use a static or dynamic database connection with script mode.
To include multiple query statements in a script, you can separate them with a semicolon.
You can use mapping variables or parameters in the script file name.
The script code page defaults to the locale of the operating system. You can change the locale of the script.
The script file must be accessible by the Integration Service. The Integration Service must have read permissions on the directory that contains the script. If the Integration Service uses operating system profiles, the operating system user of the operating system profile must have read permissions on the directory that contains the script.
The Integration Service ignores the output of any SELECT statement you include in the SQL script. The SQL transformation in script mode does not output more than one row of data for each input row.
You cannot use scripting languages such as Oracle PL/SQL or Microsoft/Sybase T-SQL in the script.
You cannot use nested scripts where the SQL script calls another SQL script.
A script cannot accept run-time arguments

Query Mode

When an SQL transformation runs in query mode, it executes an SQL query that you define in the transformation. You pass strings or parameters to the query from the transformation input ports to change the query statement or the query data.

When you configure the SQL transformation to run in query mode, you create an active transformation. The transformation can return multiple rows for each input row.

Create queries in the SQL transformation SQL Editor. To create a query, type the query statement in the SQL Editor main window. The SQL Editor provides a list of the transformation ports that you can reference in the query. You can double-click a port name to add it as a query parameter.

When you create a query, the SQL Editor validates the port names in the query. It also verifies that the ports you use for string substitution are string datatypes. The SQL Editor does not validate the syntax of the SQL query.

You can create the following types of SQL queries in the SQL transformation:

Static SQL query. The query statement does not change, but you can use query parameters to change the data. The Integration Service prepares the query once and runs the query for all input rows.
Dynamic SQL query. You can change the query statements and the data. The Integration Service prepares a query for each input row.

When you create a static query, the Integration Service prepares the SQL procedure once and executes it for each row. When you create a dynamic query, the Integration Service prepares the SQL for each input row. You can optimize performance by creating static queries.

Selecting Multiple Database Rows

When the SQL query contains a SELECT statement, the transformation returns one row for each database row it retrieves. You must configure an output port for each column in the SELECT statement. The output ports must be in the same order as the columns in the SELECT statement.

When you configure output ports for database columns, you need to configure the datatype of each database column you select. Select a native datatype from the list. When you select the native datatype, the Designer configures the transformation datatype for you.

The native datatype in the transformation must match the database column datatype. The Integration Service matches the column datatype in the database with the native database type in the transformation at run time. If the datatypes do not match, the Integration Service generates a row error.

Note: Although the Teradata database allows Bigint columns, the Transformation Developer does not include Bigint datatype as one of the native datatypes you can use in the SQL transformation.

The following figure shows the ports in the transformation configured to run in query mode:

The input ports receive the data in the WHERE clause. The output ports return the columns from the SELECT statement. The SQL query selects name and address from the employees table. The SQL transformation writes a row to the target for each database row it retrieves.

Using Dynamic SQL Queries:

A dynamic SQL query can execute different query statements for each input row. When you create a dynamic SQL query, you use string substitution to define string parameters in the query and link them to input ports in the transformation.

To change a query statement, configure a string variable in the query for the portion of the query you want to change. To configure the string variable, identify an input port by name in the query and enclose the name with the tilde (~). The query changes based on the value of the data in the port. The transformation input port that contains the query parameter must be a string datatype. You can use string substitution to change the query statement and the query data.

When you create a dynamic SQL query, the Integration Service prepares a query for each input row. You can pass the full query or pass part of the query in an input port:

Full query. You can substitute the entire SQL query with query statements from source data.
Partial query. You can substitute a portion of the query statement, such as the table name.

Passing the Full Query

You can pass the full SQL query through an input port in the transformation. To pass the full query, create a query in the SQL Editor that consists of one string variable to represent the full query:

Query_Port

The transformation receives the query in the Query_Port input port.

The following figure shows ports in the SQL transformation:

The Integration Service replaces the ~Query_Port~ variable in the dynamic query with the SQL statements from the source. It prepares the query and sends it to the database to process. The database executes the query. The SQL transformation returns database errors to the SQLError port.

The following mapping shows how to pass the query to the SQL transformation:

When you pass the full query, you can pass more than one query statement for each input row.

You can pass any type of query in the source data. When you configure SELECT statements in the query, you must configure output ports for the database columns you retrieve from the database. When you mix SELECT statements and other types of queries, the output ports that represent database columns contain null values when no database columns are retrieved.

Connecting to Databases

You can use a static database connection or you can pass database connection information to the SQL transformation at run time.

You can use a connection environment SQL statement or transactional SQL statement with the SQL transformation. Configure the SQL statements in a relational connection object. The Integration Service runs the connection environment SQL when it connects to the database. It runs the transactional SQL statement before the initiation of each transaction.

Use one of the following types of connections to connect the SQL transformation to a database:

Static connection. Configure the connection object in the session. You must first create the connection object in Workflow Manager.
Logical connection. Pass a connection name to the SQL transformation as input data at run time. You must first create the connection object in Workflow Manager.
Full database connection. Pass the connect string, user name, password, and other connection information to the SQL transformation input ports at run time.
Note: If a session has multiple partitions, the SQL transformation creates a separate database connection for each partition. Using a Static Database Connection

You can configure the SQL transformation to connect to a database with a static connection. A static database connection is a database connection defined in the Workflow Manager.

To use a static connection, choose a relational connection object when you configure the session. To avoid datatype conversion errors, use a relational connection for the same database type that is configured in the transformation.

Passing a Logical Database Connection

You can configure the SQL transformation to connect to a database with a logical database connection. A logical database connection is a connection object name that you pass to the transformation at run time. Define the relational connection object in the Workflow Manager. When you configure the transformation to use a logical database connection, the Designer creates the LogicalConnectionObject input port.

You can pass a logical connection for each input row. Configure the mapping to pass the connection object name to the LogicalConnectionObject port. To avoid datatype conversion errors, use a relational connection for the same database type that is configured in the transformation.

Passing Full Connection Information

You can pass all the database connection information to an SQL transformation as input port data. When you configure the SQL transformation to connect to a database with a full connection, the Designer creates input ports for connection components. The database type defaults to the database type you configured for the transformation.

The following table describes the ports that the Designer creates when you configure an SQL transformation to connect to a database with a full connection:

Port Required/Optional Description
ConnectString Required Contains the database name and database server name.
DBUser Required Name of the user with permissions to read and write from the database.
DBPasswd Required DBUser password.
CodePage Optional Code page the Integration Service uses to read from or write to the database. Use the ISO code page name, such as ISO-8859-6. The code page name is not case sensitive.
AdvancedOptions Optional Connection attributes. Pass the attributes as name-value pairs. Delimit each attribute from another with a semicolon. Attribute names are not case sensitive.

Passing the Connect String

The native connect string contains the database name and database server name. The connect string allows PowerCenter and the database client to direct calls to the correct database.

The following table lists the native connect string syntax for each database:

Database Connect String Syntax Example
IBM DB2 dbname mydatabase
Informix dbname@servername mydatabase@informix
Microsoft SQL Server servername@dbname sqlserver@mydatabase
Oracle dbname.world (same as TNSNAMES entry) oracle.world
Sybase ASE servername@dbname sambrown@mydatabase
Teradata ODBC_data_source_nameorODBC_data_source_name@db_name orODBC_data_source_name@db_user_name TeradataODBCTeradataODBC@mydatabaseTeradataODBC@sambrown

Rules and Guidelines for Database Connections

Use the following rules and guidelines when configuring database connections for the SQL transformation:

You need the PowerCenter license key to connect different database types. A session fails if PowerCenter is not licensed to connect to the database.
To improve performance, use a static database connection. When you configure a dynamic connection, the Integration Service establishes a new connection for each input row.
When you have a limited number of connections to use in a session, you can configure multiple SQL transformations. Configure each SQL transformation to use a different static connection. Use a Router transformation to route rows to a SQL transformation based on connectivity information in the row.
When you configure the SQL transformation to use full connection data, the database password is plain text. You can pass logical connections when you have a limited number of connections you need to use in a session. A logical connection provides the same functionality as the full connection, and the database password is secure.
When you pass logical database connections to the SQL transformation, the Integration Service accesses the repository to retrieve the connection information for each input row. When you have many rows to process, passing logical database connections might have a performance impact.

Let me know if you have any question and your comments are always welcome

Skip to top

 

Name

Extract Transform and Load (ETL) Design

Description

This document will address specific design elements that must be resolved before the ETL process can begin. These include determining:

• Whether it is better to use an ETL suite of tools or hand-code the ETL process with available resources.

• If batch processing will provide the data in a timely manner.

• How much of the ETL process will be automated with schedulers, alert notifications and work flow procedures.

Rationale

Certain design elements are a fundamental and necessary first decision in the development of an ETL system. These choices affect everything and a change in these elements can mean implementing the entire system over again from the very start. The key to applying these design elements is to apply them consistently.

Benefits

By addressing these design elements, we ensure that the ETL system can do the following:

• Deliver data most effectively to end user tools

• Add value to data in the cleaning and conforming steps

• Protect and document the lineage of data

Proven Technology

Depending on the scope of the ETL process, a decision must be made as to whether an ETL software suite will be used or the process will be ‘hand-coded’ using available resources. If the scope of the project is large, purchasing an ETL software suite will ultimately reduce the cost of building and maintaining the ETL process. Although the ETL process can be created as a ‘hand-coded’ process, there are advantages to using purchased ETL tools.

• Simpler, faster, less expensive development

• Simpler connectivity to a wide variety of complex sources such as SAP applications and/or mainframes.

• Parallel pipe-lined multithreaded operation

• ETL tools can be used effectively by less skilled staff

Many ETL tools have integrated metadata repositories that can synchronize metadata from source systems, target databases and other business intelligence tools. • Most ETL tools automatically generate metadata at every step in the process and enforce a consistent metadata-driven methodology.• Most ETL tools have a comprehensive built-in scheduler aiding in documentation, ease of creation, and management change.• The metadata repository of most ETL tools can automatically produce data lineage (looking backward) and data dependency analysis (looking forward).• ETL tools should be able to handle all forms of complex data type conversions.• Most ETL tools deliver good performance for very large data stores. If the ETL data volume is or is expected to become large, an ETL suite of tools is recommended.• Most ETL tools will perform an automatic change-impact analysis for downstream processes and applications that are affected by a proposed schema change.• An ETL-tool approach can be augmented with selected processing modules hand-coded in an underlying programming language.• A proven ETL tool suite can help you avoid reinventing the wheel. These tools are designed for what you are trying to do- provide usable data to the end user.

For smaller ETL projects, hand-coded projects may be quicker, cheaper and more flexible. Some of the advantages of hand-coding an ETL system are:

• A purchased tool-based approach will limit you to the tool vendor’s abilities and their unique scripting language. However, all ETL tools allow escapes to standard programming languages in isolated modules.

• Hand-coded ETL provides unlimited flexibility, if that is indeed what you need.

• Automated unit testing tools are available in a hand-coded system but not with a tool-based approach.

• You can more directly manage metadata in hand-coded systems, although at the time you must create all your own metadata interfaces.

• A brief requirements analysis of an ETL system quickly points you toward file-based processing, not database-stored procedures. File-based processes are more direct. They’re simply coded, easily tested and well understood.

Batch vs. Streaming Data Flow

The standard design for an ETL system is based on periodic batch extracts from the source data, which then flows through the system, resulting in a batch update to the data exported from the ETL system. However, when the real-time nature of the data exported becomes sufficiently urgent, it may be necessary to implement a streaming data flow in which the data at the record level continuously flows from the extraction process to the data exported from the system.

Scheduler Automation

It must be determined how deeply to control the overall ETL system with automated scheduler technology. At one extreme, all jobs are manually controlled and executed. At the other extreme, a master scheduler tool manages all the ETL jobs, statuses, alerts and flow processes.

Exception Handling

Exception handling should not be a random series of alerts or comments placed in files but rather should be a system-wide, uniform mechanism for reporting all instances of exceptions created by the ETL processes into a single database, with the name of the process, the time of the exception, its initially diagnosed severity, the action subsequently taken and the ultimate resolution status of the exception.

Quality Handling

All quality problems need to generate an audit record attached to the final dimension or fact data. Corrupted or suspected data needs to be handled with a small number of uniform responses, such as filling in missing text data with a question mark or supplying least biased estimators of numeric values that exist.

Recovery & Restart

You need to build your ETL system around the ability to recover from abnormal ending of a job and restart. ETL jobs need to be reentrant, otherwise impervious to incorrect multiple updating.

Metadata

In ETL, a metadata repository is where all the metadata information about source, target, transformations, mapping, workflows, sessions etc, are stored. From this repository, metadata can be manipulated, queried and retrieved with the help of wizards provided by metadata capturing tools. During the ETL process, when we are mapping source and target systems, we are actually mapping their metadata. A useful metadata fact stored in a repository can be a handy resource to know about the organization’s data systems. Assume that each department in an organization may have different business definitions, data types, attribute names for the same attribute or they may have a single business definition for many attributes. These anomalies can be overcome by properly maintaining metadata for these attributes in the centralized repository. Thus, metadata plays a vital role in explaining about how, why, where data can be found, retrieved, stored and used efficiently in an information management system.

Security

Physical and administrative safeguards need to surround every on-line table and backup tape in the ETL environment. Archived data sets should be stored with checksums to verify that they have not been altered in any way.

Skip to top

Best Practices when building Maps

When you build maps try to use the following steps.  Sometimes this can help clarify “how” to architect the database.  Before beginning ask the following questions:
1. How much data is there passing through this map?
2. What is the source, target, aggregator, and lookup sizes (in terms of Rows and Megabytes of space)?
3. What is the time frame this map is expected to run in?
4. Are we loading data over the network?
5. Are we waiting on a flat file?  More than one?
6. Is there a way to balance the size of the source / target and maybe eliminate the lookups by utilizing Database Routines?
7. Is the speed such a problem that we must eliminate aggregators?
8. Can we tune the SQL on the source qualifier?
9. Do we need to generate new sequence ID numbers for each row?

Guidelines Utilized to Divide Maps:
The designer tool is highly capable of depicting, graphically, complex mappings.  It is also fairly easy to develop these complex mappings.  However, it can quickly become difficult to maintain them.  In an effort to offer the best of the designer, we suggest breaking up the complex mappings in to more manageable pieces.

The maps should only be broken up as a last resort when attempting to make more operations parallel, or in an attempt to speed up the whole process (end-to-end).  Start with development using the Designer. As pieces get too difficult, or problems present themselves (due to speed, mobility, or cross-functionality), take a look at the architecture of the maps to see if they wouldn’t benefit from being sub-divided.
* Does the map have multiple targets?
* If so, could each target be loaded in parallel?
* What’s the affect of breaking up the logic?
* Can the logic be placed in to maplets?
* Are the targets setup with Constraints?
* Does a parent need to be loaded first?
• ONE MAP PER TARGET TABLE.  This rule forces you to think about where and how the logic will be applied. However, it allows us to control the load order by constraints, as well as which maps and sessions can be run concurrently. It also forces you to think about how to apply common logic in the most efficient manner.  Another benefit is that all of the data that flows in to that target goes through the same set of business rules designed for the target fields.

• On some occasions (due to lack of speed) we need to take this further, one map per target per action.  Where the action is Insert, Update, or Delete.

• In breaking the maps out like this, it keeps maps simple and maintainable.  Each map is therefore responsible for a single set of business rules, as well as a single task.

 

Skip to top

Standards for Cleaning up ETL Log and Bad Files

Directing Session and Workflow logs to the Correct Directory
Each person creating workflows in the PowerCenter development environment needs to check to be certain that output files resulting from the run of the workflow and session(s) (i.e., .log, .bad, .ind files) are being directed to the correct directories on the host machine. The destination needs to be hard-coded in the properties of the workflow and session using the ‘Properties’ tab for each object.
For example, the value for $PMSessionLogFile for the Development Repository is set to /dev/null.  The result will be that if any team has not entered a specific destination for session log information that the session will fail due to an invalid directory being used.

Cleaning up Files in Log Directory
Below is a sample script that was written to clean out the /load/pmserver/tmp directory of any files.  You can use this script as part of a batch job you set up in UC4 to keep your team’s log directory (or any other directory your team uses) free of files of a particular age.  Just be certain to modify it to meet your circumstances.  For example, if I were on the UD team and wanted to purge files from the /usr/local/autopca/ud/log/prospect directory that were older than 7 days, the command would be:

/usr/bin/find /usr/local/autopca/ud/log/prospect -mtime +7 -exec rm {} \;

Skip to top

Informatica Forum here in Stockholm 2014

Informatica Forum

Det digitala värdet, Strand Hotel, Stockholm
21 maj, 08:30-13:00

Information om Informatica Forum den 21 maj på Radisson Blu Strand Hotel i Stockholm, vi ses där.

Data som hanteras som en tillgång i verksamheten blir alltmer den faktor som definiera potential och framgång. De verksamheter som lyckas utveckla datatillgången, kvalitetssäkra, efterleva dataregler och integrera data i alla moment – från de dagliga processerna till strategiska beslut och analyser – styr om man tillhör dagens eller morgondagens vinnare. Vissa verksamheter har t o m utvecklat ”data trading” till en ny strategisk affär.

På Informatica Forum får du bland annat veta:

  • Hur du maximerarar den ”datadrivna” potentialen bl.a. genom nya samarbetsformer mellan IT och övriga verksamheten.
  • Hur den informationsarkitektur ser ut som kan hantera dagens och morgondagens mångfacetterade krav på Data Management
  • Hur nylanserade Informatica 9.6 och Vibe möjliggör framgång i Big Data projekt
  • Exempel på Informatica 9.6 i praktiken

Register Now

Date: 21 maj 2014
Tid: 08:30-13:00
Location: Strand Hotel
Nybrokajen 9
103 27 Stockholm

I dagens snabbrörliga, ständigt uppkopplade och globaliserade värld är det allt viktigare att rätt person har tillgång till rätt information vid rätt tidpunkt.  För att kunna maximerara informationspotentialen i alla tillgängliga data så krävs det att verksamheten och IT sätter upp gemensamma mål, som resulterar i både strategi och konkreta initiativ.

I den exponentiella dataexplosionen är behovet av verktyg som hjälper dig att hitta och analysera rätt data, snabbt och enkelt, avgörande.  Lägg till ut-maningen att följa lagar och regelverk som snabbt förändras – allt inom ramen för begränsade budgetar.

Informatica Vibe — är världens första ”Intelligent Data Platform”.  Informatica 9.6, som är kärnan i Vibe, ger dig möjlighet att mappa data en gång för implementering var som helst, vilket även framtidssäkrar din lösning.  Med Vibe kan du automatiskt behandla och kvalitetssäkra data samt att dra full nytta av senaste teknologier såsom Hadoop utan att behöva kodningskompetenser inom respektive område.  Några platser kvar Informatica Forum den 21 maj på Radisson Blu Strand Hotel i Stockholm.

 

Mehboob
Microsoft Certified Solutions Associate (MCSA)

 

 

Kimball’s DW/BI Architecture:

DW/BI systems and dimensional modeling basics by investigating the components of a DW/BI environment based on the Kimball architecture. There are four separate and distinct components to
consider in the DW/BI environment: operational source systems, ETL system, data presentation area, and business intelligence applications.

Operational Source Systems:
These are the operational systems of record that capture the business’s transactions. Think of the source systems as outside the data warehouse because presumably you have little or no control over the content and format of the data in these operational systems. The main priorities of the source systems are processing performance and availability. It is safe to assume that source systems are not
queried in the broad and unexpected ways that DW/BI systems typically are queried. Source systems maintain little historical data; a good data warehouse can relieve the source systems of much of the responsibility for representing the past. In many cases, the source systems are special purpose applications without any commitment to sharing common data such as product, customer, geography, or calendar with other operational systems in the organization. Of course, a broadly adopted cross-application enterprise resource planning (ERP) system or operational master data management system could help address these shortcomings.

bi-1

Extract, Transformation, and Load System
The extract, transformation, and load (ETL) system of the DW/BI environment consists of a work area, instantiated data structures, and a set of processes. The ETL system is everything between the operational source systems and the DW/BI presentation area. Extraction is the fi rst step in the process of getting data into the data warehouse environment. Extracting means reading and understanding the source data and copying the data needed into the ETL system for further manipulation. At this
point, the data belongs to the data warehouse. After the data is extracted to the ETL system, there are numerous potential transformations, such as cleansing the data (correcting misspellings, resolving domain confl icts, dealing with missing elements, or parsing into standard formats), combining data from multiple sources, and de-duplicating data. The ETL system adds value to the data with these cleansing and conforming tasks by changing the data and enhancing it. In addition, these activities can be architected to create diagnostic metadata, eventually leading to business process reengineering to improve data quality in the source systems over time.

The last step of the ETL process is the physical structuring and loading of data into the presentation area’s target dimensional models. Because the primary mission of the ETL system is to hand off the dimension and fact tables in the delivery step, these subsystems are critical. Many of these defi ned subsystems focus on dimension table processing, such as surrogate key assignments, code lookups to provide appropriate descriptions, splitting, or combining columns to present the appropriate data values, or joining underlying third normal form table structures into fl attened denormalized dimensions.

After validating the data for conformance with the defi ned one-to-one and many-to-one business rules, it may be pointless to take the fi nal step of building a 3NF physical database, just before transforming the data once again into denormalized structures for the BI presentation area. There are cases in which the data arrives at the doorstep of the ETL system in a 3NF relational format. In these situations, the ETL system developers may be more comfortable performing the cleansing and transformation tasks using normalized structures. Although a normalized database for ETL processing is acceptable, we have some reservations about this approach. The creation of both normalized structures for the ETL and dimensional structures for presentation means that the data is potentially extracted, transformed, and loaded twice—once into the normalized database and then again when you load the dimensional model.

This two-step process requires more time and investment for the development, more time for the periodic loading or updating of data, and more capacity to store the multiple copies of the data. At the bottom line, this typically translates into the need for larger development, ongoing support, and hardware platform budgets. Unfortunately, some DW/BI initiatives have failed miserably because they focused all their energy and resources on constructing the normalized structures rather
than allocating time to developing a dimensional presentation area that supports improved business decision making. Although enterprise-wide data consistency is a basic goal of the DW/BI environment, there may be eff ective and less costly approaches than physically creating normalized tables in the ETL system, if these structures don’t already exist. It is acceptable to create a normalized database to support the ETL processes; however, this is not the end goal. The normalized structures must be
off -limits to user queries because they defeat the twin goals of understandability and performance.

Area to Support Business Intelligence:
The DW/BI presentation area is where data is organized, stored, and made available for direct querying by users, report writers, and other analytical BI applications. Because the back room ETL system is off -limits, the presentation area is the DW/BI environment as far as the business community is concerned; it is all the business sees and touches via their access tools and BI applications.

Next will be Agile Considerations untill then

Your comments are always welcome here                                                                                               

Mehboob

MCTS – BI

 Skip to top

Data Profiling warehouse

Profiling Source Data: After you create the Data Profiling warehouse, you create data profiles in PowerCenter. A data profile contains functions that perform calculations on the source data. When you create a data profile, the Designer generates a profile mapping and a profile session.

You can run profile sessions against the mapping to gather information about source data. The Data Profiling warehouse stores the results of profile sessions. After you run profile sessions, you can view reports that display the session results.

Complete the following tasks to profile a source, mapplet, or groups in a source or mapplet:

1. Create a data profile.

2. Run a profile session.

3. View profile reports.

The Designer provides a Profile Manager and Profile Wizard to complete these tasks.

1. Create a Data Profile
To profile source data, you create a data profile based on a source or mapplet in the repository. Data profiles contain functions that perform calculations on the source data. For example, you can use a function to validate business rules in a data profile. You can apply profile functions to a column within a source, to a single source, or to multiple sources.

You can create the following types of data profiles:
Auto profile. Contains a predefined set of functions for profiling source data. Use an auto profile during mapping development to learn more about source data.

Custom profile. A data profile you define with the functions you need to profile source data. Use a custom profile during mapping development to validate documented business rules about the source data. You can also use a custom profile to monitor data quality or validate the results of BI reports.

You use the Designer to create a data profile. When you create a profile, the Designer generates a mapping and a session based on the profile information.

You can configure a data profile to write verbose data to the Data Profiling warehouse during a profile session. Verbose data provides more details about the data that results from a profile function. For example, for a function that validates business rules, verbose data may include the invalid rows in the source. For a function that determines the number of distinct values, verbose data can include a list of distinct values.

After you create a data profile, you can view profile details from the Profile Manager. You can also edit and delete the data profile.

2. Run the Profile Session
After you create a data profile, you can run the profile session. The Integration Service writes the profile session results to the Data Profiling warehouse.

You can run profile sessions from the following places:

Profile Manager. You can create and run temporary and persistent profile sessions from the Profile Manager. A temporary session runs on demand and is not stored in the repository. A persistent session can run on demand and is stored in the repository.

Workflow Manager. If you create a persistent profile session when you create the data profile, you can edit and run the profile workflow from the Workflow Manager.

3. View Data Profiling Reports
When you run a profile session, the Integration Service loads the session results to the Data Profiling warehouse. You can view the session results using PowerCenter Data Profiling reports. You can view reports on the profile session results in the following ways:

View PowerCenter Data Profiling reports from the Profile Manager.

View customizable reports in Data Analyzer.

The Profile Manager is a tool in the Designer that helps you manage data profiles. Use the Profile Manager to set default data profile options, work with data profiles in the repository, run profile sessions, view profile results, and view sources and mapplets with at least one profile defined for them. When you launch the Profile Manager, you can access profile information for the open folders in the repository.

There are two views in the Profile Manager:

Profile View. The Profile View tab displays the data profiles in the open folders in the repository.

Source View. The Source View tab displays the source definitions in the open folders in the repository for which you have defined data profiles.

Note: If the repository folder is read-only, you can view and run data profiles in the Profile View. You can also view Data Profiling reports. You cannot edit or delete data profiles.

From the Profile View and the Source View, you can complete the following tasks to manage, run, and view data profiles:

Create a custom profile.

View data profile details.

Edit a data profile.

Delete a data profile.

Run a session.

Regenerate a profile mapping.

Check in profile mappings.

Configure default data profile options.

Configure domains for profile functions.

Purge the Data Profiling warehouse.

Display the status of interactive sessions.

Display PowerCenter Data Profiling reports.

The Profile Manager launches immediately after you create a data profile. You can manually launch the Profile Manager from the following Designer tools:

Source Analyzer. Click Sources > Profiling > Launch Profile Manager.

Mapplet Designer. Click Mapplets > Profiling > Launch Profile Manager.

Repository Navigator. Open a folder and select a source definition. Right-click on the source definition and select Launch Profile Manager.

Profile View
The Profile View tab displays all of the data profiles in the open folder in the repository. Use the Profile View to determine the data profiles that exist for a particular repository folder.

Source View
The Source View displays the source definitions with data profiles in the open folder in the repository. A folder must be open before you can launch Profile Manager. Use the Source View to determine if a source definition already has data profiles defined. The Source View shows if the data profile is an auto profile or custom profile.

You can also use the Source View when you want to work with a data profile but are more familiar with the source name than the data profile name. For example, you want to run a profile session, and you know the source definition name but not the data profile name.

When you select the Source View tab in the Profile Manager, the Profile Navigator displays data profiles as nodes under the source definition for which you defined the data profile.

If you change or delete a data profile or a source or mapplet with a data profile, you can click View > Refresh to refresh the Source View.

Your comments are always welcome here                                                                                             

Mehboob

MCTS – BI

Skip to top

Data Integration Architecture Makes a Difference:

It’s time to rethink the DI process, and move it toward an agile, responsive enterprise environment. In the information technology world, various forms of architecture – especially enterprise architecture
and data architecture – help the business define its requirements and plans. Enterprise architecture focuses on the systems and applications, and data architecture is concerned with databases, data
models and semantic models. An enterprise DI architecture builds upon these methodologies, providing a comprehensive framework that supports repeatable DI strategies across all applications and data
sources.

An enterprise DI architecture is the result of an evolution of DI from project-based activities to a strategic initiative. It transforms DI into an ongoing, continual, and repeatable process that delivers value to all parts of the organization. An architectural approach is based on buy-in for DI from all parts of the business, to provide information that can be reused across the enterprise, via automated methodologies.
Enterprise DI architecture is not tied to any one technology or methodology. A well-designed architecture should support existing ETL-based interfaces, a combination of ETL, data federation and virtualization, or some other type of approach. The purpose of a DI architecture is to enable change within the interface and data structures as the business requires, along with the addition or replacement of interfaces and data sources without disrupting the enterprise data fabric.

Data on the back end is be captured from a range of sources, from transactional systems, machines and
applications, users, cloud or social-media, and any and all other sources that arise in the future. The data may be captured within operational data stores, data warehouses, appliances, or through
emerging frameworks such as the open-source Apache Hadoop File System. The data flowing into the enterprise is identified, deduped and cleansed to assure it trusted data that has the confidence of decision makers downstream. Data quality processes are often already in place within existing data warehouse and ETL environments; these practices need to be extended to all other enterprise data inputs as well.

Master data management ensures that there is a single, enterprise version of the data. A metadata layer provides details on all the critical information that is captured and available to decision makers across
the organization. Moving toward the front end, a data services layer virtualizes the data that is coming in from various sources – such as production databases, data warehouses, and unstructured data –
abstracting it for consumption by any and all clients and device types.

Finally a self-service framework on the front end enables the rapid configuration of end-user interfaces – such as dashboards and portals – for insights and consumption. Throughout the process. open and
standardized technology enables the development and deployment of a range of solutions.

Your comments are always welcome here                                                                                           

Mehboob

MCTS – BI

Skip to top

This Informatica Day will be the official Nordic launch for game-changing new products and the new Informatica vision for the future. Informatica’s Charles Race, SVP EMEA Field Operations, will provide the visionary keynote and Nordic launch of Informatica Vibe™ – the world’s only virtual data machine that lets you map once and deploy anywhere.

We live in a time of unprecedented technology changes and technology disruptions such as mobile, cloud computing and the internet of things. These changes and disruptions have the potential to deliver incredible business opportunity.

As time has progressed, computing access has become more ubiquitous, data sources and volume have exponentially increased, and the challenges to adopt technology have become staggering.

To remain competitive in the information age it is essential to better utilise data, drive operational efficiencies, and reduce complexity and costs in order to offer more targeted products and services to the right customers at the right time.

Informatica believe that the key is to transform fragmented, unpredictable, raw data into consolidated, high-quality, actionable information from any source, any technology, at any latency to put your information potential to work.

Event Details

Event Details
Wednesday 09 October 2013

Berns
Berzelii Park
Stockholm SE-103 27

Register

www.informatica.com/se

Your comments are always welcome here                                                                                             

Mehboob

MCTS – BI

 Skip to top

B2B Data Transformation Installation (for Windows)

Installing and configuring B2B Data Transformation (B2BDT) on new or existing hardware, either in conjunction with PowerCenter or co-existing with other host applications on the same server.

Note: B2B Data Transformation was formerly called Complex Data Exchange (CDE). Any references to PowerExchange Complex Data Exchange in this document are now referred to as B2B Data Transformation (B2BDT).

Description Consider the following questions when determining what type of hardware to use for B2BDT:

If the hardware already exists:

1. Is the processor, operating system supported by B2BDT?

2. Are the necessary operating system and patches applied?

3. How many CPUs does the machine currently have? Can the CPU capacity be expanded?

4. How much memory does the machine have? How much is available to the B2BDT application?

5. Will B2BDT share the machine with other applications? If yes, what are the CPU and memory requirements

of other applications?

If the hardware does not already exist:

1. Has the organization standardized on hardware or operating system vendor?

2. What type of operating system is preferred and supported?

Regardless of the hardware vendor chosen, the hardware must be configured and sized appropriately to support the complex data transformation requirements for B2BDT.

Among other factors, the hardware requirements for the B2BDT environment depend upon the data volumes, the number of concurrent users and the application server and operating system used. For exact sizing recommendations, contact Informatica Professional Services for a B2BDT Sizing and Baseline Architecture engagement.

Planning for the B2BDT Installation

There are several variations of the hosting environment from which B2BDT services will be invoked. This has implications on how B2BDT is installed and configured.

Host Software Environment

The most common configurations are:

1. B2BDT to be used in conjunction with PowerCenter

2. B2BDT as a stand alone configuration

3. B2BDT in conjunction with a non-PowerCenter integration using an adapter for other middleware software such as WebMethods or Oracle BPEL.

B2BDT 4.4 includes a mechanism for exposing B2BDT services through web services so that they can be called

from applications capable of calling web services.

Depending on what host options are chosen, installation options may vary.

Installation of B2BDT for a PowerCenter Host Environment

Be sure to have the necessary licenses and the additional plug-in to make PowerCenter work. Refer to the appropriate installation guide or contact Informatica support for details on the installation of B2BDT in PowerCenter environments.

Installation of B2BDT for a Standalone Environment

When using B2BDT services in a standalone environment, it is expected that one of the invocation methods (e.g., Web Services, .Net, Java APIs, Command Line or CGI) will be used to invoke B2BDT services. Consult accompanying B2BDT documentation for use in these environments.

Non-PowerCenter Middleware Platform Integration

Be sure to plan for additional agents to be installed. Refer to the appropriate installation guide or contact Informatica support for details for installing B2BDT in environments other than PowerCenter.

Other Decision Points

Where will the B2BDT service repository be located?

The choices for the location of the service repository are i) a path on the local file system or ii) use of a shared network drive. The justification for using a shared network drive is typically to simplify service deployment if two separate B2BDT servers want to share the same repository. While the use of a shared repository is convenient for a multi-server production environment it is not advisable for development as there could be a danger of multiple development teams potentially overwriting the same project files.

When a repository is shared between multiple machines, if a service is deployed via the B2BDT Studio, the Service

Refresh Interval setting controls how fast other installations of B2BDT that are currently running detect the deployment of a service.

What are multi-user considerations?

If multiple users share a machine (but not at same time) the environment variable “IFConfigLocation4” can be used to set the location of the configuration file to point to a different configuration file for each user.

Security Considerations

As the B2BDT repository, workspace and logging locations are directory-based all directories to be used should be

granted read and write permissions for the user identity under which the B2BDT service will run.

The identity associated with the caller of the B2BDT services will also need to have permissions to execute the files installed in B2BDT binary directory.

Special considerations should be given to environments such as web services where the user identify under which the B2BDT service runs is set to be different for the interactive user or the user associated with the calling application.

Log File and Tracing Locations

Log files and tracing options should be configured for appropriate recycling policies. The calling application must have permissions to read, write and delete files to the path that is set for storing these files.

B2BDT Pre-install Checklist

It is best to review the environment and record the information in a detailed checklist to facilitate the B2BDT install.

Minimum System Requirements

Verify that the minimum requirements for the Operating System, Disk Space, Processor Speed and RAM are met and record them the checklist.

l B2BDT Studio requires Microsoft .NET Framework, version 2.0.

l If this version is not already installed, the installer will prompt for and install the framework automatically.

l B2BDT requires a Sun Java 2 Runtime Environment, version 1.5.X or above.

l B2BDT bundles with the appropriate JRE version. The installer can be pointed to an existing JRE or a JRE can be downloaded from Sun.

l To install the optional B2BDT libraries, reserve additional space (refer to documentation for additional information).

PowerCenter Integration Requirements

Complete the checklist for integration if B2BDT will be integrated with PowerCenter.

For an existing PowerCenter installation, the B2BDT client needs to be installed on at least one PC on which the PowerCenter client resides. B2BDT components also need to be installed on the PowerCenter server. If utilizing an existing PowerCenter installation ensure the following:

l Which version of PowerCenter is being used (8.x required)?

l Is the PowerCenter version 32 bit or 64 bit?

l Are the PowerCenter client tools installed on the client PC?

l Is the PowerCenter server installed on the server?

For new PowerCenter installations, the PowerCenter Pre-Install Checklist should be completed. Keep in mind that the same hardware will be utilized for PowerCenter and B2BDT.

For windows Server, verify the following:

l The login account used for the installation has local administrator rights.

l 500Mb or more of temporary workspace is available.

l The Java 2 Runtime Environment version 1.5 or higher is installed and configured.

l Microsoft .NET Framework, version 2.0 is installed.

Non-PowerCenter Integration Requirements

In addition to the general B2BDT requirements, non-PowerCenter agents require that additional components are installed.

B2BDT Agent for BizTalk – requires that Microsoft BizTalk Server (version 2004 or 2006) is installed on the same computer as B2BDT. If B2BDT Studio is installed on the same computer as BizTalk Server 2004, the Microsoft SP2 service pack for BizTalk Server must be installed.

B2BDT Translator for Oracle BPEL – requires that BPEL 10.1.2 or above is installed.

B2BDT Agent for WebMethods – requires that WebMethods 6.5 or above is installed.

B2BDT Agent for WebSphere Business Integration Message Broker – requires that WBIMB 5.0 with CSD06 (or WBIMB 6.0) are installed. Also ensure that the Windows platform supports both the B2BDT Engine and WBIMB.

A valid license key is needed to run a B2BDT project and must be installed before B2BDT services will run on the computer. Contact Informatica support to obtain a B2BDT license file (B2BDTLicense.cfg). B2BDT Studio can be used without installing a license file.

B2BDT Installation and Configuration

The B2BDT installation consists of two main components – the B2BDT development workbench (Studio) and the B2BDT Server (which is an application deployed on a server). The installation tips apply to Windows environments.

This section should be used as a supplement to the B2B Data Transformation Installation Guide.

Before installing B2BDT complete the following steps:

l Verify that the hardware meets the minimum system requirements for B2BDT.

l Ensure that the combination of hardware and operating system are supported by B2BDT.

l Ensure that sufficient space has been allocated for the B2BDT serviceDB.

l Ensure that all necessary patches have been applied to the operating system.

l Ensure that the B2BDT license file has been obtained from technical support.

l Be sure to have administrative privileges for the installation user id.

Adhere to the following sequence of steps to successfully install B2BDT.

1. Complete the B2BDT pre-install checklist and obtain valid license keys.

2. Install B2BDT development workbench (studio) on the windows platform.

3. Install the B2BDT server on a server machine. When used in conjunction with PowerCenter, the server component must be installed on the same physical machine where PowerCenter resides.

4. Install necessary client agents when used in conjunction with WebSphere, WebMethods and BizTalk.

In addition to the standard B2BDT components that are installed by default additional libraries can be installed.

Refer to the B2BDT documentation for detailed information on these library components.

Let me know if you have any question and your comments are always welcome here ..

Skip to top

 

Register Now

PowerCenter, Data Quality, B2B, Master Data Management, Archiving & Application Retirement, Data Masking & Test Data Management, Ultra Messaging, Data Replication and Data Services.

New Product Offerings:

  • ILM 2 day Test Data Management & Masking 9.5 class
  • MDM 4 day Multidomain Hub Config 9.5 class
  • Ultra Messaging 4 day class
  • New onDemand Training now available.

Public Class Schedule EMEA: May – July 2013
Please use the links in the table below to find out the latest details for each course.

PowerCenter classroom instructor-led courses

PowerCenter classroom instructor-led courses

PowerCenter online instructor-led courses

Data Quality classroom-led courses

Data Quality classroom-led courses

Data Quality online instructor-led courses

Master Data Management classroom instructor-led courses

Master Data Management classroom instructor-led courses

Master Data Management online instructor-led courses

B2B online instructor-led courses

B2B online instructor-led courses

ILM classroom instructor-led courses

ILM classroom instructor-led courses

ILM online instructor-led courses

Velocity online instructor-led courses

Velocity online instructor-led courses

 

Skip to top

 

PowerCenter & SAP NetWeaver Integration

PowerExchange for SAP NetWeaver integrates with mySAP applications:

Data integration using the ABAP program

IDoc integration using ALE

Data integration using BAPI/RFC functions

Data migration integration

Business content integration

Data Integration Using the ABAP Program

You can extract data from mySAP applications using the ABAP program. Create a mapping in the Designer that uses the ABAP program. Generate and install the ABAP program on the SAP server that extracts the source data. When you configure a session, you can access the source data through streaming or a staged file. The Integration Service accesses streamed data through CPI-C. It accesses staged files through FTP, SFTP, or standard file I/O, typically using network file sharing, such as NFS. The steps to extract data from mySAP applications using the ABAP program:

1.

Import an SAP R/3 source definition.

2.

Create a mapping.

3.

Generate and install the ABAP program.

4.

Create a session and run a workflow.

IDoc Integration Using ALE

You can integrate PowerCenter with mySAP applications using Application Link Enabling (ALE) to send and receive Intermediate Documents (IDocs). IDocs are messages that exchange electronic data between SAP applications or between SAP applications and external programs. Message-based architecture of ALE comprises are have three layers:

Application layer. Provides ALE an interface to R/3 to send or receive messages from external systems.

Distribution layer. Filters and converts messages to ensure that they are compatible between different releases of R/3 and R/2.

Communications layer. Enables ALE to support synchronous and asynchronous communication. You use IDocs for asynchronous communication.

The architecture of ALE provides a way to send IDocs as text files without connecting to a central database. This allows applications to communicate with each other without converting between formats to accommodate hardware or platform differences. ALE has the following components:

Logical component. Determines how messages flow between the various applications or systems.

Physical component. Transport layer that routes IDoc messages using the tRFC (Transactional RFC) protocol.

Message types. Application messages that classify categories of data. For example, ORDERS and MATMAS (Material Master).

IDoc types. Data structure associated with the message type. For example, MATMAS01, MATMAS02 for MATMAS. IDocs contain the data belonging to the message type.

IDocs contain three record types:

Control record. Identifies the message type.

Data records. Contain the IDoc data in segments.

Status records. Describe the status of the IDoc. Status record names are the same for each IDoc type.

Data Integration Using BAPI/RFC Functions

Business Application Programming Interfaces (BAPI) provide a way for third-party applications to integrate data with SAP. You use BAPIs to create, change, delete, list, and detail objects in SAP.

The Business Application Programming Interfaces allow object-oriented access to the SAP system through methods for the business object types. Together with the business object types, BAPIs define and document the interface standard at the business level.

BAPIs also provide an object-oriented view of business components in SAP. You define BAPIs in the SAP Business Objects Repository. You implement and store them as Remote Function Call (RFC) enabled function modules in the Function Builder of the ABAP Workbench. You can call BAPIs as an ABAP program within SAP. You use RFCs to call BAPIs outside of SAP.

Use a BAPI/RFC transformation to create, change, or delete data in mySAP applications. When you run a session with a BAPI/RFC transformation, the Integration Service makes the RFC function calls to SAP to process SAP data. You can use a BAPI/RFC transformation for one of the following reasons:

Migrate data to SAP. For example, your organization uses PeopleSoft applications for enterprise purchasing management. You want to migrate to mySAP applications to manage enterprise purchasing. Use PowerExchange for for PeopleSoft to extract data from PeopleSoft and a BAPI/RFC transformation to write purchase management data to mySAP applications.

Synchronize data in SAP. For example, a mySAP application contains customer orders. You want to add line items to some of the orders. You can use a BAPI/RFC transformation to make a BAPI/RFC call to add the line items to the orders in SAP.

Data Migration Integration

You can migrate data from legacy applications, from other ERP systems, or data from any number of other sources and prepare it for input into mySAP applications. The Integration Service extracts the data from the data source and prepares the data in an SAP format flat file that you can load into mySAP applications.

Business Content Integration:

You can integrate PowerCenter with mySAP applications to provide an efficient, high-volume data warehousing solution. SAP business content is a collection of metadata objects that can be integrated with other applications and used for analysis and reporting. SAP produces the business content data, and PowerCenter consumes it. PowerCenter can consume all or changed business content data from mySAP applications and write it to a target data warehouse. You can then use the data warehouse to meet analysis and reporting needs. PowerExchange for SAP NetWeaver BI can perform the following tasks:

Extracting data from SAP BI

Loading data into SAP BI

PowerExchange for SAP NetWeaver BI interacts with InfoCubes and InfoSources. An InfoCube is a self-contained dataset created with data from one or more InfoSources. An InfoSource is a collection of data that logically belongs together, summarized into a single unit. TCP/IP is the native communication interface between PowerCenter and SAP NetWeaver. PowerCenter and SAP NetWeaver also use the following interfaces:

CPI-C

Remote Function Call (RFC)

Common Program Interface-Communications:

CPI-C is the communication protocol that enables online data exchange and data conversion between SAP and PowerCenter. PowerExchange for SAP NetWeaver uses CPI-C to communicate with SAP NetWeaver only when you run ABAP stream mode sessions.

To initialize CPI-C communication with PowerCenter, SAP NetWeaver requires information, such as the host name of the application server and SAP gateway. This information is stored in a configuration file named saprfc.ini on the node where the Integration Service process runs. The Integration Service uses parameters in the saprfc.ini file to connect to SAP NetWeaver when you run ABAP stream mode sessions.

RFC:

RFC is the remote communications protocol used by SAP NetWeaver. It is based on Remote Procedure Call (RPC). PowerCenter makes remote function calls to communicate with SAP NetWeaver.

To execute remote calls from PowerCenter, SAP NetWeaver requires information, such as the connection type and the service name and gateway on the application server. This information is stored in a configuration file named saprfc.ini on the node hosting the PowerCenter Client and the node where the Integration Service and SAP BW Service processes run.

The transport system is a set of ABAP programs installed on the SAP system. It provides a way to import SAP metadata into the PowerCenter repository. It also enables run-time functionalities, such as passing mapping variables and filters. You use the transport system in the following situations:

Configuring PowerExchange for SAP NetWeaver. You need to transport some customized objects that were developed by Informatica to the SAP system. These objects include tables, structures, programs, and functions. PowerCenter calls one or more of these objects each time you make a request to the SAP system.

Deploying run-time transports and ABAP programs from development to production. If you are using ABAP to integrate with mySAP applications, you might want to deploy the run-time transports provided by Informatica and the ABAP programs installed by the Designer to extract data when you move from development to production.

 

The SAP system administrator is responsible for completing transport tasks, write me if you have any further question.

Mehboob

MCTS – BI

Skip to top

Informatica PowerCenter 9.1.0 Optimizing Sessions

** Grid.

** Pushdown Optimization.

** Concurrent Sessions and Workflows.

** Buffer Memory.

** Caches.

** Target-Based Commit.

** Real-time Processing.

** Staging Areas.

** Log Files.

** Error Tracing.

** Post-Session Emails.

Grid:

You can use a grid to increase session and workflow performance. A grid is an alias assigned to a group of nodes that allows you to automate the distribution of workflows and sessions across nodes. A Load Balancer distributes tasks to nodes without overloading any node.

When you use a grid, the Integration Service distributes workflow tasks and session threads across multiple nodes. A Load Balancer distributes tasks to nodes without overloading any node. Running workflows and sessions on the nodes of a grid provides the following performance gains:

** Balances the Integration Service workload.

** Processes concurrent sessions faster.

** Processes partitions faster.

The Integration Service requires CPU resources for parsing input data and formatting the output data. A grid can improve performance when you have a performance bottleneck in the extract and load steps of a session. A grid can improve performance when memory or temporary storage is a performance bottleneck. When a PowerCenter mapping contains a transformation that has cache memory, deploying adequate memory and separate disk storage for each cache instance improves performance.

Running a session on a grid can improve throughput because the grid provides more resources to run the session. Performance improves when you run a few sessions on the grid at a time. Running a session on a grid is moreefficient than running a workflow over a grid if the number of concurrent session partitions is less than the number of nodes.

Pushdown Optimization:

To increase session performance, push transformation logic to the source or target database. Based on the mapping and session configuration, the Integration Service executes SQL against the source or target database instead of processing the transformation logic within the Integration Service.

Target-Based Commit: The commit interval setting determines the point at which the Integration Service commits data to the targets. Each time the Integration Service commits, performance slows. Therefore, the smaller the commit interval, the more often the Integration Service writes to the target database, and the slower the overall performance.

Concurrent Sessions and Workflows: If possible, run sessions and workflows concurrently to improve performance. Till exp. if you load data into an analytic schema, where you have dimension and fact tables, load the dimensions concurrently.

Buffer Memory:

When the Integration Service initializes a session, it allocates blocks of memory to hold source and target data. The Integration Service allocates at least two blocks for each source and target partition. Sessions that use a large number of sources and targets might require additional memory blocks. If the Integration Service cannot allocate enough memory blocks to hold the data, it fails the session. You can configure the amount of buffer memory, or you can configure the Integration Service to calculate buffer settings at run time.

To increase the number of available memory blocks, adjust the following session properties:

** DTM Buffer Size. Increase the DTM buffer size on the Properties tab in the session properties.

** Default Buffer Block Size. Decrease the buffer block size on the Config Object tab in the session properties.

Before you configure these settings, determine the number of memory blocks the Integration Service requires to initialize the session. Then, based on default settings, calculate the buffer size and the buffer block size to create the required number of session blocks.

If you have XML sources or targets in a mapping, use the number of groups in the XML source or target in the calculation for the total number of sources and targets.

Increasing DTM Buffer Size:

The DTM Buffer Size setting specifies the amount of memory the Integration Service uses as DTM buffer memory. The Integration Service uses DTM buffer memory to create the internal data structures and buffer blocks used to bring data into and out of the Integration Service. When you increase the DTM buffer memory, the Integration Service creates more buffer blocks, which improves performance during momentary slowdowns.

Increasing DTM buffer memory allocation generally causes performance to improve initially and then level off. When you increase the DTM buffer memory allocation, consider the total memory available on the Integration Service process system.

If you do not see a significant increase in performance, DTM buffer memory allocation is not a factor in session performance.

Note: Reducing the DTM buffer allocation can cause the session to fail early in the process because the Integration Service is unable to allocate memory to the required processes.

To increase the DTM buffer size, open the session properties and click the Properties tab. Edit the DTM Buffer Size property in the Performance settings.

Caches:

The Integration Service uses the index and data caches for XML targets and Aggregator, Rank, Lookup, and Joiner transformations. The Integration Service stores transformed data in the data cache before returning it to the pipeline. It stores group information in the index cache. Also, the Integration Service uses a cache to store data for Sorter transformations.

To configure the amount of cache memory, use the cache calculator or specify the cache size. You can also configure the Integration Service to calculate cache memory settings at run time.

If the allocated cache is not large enough to store the data, the Integration Service stores the data in a temporary disk file, a cache file, as it processes the session data. Performance slows each time the Integration Service pages to a temporary file. Examine the performance counters to determine how often the Integration Service pages to a file.

Perform the following tasks to optimize caches:

** Limit the number of connected input/output and output only ports.

** Select the optimal cache directory location.

** Increase the cache sizes.

** Use the 64-bit version of PowerCenter to run large cache sessions.

Target-Based Commit:

The commit interval setting determines the point at which the Integration Service commits data to the targets. Each time the Integration Service commits, performance slows. Therefore, the smaller the commit interval, the more often the Integration Service writes to the target database, and the slower the overall performance.

Real-time Processing:

Flush Latency

Flush latency determines how often the Integration Service flushes real-time data from the source. The lower you set the flush latency interval, the more frequently the Integration Service commits messages to the target. Each time the Integration Service commits messages to the target, the session consumes more resources and throughput drops.

Increase the flush latency to improve throughput. Throughput increases as you increase the flush latency up to a certain threshold, depending on the hardware and available resources.

Source-Based Commit:

Source-based commit interval determines how often the Integration Service commits real-time data to the target. To obtain the fastest latency, set the source-based commit to 1.

Log Files:

A workflow runs faster when you do not configure it to write session and workflow log files. Workflows and sessions always create binary logs. When you configure a session or workflow to write a log file, the Integration Service writes logging events twice. You can access the binary logs session and workflow logs in the Administrator tool.

Error Tracing:

To improve performance, reduce the number of log events generated by the Integration Service when it runs the session. If a session contains a large number of transformation errors, and you do not need to correct them, set the session tracing level to Terse. At this tracing level, the Integration Service does not write error messages or row-level information for reject data.

If you need to debug the mapping and you set the tracing level to Verbose, you may experience significant performance degradation when you run the session. Do not use Verbose tracing when you tune performance. The session tracing level overrides any transformation-specific tracing levels within the mapping. This is not recommended as a long-term response to high levels of transformation errors.

Post-Session Emails:

When you attach the session log to a post-session email, enable flat file logging. If you enable flat file logging, the Integration Service gets the session log file from disk. If you do not enable flat file logging, the Integration Service gets the log events from the Log Manager and generates the session log file to attach to the email. When the

Integration Service retrieves the session log from the log service, workflow performance slows, especially when the session log file is large and the log service runs on a different node than the master DTM. For optimal performance, configure the session to write to log file when you configure post-session email to attach a session log.

Let me know if you have any question and your comments are always welcome here ..


Mehboob

MCTS – BI

Skip to top

In a recent Gartner survey, only 1 in 5 companies were highly satisfied with the quality of their data. It’s all too common that users don’t trust the data in data warehouses and applications. It’s a significant reason they use spreadsheets or other workarounds. Read about what it takes to build trust in your data, increase data warehouse adoption, and improve business decisions. Informatica Training: Data Quality Developer or check info Communities

Food for Thought

Magic Quadrant for Master Data Management of Customer Data Solutions Magic Quadrant for Master Data Management of Customer Data Solutions
Master Data Management (MDM) improves data consistency and trust. Read the 2012 Gartner Magic Quadrant for Master Data Management of Customer Data Solutions report, and learn more about Informatica’s leadership position.

Themed Topic of the Month: Best Practices for Building Trust
Building trust in your data requires more than luck. You need to put in place the right data governance processes. Learn how to get started, and what it takes to succeed.

TDWI Data Governance Checklist TDWI Data Governance Checklist
Get started with this TDWI checklist to learn how discovery and profiling, business glossaries, and data quality tools fit into data governance processes.
IDC White Paper: Making Information Assets Trustworthy IDC White Paper: Making Information Assets Trustworthy
Examine the role of data profiling and data quality in data governance, and read about the lessons learned by companies that have successfully adopted these tools.
TDWI Data Governance Webinar Replay TDWI Data Governance Webinar Replay
Listen to TDWI’s Claudia Imhoff as she discusses why every business leader wants to have trusted, secure, consistent and usable information, and the framework for data governance that can be applied in today’s volatile, distributed data world.
Data Governance for Big Data: It’s Even More Critical Now Data Governance for Big Data: It’s Even More Critical Now
Now is exactly the right time to think about data governance for big data; before the wild, untamed data from outside the company starts getting mixed with your potentially more trustworthy, tamed, internal data.

Featured Customer Case Study of the Month

U.S. Xpress Enterprises, Inc. U.S. Xpress Enterprises, Inc.
Read how US Xpress delivered a return on investment (ROI) in less than three months and realized millions in savings by focusing on trusted data

Learn from here .. Informatica Training: Data Quality Developer

Let me know if you have any question and your comments are always welcome here..                                                                                  2012-12-21

Mehboob

MCTS – BI

Skip to top

Mapping with Fact and Dimension Tables

Mapping with Fact and Dimension Tables In previous lessons, you used the Source Qualifier, Expression, Aggregator, and Lookup transformations in mappings. How to use the following transformations:

* Stored Procedure. Call a stored procedure and capture its return values.

** Filter. Filter data that you do not need, such as discontinued items in the ITEMS table.

*** Sequence Generator. Generate unique IDs before inserting rows into the target.

You create a mapping that outputs data to a fact table and its dimension tables.

Shows the mapping you create in this ..

Creating Targets

Before you create the mapping, create the following target tables:

F_PROMO_ITEMS. A fact table of promotional items.
D_ITEMS, D_PROMOTIONS, and D_MANUFACTURERS. Dimensional tables.
1.

Open the Designer, connect to the repository, and open the tutorial folder.

2.

Click Tools > Target Designer.

To clear the workspace, right-click the workspace, and select Clear All.

3.

Click Targets > Create.

4.

In the Create Target Table dialog box, enter F_PROMO_ITEMS as the name of the target table, select the database type, and click Create.

5.

Repeat step 4 to create the other tables needed for this schema: D_ITEMS, D_PROMOTIONS, and D_MANUFACTURERS. When you have created all these tables, click Done.

6.

Open each target definition, and add the following columns to the appropriate table:

D_ITEMS

Column Datatype Precision Not Null Key
ITEM_ID Integer Not Null Primary Key
ITEM_NAME Varchar 72
PRICE Money default

D_PROMOTIONS

Column Datatype Precision Not Null Key
PROMOTION_ID Integer Not Null Primary Key
PROMOTION_NAME Varchar 72
DESCRIPTION Varchar default
START_DATE Datetime default
END_DATE Datetime default

D_MANUFACTURERS

Column Datatype Precision Not Null Key
MANUFACTURER_ID Integer Not Null Primary Key
MANUFACTURER_NAME Varchar 72

F_PROMO_ITEMS

Column Datatype Precision Not Null Key
PROMO_ITEM_ID Integer Not Null Primary Key
FK_ITEM_ID Integer Foreign Key
FK_PROMOTION_ID Integer Foreign Key
FK_MANUFACTURER_ID Integer Foreign Key
NUMBER_ORDERED Integer
DISCOUNT Money default
COMMENTS Varchar default

The datatypes may vary, depending on the database you choose.

Note: For F_PROMO_ITEMS, you include foreign key columns that correspond to the primary keys in each of the dimension tables.

7.

Click Repository > Save.

Creating Target Tables

The next step is to generate and execute the SQL script to create each of these new target tables.

1.

Select all of the target definitions.

2.

Click Targets > Generate/Execute SQL.

3.

In the Database Object Generation dialog box, connect to the target database.

4.

Select generate from Selected tables, and select the options for creating the tables and generating keys.

5.

Click Generate and Execute.

6.

Click Close.

Creating the Mapping

You create a mapping to filter out discontinued items. You can call a stored procedure to find the number of each item customers have ordered, and generate a unique ID for each row in the fact table.

1.

In the Designer, switch to the Mapping Designer, and create a mapping.

2.

Name the mapping m_PromoItems.

3.

From the list of target definitions, select the tables you just created and drag them into the

4.

From the list of source definitions, add the following source definitions to the:

PROMOTIONS
ITEMS_IN_PROMOTIONS
ITEMS
MANUFACTURERS
ORDER_ITEMS

5.

Delete all Source Qualifier transformations that the Designer creates when you add these source definitions.

6.

Add a Source Qualifier transformation named SQ_AllData to the mapping, and connect all the source definitions to it.

7.

Click View > Navigator to close the Navigator to allow extra space in the workspace.

8.

Click Repository > Save.

Creating a Filter Transformation

The Filter transformation filters rows from a source. If you connect a Filter transformation to a Source Qualifier transformation, you can filter rows passed through the Source Qualifier transformation using any condition you want to apply. In this exercise, you remove discontinued items from the mapping.

The mapping contains a Filter transformation that limits rows queried from the ITEMS table to those items that have not been discontinued.

1.

Create a Filter transformation and name it FIL_CurrentItems.

2.

Drag the following ports from the Source Qualifier transformation into the Filter transformation:

ITEM_ID
ITEM_NAME
PRICE
DISCONTINUED_FLAG

3.

Open the Filter transformation.

4.

Click the Properties tab to specify the filter condition.

5.

Click Open in the Filter Condition field.

The Expression Editor dialog box appears.

6.

Select the word TRUE in the Formula field and press Delete.

7.

Click the Ports tab.

8.

Enter DISCONTINUED_FLAG = 0.

The following example shows the complete condition:

DISCONTINUED_FLAG = 0

9.
Click Validate, and then click OK.

The new filter condition now appears in the Value field.

10.

Click OK to return to the workspace.

Connecting the Filter Transformation

Now, you need to connect the Filter transformation to the D_ITEMS target table. Currently sold items are written to this target.

1.

Connect the ports ITEM_ID, ITEM_NAME, and PRICE to the corresponding columns in D_ITEMS.

2.

Click Repository > Save.

Creating a Sequence Generator Transformation

A Sequence Generator transformation generates unique values, such as primary keys, for a target in a mapping. You can also use it to cycle through a closed set of values. Many relational databases include sequences, which are special database objects that generate values. The Sequence Generator transformation functions like a sequence object in a database. However, in PowerCenter, you do not need to write SQL code to create and use the sequence in a mapping.

The Sequence Generator transformation has the following properties:

The starting number (normally 1).
The current value stored in the repository.
The number that the Sequence Generator transformation adds to its current value for every request for a new ID.
The maximum value in the sequence.
A flag indicating whether the Sequence Generator transformation counter resets to the minimum value once it has reached its maximum value.

The Sequence Generator transformation has two output ports, NEXTVAL and CURRVAL, which correspond to the two pseudo-columns in a sequence. When you query a value from the NEXTVAL port, the transformation generates a new value.

In the new mapping, you add a Sequence Generator transformation to generate IDs for the fact table F_PROMO_ITEMS. Every time the Integration Service inserts a new row into the target table, it generates a unique ID for PROMO_ITEM_ID.

1.

Create a Sequence Generator transformation and name it SEQ_PromoItemID.

2.

Open the Sequence Generator transformation.

3.

Click the Ports tab.

The two output ports, NEXTVAL and CURRVAL, appear in the list.

Note: You cannot add any new ports to this transformation or reconfigure NEXTVAL and CURRVAL.

4.

Click the Properties tab.

The properties for the Sequence Generator transformation appear. You do not have to change any of these settings.

5.

Click OK.

6.

Connect the NEXTVAL column from the Sequence Generator transformation to the PROMO_ITEM_ID column in the target table F_PROMO_ITEMS.

7.
Click Repository > Save.

Creating a Stored Procedure Transformation

When you install the sample database objects to create the source tables, you also create a stored procedure, SP_GET_ITEM_COUNT. This procedure takes one argument, an ITEM_ID value, and returns the number of times that item has been ordered.

The following table shows the syntax for the stored procedure:

Database Syntax
Oracle
CREATE FUNCTION SP_GET_ITEM_COUNT 
   (ARG_ITEM_ID IN NUMBER) RETURN NUMBER
   IS SP_RESULT NUMBER;
      BEGIN
          SELECT COUNT(*)
             INTO SP_RESULT
             FROM ORDER_ITEMS 
             WHERE ITEM_ID = ARG_ITEM_ID;
          RETURN (SP_RESULT);
      END;
Microsoft SQL Server
CREATE PROCEDURE SP_GET_ITEM_COUNT (@ITEM_ID INT)
                AS SELECT COUNT(*) FROM ORDER_ITEMS
                WHERE ITEM_ID = @ITEM_ID
Sybase ASE
CREATE PROCEDURE SP_GET_ITEM_COUNT (@ITEM_ID INT)
AS SELECT COUNT(*) FROM ORDER_ITEMS WHERE ITEM_ID = @ITEM_ID
Informix
CREATE PROCEDURE SP_GET_ITEM_COUNT (ITEM_ID_INPUT INT)
  RETURNING INT;
  DEFINE CNT INT;
  SELECT COUNT(*) INTO CNT FROM ORDER_ITEMS WHERE ITEM_ID = ITEM_ID_INPUT;
  RETURN CNT;
DB2
CREATE PROCEDURE  SP_GET_ITEM_COUNT (IN ARG_ITEM_ID INT,
                                     OUT SP_RESULT  INT,
                                     OUT SQLCODE_OUT INT )
    LANGUAGE SQL
P1: BEGIN
    -- Declare variables
    DECLARE SQLCODE INT DEFAULT 0;
    -- Declare handler
    DECLARE EXIT HANDLER FOR SQLEXCEPTION
        SET SQLCODE_OUT = SQLCODE;
    SELECT COUNT(*) INTO SP_RESULT
           FROM ORDER_ITEMS
           WHERE ITEM_ID=ARG_ITEM_ID;
    SET SQLCODE_OUT = SQLCODE;
END P1
Teradata
CREATE PROCEDURE SP_GET_ITEM_COUNT 
 (IN ARG_ITEM_ID integer, OUT SP_RESULT integer)

 BEGIN
    SELECT COUNT(*) 
      INTO: SP_RESULT
      FROM ORDER_ITEMS 
      WHERE ITEM_ID =: ARG_ITEM_ID;

  END;

In the mapping, add a Stored Procedure transformation to call this procedure. The Stored Procedure transformation returns the number of orders containing an item to an output port.

1.

Create a Stored Procedure transformation and name it SP_GET_ITEM_COUNT.

The Import Stored Procedure dialog box appears.

2.

Select the ODBC connection for the source database. Enter a user name, owner name, and password. Click Connect.

3.

Select the stored procedure named SP_GET_ITEM_COUNT from the list and click OK.

4.

In the Create Transformation dialog box, click Done.

The Stored Procedure transformation appears in the mapping.

5.

Open the Stored Procedure transformation, and click the Properties tab.

6.

Click Open in the Connection Information section.

The Select Database dialog box appears.

7.
Select the source database and click OK.

You can call stored procedures in both source and target databases.

Note: You can also select the built-in database connection variable, $Source. When you use $Source or $Target, the Integration Service determines which source database connection to use when it runs the session. If it cannot determine which connection to use, it fails the session.

8.
Click OK.
9.
Connect the ITEM_ID column from the Source Qualifier transformation to the ITEM_ID column in the Stored Procedure transformation.
10.
Connect the RETURN_VALUE column from the Stored Procedure transformation to the NUMBER_ORDERED column in the target table F_PROMO_ITEMS.
11.
Click Repository > Save.

Completing the Mapping

The final step is to map data to the remaining columns in targets.

1. Connect the following columns from the Source Qualifier transformation to the targets:
Source Qualifier Target Table Column
PROMOTION_ID D_PROMOTIONS PROMOTION_ID
PROMOTION_NAME D_PROMOTIONS PROMOTION_NAME
DESCRIPTION D_PROMOTIONS DESCRIPTION
START_DATE D_PROMOTIONS START_DATE
END_DATE D_PROMOTIONS END_DATE
MANUFACTURER_ID D_MANUFACTURERS MANUFACTURER_ID
MANUFACTURER_NAME D_MANUFACTURERS MANUFACTURER_NAME
ITEM_ID F_PROMO_ITEMS FK_ITEM_ID
PROMOTION_ID F_PROMO_ITEMS FK_PROMOTION_ID
MANUFACTURER_ID F_PROMO_ITEMS FK_MANUFACTURER_ID
DISCOUNT F_PROMO_ITEMS DISCOUNT
COMMENTS F_PROMO_ITEMS COMMENTS

2.

Click Repository > Save.

The mapping is now complete. You can create and run a workflow with this mapping.

Let me know if you have any question and your comments are always welcome here..                                             

Mehboob

MCTS – BI

Skip to top

 

PowerExchange for SAP NetWeaver Overview

2012-11-17

SAP NetWeaver is an application platform that integrates multiple business applications and solutions, such as Customer Relationship Management (CRM), Advanced Planner and Optimizer (APO), and Bank Analyzer. Developers can add business logic within SAP NetWeaver using Java 2 Enterprise Edition (J2EE) or Advanced Business Application Programming-Fourth Generation (ABAP/4 or ABAP), a language proprietary to SAP.

You can use the following integration methods to extract data from or load data into SAP systems:

You can use the ABAP, Application Link Enabling (ALE),   BAPI/RFC functions, data migration, or business content integration methods.
You can also extract data from or load data to the SAP   Business Information Warehouse.

Note: PowerExchange for SAP NetWeaver PowerExchange for SAP NetWeaver BI require separate licenses.

SAP NetWeaver is the basis for SAP solutions. Because PowerCenter works with the SAP NetWeaver application platform, you can integrate PowerCenter with any SAP industry solution or mySAP application that provides BAPI/RFC or ALE integration methods.

The following figure shows how PowerCenter integrates with SAP NetWeaver:

PowerCenter & SAP NetWeaver Integration Methods

PowerExchange for SAP NetWeaver integrates with mySAP applications in the following ways:

Data   integration using the ABAP program
IDoc   integration using ALE
Data   integration using BAPI/RFC functions
Data   migration integration
Business   content integration

Data Integration Using the ABAP Program

You can extract data from mySAP applications using the ABAP program. Create a mapping in the Designer that uses the ABAP program. Generate and install the ABAP program on the SAP server that extracts the source data. When you configure a session, you can access the source data through streaming or a staged file. The PowerCenter Integration Service accesses streamed data through CPI-C. It accesses staged files through FTP, SFTP, or standard file I/O, typically using network file sharing, such as NFS.

Complete the following steps to extract data from mySAP applications using the ABAP program:

1. Import   an SAP R/3 source definition.
2. Create   a mapping.
3. Generate   and install the ABAP program.
4. Create   a session and run a workflow.

IDoc Integration Using ALE

You can integrate PowerCenter with mySAP applications using Application Link Enabling (ALE) to send and receive Intermediate Documents (IDocs). IDocs are messages that exchange electronic data between SAP applications or between SAP applications and external programs.

The message-based architecture of ALE comprises three layers:

Application layer.   Provides ALE an interface to R/3 to send or receive messages from external   systems.
Distribution layer.   Filters and converts messages to ensure that they are compatible between   different releases of R/3 and R/2.
Communications layer.   Enables ALE to support synchronous and asynchronous communication. You use   IDocs for asynchronous communication.

The architecture of ALE provides a way to send IDocs as text files without connecting to a central database. This allows applications to communicate with each other without converting between formats to accommodate hardware or platform differences.

ALE has the following components:

Logical component.   Determines how messages flow between the various applications or systems.
Physical component.   Transport layer that routes IDoc messages using the tRFC (Transactional RFC)   protocol.
Message types.   Application messages that classify categories of data. For example, ORDERS   and MATMAS (Material Master).
IDoc types.   Data structure associated with the message type. For example, MATMAS01,   MATMAS02 for MATMAS. IDocs contain the data belonging to the message type.

IDocs contain three record types:

Control record. Identifies   the message type.
Data records.   Contain the IDoc data in segments.
Status records.   Describe the status of the IDoc. Status record names are the same for each IDoc   type.

Data Integration Using BAPI/RFC Functions

Business Application Programming Interfaces (BAPI) provide a way for third-party applications to integrate data with SAP. You use BAPIs to create, change, delete, list, and detail objects in SAP.

The Business Application Programming Interfaces allow object-oriented access to the SAP system through methods for the business object types. Together with the business object types, BAPIs define and document the interface standard at the business level.

BAPIs also provide an object-oriented view of business components in SAP. You define BAPIs in the SAP Business Objects Repository. You implement and store them as Remote Function Call (RFC) enabled function modules in the Function Builder of the ABAP Workbench. You can call BAPIs as an ABAP program within SAP. You use RFCs to call BAPIs outside of SAP.

Use a BAPI/RFC transformation to create, change, or delete data in mySAP applications. When you run a session with a BAPI/RFC transformation, the PowerCenter Integration Service makes the RFC function calls to SAP to process SAP data.

You can use a BAPI/RFC transformation for one of the following reasons:

Migrate data to SAP.   For example, your organization uses PeopleSoft applications for enterprise   purchasing management. You want to migrate to mySAP applications to manage   enterprise purchasing. Use PowerExchange for for PeopleSoft to extract data   from PeopleSoft and a BAPI/RFC transformation to write purchase management   data to mySAP applications.
Synchronize data in SAP.   For example, a mySAP application contains customer orders. You want to add   line items to some of the orders. You can use a BAPI/RFC transformation to   make a BAPI/RFC call to add the line items to the orders in SAP.

Data Migration Integration

You can migrate data from legacy applications, from other ERP systems, or data from any number of other sources and prepare it for input into mySAP applications. The PowerCenter Integration Service extracts the data from the data source and prepares the data in an SAP format flat file that you can load into mySAP applications.

Business Content Integration

You can integrate PowerCenter with mySAP applications to provide an efficient, high-volume data warehousing solution. SAP business content is a collection of metadata objects that can be integrated with other applications and used for analysis and reporting. SAP produces the business content data, and PowerCenter consumes it. PowerCenter can consume all or changed business content data from mySAP applications and write it to a target data warehouse. You can then use the data warehouse to meet analysis and reporting needs.

PowerCenter and SAP NetWeaver BI Integration Methods

PowerExchange for SAP NetWeaver BI can perform the following tasks:

Extracting data from SAP BI
Loading data into SAP BI

PowerExchange for SAP NetWeaver BI interacts with InfoCubes and InfoSources. An InfoCube is a self-contained dataset created with data from one or more InfoSources. An InfoSource is a collection of data that logically belongs together, summarized into a single unit.

Communication Interfaces

TCP/IP is the native communication interface between PowerCenter and SAP NetWeaver. PowerCenter and SAP NetWeaver also use the following interfaces:

CPI-C
Remote   Function Call (RFC)

Common Program Interface-Communications

CPI-C is the communication protocol that enables online data exchange and data conversion between SAP and PowerCenter. PowerExchange for SAP NetWeaver uses CPI-C to communicate with SAP NetWeaver only when you run ABAP stream mode sessions.

To initialize CPI-C communication with PowerCenter, SAP NetWeaver requires information, such as the host name of the application server and SAP gateway. This information is stored in a configuration file named saprfc.ini on the node where the PowerCenter Integration Service process runs. The PowerCenter Integration Service uses parameters in the saprfc.ini file to connect to SAP NetWeaver when you run ABAP stream mode sessions.

Remote Function Call -RFC

RFC is the remote communications protocol used by SAP NetWeaver. It is based on Remote Procedure Call (RPC). PowerCenter makes remote function calls to communicate with SAP NetWeaver.

To execute remote calls from PowerCenter, SAP NetWeaver requires information, such as the connection type and the service name and gateway on the application server. This information is stored in a configuration file named saprfc.ini on the node hosting the PowerCenter Client and the node where the PowerCenter Integration Service and SAP BW Service processes run.

Transport System

The transport system is a set of ABAP programs installed on the SAP system. The ABAP programs import SAP metadata into the Model repository. The ABAP programs also enable run-time functionalities, such as passing mapping variables and filters.

You use the transport system in the following situations:

Configuring PowerExchange for SAP NetWeaver.

You need to transport some customized objects that were developed by Informatica to the SAP system. These objects include tables, structures, programs, and functions. The PowerCenter calls custom objects when you make a request to the SAP system.

Deploying run-time transports and ABAP programs from development to production.

To use ABAP to integrate with mySAP applications, deploy the run-time transports provided by Informatica and the ABAP programs installed by the PowerCenter to extract data when you move from development to production.

The SAP system administrator must install and configure transports on the SAP system.

Operating System Profiles for SAP Sessions

If the PowerCenter Integration Service uses operating system profiles, the operating system user specified in the operating system profile must have access to run time files.

Use the following rules and guidelines if the PowerCenter Integration Service uses operating system profiles for an SAP session:

The operating system profile must have permissions to   access the \server\bin directory to read saprfc.ini and create the RFC trace   file.
If the session writes to SAP NetWeaver BI and there is a   filter condition in the InfoPackage, the operating system profile must have   permissions to access the BWParam directory.

Next will be Installing and Configuring PowerExchange for SAP NetWeaver

Skip to top

 

“” PowerExchange for Salesforce “”

PowerExchange for Salesforce integrates PowerCenter with Salesforce to extract data from Salesforce sources and write data to Salesforce targets. Salesforce sources and targets represent objects in the Salesforce object model. Salesforce objects are tables that correspond to tabs and other user interface elements on the Salesforce web site. For example, the Account object contains the information that appears in fields on the Salesforce Account tab. You can view, create, update, and delete data in Salesforce objects.

PowerExchange for Salesforce uses the Salesforce API to apply existing access policies, and it does not impose additional data access restrictions.

PowerExchange for Salesforce provides the following benefits:

* Data integration and migration. Integrate data from diverse data sources, including other applications, with Salesforce data. Also, use PowerCenter to migrate data from various data sources to Salesforce.

* Data analysis. Use the built-in data quality capability of PowerCenter to analyze and evaluate Salesforce data.

* Access to Salesforce functionality through PowerCenter. Gain full access to Salesforce objects as sources and targets in PowerCenter.

PowerCenter and Salesforce Integration: PowerExchange for Salesforce uses Secure Sockets Layer (SSL) to encrypt all data that crosses the Internet. It uses 128-bit SSL encryption. PowerExchange for Salesforce also uses 128-bit encryption before it stores user IDs, passwords, and session IDs in the repository or in temporary storage locations.

Designer and Salesforce Integration: You use the Designer to import Salesforce objects as Salesforce source or target definitions or as Salesforce Lookup, Merge, or Picklist transformations. You can then use those objects in mappings for seamless integration between PowerCenter and Salesforce. You can import both standard and custom Salesforce objects. Standard object types are objects packaged within Salesforce, such as Account, AccountPartner, Event, Opportunity, and Product. Custom object types extend the Salesforce data for an organization by defining data entities that are unique to the organization.

PowerCenter Integration Service and Salesforce Integration : The PowerCenter Integration Service connects to Salesforce to extract, transform, and load Salesforce data. The PowerCenter Integration Service uses a Salesforce login call to authenticate with the Salesforce service. You can specify the login server to use either a production environment or a special testing and development environment called the Salesforce Sandbox.

A connection object stores the Salesforce user ID, password, and end point URL information for the run-time connection. Each Salesforce source or target in a mapping references a Salesforce application connection object. You can use multiple Salesforce application connections in a mapping to access different sets of Salesforce data for the sources and targets.

The PowerCenter Integration Service uses the Salesforce security mechanism to authenticate users and manage sessions. The Salesforce API performs user ID and password authentication at the initiation of a run-time session.

At session run time, the PowerCenter Integration Service generates an SOQL query based on the Salesforce source definitions and field projections in the mapping. The SOQL language is a derivative of SQL. The Salesforce API performs SOQL syntax validation at run time.

Code Pages : Salesforce processes UTF-8 characters. The PowerCenter Integration Service handles Salesforce data based on the following data movement modes:

* ASCII. When the PowerCenter Integration Service runs in ASCII mode, it does not perform any code page validation or any data conversion. You might get inconsistent or truncated data if the PowerCenter Integration Service runs in ASCII mode but processes non-ASCII character data when writing to a Salesforce target.

* Unicode. When the PowerCenter Integration Service runs in Unicode mode, it converts data from the source character set to UCS-2, processes the data, and then converts the UCS-2 data to the target code page character set before loading the data. If a session writes to a Salesforce target, the PowerCenter Integration Service converts the UCS-2 data to the UTF-8 character set.

Installation and Configuration: This chapter provides information about upgrading, installing, and configuring PowerExchange for Salesforce.

Prerequisites: Before you upgrade or install PowerExchange for Salesforce, install or upgrade PowerCenter on the machine that runs the Integration Service.

Installing and Configuring PowerExchange for Salesforce To install and configure PowerExchange for Salesforce,  steps are:

1.  Register the plug-in.

2.  Configure the HTTP proxy options. Configure the HTTP proxy options for the PowerCenter Client and the PowerCenter Integration Service.

After you install and configure PowerExchange for Salesforce, you can create connections to access Salesforce. Create connection objects in the Workflow Manager so the PowerCenter Integration Service can connect to Salesforce.

Upgrading PowerExchange for Salesforce: When you upgrade PowerExchange for Salesforce, complete the following tasks:

1.  Upgrade the plug-in registration.

2.  Complete post-upgrade tasks.   Plug-in Registration After you install PowerExchange for Salesforce, register the plug-in with the repository. If you are upgrading from a previous version, update the plug-in registration when you register the plug-in.

A plug-in is an XML file that defines the functionality of PowerExchange for Salesforce. To register the plug-in, the repository must be running in exclusive mode. Use the Informatica Administrator or the pmrep RegisterPlugin command to register the plug-in.

The plug-in file for PowerExchange for Salesforce is pmsfdc.xml. When you install the Repository component, the installer copies pmsfdc.xml to the following directory:

<PowerCenter Installation Directory>/server/bin/native

Note: If you do not have the correct privileges to register the plug-in, contact the user who manages the PowerCenter Repository Service.

Skip to top

Creating a Folder in the PowerCenter Repository

Folders provide a way to organize and store all metadata in the repository, including mappings, schemas, and sessions. Folders are designed to be flexible to help you organize the repository logically. Each folder has a set of properties you can configure to define how users access the folder. For example, you can create a folder that allows all users to see objects within the folder, but not to edit them.

Folder Permissions:

Permissions allow users to perform tasks within a folder. With folder permissions, you can control user access to the folder and the tasks you permit them to perform. Folder permissions work closely with privileges.

Privileges grant access to specific tasks, while permissions grant access to specific folders with read, write, and execute access.

Folders have the following types of permissions:

* Read permission. You can view the folder and objects in the folder.

* Write permission. You can create or edit objects in the folder.

* Execute permission. You can run or schedule workflows in the folder.

When you create a folder, you are the owner of the folder. The folder owner has all permissions on the folder which cannot be changed.

Connecting to the Repository To complete this tutorial, you need to connect to the PowerCenter repository. To connect to the repository:

1. Launch the PowerCenter Repository Manager.

2. Click Repository >> Add Repository.

The Add Repository dialog box appears.

3. Enter the repository and user name. Use the name of the repository in “PowerCenter Repository and User Account” Use the name of the user account you created in “Creating a User”

4. Click OK. The repository appears in the Navigator.

5. Click Repository >> Connect or double-click the repository to connect.

The Connect to Repository dialog box appears.

6. In the connection settings section, click Add to add the domain connection information. The Add Domain dialog box appears.

7. Enter the domain name, gateway host, and gateway port number from “Domain”.

8. Click OK. If a message indicates that the domain already exists, click Yes to replace the existing domain.

9. In the Connect to Repository dialog box, enter the password for the Administrator user.

10. Select the Native security domain.

11. Click Connect.

Creating a Folder:

You create a folder where you will define the data sources and targets, build mappings, and run workflows in later. To create a new folder:

1. In the Repository Manager, click Folder >> Create.

2. Enter your name prefixed by Tutorial_ as the name of the folder. By default, the user account logged in is the owner of the folder and has full permissions on the folder.

3. Click OK. The Repository Manager displays a message that the folder has been successfully created.

4. Click OK.

The new folder appears as part of the repository.

5. Exit the Repository Manager.

That it’s “piece of cake ” right

 

Skip to top

 

Creating Source Definitions

Now that you have added the source tables containing sample data, you are ready to create the source definitions in the repository. The repository contains a description of source tables, not the actual data contained in them. After you add these source definitions to the repository, you use them in a mapping. To import the sample source definitions:

1. In the Designer, click Tools >> Source Analyzer to open the Source Analyzer.

2. Double-click the tutorial folder to view its contents.

Every folder contains nodes for sources, targets, schemas, mappings, mapplets, cubes, dimensions and reusable transformations.

3. Click Sources > Import from Database.

4. Select the ODBC data source to access the database containing the source tables.

5. Enter the user name and password to connect to this database. Also, enter the name of the source table owner, if necessary.

Use the database connection information you entered in “PowerCenter Source and Target”

Oracle, the owner name is the same as the user name. Make sure that the owner name is in all caps. For example, JDOE.

6. Click Connect.

7. In the Select tables list, expand the database owner and the TABLES heading. If you click the All button, you can see all tables in the source database. A list of all the tables you created by running the SQL script appears in addition to any tables already in the database.

8. Select the following tables:

* CUSTOMERS

* DEPARTMENT

* DISTRIBUTORS

* EMPLOYEES

* ITEMS

* ITEMS_IN_PROMOTIONS

* JOBS * MANUFACTURERS

* ORDERS

* ORDER_ITEMS

* PROMOTIONS * STORES

Hold down the Ctrl key to select multiple tables. Or, hold down the Shift key to select a block of tables. You may need to scroll down the list of tables to select all tables.

Note: Database objects created in Informix databases have shorter names than those created in other types of databases. For example, the name of the table ITEMS_IN_PROMOTIONS is shortened to ITEMS_IN_PROMO.

9. Click OK to import the source definitions into the repository. The Designer displays the newly imported sources in the workspace. You can click Layout >>:

Scale to Fit to fit all the definitions in the workspace.

This new entry has the same name as the ODBC data source to access the sources you just imported. If you double-click the DBD node, the list of all the imported sources appears.

Viewing Source Definitions You can view details for each source definition.

To view a source definition:

1. Double-click the title bar of the source definition for the EMPLOYEES table to open the EMPLOYEES source definition.

The Edit Tables dialog box appears and displays all the properties of this source definition. The Table tab shows the name of the table, business name, owner name, and the database type. You can add a comment in the Description section.

2. Click the Columns tab. The Columns tab displays the column descriptions for the source table.

The source definition must match the structure of the source table. Therefore, you must not modify source column definitions after you import them.

3. Click the Metadata Extensions tab. Metadata extensions allow you to extend the metadata stored in the repository by associating information with individual repository objects. For example, you can store contact information, such as name or email address, with the sources you create.

You create user-defined metadata extensions that define the date you created the source definition and the name of the person who created the source definition.

4. Click the Add button to add a metadata extension.

5. Name the new row SourceCreationDate and enter today’s date as the value.

6. Click the Add button to add another metadata extension and name it SourceCreator.

7. Enter your first name as the value in the SourceCreator row.

8. Click Apply.

9. Click OK to close the dialog box.

10. Click Repository >> Save to save the changes to the repository. Creating Target Definitions and Target Tables

You can import target definitions from existing target tables, or you can create the definitions and then generate and run the SQL to create the target tables. In this lesson, you create a target definition in the Target Designer, and then create a target table based on the definition.

Creating Target Definitions: >> The next step is to create the metadata for the target tables in the repository. The actual tables that the target definitions describe do not exist yet.

Target definitions define the structure of tables in the target database, or the structure of file targets the Integration Service creates when you run a session. If you add a relational target definition to the repository that does not exist in a database, you need to create target table. You do this by generating and executing the necessary SQL code within the Target Designer.

>> steps, you copy the EMPLOYEES source definition into the Target Designer to create the target definition. Then, you modify the target definition by deleting and adding columns to create the definition you want.

To create the X_EMPLOYEES target definition:

1. In the Designer, click Tools >> Target Designer to open the Target Designer.

2. Drag the EMPLOYEES source definition from the Navigator to the Target Designer workspace. The Designer creates a new target definition, EMPLOYEES, with the same column definitions as the EMPLOYEES source definition and the same database type. Next, modify the target column definitions.

3. Double-click the EMPLOYEES target definition to open it.

4. Click Rename and name the target definition T_EMPLOYEES.

Note: If you need to change the database type for the target definition, you can select the correct database type when you edit the target definition.

5. Click the Columns tab. The target column definitions are the same as the EMPLOYEES source definition.

(( 1. Add button. 2. Delete button ))

6. Select the JOB_ID column and click the delete button.

7. Delete the following columns:

¨ ADDRESS1 ¨ CITY ¨ STATE ¨ HOME_PHONE

When you finish, the target definition should look similar to the following target definition:

that the EMPLOYEE_ID column is a primary key. The primary key cannot accept null values. The Designer selects Not Null and disables the Not Null option. You now have a column ready to receive data from the EMPLOYEE_ID column in the EMPLOYEES source table. Note: If you want to add a business name for any column, scroll to the right and enter it.

8. Click OK to save the changes and close the dialog box.

9. Click Repository > Save

That its, welcome any further question – or + comments welcome here.                                                                              Skip to top

Mehboob MCTS & MCITP

Informatica Administrator:

Informatica Administrator is a web application that you use to administer the PowerCenter domain and PowerCenter security. You can also administer application services for the Informatica Analyst and Informatica Developer. Application services for Informatica Analyst and Informatica Developer include the Analyst Service, the Model Repository Service, and the Data Integration Service.

Domain Page Administer the Informatica domain on the Domain page of the Administrator tool. Domain objects include services, nodes, and licenses.

You can complete the following tasks in the Domain page:  Manage application services. Manage all application services in the domain, such as the Integration Service and Repository Service.

Configure nodes. Configure node properties, such as the backup directory and resources. You can also shut down and restart nodes.   Manage domain objects. Create and manage objects such as services, nodes, licenses, and folders. Folders allow you to organize domain objects and manage security by setting permissions for domain objects.

View and edit domain object properties. View and edit properties for all objects in the domain, including the domain object.

View log events. Use the Log Viewer to view domain, PowerCenter Integration Service, SAP BW Service, Web Services Hub, and PowerCenter Repository Service log events.

Generate and upload node diagnostics. You can generate and upload node diagnostics to the Configuration Support Manager. In the Configuration Support Manager, you can diagnose issues in your Informatica environment and maintain details of your configuration. Other domain management tasks include applying licenses and managing grids and resources.

Security Tab You administer PowerCenter security on the Security tab of Informatica Administrator. You manage users and groups that can log in to the following PowerCenter applications:

* Administrator tool

* PowerCenter Client

* Metadata Manager

* Data Analyzer

You can also manager users and groups for the Informatica Developer and Informatica Analyst.

You can complete the following tasks in the Security page:

* Manage native users and groups. Create, edit, and delete native users and groups.

* Configure LDAP authentication and import LDAP users and groups. Configure a connection to an LDAP directory service. Import users and groups from the LDAP directory service.

* Manage roles. Create, edit, and delete roles. Roles are collections of privileges. Privileges determine the actions that users can perform in PowerCenter applications.

* Assign roles and privileges to users and groups. Assign roles and privileges to users and groups for the domain, PowerCenter Repository Service, Metadata Manager Service, or Reporting Service.

* Manage operating system profiles. Create, edit, and delete operating system profiles. An operating system profile is a level of security that the Integration Services uses to run workflows. The operating system profile contains the operating system user name, service process variables, and environment variables. You can configure the Integration Service to use operating system profiles to run workflows.

Domain Configuration The Service Manager maintains configuration information for an Informatica domain in relational database tables. The configuration is accessible to all gateway nodes in the domain. The domain configuration database stores the following types of information about the domain:

* Domain configuration. Domain metadata such as the host names and the port numbers of nodes in the domain. The domain configuration database also stores information on the master gateway node and all other nodes in the domain.

* Usage. Includes CPU usage for each application service and the number of Repository Services running in the domain.

* Users and groups. Information on the native and LDAP users and the relationships between users and groups. * Privileges and roles. Information on the privileges and roles assigned to users and groups in the domain. Each time you make a change to the domain, the Service Manager updates the domain configuration database. For example, when you add a node to the domain, the Service Manager adds the node information to the domain configuration. All gateway nodes connect to the domain configuration database to retrieve the domain information and to update the domain configuration.

PowerCenter Client: The PowerCenter Client application consists of the tools to manage the repository and to design mappings, mapplets, and sessions to load the data. The PowerCenter Client application has the following tools:

* Designer. Use the Designer to create mappings that contain transformation instructions for the Integration Service.

* Mapping Architect for Visio. Use the Mapping Architect for Visio to create mapping templates that generate multiple mappings.

* Repository Manager. Use the Repository Manager to assign permissions to users and groups and manage folders.

* Workflow Manager. Use the Workflow Manager to create, schedule, and run workflows. A workflow is a set of instructions that describes how and when to run tasks related to extracting, transforming, and loading data.

* Workflow Monitor. Use the Workflow Monitor to monitor scheduled and running workflows for each Integration Service.

Install the client application on a Microsoft Windows computer.

PowerCenter Designer: The Designer has the following tools that you use to analyze sources, design target schemas, and build source-totarget mappings:

* Source Analyzer. Import or create source definitions.

* Target Designer. Import or create target definitions.

* Transformation Developer. Develop transformations to use in mappings. You can also develop user-defined functions to use in expressions.

* Mapplet Designer. Create sets of transformations to use in mappings.

* Mapping Designer. Create mappings that the Integration Service uses to extract, transform, and load data. You can display the following windows in the Designer:

* Navigator. Connect to repositories and open folders within the Navigator. You can also copy objects and

create shortcuts within the Navigator.

* Workspace. Open different tools in this window to create and edit repository objects, such as sources, targets, mapplets, transformations, and mappings.

* Output. View details about tasks you perform, such as saving your work or validating a mapping.

More to come and untill leave your comments both – or + will be great.                                                                                               Skip to top

Mehboob

MCTS & MCITP

7 thoughts on “Informatica

  1. informatica 03/09/2013 at 06:29 Reply

    attractive piece of information, I had come to know about your blog from my friend arjun, ahmedabad,i have read atleast eleven posts of yours by now, and let me tell you, your website gives the best and the most interesting information. This is just the kind of information that i had been looking for, i’m already your rss reader now and i would regularly watch out for the new posts, once again hats off to you! Thanks a lot once again, Regards.

    Like

  2. click the following article 04/08/2013 at 11:00 Reply

    Great article, totally what I needed.

    Like

  3. amoureuse 04/14/2013 at 22:03 Reply

    This is a great tip especially to those new to the blogosphere.
    Brief but very accurate info… Appreciate your sharing this
    one. A must read post!

    Like

  4. icsddzsai@gmail.com 10/23/2013 at 08:37 Reply

    Whata??s Taking place i am new to this, I stumbled on this I’ve located It positively helpful and it has aided me out hundreds.I hope to contribute & assist other consumers like its aided me.Excellent occupation

    Like

  5. nox360 10/27/2013 at 14:08 Reply

    It’s hard to come by well-informed people about this subject,
    but you seem like you know what you’re talking about! Thanks

    Like

  6. here i found it 11/10/2013 at 10:07 Reply

    Keep on writing, great job!

    Like

  7. Philip 05/28/2014 at 21:27 Reply

    Hi my loved one! I wish to say that this article iis amazing,
    nice written and include almost all significant infos.
    I’d like to see extra posts like tthis .

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: