![]() More information on setting up DBFS can be found in Oracle Database SecureFiles and Large Objects Developer's Guide.īecause ETL can become complex and suffer from poor performance, Oracle Database provides a user interface that enables you to monitor and report on database operations that are part of an ETL plan.Ī database operation is a user-defined logical object that contains a set of related database tasks, for example an ETL processing job, defined by end users or application code. Oracle recommends that you create the DBFS in a separate database from the data warehouse, and that the file system be mounted using the DIRECT_IO option to avoid contention on the system page cache while moving the raw data files in and out of the file system. DBFS is similar to NFS in that it provides a shared network file system that looks like a local file system. DBFS creates a mountable file system which can be used to access files stored in the database as SecureFiles LOBs. It is highly recommended that you stage your raw data across as many physical disks as possible to ensure the reading of the raw data is not a bottleneck during the load.Īn excellent place to stage the data is in an Oracle Database File System (DBFS). ![]() The overall speed of your load is determined by how quickly the raw data can be read from the staging area and written to the target table in the database. Figure 18-2 illustrates the new functionality, which is discussed throughout later sections. The underlying database has to enable the most appropriate ETL process flow for a specific customer need, and not dictate or constrain it from a technical perspective. It is important to understand that the database offers toolkit functionality rather than trying to address a one-size-fits-all solution. Oracle offers a wide variety of new capabilities to address all the issues and tasks relevant in an ETL scenario. The task shifts from serial transform-then-load process (with most of the tasks done outside the database) or load-then-transform process, to an enhanced transform-while-loading. The new functionality renders some of the former necessary process steps obsolete while some others can be remodeled to enhance the data flow and the data transformation to become more scalable and non-interruptive. The ETL process flow can be changed dramatically and the database becomes an integral part of the ETL solution. Doing so may provide better performance than performing each step independently, but it may also introduce difficulties in modifying, adding, or dropping individual transformations, as well as recovering from failed transformations. It may also be possible to combine many simple logical transformations into a single SQL statement or single PL/SQL procedure. ![]() However, a disadvantage to multistaging is that the space and time requirements increase. This load-then-transform strategy also provides a natural checkpointing scheme to the entire transformation process, which enables the process to be more easily monitored and restarted. When using Oracle Database as a transformation engine, a common strategy is to implement each transformation as a separate SQL operation and to create a separate, temporary staging table (such as the tables new_sales_step1 and new_sales_step2 in Figure 18-1) to store the incremental results for each step. This chapter does not seek to illustrate all of the typical transformations that would be encountered in a data warehouse, but to demonstrate the types of fundamental technology that can be applied to implement these transformations and to provide guidance in how to choose the best techniques.įigure 18-1 Multistage Data Transformationĭescription of "Figure 18-1 Multistage Data Transformation" However, the transformation techniques introduced in this chapter meet the majority of real-world data transformation requirements, often with more scalability and less programming than alternative approaches. ![]() Real-world data transformations are often considerably more complex. The examples in this chapter are relatively simple. This chapter introduces techniques for implementing scalable and efficient data transformations within the Oracle Database. Many, if not all, data transformations can occur within an Oracle database, although transformations are often implemented outside of the database (for example, on flat files) as well. They can range from simple data conversions to extremely complex data scrubbing techniques. Data transformations are often the most complex and, in terms of processing time, the most costly part of the extraction, transformation, and loading (ETL) process.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |