چه کسانی این کتاب را می‌خوانند

دانشجوعلاقه‌مند یادگیری
کتابخوان حرفه‌ایلذت مطالعه
نویسندهالهام‌گیری

Pentaho 3. 2 Data Integration : Explore, transform, validate, and integrate your data with ease: Beginner's Guide

María Carina Roldán

قیمت نهایی

۴۰٬۰۰۰ تومان۴۹٬۰۰۰ تومان۱۸٪ تخفیف
  • تخفیف زمان‌دار−۹٬۰۰۰ تومان

۹٬۰۰۰ تومان صرفه‌جویی نسبت به قیمت اصلی

بلافاصله پس از خرید، فایل کتاب روی دستگاه شما آمادهٔ دانلود است.

تحویل فوری
پرداخت امن
ضمانت فایل
پشتیبانی

نسخه اصلی و اورجینال

فایل دیجیتال کامل و بدون دستکاری — همان نسخه‌ای که پس از خرید دریافت می‌کنید.

مشخصات کتاب

نویسنده
María Carina Roldán
سال انتشار
۲۰۱۰
فرمت
PDF
زبان
انگلیسی
تعداد صفحات
۵ صفحه
حجم فایل
۱۰ مگابایت
شابک
9781847199546، 9781847199553، 1847199542، 1847199550

دربارهٔ کتاب

Pentaho Data Integration (a.k.a. Kettle) is a full-featured open source ETL (Extract, Transform, and Load) solution. Although PDI is a feature-rich tool, effectively capturing, manipulating, cleansing, transferring, and loading data can get complicated.This book is full of practical examples that will help you to take advantage of Pentaho Data Integration’s graphical, drag-and-drop design environment. You will quickly get started with Pentaho Data Integration by following the step-by-step guidance in this book. The useful tips in this book will encourage you to exploit powerful features of Pentaho Data Integration and perform ETL operations with ease.Starting with the installation of the PDI software, this book will teach you all the key PDI concepts. Each chapter introduces new features, allowing you to gradually get involved with the tool. First, you will learn to work with plain files, and to do all kinds of data manipulation. Then, the book gives you a primer on databases and teaches you how to work with databases inside PDI. Not only that, you’ll be given an introduction to data warehouse concepts and you will learn to load data in a data warehouse. After that, you will learn to implement simple and complex processes.Once you’ve learned all the basics, you will build a simple datamart that will serve to reinforce all the concepts learned through the book.Explore, transform, validate, and integrate your data with easeGet started with Pentaho Data Integration from scratch.Enrich your data transformation operations by embedding Java and j#097;vascript code in PDI transformations.Create a simple but complete Datamart Project that will cover all key features of PDI.Part of Packt’s Beginner’s Guide series: Each task gradually develops from the previous task with an organized sequence of instructions accompanied with relevant explanation and a set of challenging tasks that the reader will be able to accomplish.What you will learn from this bookInstall Penataho Data Integration and get familiar with the graphical designer–SpoonWork with files to get data from simple input sources, preview, and send it back in any of the common output formatsPerform basic transformation operations on data such as performing calculations, adding constants, counting, filtering, ordering, and searching dataSolve real-world problems by manipulating the flow of data by combining or splitting itSolve sophisticated problems such as normalizing data from pivoted tables with easeExplore the various PDI options to validate data and to handle errorsConnect to a database engine of your choice to get and store information coming from any sourcePerform advanced operations with databases such as loading data warehouses dimensionsCreate advanced processes such as nesting jobs, iterating on jobs and transformations, and creating subtransformationsImplement business processes by scheduling tasks, checking conditions, organizing files and folders, sending emails with alerts, and treating errors in a way that meets your requirements Cover......Page 1 Copyright......Page 3 Credits......Page 4 Foreword......Page 5 The Kettle Project......Page 7 About the Author......Page 10 About the Reviewers......Page 11 Table of Contents......Page 14 Preface......Page 24 Pentaho Data Integration and Pentaho BI Suite......Page 30 Pentaho Data Integration......Page 32 Loading data warehouses or data marts......Page 34 Data cleansing......Page 35 Integrating PDI using Pentaho BI......Page 36 Time for action – installing PDI......Page 37 Time for action – starting and customizing Spoon......Page 38 Setting preferences in the Options window......Page 41 Storing transformations and jobs in a repository......Page 42 Time for action – creating a hello world transformation......Page 43 Directing the Kettle engine with transformations......Page 48 Exploring the Spoon interface......Page 49 transformation......Page 50 Time for action – installing MySQL on Windows......Page 52 Time for action – installing MySQL on Ubuntu......Page 55 Summary......Page 57 Reading data from files......Page 58 Time for action – reading results of football matches from files......Page 59 Input steps......Page 64 Text file input step......Page 65 Text file input step and regular expressions......Page 66 Regular expressions......Page 67 Grids......Page 69 Time for action – sending the results of matches to a plain file......Page 70 Output files......Page 72 Rowset......Page 73 Streams......Page 74 Getting system information......Page 75 Time for action – updating a file with news about examinations......Page 76 Getting information by using Get System Info step......Page 80 Date fields......Page 81 Numeric fields......Page 82 a terminal window......Page 83 about countries......Page 85 What is XML......Page 90 XPath......Page 91 Configuring the Get data from XML step......Page 92 How and when you can use variables......Page 93 Summary......Page 95 Basic calculations......Page 96 Calculator step......Page 97 Adding or modifying fields by using different PDI steps......Page 105 The Calculator step......Page 106 Formula step......Page 107 Calculations on groups of rows......Page 111 grouping data......Page 112 Group by step......Page 117 Time for action – counting frequent words by filtering......Page 120 Filtering rows using the Filter rows step......Page 126 Time for action – finding out which language people speak......Page 128 The Stream lookup step......Page 132 Summary......Page 135 Splitting streams......Page 136 a dataset......Page 137 Copying rows......Page 142 Distributing rows......Page 143 Time for action – assigning tasks by distributing......Page 144 Splitting the stream based on conditions......Page 148 Filter rows step......Page 149 PDI steps for splitting the stream based on conditions......Page 151 Switch/ Case step......Page 152 Merging streams......Page 154 Time for action – gathering progress and merging all together......Page 155 PDI options for merging streams......Page 157 Append Stream......Page 160 Summary......Page 162 Doing simple tasks with the JavaScript step......Page 164 Time for action – calculating scores with JavaScript......Page 165 Using the JavaScript language in PDI......Page 170 Inserting JavaScript code using the Modified Java Script Value step......Page 171 Modifying fields......Page 173 Testing your code......Page 174 Time for action – testing the calculation of averages......Page 175 Testing the script using the Test script button......Page 176 Time for action – calculating flexible scores by using variables......Page 177 Using named parameters......Page 181 Using transformation predefined constants......Page 182 JavaScript......Page 185 Looking at previous rows......Page 187 Avoiding coding by using purpose-built steps......Page 188 Summary......Page 190 Converting rows to columns......Page 192 rows to columns......Page 193 Converting row data to column data by using the Row denormalizer step......Page 196 Aggregating data with a Row denormalizer step......Page 199 by country......Page 200 Using Row denormalizer for aggregating data......Page 201 the dataset......Page 203 Modifying the dataset with a Row Normalizer step......Page 205 Summarizing the PDI steps that operate on sets of rows......Page 207 Generating a custom time dimension dataset by using Kettle variables......Page 209 Time for action – creating the time dimension dataset......Page 210 Getting variables......Page 214 starting date......Page 215 Using the Get Variables step......Page 216 Summary......Page 217 Capturing errors......Page 218 of a film......Page 219 Using PDI error handling functionality......Page 223 Aborting a transformation......Page 224 Time for action – aborting when there are too many errors......Page 225 Time for action – treating errors that may appear......Page 226 Treating rows coming to the error stream......Page 228 Time for action – validating genres with a Regex Evaluation step......Page 229 Validating data......Page 231 Time for action – checking films file with the Data Validator......Page 232 Defining simple validation rules using the Data Validator......Page 234 Cleansing data......Page 236 Summary......Page 238 Introducing the Steel Wheels sample database......Page 240 database......Page 242 Connecting with Relational Database Management Systems......Page 245 Exploring the Steel Wheels database......Page 246 Time for action – exploring the sample database......Page 247 A brief word about SQL......Page 248 Exploring any configured database with the PDI Database explorer......Page 251 Time for action – getting data about shipped orders......Page 252 Getting data from the database with the Table input step......Page 254 Using the SELECT statement for generating a new dataset......Page 255 parameters......Page 257 Making flexible queries by using Kettle variables......Page 259 variables......Page 260 Time for action – loading a table with a list of manufacturers......Page 262 Inserting new data into a database table with the Table output step......Page 268 existent ones......Page 269 Time for action – testing the update of existing products......Page 272 Inserting or updating data with the Insert/Update step......Page 274 Time for action – deleting data about discontinued items......Page 279 Deleting records of a database table with the Delete step......Page 282 Summary......Page 283 Time for action – populating the Jigsaw database......Page 284 Exploring the Jigsaw database model......Page 287 of products to buy......Page 289 Looking up values in a database with the Database lookup step......Page 291 suggested products to buy......Page 293 Joining data from the database to the stream data by using a Database join step......Page 295 Introducing dimensional modeling......Page 298 Combination lookup/update step......Page 299 region dimension......Page 302 Describing data with dimensions......Page 304 Loading Type I SCD with a Combination lookup/update step......Page 305 Dimension lookup/update step......Page 309 of product changes......Page 311 Keeping an entire history of data with a Type II slowly changing dimension......Page 312 Loading Type II SCDs with the Dimension lookup/update step......Page 314 Summary......Page 319 Introducing PDI jobs......Page 320 Time for action – creating a simple hello world job......Page 321 Executing processes with PDI jobs......Page 328 Using Spoon to design and run jobs......Page 329 Using the transformation job entry......Page 330 arguments and parameters......Page 332 Running jobs from a terminal window......Page 335 window......Page 336 Using named parameters and command-line arguments in transformations......Page 337 fixed arguments and parameters......Page 338 Deciding between the use of a command-line argument and a named parameter......Page 340 administrator if something is wrong......Page 341 Changing the flow of execution on the basis of conditions......Page 347 Creating and using a file results list......Page 349 Summary......Page 350 Enhancing your processes with the use of variables......Page 352 by setting a variable with the name of the file......Page 353 Setting variables inside a transformation......Page 358 Time for action – generating files with top scores......Page 360 subtransformation......Page 364 Creating and using subtransformations......Page 368 copying and getting rows......Page 371 Transferring data between transformations by using the copy /get rows mechanism......Page 375 nesting jobs......Page 377 Running a job inside another job with a job entry......Page 378 Understanding the scope of variables......Page 379 Iterating jobs and transformations......Page 380 transformation for every input row......Page 381 Executing for each row......Page 384 Summary......Page 389 Exploring the sales datamart......Page 390 Loading the dimensions......Page 393 Time for action – loading dimensions for the sales datamart......Page 394 Extending the sales datamart model......Page 399 dimensions......Page 401 Getting the information from the source with SQL queries......Page 407 Obtaining the surrogate key for a Type I SCD......Page 411 Obtaining the surrogate key for a Type II SCD......Page 412 Obtaining the surrogate key for the Time dimension......Page 414 obtained from the command line......Page 417 Time for action – loading the sales star......Page 419 Time for action – automating the loading of the sales datamart......Page 422 Summary......Page 426 PDI best practices......Page 428 Extending Kettle with plugins......Page 431 Overcoming real world risks with some remote execution......Page 433 Scaling out to overcome bigger risks......Page 434 PDI as a process action......Page 435 PDI as a datasource......Page 436 More about the Pentaho suite......Page 437 PDI Enterprise Edition and Kettle Developer Support......Page 438 Summary......Page 439 Appendix A: Working with Repositories......Page 440 Time for action – creating a PDI repository......Page 441 Creating repositories to store your transformations and jobs......Page 443 Time for action – logging into a repository......Page 444 Defining repository user accounts......Page 445 Creating transformations and jobs in repository folders......Page 446 Examining and modifying the contents of a repository with the Repository explorer......Page 447 Migrating from a file-based system to a repository-based system and vice-versa......Page 449 Summary......Page 450 Running transformations and jobs stored in files......Page 452 Running transformations and jobs from a repository......Page 453 Specifying command line options......Page 454 Providing options when running Pan and Kitchen......Page 455 Variables......Page 456 Appendix C: Quick Reference: Steps and Job Entries......Page 458 Transformation steps......Page 459 Job entries......Page 463 General shortcuts......Page 466 Designing transformations and jobs......Page 467 Repositories......Page 468 Experiencing the mouse-over assistance......Page 470 Using the mouse-over assistance toolbar......Page 471 Experiencing the job drill-down feature......Page 472 Summary......Page 473 PDI basics......Page 474 splitting a stream......Page 475 PDI error handling......Page 476 loading slowly changing dimensions......Page 477 deciding the scope of variables......Page 478 remote execution and clustering......Page 479 Index......Page 480 Explore, transform, validate, and integrate your data with ease Get started with Pentaho Data Integration from scratch. Enrich your data transformation operations by embedding Java and JavaScript code in PDI transformations. Create a simple but complete Datamart Project that will cover all key features of PDI. Part of Packt's Beginner's Guide series: Each task gradually develops from the previous task with an organized sequence of instructions accompanied with relevant explanation and a set of challenging tasks that the reader will be able to accomplish. In Detail Pentaho Data Integration (a.k.a. Kettle) is a full-featured open source ETL (Extract, Transform, and Load) solution. Although PDI is a feature-rich tool, effectively capturing, manipulating, cleansing, transferring, and loading data can get complicated. This book is full of practical examples that will help you to take advantage of Pentaho Data Integration's graphical, drag-and-drop design environment. You will quickly get started with Pentaho Data Integration by following the step-by-step guidance in this book. The useful tips in this book will encourage you to exploit powerful features of Pentaho Data Integration and perform ETL operations with ease. Starting with the installation of the PDI software, this book will teach you all the key PDI concepts. Each chapter introduces new features, allowing you to gradually get involved with the tool. First, you will learn to work with plain files, and to do all kinds of data manipulation. Then, the book gives you a primer on databases and teaches you how to work with databases inside PDI. Not only that, you'll be given an introduction to data warehouse concepts and you will learn to load data in a data warehouse. After that, you will learn to implement simple and complex processes. Once you've learned all the basics, you will build a simple datamart that will serve to reinforce all the concepts learned through the book. A practical, easy-to-read guide that gives you full understanding of the Pentaho Data Integration tool and shows you how to use it to your advantage to manipulate data As part of Packt's Beginner's Guide, this book focuses on teaching by example. The book walks you through every aspect of PDI, giving step-by-step instructions in a friendly style, allowing you to learn in front of your computer, playing with the tool. The extensive use of drawings and screenshots make the process of learning PDI easy. Throughout the book numerous tips and helpful hints are provided that you will not find anywhere else. The book provides short, practical examples and also builds from scratch a small datamart intended to reinforce the learned concepts and to teach you the basics of data warehousing. This book is for software developers, database administrators, IT students, and everyone involved or interested in developing ETL solutions, or, more generally, doing any kind of data manipulation. If you have never used PDI before, this will be a perfect book to start with. You will find this book is a good starting point if you are a database administrator, data warehouse designer, architect, or any person who is responsible for data warehouse projects and need to load data into them. You don't need to have any prior data warehouse or database experience to read this book. Fundamental database and data warehouse technical terms and concepts are explained in easy-to-understand language.

قیمت نهایی

۴۰٬۰۰۰ تومان