A I A N A L Y T I C S

About Course

More and more organizations are taking on the challenge of analyzing big data. This course teaches you how to use the Hadoop technologies.
In this course, you’ll learn how to use technologies like Hive, Pig, Oozie, and Sqoop with Hadoop . you will learn how big data is driving organisational change and the key challenges organizations face when trying to analyse massive data sets. You will learn fundamental techniques, such as data mining and stream processing.


You will also learn how to design and implement PageRank algorithms using MapReduce, a programming paradigm that allows for massive scalability across hundreds or thousands of servers in a Hadoop cluster. You will learn how big data has improved web search and how online advertising systems work. By the end of this course, you will have a better understanding of the various applications of big data methods in industry and research.

01
Introduction To Big Data
  1. What is Big data
  2. Big Data opportunities
  3. Big Data Challenges
  4. Characteristics of Big data
  5. Real Time Big Data Use cases
02
Introduction To Hadoop
  1. Hadoop Distributed File System
  2. Comparing Hadoop & SQL
  3. Industries using Hadoop
  4. Data Locality
  5. Hadoop Architecture
  6. Map Reduce & HDFS
  7. Using the Hadoop single node image (Clone)
03
The Hadoop Distributed File System (HDFS): Storage.
  1. HDFS Design & Concepts
  2. Blocks, Name nodes and Data nodes
  3. Anatomy of File Write and Read
  4. Hadoop DFS the Command-Line Interface
  5. Basic File System Operations
  6. Multi Node Cluster Setup and its operations
  7. More detailed explanation about Configuration files
  8. Metadata, FS image, Edit log, Secondary Name Node and Safe Mode
  9. FSCK Utility. (Block report)
  10. How to add New Data Node dynamically
  11. HDFS High-Availability and HDFS Federation
  12. How to decommission a Data Node dynamically (Without stopping cluster)
  13. How to override default configuration at system level and Programming level
  14. HDFS Federation
  15. ZOOKEEPER Leader Election Algorithm
  16. Java API for HDFS Commands
  17. Exercise and small use case on HDFS
04
Core Hadoop (Map-Reduce)
  1. Functional Programming Basics
  2. Map and Reduce Basics and its Architecture
  3. Anatomy of a Map Reduce Job Run
  4. Legacy Architecture ->Job Submission, Job Initialization, Task Assignment, Task
  5. Execution, Progress and Status Updates
  6. Job Completion, Failures, Shuffling and Sorting
  7. Splits, Record reader, Partition, Types of partitions & Combiner
  8. Hands on “Word Count” in Map/Reduce in standalone and Pseudo distribution Mode
  9. Types of Schedulers and Counters
  10. Getting the data from RDBMS into HDFS using Custom data types
  11. Distributed Cache and Hadoop Streaming (Python, Ruby and R)
  12. Sequential Files and Map Files
  13. Optimization Techniques -> Speculative Execution, JVM Reuse and No. Slots
  14. Enabling Compressions and Compression Codec’s
  15. Map side Join with distributed Cache
  16. Secondary Sorting. Creating custom data types, comparators
  17. Types of I/O Formats: Multiple outputs, NLINE input format
  18. Handling small files using Combine File Input Format
  19. YARN and Hands on Practical session
  20. Hadoop 1.x VS Hadoop 2.x VS Hadoop 3.x
05
NOSQL Basics And HBSASE Database
  1. NOSQL
  2. ACID in RDBMS and BASE in NOSQL
  3. CAP Theorem and Types of Consistency
  4. Types of No SQL Databases in detail, Columnar Databases in Detail (HBASE and
    CASSANDRA).TTL, Bloom Filters and Compensation
  5. HBASE
  6. HBase concepts
  7. HBase Data Model and Comparison between RDBMS and NOSQL
  8. HBase Architecture, Master and Region Servers
  9. Block Cache and sharing
  10. HBase Operations (DDL and DML) through Shell and JAVA API Programming.
  11. Splits, Catalog Tables
  12. Data Modelling (Sequential, Salted, Promoted and Random Keys)
  13. Client Side Buffering and Process 1 million records using Client side Buffering
  14. HBASE Counters, Enabling Replication and HBASE RAW Scans
  15. HBASE Filters, Bulk Loading and Coprocessors (Endpoints and Observers)
  16. Real world use case consisting of HDFS, MR and HBASE
  17. Hadoop In Cloud
  18. Introduction to AWS (Amazon Web Service)
  19. Launching 4 Node Cluster in AWS and EMR
06
HIVE With MYSQL
  1. Introduction and Architecture
  2. Hive Services, Hive Shell, Hive Server and Hive Web Interface (HWI)
  3. Meta store, Types of Meta Stores
  4. Configuring External Meta store
  5. HIVE QL
  6. OLTP vs OLAP
  7. Working with Tables and different File Formats in HIVE
  8. Primitive data types and complex data types
  9. Working with Partitions, Multiple Inserts and dynamic Partitioning
  10. User Defined Functions
  11. Hive Bucketed Tables and Sampling
  12. External partitioned tables, Map the data to the partition in the table, writing the output of one query to another table, multiple inserts
  13. Differences between ORDER BY, DISTRIBUTE BY and SORT BY
  14. RC File. INDEXES and VIEWS.MAPSIDE JOINS
  15. Compression on hive tables and Migrating Hive tables
  16. Dynamic substation of Hive and Different ways of running Hive
  17. How to enable Update in HIVE
  18. Log Analysis on Hive
  19. Access HBASE tables using Hive
07
PIG, HCATALOG And INTEGRATIONS.
  1. PIG
  2. Execution Types
  3. Grunt Shell, Pig Latin, Data Processing, and Schema on read
  4. Primitive data types and complex data types
  5. Tuple schema, BAG Schema and MAP Schema
  6. Loading and Storing, Filtering, Grouping & Joining
  7. Debugging commands (Illustrate and Explain)
  8. Validations in PIG, Type casting in PIG
  9. Working with Functions, User Defined Functions
  10. Types of JOINS in pig and Replicated Join in detail
  11. SPLITS and Multiquery execution
  12. Error Handling, FLATTEN and ORDER BY
  13. Parameter Substitution, Nested For Each
  14. User Defined Functions, Dynamic Invokers and Macros
  15. How to access HBASE using PIG
  16. How to Load and Write JSON DATA using PIG
  17. Piggy Bank
  18. HCATALOG
  19. Installation
  20. Introduction to HCATALOG
  21. About Hcatalog with PIG, HIVE and MR
08
SQOOP, FLUME, OOZIE And ZOOKEEPER
  1. SQOOP
  2. Installation
  3. Import Data: (Full table, Only Subset, Target Directory, protecting Password, file format other than CSV, Compressing, Control Parallelism, All tables Import)
  4. Modelling sequences
  5. Incremental Import: (Import only New data, Last Imported data, storing Password in
    Metastore, Sharing Metastore between Sqoop Clients)
  6. Free Form Query Import
  7. Export data to RDBMS, HIVE and HBASE
  8. Hands on Exercises
  9. FLUME
  10. Installation
  11. Introduction to Flume
  12. Flume Agents: Sources, Channels and Sinks
  13. Log User information using Java program in to HDFS using LOG4J and Avro
    Source
  14. Log User information using Java program in to HBASE using LOG4J and Avro
    Source
  15. Log User information using Java program in to HDFS using Tail Source
  16. Use case of Flume: Flume the data from twitter in to HDFS and HBASE
  17. Do some analysis using HIVE and PIG
  18. OOZIE
  19. Workflow (Action, Start, Action, End, Kill, Join and Fork), Schedulers, Coordinators and Bundles
  20. Workflow to show how to schedule Sqoop Job, Hive, MR and PIG
  21. Real world Use case which will find the top websites used by users of certain ages and will be scheduled to run for every one hour
09
SCALA (Object Oriented And Functional Programming)
  1. Getting started With Scala
  2. Interactive Scala – REPL, data types, variables, expressions, simple functions
  3. Running the program with Scala Compiler
  4. Explore the type lattice and use type inference
  5. Define Methods and Pattern Matching
  6. Scala Environment Set up
  7. Scala set up on Windows
  8. Scala set up on UNIX
  9. What is Functional Programming
  10. Differences between OOPS and FPP
  11. Collections (Very Important for Spark)
  12. Iterating, mapping, filtering and counting
  13. Regular expressions and matching with them
  14. Maps, Sets, group By, Options, flatten, flat Map
  15. Word count, IO operations, file access, flatMap
  16. Object Oriented Programming
  17. Classes and Properties
  18. Objects, Packaging and Imports
  19. Traits
  20. Objects, classes, inheritance, Lists with multiple related types, apply Integrations
  21. What is SBT?
  22. Integration of Scala in Eclipse IDE
  23. Integration of SBT with Eclipse

Download Complete Course Details

Download Now

Pre-Requisites

  • Graduate /PG with computer science or a related numerical discipline.
  • Basic Knowledge of Java , Map Reduce, Data structure, SQL, to learn hadoop.
  • Knowledge of statistics and mathematics is beneficial.

Job Opportunities

  • Big Data Analytics Business Consultant
  • Big Data Analytics Architect
  • Big Data Engineer
  • Data Administrator
  • Big Data Solution Architect
  • Big Data Analyst
  • Analytics Associate
  • Data Analyst Business
  • Analyst Data/Analytics Manager
  • Business Intelligence Manager
  • Business Intelligence Analytics Consultant
  • Metrics and Analytics Specialist
0 %
Big Data Business Consultant
0 %
Data Analyst
0 %
Data Scientist

Satisfied Clients

Sanjana

I attended Big data hadoop course, training went on very well and I was able to explore in and out concepts in working with big data eco system. Trainer who taught me had a vast knowledge about the big data solutions and the exercise which the institute provided really helped me to understand the in depth idea of Big data. Trainer was very friendly and ready to provide help and support all the time. Never hesitated to clarify our questions. I would strong recommend this institute, if someone is looking for Big data hadoop training centre in Pune.

Tushar K

I am Tushar and completed my hadoop admin training in Pune at A.I. Analytics. My trainer is very professional and his way of approach toward the class was very interactive and interesting. He always used to ensure that everyone in the class is clear about the days hadoop training topics. I like to thank A.I. Analytics and my trainer for providing this big data training and placement in Pune

Yashesha

I completed my Big Data Training in A.I. analytics. The explanations given by trainer was nice and easy to learn. He will move to the next topic only after we completely understood the current session. He will clear the doubts whenever we call him, He is very friendly, The whole training session are very interactive and useful.

Book Your FREE Consultation

Also avail 20% OFF offer valid for next 3 days

FIRST 100 STUDENTS ONLY

Call Now