SequelGate Training Institute

Data Science Training

 

SequelGate Technologies offers best Data Science Training with impeccable syllabus with unmatched curriculum and course plan. The instructors for this course are very much well versed with Data Science and Big Data Analytics - very energetic and caring to ensure proper learning process with streamlined course proceedings.
We are truly aware of the industry needs and hence offering the best Data Science Training services in more practical way - LIVE Online and Classroom. We framed our syllabus to match with the real world requirements for both beginner level to advanced level.

 

This course includes : Hadoop, R Studio, Python, Scala, Machine Learning, Tableau, Excel and more..!
Register Today

 

 
 

Data Science Training Course Content

Data Science Training

 

 

Module -1

HADOOP - Frame work for Big data

HDFS

  • What is Big Data?
  • Challenges for processing big data?
  • What technologies support big data?
  • What is Hadoop?
  • Why Hadoop?
  • History of Hadoop
  • Use Cases of Hadoop
  • Hadoop eco Systems

Understanding the Cluster

  • Typical Workflow
  • Writing files to HDFS
  • Reading files from HDFS
  • Rack Awareness, 5 daemons

Map Reduce:

  • Before Map reduce
  • Map Reduce Overview
  • Job Tracker
  • Task Tracker Job Scheduling
  • Mapper and Reducer code
  • Configuring development environment – Eclipse

How MapReduce Works:

  • Anatomy of Map Reduce Job run
  • Job Submission, Job Initialization
  • Task Assignment
  • Job Completion, Job Scheduling
  • Job Failures, Shuffle and sort

PIG

  • Pig basics
  • Install and configure PIG on a cluster
  • PIG Vs MapReduce and SQL
  • Pig Vs Hive
  • Pig Latin
  • Primitive  Data Types and Complex Data Types
  • Types of Modes
  • Interactive mode
  • Script mode
  • Embedded mode
  • Modes of running PIG
  • Running in Grunt shell
  • Programming in Eclipse
  • Loading and Storing Datasets, Filters, Groups, Co-Groups, Foreach, Nested Foreach, Parallel, Distinct, Limit, Sample, Different Types of Joins
  • Debugging Commands(Illustrate and Explain)
  • Processing XML Files using Piggy bank
  • Processing Logfiles using Regex
  • Pigdate Function
  • PIG UDFs,UDAFs
  • Working with Predefine Functions, User Define Functions
  • Pig Macros
  • How To Load and Write JSON DATA using PIG
  • Accessing HBASE USING PIG

HIVE

  • Hive Introductions
  • Hive Architecture
  • Different Modes to Access HIVE
  • Command Line Interface
  • Web Interface(HWI)
  • Thrift Interface
  • Hive Meta Store
  • Hive QL
  • Primitive Data Types and Complex Data Types
  • Working with Partitions
  • Hive Bucketed Tables and Sampling
  • External Tables
  • Nested Queries
  • Multiple Inserts
  • Dynamic Partitions
  • Different Types of Joins
  • ORDER BY,SORT BY, DISTRIBUT BY,CLUSTER BY
  • INDEXES,VIEWS
  • Compression on Hive Tables and Migrating Hive Tables
  • Hive SerDe's
  • Processing XML Files using Regex
  • Processing Log Files using Regex
  • Accessing Hbase Tables using Hive
  • Hive UDF
  • Hive UDAF
  • Hive UDTF

Hbase

  • Hbase introduction
  • Hbase Data Model and Comparison between RDBMS and NOSQL
  • HBase  Architecture
  • HMaster,HRegionServer,Zookeeper,HRegion
  • File storage architecture
  • HFiles Compction,DeCompaction,Region Splits
  • HBase Opreations(DDL AND DML)Through Shell
  • Hbase Installation
  • Internal Zookeeper,External Zookeeper
  • Hbase Counters
  • Hbase Filters
  • HBase use Cases
  • Install and Configure HBase on a Multi Node Cluster
  • Create Database, Develop and Run Sample Applications
  • Access Data Stored in HBase  using Clients like Java, Python
  • MapReduce Client to Access the HBase  Data
  • HBase  and Hive Integration
  • HBase  Admin Tasks

Cassandra

  • Introduction
  • Installation
  • Creation of Database

 

Module -2

Scala - Language for Data Science & Bigdata

Scala Introduction &Environment Setup:

  • Scala is object-oriented, Scala is functional,Scala runs on the JVM
  • Installing Scala

Scala Basic Syntax

  • First Scala Program
  • Interactive Mode Programming
  • Script Mode Programming

Scala Data TYPES:

  • Literals
  • Strings
  • Escape Sequences

Scala Variables:

  • Declaration
  • Data Types
  • Type Inference
  • Multiple assignments
  • Variable Types

Scala Operators:

  • Arithmetic
  • Relational
  • Logical
  • Operator Precedence in Scala

Scala Conditions

 

Scala Loops

 

Scala Strings:

 

Scala Regular Expressions:

  • Forming regular expressions
  • Matching Literals and Constants
  • Matching Tuples and Lists
  • Matching with Types and Guards
  • Pattern Variables and Constants in case Expressions
  • Regular-expression Examples
  • Pattern matching with Extractors

Scala Functions:

  • Declarations
  • Definitions
  • Calling 
  • Function Literals
  • Anonymous
  • Currying

Scala Arrays

  • Declaring
  • Processing
  • Multi-Dimensional
  • Create Array with Range
  • Scala Arrays Methods

Scala Collections

  • Basic Operations on List,
  • Concatenating Lists
  • Creating Uniform Lists
  • Tabulating a Function
  • Scala List Methods
  • Concatenating Sets, Find max, min elements in Set
  • Find common values in Sets
  • Scala Set Methods
  • Basic Operations on Map
  • Check for a Key in Map

Scala Classes & Objects:

  • Oops Basics
  • Defining Fields,Methods,Constructors

Module -3

SPark - Frame work for Data Science & Bigdata Analytics

  • Introduction to Apache Spark:
  • What is Spark?
  • Spark Ecosystem, &modes of Spark
  • overview of Spark on a cluster
  • Spark Standalone cluster
  • Spark Web UI &
  • Spark Common Operations

Spark Core

  • performing basic Operations on files in Spark Shell and Overview of SBT
  • building a Spark project with SBT
  • running Spark project with SBT
  • Playing with RDDs:
  • RDDs, transformations in RDD, actions in RDD
  • loading data in RDD
  • saving data through RDD
  • Key-Value Pair RDD
  • MapReduce and Pair RDD Operations
  • Spark and Hadoop Integration-Yarn

Spark SQL

  • SparkSQL and Performance Tuning in Spark:
  • Analyze Hive and Spark SQL architecture, SQLContext in Spark SQL
  • working with Data Frames
  • implementing an example for Spark SQL
  • integrating hive and Spark SQL
  • support for JSON and Parquet File Formats
  • implement data visualization in Spark
  • loading of data
  • Hive queries through Spark
  • performance tuning tips in Spark

Spark Streaming

  • A Simple Example
  • Architecture and Abstraction
  • Transformations
  • Stateless Transformations
  • Stateful Transformations
  • Output Operations
  • Input Sources
  • Additional Sources
  • Multiple Sources and Cluster Sizing
  • Worker Fault Tolerance
  • Receiver Fault Tolerance
  • Processing Guarantees
  • Streaming UI
  • Batch and Window Sizes
  • Level of Parallelism

Spark GraphX

  • Edges
  • Vertices
  • Types of Graphs
  • Usages
  • Simple Program

SPARK Mlib

  • Vectors
  • Labledpoints
  • Lables
  • Features
  • RDD with Vectors
  • Matrices, Stats, Maths
  • Algorithms with Spark Mlib

Module -4

STATISTICS

STATISTICS: Descriptive & Inferential Statistics

DESCRIPTIVE  STATISTICS

  • Introduction to Advanced Data Analytics
  • Statistical descriptive and inferences for various Business problems
  • Types of Variables
  • Measures of central tendency
  • Dispersion
  • Variable Distributions
  • Probability Distributions
  • Normal Distribution and Properties
  • Skewness and Kurtosis
  • Five number Summary Analysis

INFERENTIAL STATICS

  • Null/Alternative Hypothesis formulation
  • Type I and Type II errors
  • One Sample T-TEST
  • Independent Sample T-TEST
  • Analysis of Variance ( ANOVA)
  • MANOVA
  • Chi Square Test (Non Parametric Tests)

Data quality and outlier treatment

  • Outlier treatment with robust measurements
  • Outlier treatment with central tendency Mean
  • Outlier with Min Max methods
  • Imputation with series means or median values
  • Z score Calculation
  • Sampling and estimation

Data Visualization

  • Stem and leaf
  • Dot Plot
  • Histogram
  • Density Plot
  • Frequency Plot and

Cumulative Frequency plots

  • Box and Whisker Plot
  • Scatter Plot
  • Line Graph
  • Bar Graph
  • Pie Chart
  • Tree Map
  • Cross Tabulation
  • Case Study for Visualization

Data Quality checking

  • Z score Calculation
  • Measure of position (percentile and Quartiles)
  • Measure of asymmetry --Skewness
  • Measure of Peaked-ness --Kurtosis
  • Q-Q probability plots
  • Kolmogorov Smirnov test
  • Shapiro Wilks test
  • Data Normalization
  • Handling missing Values
  • Case Studies for Data Quality Checking

Module -5

R–Lan/Python for Data Analytics

Getting Started R

  • R Basics
  • Variables and Class
  • Vectors, List, Factors, Matrix
  • Data Frames
  • Missing Values
  • Data Reading and Writing data
  • Data Visualization using GGPLOT
  • If-Else Conditions
  • Function
  • Loops
  • Data manipulation

Python

  • Python Basics
  • Python Lists
  • Functions and Packages
  • Numpy
  • Control flow and Pandas

Probability

  • Counting Combinations, Generating Combinations
  • Generating Random Numbers
  • Generating Reproducible Random Numbers
  • Generating a Random Sample
  • Generating Random Sequences
  • Randomly Permuting a Vector
  • Probabilities for Discrete Distributions
  • Probabilities for Continuous Distributions, Converting
  • Probabilities to Quantiles, Plotting a Density Function

Graphics

  • Edges
  • Vertices
  • Graphs
  • Programs

Machine Learning

  • Introduction to Machine Learning
  • Types Of Machine Learning
  • Real time use cases in Machine Learning
  • Types of Algorithms Types of Problems –
    • Regression
    • Classification
    • Clustering
    • Collaborative Filtering
    • Optimization
    • Prediction
  • Regression –
    • Linear Regression
    • Logistic Regression
  • Classification –
    • Logistic Regression
    • Decision Tree,Random Forest
    • KNN,SVM
    • Naive ayes
  • Clustering –
    • K-means Clustering

Register Today