更新时间:2021-07-16 11:14:22
coverpage
Practical Data Analysis Cookbook
Credits
About the Author
Acknowledgments
About the Reviewers
www.PacktPub.com
Support files eBooks discount offers and more
Preface
What this book covers
What you need for this book
Who this book is for
Sections
Conventions
Reader feedback
Customer support
Chapter 1. Preparing the Data
Introduction
Reading and writing CSV/TSV files with Python
Reading and writing JSON files with Python
Reading and writing Excel files with Python
Reading and writing XML files with Python
Retrieving HTML pages with pandas
Storing and retrieving from a relational database
Storing and retrieving from MongoDB
Opening and transforming data with OpenRefine
Exploring the data with Open Refine
Removing duplicates
Using regular expressions and GREL to clean up data
Imputing missing observations
Normalizing and standardizing the features
Binning the observations
Encoding categorical variables
Chapter 2. Exploring the Data
Producing descriptive statistics
Exploring correlations between features
Visualizing the interactions between features
Producing histograms
Creating multivariate charts
Sampling the data
Splitting the dataset into training cross-validation and testing
Chapter 3. Classification Techniques
Testing and comparing the models
Classifying with Naïve Bayes
Using logistic regression as a universal classifier
Utilizing Support Vector Machines as a classification engine
Classifying calls with decision trees
Predicting subscribers with random tree forests
Employing neural networks to classify calls
Chapter 4. Clustering Techniques
Assessing the performance of a clustering method
Clustering data with k-means algorithm
Finding an optimal number of clusters for k-means
Discovering clusters with mean shift clustering model
Building fuzzy clustering model with c-means
Using hierarchical model to cluster your data
Finding groups of potential subscribers with DBSCAN and BIRCH algorithms
Chapter 5. Reducing Dimensions
Creating three-dimensional scatter plots to present principal components
Reducing the dimensions using the kernel version of PCA
Using Principal Component Analysis to find things that matter
Finding the principal components in your data using randomized PCA
Extracting the useful dimensions using Linear Discriminant Analysis
Using various dimension reduction techniques to classify calls using the k-Nearest Neighbors classification model
Chapter 6. Regression Methods
Identifying and tackling multicollinearity
Building Linear Regression model
Using OLS to forecast how much electricity can be produced
Estimating the output of an electric plant using CART
Employing the kNN model in a regression problem
Applying the Random Forest model to a regression analysis
Gauging the amount of electricity a plant can produce using SVMs
Training a Neural Network to predict the output of a power plant
Chapter 7. Time Series Techniques
Handling date objects in Python
Understanding time series data
Smoothing and transforming the observations
Filtering the time series data
Removing trend and seasonality
Forecasting the future with ARMA and ARIMA models
Chapter 8. Graphs
Handling graph objects in Python with NetworkX
Using Gephi to visualize graphs
Identifying people whose credit card details were stolen
Identifying those responsible for stealing the credit cards
Chapter 9. Natural Language Processing
Reading raw text from the Web
Tokenizing and normalizing text
Identifying parts of speech handling n-grams and recognizing named entities
Identifying the topic of an article
Identifying the sentence structure
Classifying movies based on their reviews