Step-by-step to a Data Scientist

July 11, 2022July 11, 2022

List: PyCaret Articles

Step-by-step to a Data Scientist > Blog > 2022

PyCaret is a useful auto ML python library because we can deploy machine learning models with low codes. We can also perform preprocessing, compare models, and tune hyperparameters, of course with low codes.

This article is a summary of the list of the PyCaret articles introduced in this blog.

Dockerfile for PyCaret

We create a docker image for PyCaret from Dockerfile. This post is intended for mastering how to build a docker image from Dockerfile with docker commands.

Tutorial of PyCaret, Regression Analysis

This post is for beginners.

In this post, we will see the tutorial of PyCaret with a regression analysis against the Boston house prices dataset. This post is intended with the step-by-step guide in mind.

Prediction of Diabetes Progression by PyCaret, Regression Analysis

This post is one of the good examples of regression analysis.

The purpose is to learn the basics of regression analysis using PyCaret. Using a famous data set, we will master the basics of everything from model construction to analysis of results.

PyCaret 2.3.6, incredible update; Convert model and Web App
PyCaret 2.3.6, incredible update; Dashboard and EDA functions

PyCaret was fully updated in version 2.3.6.

In version 2.3.6, several new features were added. In this article, You can check the major changes. These articles are also worth reading to get an idea of the latest new features in PyCaret.

July 11, 2022July 11, 2022

Weekly Article News #29

Step-by-step to a Data Scientist > Blog > 2022

The recommended articles the author has read this week.
This letter is posted every Monday.

This week, the articles are about a Machine learning project environment, e. g. MLflow, and Jupyter Notebook.

In the environment of a machine learning project, it is very important to prepare not only a data analysis environment such as Jupyter Notebook but also an MLflow environment for managing experiment records. Managing experiment records helps to improve reproducibility and project promotion efficiency.

Containerize your whole Data Science Environment (or anything you want) with Docker-Compose

To build the environment of a machine learning project, docker-compose is a powerful tool. By docker-compose, we can build data-analysis and experiment-management experiments separately. This article tells us how to build such a style environment.

Manage your machine learning life cycle with MLflow in Python

This article shows one example of how to use the MLflow tracking server, which is a tool for managing experiment records. There are several styles to build the MLflow tracking server. This article suggests one of the helpful styles.

June 13, 2022June 13, 2022

Weekly Article News #28

Step-by-step to a Data Scientist > Blog > 2022

The recommended articles the author has read this week.
This letter is posted every Monday.

This week, the articles are about an explanative AI.

Top AutoML Python libraries in 2022

AutoML(Automated machine learning) is one of the hot topics. This article introduces the notable AutoML libraries, e.g. PyCaret, AtoKeras, and AutoGluon.

Data Drift Explainability: Interpretable Shift Detection with NannyML

NannyML is an open-source python library, a tool for estimating post-deployment model performance. Drift is one of the hot topics in MLOps, and this library is based on an interesting algorithm.

May 9, 2022May 9, 2022

Weekly Article News #27

Step-by-step to a Data Scientist > Blog > 2022

The recommended articles the author has read this week.
This letter is posted every Monday.

This week, the articles are about an explanative AI.

What Is a Transformer Model?

A transformer is a recent novel AI model which is based on a neural network. This model can learn context among sequence data such as sentences in a language. An attention mechanism in a transformer may be one of the most valuable knowledge we should learn.

Visualizing multicollinearity in Python

Multicollinearity is one of the annoying problems for a data scientist. In this article, we can learn how to visualize the relationships of multicollinearity between features.

March 27, 2022August 16, 2022

Demo of a Web App. for 3D Scatter Plot

Step-by-step to a Data Scientist > Blog > 2022

Now, the web app for the 3D-scatter plot has been released.

We can v plot the CSV-format dataset as the following image.

URL of the web app.

https://caron14-streamlit-3d-plot-by-plotly-plot3d-on-streamlit-bzo7i5.streamlitapp.com/

GitHub Repo. for the full Code

https://github.com/caron14/streamlit_3d_plot_by_plotly

Image for the Web App.

March 23, 2022March 23, 2022

Weekly Article News #26

Step-by-step to a Data Scientist > Blog > 2022

The recommended articles the author has read this week.
This letter is posted every Monday.

This week, the articles are about an explanative AI.

Our approach to building transparent and explainable AI systems

An informative article whose topic is an explainable AI model. It is well known that it is difficult to build an explainable AI system. In this article, the author thinks, there is a hint.

FastTreeSHAP: Accelerating SHAP value computation for trees

One of the powerful methods to build an explainable AI model is to use a SHAP method. However, we face a computational problem when using a tree-based ML model with a large dataset. In this article, the new SHAP-based library is introduced. The new one is so faster.

February 28, 2022February 28, 2022

Weekly Article News #25

Step-by-step to a Data Scientist > Blog > 2022

The recommended articles the author has read this week.
This letter is posted every Monday.

This week, the articles about VAE(Variational Autoencoder) are introduced.

Pyro tutorial: Variational Autoencoders

A tutorial of VAE by Pyro, which is a universal probabilistic programming language (PPL) written in Python and supported by PyTorch on the backend.

Variational Autoencoder: Intuition and Implementation

The theoretical background of VAE is written in detail.

The author also introduces several new features in the following articles.

February 22, 2022February 22, 2022

Weekly Article News #24

Step-by-step to a Data Scientist > Blog > 2022

The recommended articles the author has read this week.
This letter is posted every Monday.

Include diagrams in your Markdown files with Mermaid

An informative GitHub blog about how to create diagrams in markdown.

Practical Quantization in PyTorch

The PyTorch official blog about a quantization method. Quantization is a method for making your DNN run faster and with lower memory.

The author also introduces several new features in the following articles.

February 13, 2022February 13, 2022

Weekly Article News #23

Step-by-step to a Data Scientist > Blog > 2022

The recommended articles the author has read this week.
This letter is posted every Monday.

Comparison of AutoML solutions 2021

AutoML is one of the hot topics. Recently, however, there are many AutoML tools. In this article, we can check the recent trend.

Run Your Python Code as Quickly as C++

We sometimes face the problem that our python code is NOT as fast as we expect. This article introduces a really simple method to fast our python code.

The author also introduces several new features in the following articles.

February 1, 2022February 13, 2022

Weekly Article News #22

Step-by-step to a Data Scientist > Blog > 2022

The recommended articles the author has read this week.
This letter is posted every Monday.

T-distributed Stochastic Neighbor Embedding(t-SNE)

The techniques to reduce the dimensionality of a dataset are important because we can understand a dataset visually. Principal component analysis(PCA) is known as one of the most famous techniques. However, t-SNE is also might be a good choice because this technique can reflect the nonlinearity than that of PCA.

What makes LightGBM lightning fast?

LightGBM, one of the decision-tree-method libraries, is known as the faster library. This article demonstrates features of how to facilitate computational speed.

The author also introduces several new features in the following articles.