Step-by-step to a Data Scientist

September 26, 2022September 26, 2022

Weekly Article News #32

Step-by-step to a Data Scientist > Blog > Weekly Article News > Weekly Article News #32

The recommended articles the author has read this week.
This letter is posted every Monday.

This week, the author introduces two practical OSS to validate the sensitivity of model prediction, e. g. how sensitive the output is to small changes in the input.

Foolbox

a python library to run fast adversarial attacks to benchmark the robustness of machine learning models in PyTorch, TensorFlow, and JAX.

CleverHans

a python library to implement adversarial attacks against machine learning models.

August 29, 2022August 29, 2022

Weekly Article News #31

Step-by-step to a Data Scientist > Blog > Weekly Article News > Weekly Article News #31

The recommended articles the author has read this week.
This letter is posted every Monday.

On writing clean Jupyter notebooks

Jupyter Notebook is an excellent tool for developers. However, there are some differences from the scripting method, so there are things you should know. This article introduces valuable things.

Introducing Snapshot Testing for Jupyter Notebooks

The wonderful OSS, nbsnashot, is a tool for testing a Jupyter Notebook. In the script style, we make test code, however, it is difficult in the notebook style. This OSS makes it possible and easy!

August 1, 2022August 1, 2022

Weekly Article News #30

Step-by-step to a Data Scientist > Blog > Weekly Article News > Weekly Article News #30

The recommended articles the author has read this week.
This letter is posted every Monday.

DeepSpeed

Excellent library, developed by Microsoft, for optimization of deep learning training and inference.

Top Explainable AI (XAI) Python Frameworks in 2022

XAI, Explainable AI, is one of the recent hot topics. In this article, 6 popular OSS have been introduced, i.e., SHAP, LIME, Shapash, ELI5, InterpretML, OmniXAI.

July 11, 2022July 11, 2022

List: PyCaret Articles

Step-by-step to a Data Scientist > Blog > for beginner > List: PyCaret Articles

PyCaret is a useful auto ML python library because we can deploy machine learning models with low codes. We can also perform preprocessing, compare models, and tune hyperparameters, of course with low codes.

This article is a summary of the list of the PyCaret articles introduced in this blog.

Dockerfile for PyCaret

We create a docker image for PyCaret from Dockerfile. This post is intended for mastering how to build a docker image from Dockerfile with docker commands.

Tutorial of PyCaret, Regression Analysis

This post is for beginners.

In this post, we will see the tutorial of PyCaret with a regression analysis against the Boston house prices dataset. This post is intended with the step-by-step guide in mind.

Prediction of Diabetes Progression by PyCaret, Regression Analysis

This post is one of the good examples of regression analysis.

The purpose is to learn the basics of regression analysis using PyCaret. Using a famous data set, we will master the basics of everything from model construction to analysis of results.

PyCaret 2.3.6, incredible update; Convert model and Web App
PyCaret 2.3.6, incredible update; Dashboard and EDA functions

PyCaret was fully updated in version 2.3.6.

In version 2.3.6, several new features were added. In this article, You can check the major changes. These articles are also worth reading to get an idea of the latest new features in PyCaret.

July 11, 2022July 11, 2022

Weekly Article News #29

Step-by-step to a Data Scientist > Blog > Weekly Article News > Weekly Article News #29

The recommended articles the author has read this week.
This letter is posted every Monday.

This week, the articles are about a Machine learning project environment, e. g. MLflow, and Jupyter Notebook.

In the environment of a machine learning project, it is very important to prepare not only a data analysis environment such as Jupyter Notebook but also an MLflow environment for managing experiment records. Managing experiment records helps to improve reproducibility and project promotion efficiency.

Containerize your whole Data Science Environment (or anything you want) with Docker-Compose

To build the environment of a machine learning project, docker-compose is a powerful tool. By docker-compose, we can build data-analysis and experiment-management experiments separately. This article tells us how to build such a style environment.

Manage your machine learning life cycle with MLflow in Python

This article shows one example of how to use the MLflow tracking server, which is a tool for managing experiment records. There are several styles to build the MLflow tracking server. This article suggests one of the helpful styles.

June 13, 2022June 13, 2022

Weekly Article News #28

Step-by-step to a Data Scientist > Blog > Weekly Article News > Weekly Article News #28

The recommended articles the author has read this week.
This letter is posted every Monday.

This week, the articles are about an explanative AI.

Top AutoML Python libraries in 2022

AutoML(Automated machine learning) is one of the hot topics. This article introduces the notable AutoML libraries, e.g. PyCaret, AtoKeras, and AutoGluon.

Data Drift Explainability: Interpretable Shift Detection with NannyML

NannyML is an open-source python library, a tool for estimating post-deployment model performance. Drift is one of the hot topics in MLOps, and this library is based on an interesting algorithm.

May 9, 2022May 9, 2022

Weekly Article News #27

Step-by-step to a Data Scientist > Blog > Weekly Article News > Weekly Article News #27

The recommended articles the author has read this week.
This letter is posted every Monday.

This week, the articles are about an explanative AI.

What Is a Transformer Model?

A transformer is a recent novel AI model which is based on a neural network. This model can learn context among sequence data such as sentences in a language. An attention mechanism in a transformer may be one of the most valuable knowledge we should learn.

Visualizing multicollinearity in Python

Multicollinearity is one of the annoying problems for a data scientist. In this article, we can learn how to visualize the relationships of multicollinearity between features.

March 27, 2022August 16, 2022

Demo of a Web App. for 3D Scatter Plot

Step-by-step to a Data Scientist > Blog > streamlit > Demo of a Web App. for 3D Scatter Plot

Now, the web app for the 3D-scatter plot has been released.

We can v plot the CSV-format dataset as the following image.

URL of the web app.

https://caron14-streamlit-3d-plot-by-plotly-plot3d-on-streamlit-bzo7i5.streamlitapp.com/

GitHub Repo. for the full Code

https://github.com/caron14/streamlit_3d_plot_by_plotly

Image for the Web App.

March 23, 2022March 23, 2022

Weekly Article News #26

Step-by-step to a Data Scientist > Blog > Weekly Article News > Weekly Article News #26

The recommended articles the author has read this week.
This letter is posted every Monday.

This week, the articles are about an explanative AI.

Our approach to building transparent and explainable AI systems

An informative article whose topic is an explainable AI model. It is well known that it is difficult to build an explainable AI system. In this article, the author thinks, there is a hint.

FastTreeSHAP: Accelerating SHAP value computation for trees

One of the powerful methods to build an explainable AI model is to use a SHAP method. However, we face a computational problem when using a tree-based ML model with a large dataset. In this article, the new SHAP-based library is introduced. The new one is so faster.

February 28, 2022February 28, 2022