Uday Keith

    Recent posts by Uday Keith

    5 min read

    The Worst Kind of Data: Missing Data

    By Uday Keith on Aug 17, 2020 1:33:23 PM

    Most publicly available datasets or datasets at the workplace are complete. However, from time to time we encounter datasets where some or many entries are missing. The problem of missing data exists on a spectrum; only a few entries missing among millions is virtually negligible, however, upwards of 10% of missing data can be crippling.

    The exact problem of missing data contains multiple layers, so let us proceed to peel it like the onion it is.  At its most basic, enough missing data may skew the distribution(s) the data follows.

    Topics: coding Data Science Programming Tips
    5 min read

    The Beautiful Binomial Logistic Regression

    By Uday Keith on Aug 17, 2020 1:32:57 PM

    The Logistic Regression is an important classification model to understand in all its complexity. There are a few reasons to consider it:

    Topics: regression Data Science
    8 min read

    K-Means Clustering: All You Need to Know

    By Uday Keith on Aug 17, 2020 5:16:04 AM

    In machine learning, we are often in the realm of “function approximation”. That is, we have a certain ground-truth (y) and associated variables (X) and our aim is to use identify a function to wrap our variables in that does a good job in approximating the ground-truth. This exercise in function approximation is also known as “supervised-learning”.  

    Topics: Machine Learning Python Data Science Programming Tips