Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature/Adaboost implementation #207

Open
JasonShin opened this issue Jan 19, 2019 · 6 comments
Open

feature/Adaboost implementation #207

JasonShin opened this issue Jan 19, 2019 · 6 comments
Assignees
Labels
feature New feature that does not exist in Kalimdor.js yet

Comments

@JasonShin
Copy link
Member

JasonShin commented Jan 19, 2019

  • I'm submitting a ...
    [/] feature request

  • Summary

An AdaBoost classifier is a meta-estimator that begins by fitting a classifier on the original dataset and then fits additional copies of the classifier on the same dataset but where the weights of incorrectly classified instances are adjusted such that subsequent classifiers focus more on difficult cases.

In an effort to implement boosting models in the library, Adaboost would be an ideal first model to implement in order for me to understand how boosting works.

  • Illustration

  • References
@JasonShin JasonShin added this to the Sprint 1 milestone Jan 19, 2019
@JasonShin JasonShin self-assigned this Jan 19, 2019
@JasonShin JasonShin added the feature New feature that does not exist in Kalimdor.js yet label Jan 19, 2019
@JasonShin
Copy link
Member Author

I found the ML-From-Scratch example and the StatQuest explanation exactly aligns in terms of implementation. I will work on the feature primarily based on the ML-From-Scratch example.

@JasonShin
Copy link
Member Author

Started the implementation here https://github.com/machinelearnjs/machinelearnjs/tree/feature/adaboost

@JasonShin
Copy link
Member Author

JasonShin commented Jan 26, 2019

@BenjaminMcDonald suggested refactoring go about

tf.sum(tf.where(tf.equal(y, pred), tf.zerosLike(w), w))

is equivalent to

// Sum of weights of misclassified samples
// w = [0.213, 0.21342] -> y = [1, 2] -> prediction = [2, 2] ->
// any index that has -1 -> grab them from w and get a sum of them
let error = Array.from(w.dataSync())
  .filter((_, index) => y[index] !== prediction[index])
  .reduce((total, x) => total + x);

@JasonShin
Copy link
Member Author

fit function is working fine now, hopefully.

There's still a problem with predict

@JasonShin
Copy link
Member Author

JasonShin commented Jan 31, 2019

Prediction is still always returning 1s.

To debug the issue:

  1. Grab the classifier attributes, alpha, polarity, threshold, feature_index, and alpha
  2. get the Python implementation
  3. Try running Python's predict using the classifier attributes/save points

Observations:

  1. Classifier's attributes are all same..

Alternative solutions:

  1. Find a different example

@JasonShin
Copy link
Member Author

Python experiment

import numpy as np

clfs = [
    { "p": -1, "threshold": 0.6455696225166321, "feature_index": 0, "alpha": 11.512925148010254 },
    { "p": -1, "threshold": 0.6455696225166321, "feature_index": 1, "alpha": 11.512925148010254 },
    { "p": -1, "threshold": 0.6455696225166321, "feature_index": 2, "alpha": 11.512925148010254 },
    { "p": -1, "threshold": 0.6455696225166321, "feature_index": 3, "alpha": 11.512925148010254 },
    { "p": -1, "threshold": 0.6455696225166321, "feature_index": 4, "alpha": 11.512925148010254 },
    { "p": -1, "threshold": 0.6455696225166321, "feature_index": 5, "alpha": 11.512925148010254 },
]

test_x = np.array([ [ 5.4, 3, 4.5, 1.5 ],
  [ 5.6, 2.7, 4.2, 1.3 ],
  [ 7.7, 3, 6.1, 2.3 ],
  [ 5, 3.6, 1.4, 0.2 ],
  [ 6.3, 3.4, 5.6, 2.4 ],
  [ 5.2, 3.5, 1.5, 0.2 ],
  [ 5.5, 2.6, 4.4, 1.2 ],
  [ 5.2, 2.7, 3.9, 1.4 ],
  [ 4.9, 2.5, 4.5, 1.7 ],
  [ 6.6, 3, 4.4, 1.4 ],
  [ 6.2, 2.9, 4.3, 1.3 ],
  [ 5.5, 4.2, 1.4, 0.2 ],
  [ 6.9, 3.1, 5.1, 2.3 ],
  [ 6.5, 3.2, 5.1, 2 ],
  [ 5.5, 3.5, 1.3, 0.2 ],
  [ 6.3, 2.8, 5.1, 1.5 ],
  [ 4.7, 3.2, 1.3, 0.2 ],
  [ 7.2, 3, 5.8, 1.6 ],
  [ 6.1, 2.8, 4, 1.3 ],
  [ 7.4, 2.8, 6.1, 1.9 ],
  [ 6.7, 3.1, 4.4, 1.4 ],
  [ 5.1, 2.5, 3, 1.1 ],
  [ 4.4, 2.9, 1.4, 0.2 ],
  [ 4.9, 3.1, 1.5, 0.1 ],
  [ 5.9, 3, 5.1, 1.8 ],
  [ 6.7, 3.1, 4.7, 1.5 ],
  [ 5.1, 3.5, 1.4, 0.3 ],
  [ 6.1, 2.9, 4.7, 1.4 ],
  [ 5.6, 3, 4.5, 1.5 ],
  [ 7, 3.2, 4.7, 1.4 ],
  [ 6, 3, 4.8, 1.8 ],
  [ 6.4, 2.9, 4.3, 1.3 ],
  [ 5.6, 2.9, 3.6, 1.3 ],
  [ 6.5, 3, 5.2, 2 ],
  [ 6.1, 2.8, 4.7, 1.2 ],
  [ 6.3, 3.3, 6, 2.5 ],
  [ 4.9, 3.1, 1.5, 0.1 ],
  [ 5.7, 3.8, 1.7, 0.3 ] ])

def predict(X):
    n_samples = np.shape(X)[0]
    y_pred = np.zeros((n_samples, 1))
    # For each classifier => label the samples
    for clf in clfs:
        # Set all predictions to '1' initially
        predictions = np.ones(np.shape(y_pred))
        # The indexes where the sample values are below threshold
        negative_idx = (clf['p'] * X[:, clf['feature_index']] < clf['p'] * clf['threshold'])
        # Label those as '-1'
        predictions[negative_idx] = -1
        # Add predictions weighted by the classifiers alpha
        # (alpha indicative of classifier's proficiency)
        y_pred += clf['alpha'] * predictions

    # Return sign of prediction sum
    y_pred = np.sign(y_pred).flatten()

    return y_pred

# predictions = predict(test_x)

# print(predictions)

print(test_x[:, 4])

@JasonShin JasonShin removed this from the Sprint 1 milestone May 10, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New feature that does not exist in Kalimdor.js yet
Projects
None yet
Development

No branches or pull requests

1 participant