Difference between revisions of "Página de pruebas"

From Sinfronteras
Jump to: navigation, search
(Blanked the page)
(Tag: Blanking)
 
(154 intermediate revisions by the same user not shown)
Line 1: Line 1:
==K-Nearest Neighbour==
 
  
* 15/06: Recorded class - K-Nearest Neighbour
 
 
:* https://drive.google.com/drive/folders/1BaordCV9vw-gxLdJBMbWioX2NW7Ty9Lm
 
 
:* https://drive.google.com/drive/folders/1BaordCV9vw-gxLdJBMbWioX2NW7Ty9Lm
 
 
 
* StatQuest: https://www.youtube.com/watch?v=HVXime0nQeI
 
 
 
<img src="https://upload.wikimedia.org/wikipedia/commons/e/e7/KnnClassification.svg"  class="center" style="display: block; margin-left: auto; margin-right: auto; width: 200pt;" />
 
 
 
 
<br >
 
KNN determines the class of a given unlabeled observation by identifying the k-nearest labeled observations to it. In other words, the algorithm assigns a given unlabeled observation to the class that has more similar labeled instances. This is a simple method, but very powerful.
 
 
 
[[File:KNearest_Neighbors_from_the_Udemy_course_Pierian_data1.mp4|800px|thumb|center|Udemy course, Pierian data https://www.udemy.com/course/python-for-data-science-and-machine-learning-bootcamp/]]
 
 
 
k-NN is ideal for classification tasks where relationships among the attributes and target classes are:
 
* numerous
 
* complex
 
* difficult to interpret and
 
* where instances of a class are fairly homogeneous
 
 
 
<br />
 
'''Applications of this learning method include:'''
 
* Computer vision applications:
 
:* Optical character recognition
 
:* Face recognition
 
* Recommendation systems
 
* Pattern detection in genetic data
 
 
 
<br />
 
Basic Implementation:
 
 
* Training Algorithm:
 
:* Simply store the training examples
 
 
 
* Prediction Algorithm:
 
:# Calculate the distance from x to all points in your data (Udemy Course)
 
:# Sort the points in your data by increasing distance from x (Udemy Course)
 
:# Predict the majority label of the "k" closets points (Udemy Course)
 
 
:* Find the <math>k</math> training examples <math>(x_{1},y_{1}),...(x_{k},y_{k})</math> that are '''nearest''' to the test example <math>x</math> (Noel)
 
:* Predict the most frequent class among those <math>y_{i}'s</math>. (Noel)
 
 
 
* '''Improvements:'''
 
:* Weighting training examples based on their distance
 
:* Alternative measures of "nearness"
 
:* Finding "close" examples in a large training set quickly
 
 
 
'''Strengths and Weaknesses:'''
 
{| class="wikitable"
 
|+
 
!Strengths
 
!Weaknesses
 
|-
 
|The algorithm is simple and effective
 
|The method does not produce any model which limits potential insights about the relationship between features
 
|-
 
|Fast training phase
 
|Slow classification phase. Requires lots of memory
 
|-
 
|Capable of reflecting complex relationships
 
|Can not handle nominal feature or missing data without additional pre-processing
 
|-
 
|Unlike many other methods, no assumptions about the distribution of the data are made
 
|
 
|}
 
 
 
 
* Classifying a new example:
 
 
 
<br />
 

Latest revision as of 22:25, 23 February 2026