2425-2-FDS01Q013: Next lecture (TM&S)

Dear students,

this message is to alert you that the next TM&S class will take place on Thursday, November 28 from 10:30 am to 12:30 pm (official end, 12:00 pm) in Lab 718(U7).

Following this, from 12:00 to 13:00, Dr. Luca Herranz-Celotti will be available for an extra hour to clarify any of your doubts directly in the lab about what you saw in the previous lab. So possibly start thinking of clarifications you can submit to him.

Best regards,

Marco Viviani

Re: Next lecture (TM&S)

by Luca Celotti - Thursday, 28 November 2024, 2:07 PM

Hi everyone! Somebody asked 'why do we need to specify a number of clusters in AgglomerativeClustering (AC)? I wasn't sure but I looked a bit into it, and as far as I understand AC will put together datapoints that are closest, however it will keep on going until it puts everything in the same bag unless you tell it to stop. There's usually two criteria for stopping: i) specify how many clusters you want, and the loop will stop building the tree as soon as it has found that number of clusters, ii) specify the maximal distance between two points and the loop will stop when all distances are above that threshold. So the reply to the question to me is: you need to specify a number of clusters to let AC know when to stop.

What do you think?

Also see the definition of the n_clusters parameter when you go to the sklearn script for AC:

image%20%281%29.png

image%20%282%29.png

Re: Next lecture (TM&S)

by Marco Viviani - Thursday, 28 November 2024, 3:31 PM

Dear students,

let me add some further details. Unlike flat clustering, hierarchical (or agglomerative or divisive) clustering does not, in principle, require defining the desired number of clusters, because the limiting cases are obtaining a single cluster (agglomerative approach) or as many clusters as there are documents (divisive approach).

In practice then, since these are limiting cases, and since hierarchical algorithms are much more computationally burdensome than flat clustering algorithms, it is necessary to set criteria that can “stop” the algorithm once a “satisfactory” division into clusters has been obtained.

Several criteria (explained in class) can be considered at this point, including those also correctly suggested by Luca.

Best regards,

Marco Viviani