PCI – Chapter Three – Discovering Groups

I am a third of the way into the problems on chapter three and already thinking that I should have used Python for chapter two. Absolutely wonderful and efficient, no fluff whatsoever. Live and learn…

Chapter three starts off by mentioning the techniques and practices involved are of the data-intensive (heavy computational I bet as well) type. Learning from the previous chapter, I am definitely not using SQL as my sole tool for the job…

This chapter focuses on data clustering, which as best I can describe, means finding how much alike items are in a data-set when there is not enough (known) information to make obvious comparisons or well defined associations.

Having just finished the problems on Word Vectoring, I am still eager to find some relevant use of this skill that helps me understand the problem domain. Basically, I am parsing blog feeds for word frequency among them to infer which blogs are alike. Besides this being an awesome extension to chapter two, I haven’t come up with a typical-style app or tool that would make a good case (exiting for me) for doing.

Then again, I am a third into chapter so my perspective may change in the next few days… Hope so.
For now, the most important thing to note is that, if you only have a hammer, everything looks like a nail.

Giving Python a chance!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s