Building an Identity Extraction Engine in PHP
When it comes to building customized experiences for your users, the biggest key is in understanding who those users are and what they’re interested in. The largest problem with the traditional method for doing this, which is through a profile system, is that this is all user-curated content, meaning that the user has the ability to enter in whatever they want. While this gives people the opportunity to portray themselves how they wish to the outside world, it is an unreliable identity source because it’s based on perceived identity.

The crux behind perceived identity that makes it dangerous for user understanding is that I can portray myself to be anyone I want online, and it’s very rare that the personality that people portray themselves to be online is their actual personality, rather a way that they want people to think of them. This is why, when understanding people, you have to find a method of extracting personality information just by them using the web or your site how they normally would. The old adage of „Actions speak louder than words“ rings completely true here.

This is the real rational behind invoking an identity extraction system, true user understanding. In this article we will cover the basics behind building one of these systems by extracting entity data information from the web pages that a user visits. As we determine the main categories of the pages through this process, we will begin to be able to correlate the data from multiple page instances to start constructing a personality profile based on user interest levels.

The Core Steps Behind Building an Entity Data Scraper

In traditional social identity models, you could use the friendships (or connections) that a person has to construct a relationship graph to attempt to surface content to a user that is more relevant to them, with the assumption that people who are connected will more than likely share some commonality between them with their personalities. This model is, unfortunately, as antiquated as the traditional social network.


