Introducing Zap, a generic machine learning pipeline developed by ByteGain for making predictions based on online user behavior. The pipeline creates website- and task-specific models without knowing anything about the structure of the website. It is designed to minimize the amount of website-specific code, which is realized by factoring all website-specific logic into example generators.
As more people spend their days on the Internet, businesses have followed by exchanging their physical stores for websites and apps. However, in the process of moving online, many businesses have lost the personal touch they had with their customers. Instead, they serve generic experiences and random pop-ups. The irony is that the increase in online activity means that user web data has evolved into a high-resolution reflection of a person’s daily life. Businesses could use this highly textured online data to predict user behavior, provide recommendations and increase engagement.
A recent study commissioned by the Digital Advertising Alliance (DAA) finds that over 40% of respondents would prefer personalized marketing based on their online behavior. Many consumers have also experienced, and now expect, the magic of Google Search autocomplete, Netflix recommendations and more. Every business, not just big tech companies, could benefit from personalized experiences. Moving forward, every business, not just big tech companies, could benefit from personalized experiences.
Analyzing web data is a complicated problem and machine learning has become a standard method of solving such problems when heuristics become too complex. Machine learning becomes especially effective when a large amount of data is available in which case deep learning (for introduction see The Deep Learning textbook) is almost a canonical approach producing state-of-the-art results. Most modern deep learning models are based on artificial neural networks.
We believe that neural networks represent the beginning of a fundamental shift in software development. In “classic” software development (Software 1.0), the programmer identifies a specific point in program space with some desirable behavior and expresses that by writing lines of code. In contrast, neural networks (Software 2.0) are written in a more abstract language, such as the weights of a neural network. The weights are not explicitly specified but learned via backpropagation and stochastic gradient descent. Deep learning has made advances in solving problems that have previously resisted the best attempts of the machine learning community. It also has turned out to be excellent in the discovery of intricate structures in high-dimensional data and therefore a natural candidate for analyzing web data. Finally, another advantage of deep learning is that it requires very little feature engineering and can easily take advantage of increases in the amount of available data and computation.
Making predictions based on web data is important for multi-billion dollar online industries, but there is little published research in this area. Tech giants such as Google, Facebook and Yahoo published their research some time ago but it is now outdated. With Zap we had to re-think many aspects of data generation, data collection, feature engineering, applying machine learning models and serving predictions.