For many years, we were involved in a surface relationship with eBay. For us, it was a great place to sell stuff. We weren't involved in eBay politics or the policies behind the scenes, because we simply didn't need to be. We listed our products and they sold. Easy peasy. It wasn't until out sales began to plummet (see more here) that we began to dig into the depths of the eBay marketplace.
We previously discussed eBay being infiltrated by hackers in early 2014 (more on that here), Google's algorithm update (read about that here) and the infiltration of new players into the eCommerce marketplace (read about Alibaba here). We also discussed eBay's faulty defect system (more here) and the split from PayPal (read about that here).
Our information excavation did not stop there. We recently learned that, in 2013, eBay replaced their simple search engine with a new search ranking system called Cassini. Cassini was designed by Hugh Williams, Vice President of eBay's Experience, Search and Platforms. He developed it to replace Voyager, the search engine employed by eBay from 2002 to 2012. This change would transform eBay's literal keyword search engine into one that was much more "intuitive". The new system would essentially out-think buyers and sellers. Using the massive amounts of data that eBay accumulates with each search, a search engine was created to find the "best match" between a particular buyer and a seller's item. Under Williams' guidance, some 100 engineers created just that. Thus, Cassini was born.
Why would eBay develop such a complex system? That is exactly the question that I seek to answer with this post. To accomplish this, I will be reviewing direct quotes from Hugh Williams himself, as presented at a 2012 conference (watch the full speech here).
Early in his speech, Williams stated, "What we were really trying to do is to do a good job of understanding the intent of the users query and not be so literal about matching the user's query to the title of the item." He spoke frequently about the massive amount of data that eBay has collected over the years. This data included the tendencies of buyers, sellers and the behavior of each individual visitor to the site. He explained that, “The idea is to take that data, mine that data, look for patterns, create understanding from that data, and then use that to better match users literal queries to the real intent that the users have”. To accomplish this, Cassini “pre-processes" or “rewrites the query,” adding words and/or removing words from the actual search terms that were typed in by the user.
Williams went on to describe "five buckets of information."He stated that these buckets hold everything that is used to pre-process each search query within eBay. The first bucket contains textual information. Included in this bucket are things specifically found within a listing, such as the title and description, as well as words associated with the listing in relation to categories and possible tags as chosen by the seller. The second bucket holds images. He emphasized images of high-quality with a pleasing aspect ratio and clean background. There was no explanation as to how this information was to be harvested from the images. The third bucket is full of knowledge about the seller. Where are they located in relationship to the buyer? What is their past shipping record? How well do they communicate with buyers via email? The forth bucket is where information about the buyer is stored. Do they prefer items listed in auction or fixed price? Do they tend to look for free shipping? Are they attracted by certain categories? The fifth and final bucket contains behavior information regarding each item. How many people are watching a particular item? Are offers being made for it? Is it clicked on frequently?
According to Williams, the pre-qualifications made before search results appear do not end there. The rest of the process includes what he refers to as a determination of the item quality. As he says, “What we’re trying to do, using our extreme data, is compute for any new item that’s listed on eBay, what’s the likelihood that it will sell, and what price might it sell for....and then we can use that in ranking." This seems to suggest that, as soon as an item is listed, Cassini decides whether or not the item is "relevant" and has a chance of selling at the set price. If it is determined irrelevant or "incorrectly" priced, it is immediately downgraded in search results. Williams went on to say, "...as soon as it really is listed on the site, customers begin to interact with it and we get much richer information. So, as time passes by...we put less and less reliance in our ranking on what we pre-computed when the seller listed it, and we put more and more emphasis on how the buyers are interacting with it”. Those interactions include bidding, offering, watching, asking questions, clicking, viewing, etc. He explained, “We’re beginning to build up a story about whether that item is interesting….If it’s an auction, by the time it’s about to complete, we’re almost entirely depending in our ranking function on the information the buyers have given us and we’ve almost completely retired our pre estimation we made when the item was first listed”. In other words, an item will only move up in ranking if people are interacting with it in what Cassini believes is a positive way. If not, it will stay low within the search results and possibly move down even further.
In each of his presentations, Mr. Williams is quick to point out that there are huge challenges with a machine like Cassini, even admitting to weaknesses within the system. While he does not seem to have solutions for the problems presented by a pre-processing search engine, he is forever optimistic that the system was built to flush out problems. In his speech, Williams stated:
At our scale, everything that seems intuitive and seems to be easy, turns out to be a hard engineering problem….So we have typically over 6 million item updates per hour. So that’s things like bids, offers, revisions of descriptions, people adding additional photographs, answering questions, all those kinds of things. A rate of about 6 million an hour. And you would need to process these events if you were going to accurately compute the behavioral side of this item quality factor. These events are generated on individual machines and we have a large farm of thousands of machines...The individual machines create logging events; those events are sort of put out onto a bus, and there’s a cluster of machines that’s listening to that bus watching for interesting events. It’s collecting those events, it’s then partitioning those events by item ID, so we’re getting all of the events for an item on a particular machine, then we have to sort those so that they can be collated together so we can do some computation of what’s the meaning of those behaviors, and once we’ve done that we need to decide how interesting is that set of behaviors. Because we have such a large queue of behaviors to process, that we need to make sure we put the most interesting behaviors at the front of the queue. So then, [we have] our priority queuing exercise going on and then we’re consuming that queue into our ranking function, and then updating the statistics about the item’s so that we can correctly compute the ranks.
Cassini brings up a whole new set of information that we must attempt to follow and understand. In the coming days, we will share our personal thoughts and observations in regards to Cassini and its home within the eBay marketplace.
See the rest of our eBay series here:
Part I - Why are My eBay Sales Down?
Part II - eBay Takes a Hit
Part III - Google Takes Aim
Part IV - eBay Alternatives?
Part V - Fixing the Defects in eBay's Defect System
Part VI - Breaking Up is Hard to Do: eBay's Split from PayPal
Part VII - Is eBay's Cassini Really the "Best Match"?
Part VIII - Is eBay's Cassini Stuck in Orbit?
The Plan, Part I - Positioning for More than Survival
The Plan, Part II - Expanding Our Reach Beyond eBay
Frankly, I am not fond of monkeys. They affect me the way spiders and snakes affect other people. The flying monkeys in “The Wizard of Oz” and the rogue monkeys in Robin Williams’s “Jumanji” were menacing to me, and I closed my eyes so I wouldn’t have to see them.