Including context to your dynamic pricing downside can enhance alternatives in addition to challenges
In my earlier article, I performed a radical evaluation of the preferred methods for tackling the dynamic pricing downside utilizing easy Multi-armed Bandits. When you’ve come right here from that piece, firstly, thanks. It’s under no circumstances a straightforward learn, and I really recognize your enthusiasm for the topic. Secondly, prepare, as this new article guarantees to be much more demanding. Nonetheless, if that is your introduction to the subject, I strongly advise starting with the earlier article. There, I current foundational ideas, which I’ll assume readers are acquainted with on this dialogue.
Anyway, a short recap: the prior evaluation aimed to simulate a dynamic pricing state of affairs. The primary purpose was to evaluate as rapidly as potential numerous value factors to seek out the one yielding the best cumulated reward. We explored 4 distinct algorithms: grasping, ε-greedy, Thompson Sampling, and UCB1, detailing the strengths and weaknesses of every. Though the methodology employed in that article is theoretically sound, it bears oversimplifications that don’t maintain up in additional advanced, real-world conditions. Essentially the most problematic of those simplifications is the belief that the underlying course of is stationary — which means the optimum value stays fixed regardless of the exterior surroundings. That is clearly not the case. Contemplate, for instance, fluctuations in demand throughout vacation seasons, sudden shifts in competitor pricing, or adjustments in uncooked materials prices.
To unravel this concern, Contextual Bandits come into play. Contextual Bandits are an extension of the Multi-armed Bandit downside the place the decision-making agent not solely receives a reward for every motion (or “arm”) but additionally has entry to context or environment-related data earlier than selecting an arm. The context could be any piece of data which may affect the end result, reminiscent of buyer demographics or exterior market circumstances.
Right here’s how they work: earlier than deciding which arm to drag (or, in our case, which value to set), the agent observes the present…