Predicting Car Depreciation using Online Reviews

August 30, 2023 General

Group 11

The biggest risk for a leasing company is the risk of incorrectly predicting the residual value of a car at the end of the lease period. So how to most accurately predict the future value of new types of vehicles? Answering this question could give car leasing companies the chance to optimize pricing, lease terms, and overall profitability. Traditional depreciation models are based on historical sales data and vehicle characteristics. Their predictions fall short for new brands or new models with no historical data. In this blog, we will attempt to draw depreciation curves based on car characteristics and online reviews.

Two open-source datasets were available for this experiment:

  • Car marketplace data containing vehicle new price, second-hand price, kilometers driven, age, and vehicle characteristics such as engine power, mileage, transmission, seats, etc.
  • Driver’s reviews data containing detailed opinion articles for each car model.

We started by extracting the main topics of discussion from the reviews. This was done using an unsupervised algorithm. One of the two major topics extracted from the drivers’ reviews is represented by the following wordcloud.

We then applied sentiment analysis techniques both to the full text of the reviews and per extracted topic for each car. The sentiment score was added as an extra feature in the used car marketplace dataset, along with the actual depreciation, the ratio between second-hand and new prices.

We were able to rank the brands and models from the dataset by sentiment

We built a depreciation prediction model to extract the relative importance of the various features. Although the most important ones are kilometers driven, age, and vehicle characteristics, there is a significant contribution from sentiments, meaning the depreciation prediction becomes more accurate when taking into account sentiment alongside the other features. Other is the location of the car.

We were finally able to draw a sentiment-only-based depreciation curve, showing a difference of up to 15% between the positive and negative sentiment curves at 80.000km.

Although more data would allow a more granular and accurate analysis, this experiment shows that the effect of sentiment on car depreciation can be quantified and has material predictive power for the residual value. When a new car comes on the market like say the Cybertruck, a car lease company could gather all the test reviews by scraping the internet, extract the overall sentiment and position the vehicle on the positive or negative depreciation curve to set the lease term.