Discussion
The results indicate that cosmetic features provide some
indication of a building's overall compliance with building standards, with significant caveats. Cosmetic features explained 57% of the variation in assessment scores in our analysis, and a predictive linear model based on cosmetic features correctly identified buildings as were "at or above" or "below" average in quality 82% of the time.
However, 18% of the time our simple cosmetics based model
incorrectly labelled, or misjudged the quality of, buildings. Errors
were, moreover, significantly more likely to be biased in the direction of
false positives (meaning, when our model made a mistake it was more likely
to think a building to be above average, when in fact it was not).
Bias in our model was also not independent of the category of housing. The bias in favour of false positives was most pronounced for social housing cooperatives; 3 of every 4 labelling errors within this group posited "below" average buildings to be "at or above" average. The model was therefore more likely to judge a low quality building of this type to be a high quality one based solely on its looks. The bias was least pronounced for Toronto City Housing Corporation buildings; less than 2 in 3 errors were false positives in this group.
The overall accuracy rate for labels assigned to Toronto City Housing Corporation buildings was also notably lower than for any other type of housing. While accuracy of the model's predictions for both social and private housing is about 82%, it was only 75% for buildings managed by the TCHC. Moreover, our model's propensity to label "above average" buildings as "below" was notably higher for this group than for others. Meaning, while there have may been fewer "above average" buildings in our selection of TCHC buildings (and apparently, there are plenty of "below" average TCHC buildings to be found), our model was less likely to be recognize the very good ones based on their cosmetic features alone.
Limitiations
There are limitations to our study. For one, while RentSafeTO assesses many features that may contribute to a good experience for a renter, there are plenty of other features that may be of value to individuals and that are not reflected in evaluation scores. Moreover, very different combinations of features may lead two buildings to arrive at the same RentSafeTO score. One building may have rats while another lacks garbage chutes; both may appear to be identical based on the numbers. Renters need to therefore interpret the RentSafeTO scores cautiously as their own assessments of "above average" buildings may not be quite the same.
In addition, our regression exercise has many basic limitations. For one, there were significantly more private housing evaluations in our data set than evaluations for any other type of housing. As a result, our regression was trained to fit this subset of data the best. The bias toward "at or above" average labels reflects an overall trend in this particular category of housing data and this bias influenced model evaluations of other types. In addition, the parameters in our regression model depended on our separation of training and test data. We fit our regression to a randomly selected sample of data; had we picked a different sample, parameters may have changed. We also combined all 10 cosmetic features into one aggregate feature to simplify our problem, discarding some important information along the way. We don't know, for example, if a building with an average cosmetic score of 2.5 had many beautiful features and a few stinkers or if all its cosmetics were equal. And some cosmetics, like building cleanliness, may be more informative than other cosmetics, like the state of the walls. Moreover, combinations of cosmetic features may tell you more about building quality than a single cosmetic feature alone.