Toronto Rentals: Can You Judge Them by Their Look?

Introduction

Toronto suffers a lack of affordable housing. This lack makes it easy for landlords to take advantage of potential tenants, as tenants may not have very many options when it comes time for them to look for a place to live.

To ameliorate the situation, Toronto city council funded several apartment inspection and enforcement programs designed to improve building conditions and encourage transparency. One such program is RentSafeTO, a bylaw enforcement system that monitors the state of over 3,600 apartment buildings and ensures the buildings meet standards. RentSafeTO officers collect data every two years about more than 50 features from each registerd building; these include information about pest problems, lighting conditions, and more.

But the list of RentSafeTO features is extensive and the formula used to create evaluations complex, which may leave some renters wondering if they can simplify the job.

This report is designed to answer this question. It asks specifically if the building features that RentSafeTO inspectors consider to be "cosmetic" align with overall building evaluations. If yes, aesthetic features that might be easily visible to potential renters may carry information about important, yet complex features (like the state of electrical wiring in the walls) that might both be hidden to the average renter and difficult to assess. Our research question can therefore be phrased as follows:

Can a building's cosmetics tell you if it meets the city's comprehensive building standards, as defined by RentSafeTO's evaluation scores?

Related Work

Several other studies have looked at the degree to which tenants perceptions of a building are correlated with overall building quality. The Perceived Housing Quality Scale (PHQS) was developed in order to measure tenant perceptions [1]; features measured on this scale include perceptions of noise level, privacy, parking availability, and building safety. In general, renters’ perceptions measured on such scales have been shown to be strongly related to the physical conditions of the buildings [2]; poor maintenance and structural issues correlate with lower perceived housing quality. However, these prior studies captured the perception of tenants already living in buildings and with an intimate knowledge of building properties. In our research we look at cosmetic features as they are perceived by inspectors during a building visit, as opposed to the buildings' residents.

Dataset

We report on key details about our dataset using a framework inspired by Gebru et. al. in 2018.

Motivation and Distribution

We explore data provided by RentSafeTO to the Toronto's Open Data Portal, which is a catalogue of digital data about city operations. This data is made available for anyone to use and is released in the hopes of encouraging transparency in city government.

Composition and Collection Process

The RentSafeTO data contains evaluation scores for buildings registered with the program. These scores are based on the assessmets of RentSafeTO officers. There are 50 items that inform their overall building evaluations; 17 are considered by impsection officers to be "high risk", 23 are "moderate risk" and the remaining 10 are considered "cosmetic". Examples of cosmetic features include the state of fencing or cleanliness, while examples of moderate risk features include the state of mail recepticals and lobby floors. High risk items include the state of building drainage and security of exterior doors, among other things. Each item is rated on a scale between 1 and 3, with 1 being at the low end. A 0 is sometimes assigned to an item when building managers do not comply with inspection officers. Overall scores are created using a function that weights all individual item ratings to create a single, combined score; this combined score ranges between 0 and 100 with 0 being low and 100 high.

We focus on evaluations collected in 2023 for 1952 buildings that were registered with RentTO at that time. This data includes 175 evaluations for buildings managed by Toronto's community housing corporation (TCHC), 135 evaluations of social housing complexes and 1641 evaluations of private apartment buildings. The buildings lie in all 25 wards of the city; some are large and some small, some were built last year and others are more than 200 years old.

Preprocessing

Before analysis of the data, features that were not considered to be "cosmetic" were dropped from the dataset. In addition and to simplify analysis, we combined all of the cosmetic ratings into a single, average cosmetic score, across all cosmetic items in the assessment inventory, for each building; this is a number that ranges between 0 and 3. Finally, overall evaluation scores were rounded to the nearest integer (between 0 and 100).

Methods

To help us answer our research question, we perform a linear regression analysis. In doing this, we related the avearage measure of each buildings' cosmetics (a floating point number ranging between 0 and 3) to the overall evaluation score (an integer ranging between 0 and 100) that was assigned to the building by RentSafeTO.

A linear regression posits that there is a linear relationship between an input variable (in this case, a building's average cosmetic score) and an output variable (in this case, a building's overall evaluation score). We therefore model the relationship between overall evaluations and cosmetics as follows:

overall evaluation = cosmetic score * alpha + beta

Given such a linear model, we can ask if it can be used to recognize buildings that are above or below average in overall quality based on cosmetics alone. To do this, we stratify the RentSafeTO data by housing type (i.e. social housing, TCHC housing and private apartment buildings) and randomly select 1/3 of the records for each type (650 in total) to create "test" data. Remaining data (consisting of 1302 records) is used to form a "training" set. We then calculate the "most likely" values for alpha and beta in the equation above using our training set and standard linear regression estimation techniques. This regression is then used label the buildings in our test set as either "at or above" or "below" average in terms of overall quality based on cosmeticss. We label a building as "at or above" average if our regression suggests its overall evaluation score to be over the training set's average; if not, we label it as "below".

We report the accuracy of our regression based judgements, as well as the number of false positives (i.e. "below" average buildings we labelled as "at or above" based on cosmetics) and false negatives (i.e. "at or above" average buildings we labelled as "below"). We also report the overall "goodness" of our trained regression model using a measure called R-squared. R-squared tells us the percentage of variance in evaluation score that is explained by variation in cosmetics; it ranges between 0 and 100. An R-squared near 100 indicates there is a strong linear relationship between our chosen input and output variables while an R-squared of 0 indicates tells us no such relationship is to be found.

Results

Figure 1. Training Set Data and The Best Fit Line. Data points have been stratified by type (private building, social housing, or TCHC housing).

Figure 1 illustrates the data points in the training set as well as the best fit line that relates cosmetic scores to overall evaluation scores in this data.

The linear model derived from the estimation process was as follows:

overall evaluation = cosmetic score * 14.2 + 51.50

This regression's R-squared was 57%, indicating that average cosmetic score was able to explain 57% of the variation in the training set's overall evaluation scores. The average evaluatuion score was found to be 88.1, with a standard deviation of 6.5 points, and a median of 87.

Figure 2. Test Set Data and Precited Overall Ratings. Predicted ratings are '+' symbols on the best fit line, and actual ratings are the dots. Data points have been stratified by building type (private building, social housing, or TCHC )

Figure 2 illustrates data points in the test set and the model's predictions of evaluation score. Predicted scores were thersholded to create the labels "at or above" and "below" average in overall quality.

Figure 3. Confusion matrices for the regression-based labels.

Figure 3 details the accuracy of our regression's labelling of training set data using confusion matrices. The x-axis of these matrices show us the labels our model assigned and the y-axis shows us the labels that correspond to buildings' true assessments (i.e. the assessments of officers). The brightness of the individual cells reflects the number of buildings represented by that cell. Bright colors along the diagonals suggest relatively "accurate" predictions, while bright colours in the off diagonal cells indicate labelling errors (either false positives or false negatives). We have created confusion matrices for all buildings in the test set as well as for different building categories (social housing, TCHC housing, or private buildings).

We find the overall accuracy of our assigned labels was 81.85%. Of the errors, 70.34% were false positives (i.e. incorrectly classified as "at or above" average) while 29.66% were false negatives (i.e. incorrectly classified as "below" average).

There were 546 records for private building evaluations in our test set. The accuracy of label assigmment in this subset was 82.42%; 70.83% of the errors here were false positives. Of all the "above average" buildings in this set, 10% failed to be recognized by the model. There were 46 records for social housing cooperatives; our labelling accuracy for this group was 82.61% and 75% of the recorded errors were false positives. In this set, only 7% of the "above average" buildings failed to be recognizd by our model. Finally, there were 58 records for buildings managed by Toronto City Housing (TCHC); the accuracy for this subset was 75.86% with 64.29% of the recorded errors being false positives. In this set, however, more than 25% of the "above average" buildings failed to be recognizd by our model.

Discussion

The results indicate that cosmetic features provide some indication of a building's overall compliance with building standards, with significant caveats. Cosmetic features explained 57% of the variation in assessment scores in our analysis, and a predictive linear model based on cosmetic features correctly identified buildings as were "at or above" or "below" average in quality 82% of the time.

However, 18% of the time our simple cosmetics based model incorrectly labelled, or misjudged the quality of, buildings. Errors were, moreover, significantly more likely to be biased in the direction of false positives (meaning, when our model made a mistake it was more likely to think a building to be above average, when in fact it was not).

Bias in our model was also not independent of the category of housing. The bias in favour of false positives was most pronounced for social housing cooperatives; 3 of every 4 labelling errors within this group posited "below" average buildings to be "at or above" average. The model was therefore more likely to judge a low quality building of this type to be a high quality one based solely on its looks. The bias was least pronounced for Toronto City Housing Corporation buildings; less than 2 in 3 errors were false positives in this group.

The overall accuracy rate for labels assigned to Toronto City Housing Corporation buildings was also notably lower than for any other type of housing. While accuracy of the model's predictions for both social and private housing is about 82%, it was only 75% for buildings managed by the TCHC. Moreover, our model's propensity to label "above average" buildings as "below" was notably higher for this group than for others. Meaning, while there have may been fewer "above average" buildings in our selection of TCHC buildings (and apparently, there are plenty of "below" average TCHC buildings to be found), our model was less likely to be recognize the very good ones based on their cosmetic features alone.

Limitiations

There are limitations to our study. For one, while RentSafeTO assesses many features that may contribute to a good experience for a renter, there are plenty of other features that may be of value to individuals and that are not reflected in evaluation scores. Moreover, very different combinations of features may lead two buildings to arrive at the same RentSafeTO score. One building may have rats while another lacks garbage chutes; both may appear to be identical based on the numbers. Renters need to therefore interpret the RentSafeTO scores cautiously as their own assessments of "above average" buildings may not be quite the same.

In addition, our regression exercise has many basic limitations. For one, there were significantly more private housing evaluations in our data set than evaluations for any other type of housing. As a result, our regression was trained to fit this subset of data the best. The bias toward "at or above" average labels reflects an overall trend in this particular category of housing data and this bias influenced model evaluations of other types. In addition, the parameters in our regression model depended on our separation of training and test data. We fit our regression to a randomly selected sample of data; had we picked a different sample, parameters may have changed. We also combined all 10 cosmetic features into one aggregate feature to simplify our problem, discarding some important information along the way. We don't know, for example, if a building with an average cosmetic score of 2.5 had many beautiful features and a few stinkers or if all its cosmetics were equal. And some cosmetics, like building cleanliness, may be more informative than other cosmetics, like the state of the walls. Moreover, combinations of cosmetic features may tell you more about building quality than a single cosmetic feature alone.

Conclusion

So, can a building's cosmetics tell you RentSafeTO's evaluation score?

Despite the limitations, our regression exercise seems to indicate that cosmetic building features carry some, but not all, of the information required to make an accurate guess as to a building's overall compliance with standards, as reflected in the RentSafeTO evaluations.

But if the building is managed by Toronto City Housing, tread carefully. While the evaluation scores for many TCHC buildings may indeed be less than those that are privately owned of managed by cooperatives, there are still some with solid evaluation scores but a questionable cosmetic look. So you can't always judge a book by its cover; some buildings may be investing in essentials at the expense of its outward vibe.

Of course, the very best thing to do when considering a building rental is to look at the exhaustive collection of data in the RentSafeTO database and sit with the information for a while on your own. The database is there for you and everyone else to review, and it contains information that may help steer you in the direction of a home managed by people you can trust ... and away from some real stinkers.

References

1. Caffaro, F., Galati, D., & Roccato, M. (2016). Development and validation of the Perception of Housing Quality Scale (PHQS). TPM-Testing, Psychometrics, Methodology in Applied Psychology, 23(1), 37–51.

2. Maclennan, D., & Williams, R. (2006). Housing Quality and the Perceptions of Renters: A Cross-National Study. Housing Studies, 21(1), 15-34.

The Toronto Open City Data Portal https://www.toronto.ca/city-government/data-research-maps/open-data/

by X for EECS1516