Skip to content

Time of Day Classifier

Purpose

The purpose of this model is to detect the light level (a.k.a. time of day) that the driver has vision of. The motivation here being that dawn/dusk is known to be the most risky time of day to drive. 1 Potentially time of day could be approximated by just using a clock, but the relationship is not that simple.

Light Level ~ Clock Time + Timezone + Geographic Position + Vehicle Heading + etc...

Using machine vision to visually assess "is the sun rising/setting in front of me right now?" is seemingly simpler than trying to think through all of that complexity.

The proposed model takes a single 3 x 224 x 224 image as input, and outputs probability of 4 classes:

1) Day
2) Dawn/Dust
3) Night
4) Undefined

In practice, the Undefined class is almost never used in the training data. It is mostly reserved for totally artificial environments such as a tunnel where no daylight is visible in the photo.

Data Considerations

The model was trained using the BDD100k dataset as described previously. This dataset has approximately 70,000 training images and 10,000 validation images.

Images were resized down to 224 x 224 pixels in order to align with the Sagemaker Image Classification container.

The dataset was highly imbalanced initially. As shown below, data was down-sampled for training. Validation statistics reported further down are based upon the original validation dataset.

Level Original Count Down Sample Rate Final Count
Daytime 36,728 15% 5,574
Dawn/Dusk 5,027 100% 5,027
Night 27,971 20% 5,672
undefined 137 100% 137
Total 69,863 23.5% 16,410

Model Architecture

The model was trained using the AWS Sagemaker Image Classification container. The model is trained using MXNet, it is a convolutional neural network. Beyond that, the AWS user documentation unfortunately does not give a ton of details on the architecture built behind the scenes. A raw visualization of the architecture exported from Sagemaker can be found here. It appears to match the ResNet architecture 2 terminating with a 4-node classification head.

Below are the key hyperparameters that were selected:

Hyperparameter Value Notes
Epochs 6
Pretrained Weights 1 AWS provides weights pretrained on the ImageNet with 11,000 categories
Image Size 3 x 224 x 224 Pretrained weights are only supported at this image size
Layers 18 The minimum supported layer count. The model did show elements of overfitting even at this restricted layer count
Optimizer Adam
Learning Rate 0.001
Mini Batch Size 16

Performance

Overall the performance of the model appears to be fairly good with a 92% accuracy on validation. However, the model appears to struggle with the "Dawn/Dusk" class (often labeling it as "daytime" instead). In part, I assume this is due to the ambiguity of how "Dawn/Dusk" was defined during labeling.

Find additional fit statistics in the appendix.

Future Enhancements

The model trained on the BDD100k dataset does not exactly meet the intended use of that data. Teaching autonomous vehicles to drive isn't directly in line with my intended use (identifying risks to a human driver). While it was good enough for a school project, future research should look to collect a different dataset more in line with this use case. A big gap applicable to this "time of day" model was the definitions of daytime/dusk/night were inconsistent between photos. A better dataset for this purpose would likely have a numeric target like "hours till sunset" which is more objective.

Additionally, looking to rebuild the model in a different tool stack would likely be good. The AWS Sagemaker Image Classification container was used as a learning opportunity, but the lack of control I had over the model was limiting. The model was prone to overfitting, and that sagemaker container does not give many options for a data scientist to mitigate those issues.

Appendix

F1 Score

precision recall f1-score support
Daytime 0.93 0.94 0.94 5258
Dawn/Dusk 0.57 0.53 0.55 778
Night 0.98 0.98 0.98 3929
Undefined 0.74 0.57 0.65 35
accuracy 0.92 10000
macro avg 0.80 0.75 0.78 10000
weighted avg 0.92 0.92 0.92 10000

Confusion Matrix

Model Interpretation

Below is an example image from each class where the model correctly labeled the image. When reading these images, a blue region means that it contributed to the confidence, and a red region means it detracted from the confidence.


  1. AARP recommends older drivers avoid driving during dusk 

  2. ResNet Architecture Paper