Notes

1. Why is our implementation always seem a bit better than other implementations in general?

e.g. DeepLab on VOC, ERFNet on Cityscapes, CULane, etc. For all methods, due to saved memory from mixed precision training, we use relatively large batch size (at least 4 or 8), large learning rate and do not freeze BN parameters. Also checkpointing for the best model is applied for segmentation.

2. Why is our code seem to run faster?

We mostly employ mixed precision training, which is fast. Also we try optimize GPU-to-CPU conversions and I/O whenever we can so that they do not become a bottleneck, e.g. lane detection testing. And we support arbitrary batch size in test inference.

3. Why is our SCNN implementation better than the original but appear worse on TuSimple?

If you look at the VGG16-SCNN results on CULane, our re-implementation is no-doubt far superior (over 1.5% improv.), mainly because the original SCNN is in old torch7, while we are based on the modern torchvision backbone implementations and ImageNet pre-training. We also did a simple grid search for learning rate on the validation set. However, the original SCNN are often reported to achieve 96.53% on TuSimple in the literature, much higher than ours. That is mainly due to it was a competition entry, there is no way we (only use train set) can compete with that. Note that the original SCNN paper never mentioned a TuSimple experiment without competition tricks can reach that performance.

4. Why our Baselines in lane detection have surprising good performance?

As methods become more advance, they are less dependent on hyper-parameters (such as learning rate or backbone). For instance, our re-implemented SCNN achieves similarly good results across VGG, ResNets and ERFNet. Baselines, on the other hand, is most sensitive to hyper-parameters. Unlike other researchers who tend to not tune hyper-parameters on Baselines (can't really blame them, their resources are focus on their own methods), we conduct the same learning rate grid search on Baselines. As a result, we find that larger learning rates bring much better Baseline performance.

5. ResNet backbone in lane detection

In this field, we find the seminal SCNN paper provides the most precise evaluation of ResNet baselines. But for ResNet backbones, we follow RESA to reduce the channel numbers to 128, so the computation is feasible to add SCNN/RESA. The SCNN paper did not reduce channels and further applied dropout on ResNet-101, while we find the RESA configuration brings similarly good performance at a lower compute budget. We also support the modified light-weight ResNet18 from LSTR, which is called ResNet18s in this repo.

6. Why is our LSTR performance lower than the official implementation on TuSimple?

We did not use transformer's masking, nor frozen batchnorm, we use only 2 classes (lane/not lane), and we use larger learning rate & batch size for faster convergence (our training time on RTX 2080 Ti is only ~1/3 of the original implementation). These design choices do not degrade performance. The reason for a lower accuracy is we use the train set, while the original paper's performance (>96%) is obtained on train+val. The performance we got from the original code on train is Accuracy: 95.25% FP: 0.0647 FN: 0.0454, while our best model is Accuracy: 95.06% FP: 0.0486 FN: 0.0418. The results are similar, except we have a notably lower FP, since we restricted maximal lanes to be <=5 for fair comparison with segmentation methods (they assume this prior knowledge in network design & post-processing).

7. Why SAD implementation is indefinitely postponed?

The reproducibility of this method has always been an issue. Especially when there are engineering tricks that were not properly documented. For example, our baseline ENet already gets 95.61% on TuSimple with train alone, it could possibly reach the paper's ENet-SAD performance with train+val. We speculate the gains of SAD are mostly illusions from imperfect baseline hyper-parameter tuning, and can be mostly achieved with learning rate adjustments (note that this is merely speculation, and should not be used as proof in any form of scientific papers!). Before we successfully get performance gains from this method, we will not post the codes in this repo. Though anyone is encouraged to try SAD and make a pull request!

8. The effect of strong data augmentation on different lane detection approaches

Interestingly, segmentation-based methods (e.g. SCNN) do not overfit much, while non-segmentation methods (e.g. LSTR) easily overfit the data. Strong data augmentation (color distortion, flipping, rotating, cropping) is crucial, for methods such like LSTR to even work at all. While it could introduce too much disturbance, and degrades segmentation methods' performance.

9. BézierLaneNet disclaimer

Our BézierLaneNet is the CVPR 2022 Feng et al. paper, which is unrelated to another lane detection system with (almost) the same name: mo-vic/BezierLaneNet. BezierLaneNet directly uses PolyLaneNet to optimize Bézier curves, and similarly found the use of a sampling loss like BézierLaneNet. Although never evaluated on any benchmark or written in any academical paper or report, these two works are independent findings. The PytorchAutoDrive team and BézierLaneNet authors recommend to recognize them together (on the matter of introducing Bézier curve to lane detection), although whether to do that academically, is left for one's deliberation.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Notes

Clone this wiki locally