Isma Hadji Isma Hadji
PhD candidate at York University
e-mail: hadjisma [@] cse. yorku. ca
Research Interests:

I have a strong interest in computer vision, and machine learning in general. My PhD research focuses on analytically defined representations with application to video analysis. In particular, I aim at injecting more domain priors into convolutional networks design to gain a better understanding of the resulting representation and ultimately design a more efficient architecture.
Some of the applications I am interested in include: recognition and detection from video as well as video content synthesis.



A New Large Scale Dynamic Texture Dataset with Application to ConvNet Understanding
Isma Hadji and Richard P. Wildes
European Conference on Computer Vision (ECCV) 2018


This paper introduces a new large scale dynamic texture dataset. With over 10,000 videos, our Dynamic Texture DataBase (DTDB) is two orders of magnitude larger than any previously available dynamic texture dataset. DTDB comes with two complementary organizations, one based on dynamics independent of spatial appearance and one based on spatial appearance independent of dynamics. The complementary organizations allow for uniquely insightful experiments regarding the abilities of major classes of spatiotemporal ConvNet architectures to exploit appearance vs. dynamic information. We also present a new two-stream ConvNet that provides an alternative to the standard optical-flow-based motion stream to broaden the range of dynamic patterns that can be encompassed. The resulting motion stream is shown to outperform the traditional optical flow stream by considerable margins. Finally, the utility of DTDB as a pretraining substrate is demonstrated via transfer learning on a different dynamic texture dataset as well as the companion task of dynamic scene recognition resulting in a new state-of-the-art.

| PDF | Project website| Poster| Video summary | Bibtex
author = {I. Hadji and R. P. Wildes},
title = {A New Large Scale Dynamic Texture Dataset with Application to ConvNet Understanding},
booktitle = {ECCV},
year = {2018},

What Do We Understand About Convolutional Networks?
Isma Hadji
arXiv e-print 2018

This document will review the most prominent proposals using multilayer convolutional architectures. Importantly, the various components of a typical convolutional network will be discussed through a review of different approaches that base their design decisions on biological findings and/or sound theoretical bases. In addition, the different attempts at understanding ConvNets via visualizations and empirical studies will be reviewed. The ultimate goal is to shed light on the role of each layer of processing involved in a ConvNet architecture, distill what we currently understand about ConvNets and highlight critical open problems.

| PDF | Slides| Bibtex
author = {I. Hadji},
title = {What Do We Understand About Convolutional Networks?},
journal = {arXiv},
volume = {1803.08834},
year = {2018},

A Spatiotemporal Oriented Energy Network for Dynamic Texture Recognition
Isma Hadji and Richard P. Wildes
IEEE International Conference on Computer Vision (ICCV) 2017 (spotlight paper, 2.61% acceptance rate)


This paper presents a novel hierarchical spatiotemporal orientation representation for spacetime image analysis. It is designed to combine the benefits of the multilayer architecture of ConvNets and a more controlled approach to spacetime analysis. A distinguishing aspect of the approach is that unlike most contemporary convolutional networks no learning is involved; rather, all design decisions are specified analytically with theoretical motivations. This approach makes it possible to understand what information is being extracted at each stage and layer of processing as well as to minimize heuristic choices in design. Another key aspect of the network is its recurrent nature, whereby the output of each layer of processing feeds back to the input. To keep the network size manageable across layers, a novel cross-channel feature pooling is proposed. The multilayer architecture that results systematically reveals hierarchical image structure in terms of multiscale, multiorientation properties of visual spacetime. To illustrate its utility, the network has been applied to the task of dynamic texture recognition. Empirical evaluation on multiple standard datasets shows that it sets a new state-of-the-art.

| PDF | Project website| Poster| Code| Talk | Bibtex
author = {I. Hadji and R. P. Wildes},
title = {A Spatiotemporal Oriented Energy Network for Dynamic Texture Recognition},
booktitle = {ICCV},
year = {2017},

Local-to-Global Signature Descriptor for 3D Object Recognition
Isma Hadji and Guilherme N. DeSouza
Workshop on Robust Local Descriptors, IEEE Asian Conference on Computer Vision (ACCV) 2014

In this paper, we present a novel 3D descriptor that bridges the gap between global and local approaches. While local descriptors proved to be a more attractive choice for object recognition within cluttered scenes, they remain less discriminating exactly due to the limited scope of the local neighborhood. On the other hand, global descriptors can better capture relationships between distant points, but are generally affected by occlusions and clutter. So, we propose the Local-to-Global Signature (LGS) descriptor, which relies on surface point classification together with signature-based features to overcome the drawbacks of both local and global approaches. As our tests demonstrate, the proposed LGS can capture more robustly the exact structure of the objects while remaining robust to clutter and occlusion and avoiding sensitive, low-level features, such as point normals. The tests performed on four different datasets demonstrate the robustness of the proposed LGS descriptor when compared to three of the SOTA descriptors today: SHOT, Spin Images and FPFH. In general, LGS outperformed all three descriptors and for some datasets with a 50-70% increase in Recall.

| PDF | Slides| Code| Bibtex
author = {I. Hadji and G. N. DeSouza},
title = {Local-to-Global Signature Descriptor for 3D Object Recognition},
booktitle = {wACCV},
year = {2014},

Least Expected Features for 3D Keypoint Detection
Isma Hadji and Guilherme N. DeSouza
Technical Report, EECS, University of Missouri, 2014


Most object recognition algorithms rely on the detection of a subset of important or discriminative visual stimuli (keypoints) as a first step towards the description of those objects. Independently of the type of 3D feature used, all 3D detectors rely on a local criterion for keypoint selection. This criterion is usually a point-wise saliency measure that is based on experimentally learnt thresholds. In this research, we question both the threshold based approach, as well as the local characteristic of traditional 3D keypoint detection schemes. First, we abstract the keypoint selection from experimentally learnt thresholds that depend on low level features for saliency detection. To this end we propose the Least Expected Feature criterion (LEFT) for saliency detection. Second we introduce the concept of finding keypoints considering a global approach, as opposed to more traditional local neighborhood based approaches. It turns out that adopting the proposed global LEFT criterion allows for the selection of very distinctive keypoints across the entire object, while avoiding sensitive and noisy regions. Our LEFT criterion only selects outstanding points as opposed to traditional detectors that select points across the entire object even in smooth non-salient regions.

| PDF | Code| Bibtex
author = {I. Hadji and G. N. DeSouza},
title = {Bridging the Gap Between Local and Global Approaches for 3D Object Recognition},
Institution = {University of Missouri-Columbia},
year = {2014},