Vectorâs Industry Innovation team has released theComputer Vision: Applications in Manufacturing, Surgery, Traffic, Satellites, and Unlabelled Data Recognition Technical Report. It details experiments and insights from the computer vision (CV) project, a multi-phase industrial-academic collaborative project focusing on recent advances in CV, one of the largest and fastest-growing areas of AI.Â
As a part of this project, Arthur Berrill, RBCâs Innovation and Technology CTO & Distinguished Technologist, spoke at Vectorâs recent Computer Vision (CV) Symposium about RBCâs unique use case: applying computer vision to satellite imagery to detect geolocation features and track them over time.Â
This is done by RBCâs new Computer Vision Engine. This engine is a component of the RBC Brain, the bankâs enterprise AI platform launched to enable a trusted, tailored partner for every client, based on deep understanding, personalized digital experiences and expert advisor relationships. The RBC Brain enables the bank to iteratively improve client experiences by providing increasingly more accurate understandings of their requests.Â
RBC Brainâs Computer Vision Engine uses a mix of deep learning and location intelligence techniques. It provides a novel approach for getting accurate, up-to-date predictions across various financial products, and enables those predictions to be done automatically, at scale, and with surprising detail. The Engine is also notable for another reason: its development required finding an AI solution to a decades-old technical challenge in satellite imagery, computer vision, and location intelligence â a âworld-class problem,â as they put it.Â
Staying current on asset values, liabilities, and investments through operational risk management and economic predictions is critical for such a high-performing financial institution. A new neighborhood feature for a small business or a personal asset owner may mean new value that RBC customers could access through personal and commercial banking products. An example of such a feature could be a park recently developed in the vicinity of or right across from a commercial unit. Acquiring this knowledge accurately and automatically enables RBC to offer now-relevant financial products as well as offering even more competitive pricing for the existing product for our clients. Offering them the opportunity to use that new value as soon as it appears is a priority for the bank. But keeping track of changes that impact individual businesses and assets is a tall order. The number of assets and businesses is massive, and regularly reviewing them through manual or time-consuming processes and identifying new and salient features is an immense challenge. Solving this problem with an automated process would be a real breakthrough, enabling the bank to be proactive and precise about which offerings would be most helpful to which clients, and when.Â
Satellite imagery â specifically synthetic-aperture radar (SAR) and hyperspectral imagery â provides a key for this breakthrough. Together, these technologies can capture images of buildings and assets, penetrate cloud cover and tree foliage, detect soil permeability, show tree canopy volumes, and distinguish among plant species. This means that incredibly detailed insight about a location and its value can be gleaned from this imagery, including a buildingâs volume (through estimates derived from its outline against the ground), a locationâs susceptibility to natural disaster (through analysis of the soilâs ability to absorb rainfall), and even the carbon sequestration potential of greenery on a lot (through identification of tree type and number â an insight that could enable the asset owner to register it as a carbon sink and engage in carbon trading).Â
But unlocking such insight to inform timely solutions for clients requires an automated ability to detect those features and to notice changes from shot to shot over the same area.Â
This is where RBCâs computer vision lab comes in, andinparticular,its work on models that perform instance segmentation. This technique enables a model to identify objects in an image and recognize how many times those objects appear by classifying each pixel with a corresponding category. In simple terms, if presented with a picture of three buildings, the model can not only determine which pixels belong to the category âbuilding,â but can also recognize that the image contains three separate instances of that category. Itâs a technique commonly used to analyze x-ray images, support autonomous driving, and perform land-use mapping on satellite imagery.Â
To understand how itâs employed in the Brainâs Computer Vision Engine, consider a satellite image of a typical subdivision showing densely packed rows of roofs from above. Each two-dimensional roof in that image would correspond with a building footprint. Using instance segmentation, the Engine can identify each footprint and accurately map its shape. It can then
compare these footprints with those from images taken earlier to pinpoint changes that may indicate new value. Done well, and the bank achieves the automation breakthrough that can enable fast and precise recommendations, offers, and various other services to its personal and commercial clients.Â
But thereâs a reason that it hasnât been done before. Standing in the way is a thorny technical impediment called theadjacent objects challenge.Â
âI can tell you from 40 odd years in the location intelligence business that this particular problem hasnât been solved,â Berrill says, âand itâs a tough problem.âÂ
The problem is that when objects in an image have very little space between them â like the building footprints in a dense subdivision â instance segmentation models have a hard time recognizing them as separate. Instead, for that example, the models often predict that tightly packed buildings are attached, when in fact theyâre not. Hereâs why: in the part of the image where two buildings are close together, there are far more pixels representing building footprints than there are representing the narrow space separating them. When the model is classifying the relatively few pixels representing space between buildings, it will often predict that they belong to the dominant class of pixels in that area â in other words, that they are probably also depicting a building footprint. In these situations, models have a hard time âseeingâ the boundaries and separation â even if theyâre distinguishable to the human eye â because its prediction is heavily influenced by the other pixels in the area.Â
âThis often happens when pixels belonging to multiple objects exist in close vicinity to one another and is amplified by the data imbalance problem pervasive in such large aerial and satellite imagery,â says Dr. Ehsan Amjadian, Head of Data Science at RBC and an Adjunct Professor of Computer Science at the University of Waterloo. âThereâs a much greater number of non-boundary pixels within the aerial and satellite images than boundary pixels.âÂ
Because of this, the model may predict that two buildings are attached in some way, often looking like some kind of warped townhouse, when in reality theyâre distinct. Obviously, the Computer Vision Engineâs reliability would suffer if this problem went unsolved.Â
Enter Dr. Elham Ahmadi, a Data Science Lead at RBC and the technical lead for RBCâs computer vision practice. It was Dr. Ahmadi who cracked this nut. That metaphor is no accident: the idea for a solution occurred to her while working on an entirely different problem in the computer vision domainâspecifically, applying a variational autoencoder to identify defects in nuts and bolts on a manufacturing line. This was done in a Vector Institute project focused on computer vision.Â
Ahmadi explains: âThe concept â the variational autoencoder â was used in the anomaly detection with a different architecture and for another purpose in the Vector project. But it sparked an idea: we can apply a new architecture and a new method based on variational autoencoders to solve the problem of the aerial images.â
The inner workings of variational autoencoders are complex, but itâs enough to understand that their design makes them particularly good at pixel-level classification, even when there are very few pixels to go on. Dr. Ahmadiâs innovative idea was to modify variational autoencoders to succeed at precise analysis of challenging satellite images, a critical capability for RBC Brainâs proprietary Computer Vision Engine.Â
âTo the best of our knowledge, itâs the first time that such a corrective architecture has been employed to tackle the well-known pixel imbalance problem in satellite images. This innovation has finally solved the problem,â says Dr. Amjadian. The superior resulting boundaries are shown in the rightmost column of Figure 1 above.Â
Berrill agrees: âBeing able to do this automatically is a big breakthrough.âÂ
This new approach is patent pending, and this new capability â one that brings AI, location intelligence, and satellite imagery together â has the potential to significantly change the speed and precision of product marketing by the bank and, in turn, access to wealth by asset and business owners.Â
RBCâs Innovation and Technology CTO Arthur Berrill started with a question: Why would a bank be interested in CV? With this innovation, the answer should be clear. The next question is: What value can they unlock, how much intellectual property can they create, and how many clients can they serve with a commitment to staying at computer visionâs cutting edgeâthis is something they are working on as a part of theira commitment that includescollaboration with ecosystem partners like Vector.Â
Full descriptions of the technical implementations and results of each use case from Vectorâs Computer Vision Symposium are provided in the report and the project toolkit includes various datasets and useful image/video tools such as data augmentation and visualization utilities provided by the Vector AI Engineering team. The project code is provided in the Computer Vision Project Repo.Â