Why Data Extraction is Critical to the Digital Transformation of Real Estate Transactions

Modern businesses and the digital technologies that power them rely on the rapid analysis of large amounts of data to drive decisions, automation, reporting and, ultimately, the customer experience. In real estate, mortgage finance and title and settlement, data fuels everything from web-based home listing searches, mortgage loan application and origination, and increasingly title and escrow. As the real estate transaction shifts from a paper-intensive process to a more digital experience, the massive amounts of information on mortgage documents and within public land records must be extracted, digitized and turned into structured data, so it can be useful.

Collecting this data used to involve the manual keying and re-keying of information, but has evolved to include the use of powerful data extraction technologies, including optical character recognition, in combination with machine learning and cloud computing.

Recently recognized as a HousingWire Tech Trendsetter, First American Data & Analytics Chief Information Officer Calvin Powell shared his thoughts on the evolution of data extraction technology and how that evolution is fueling the next generation of title and escrow automation.

Question: How did you get into data extraction as a professional field?

"I’ve been fortunate to work in the real estate industry and worked with leading businesses across the entire real estate transaction, from the front end with mortgage financing, through title, escrow and on to the closing and post-closing. Along the way, I’ve been heavily involved in the adoption and application of a variety of leading technologies, including the data extraction and related technologies that are empowering the future of title and escrow automation."

Question: How has data extraction technology evolved over last 5-10 years?

"The evolution really can be traced back to the 1990s when First American led the industry shift to computer-based title plants. Today, we’re pioneering the automation of title and escrow. Foundational to this evolution is the information on title and mortgage documents and public land records. Initially, those documents were converted to document images, like a scanned document, and indexed in a database via the manual keying and rekeying of only a few key pieces of data, like property address, owners names and one or two other data fields. This helped create indexed files of document images that could be quickly located within a title plant.

As we look to fuel title and escrow automation, we need to lift considerably more, if not all, of the data from these title and mortgage documents and public land records, which becomes a daunting process with the technology available 10 years ago and even more daunting if it was done manually as it was 20-30 years ago.

However, our patented advances in optical character recognition technology, combined with the advent of cloud computing and machine learning, allow us to compile the data rapidly and efficiently from these documents in a way that allows it to be analyzed from a risk identification and risk decisioning perspective. The more data you have, the stronger and more robust your risk models become, and First American has the industry’s largest and most comprehensive property and ownership dataset. And we’re adding more data to that dataset every day to maintain that leadership."

“Our patented advances in optical character recognition technology, combined with the advent of cloud computing and machine learning, allow us to compile the data rapidly and efficiently from these documents in a way that allows it to be analyzed from a risk identification and risk decisioning perspective. The more data you have, the stronger and more robust your risk models become, and First American has the industry’s largest and most comprehensive property and ownership dataset.”

Question: What role does data extraction play in supporting the digital transformation of real estate transactions?

"Data extraction is central to building and maintaining the high quality, accurate and comprehensive datasets needed to fuel digital transformation and automation. Real estate transactions and their accompanying property records have traditionally been very paper intensive. To enable a more digital real estate transaction, all of the information captured in documents within the mortgage finance and real estate transaction process must be turned into structured data that can be analyzed and also packaged for consumption by the technologies used by the other parties in a transaction, such as the lender, realtor, and title and escrow provider.

Our patented data extraction technologies have accelerated our ability to lift data from documents that often have different formats depending on local preferences, customs and other considerations. Once the data is extracted and structured, we can then validate the accuracy of that data with an extremely high level of confidence. One of our advantages as a leading data provider is the vast amounts of reference data we have that we can use to cross-check the data we extract from documents with other sources, so we can quickly and efficiently determine a confidence level in how well our extraction engines are performing.

The rapid extraction and validation of the data means high-quality data can be fed into our risk decisioning models, enabling us to automate title decisions and help accelerate the real estate transaction process, which benefits the buyer, seller, lender and realtor."

Question: What’s on the horizon for data extraction technology?

"We’re still in the early stages of implementing more advanced modeling and machine learning capabilities. As we continue to enhance the breadth and complexity of our data, we’ll continue to have greater opportunities to leverage more robust models that will help further improve and accelerate our decisioning processes.

We’ve also begun deploying our technology in new ways to help solve other challenges. We’ve architected these solutions so they are easy to pivot and adapt to new use cases. Our CovenantGuard™ solution is a great example. The existence of discriminatory restrictive covenants in county recorded land records became a real concern across the country over the past several years. In California, a bill was enacted that requires the county recorder of each California county to establish a restrictive covenant modification program to assist in the identification and redaction of unlawfully restrictive covenants in public land records. Depending on the county, these land records can number in the millions and are often paper documents, so sifting through them to locate discriminatory restrictive covenants can be an overwhelming task for county staff. With CovenantGuard, we can leverage our existing data and our patented data extraction technologies to help counties identify where these discriminatory restrictive covenants may exist and help the county redact the discriminatory language.

So, we’re looking at a variety of ways that we can deploy our proprietary technologies to provide value and it will be exciting to share more details about these projects in months ahead."

Why Data Extraction is Critical to the Digital Transformation of Real Estate Transactions

Get the Latest News

Related posts