Help us direct you to the right place to sign up

SKIP AHEAD TO
Instant Comparable Set for Analytics
CompStak’s Property CompSet delivers a systematic, data-driven approach to generate instant comparable property sets. Rather than relying on time-consuming traditional attribute by attribute filters, the CompSet employs a bottom-up methodology to automatically identify the most comparable properties for a given subject asset. By leveraging CompStak’s extensive property database and advanced machine learning techniques, the system objectively segments asset classes and normalizes key market metrics. This enables it to rank comparable properties with precision, free from bias introduced by buyer or seller perspectives, providing analysts with a robust, ready-to-use competitive set to accelerate decision-making.
Methodology: A Two-Step Inference Pipeline
Data & Attributes
- The process begins with a comprehensive dataset enriched with key property attributes. These attributes are normalized against market-specific distributions to ensure consistent comparisons across diverse regions and asset classes.
- Differences in building class, age, low-rise versus high-rise characteristics, and other factors can lead to significant variations in historical data, causing deviations that a simple average or weighted average approach cannot adequately address. These methods fail to capture the nuanced dynamics of the commercial real estate (CRE) market. To overcome these complexities, a comprehensive market rent index is necessary. This approach assumes that, all other factors being equal, macro market momentum is the primary force influencing rent levels. By treating variations in property and lease-level characteristics as independent variables and isolating the effect of time, the market rent index effectively captures rent changes driven by overall market trends, offering a robust tool for understanding and predicting market dynamics.
Step 1 – Nearest Neighbors Search
In the initial phase, a vector database search is used to reduce the search space dramatically through a multi-stage filtering process against all CompStak properties:
Vectorization Process
Key property attributes are transformed into two distinct vector representations: location vectors (derived from geographic coordinates) and property feature vectors (containing normalized property attributes such as property size)
Hybrid Search Architecture
- Location-Based Search: Utilizes spatial vectors to identify properties within geographic proximity, accounting for market boundaries and building density
- Attribute-Based Search: Simultaneously searches using property feature vectors to find functionally similar properties regardless of location
- Dual Vector Strategy: Both searches operate in parallel, allowing the system to capture both spatial and characteristic-based similarity. A hybrid algorithm uses calibrated weights to combine similarity scores from both location-based and attribute-based search.
Intelligent Filtering & Grouping
- Market-specific filters prevent cross-market contamination while maintaining sufficient candidate pool size
- Property type filters (office, retail, industrial) ensure category-appropriate comparisons
- Optional property subtype filtering provides additional granularity for specialized property subtype categories, such as Data Centers or Self Storage locations
Performance Optimization Strategy
- Market-based grouping reduces computational overhead by limiting search scope to relevant geographic regions
- Asynchronous processing enables concurrent searches across multiple market/ property type combinations
- The system efficiently processes millions of properties, typically returning 500-1000 highly relevant candidates within milliseconds
This multi-layered approach ensures that the initial candidate selection captures both the most geographically proximate and functionally similar properties, creating an optimal foundation for the subsequent machine learning refinement phase.
Step 2 – Machine Learning Refinement
The candidate set from Step 1 is further analyzed using a tree-based XGBoost model that employs market-contextualized feature engineering:
Market-Relative Feature Analysis
Rather than using raw property differences, the model employs percentile-based comparisons that account for market-specific norms. For example, a 10-floor difference between office properties in Midtown Manhattan represents a much smaller relative variance than the same difference between industrial properties in New Jersey, where building heights are typically more uniform.
Property Type-Specific Feature Sets
Office Properties:
- Property Attributes: Key attributes include property size, floors, and year_built evaluated relative to local office market norms
- Lease Attributes: Office-to-retail ratios, flex space allocation, and mixed-use space composition based on tenant lease data
- Market Position: Rent levels compared to local office market distributions
- Geographic Location: Proximity and submarket positioning within the broader office ecosystem
Industrial Properties:
- Property Attributes: Key attributes include ceiling height, property size, year built, and number of loading docks
- Lease Attributes: Tenant industry makeup and distribution, and occupancy patterns
- Property Subtype Similarity: Warehouse, manufacturing, flex industrial, and distribution center classifications
Market Normalization Process
- Percentile Ranking: All quantitative features are converted to percentile rankings within their respective markets
Model Training & Refinement
- User-Contributed Data: The XGBoost model incorporates custom comparable sets contributed by CompStak users, to train the model and ensure real-world validation
- Focused Analysis: By concentrating on the pre-filtered candidate pool, the model can perform detailed feature analysis without computational constraints
- Re-ranking Logic: Candidates are re-ordered based on the comprehensive feature analysis, often significantly improving upon the initial vector search results
Probability Scoring
- Confidence Measurement: Each comparison receives a probability score indicating the confidence level of the comparability match
- Interpretable Output: Raw model scores are transformed through logistic scaling to produce user-friendly probabilities while preserving the relative ranking accuracy
- Market-Adjusted Thresholds: Probability interpretation accounts for market-specific data quality and coverage variations
This approach ensures that property comparisons are meaningful within their specific market and property type context, rather than applying universal standards that may not reflect local real estate dynamics.
Pipeline Overview

Product Impact
CompStak Platform Integration
CompStak maintains the Property CompSet for all properties in our database. This can be found on the property page, under the Competitive Set tab which provides model-suggested comparable set and allows users to customize the comparable set selection.

API Integration
In addition to the platform experience, Property CompSet can be accessed directly through the CompStak API. Users can provide either an address or geographic coordinates (latitude/longitude), along with key property attributes such as property type, property size, floors, year built, ceiling height, and number of loading docks. The API returns a ranked set of comparable properties with associated probability scores, enabling seamless integration into external workflows such as portfolio analysis, underwriting, and valuation pipelines.

Disclaimer: This document is proprietary and confidential. Unauthorized copying, sharing or distribution of this document or the information contained herein is strictly prohibited.
