kelper/addis_care_real_data_analysis.py

Step 1: Data Aggregation by ZIP Code

# Group by ZIP and count providers
zip_analysis = df.groupby('zip').agg({
    'provider_type': 'count',        # Total providers per ZIP
    'state': 'first'                 # State for each ZIP
}).reset_index()

# Separate ALF and HCBS counts by ZIP
alf_by_zip = alf_providers.groupby('zip').agg({
    'provider_type': 'count'         # ALF count per ZIP
}).reset_index()

hcbs_by_zip = hcbs_providers.groupby('zip').agg({
    'provider_type': 'count'         # HCBS count per ZIP
}).reset_index()

Step 2: Calculate Provider Percentages

# Calculate what percentage of providers are ALF vs HCBS
zip_analysis['alf_percentage'] = zip_analysis['alf_count'] / zip_analysis['total_providers'] * 100
zip_analysis['hcbs_percentage'] = zip_analysis['hcbs_count'] / zip_analysis['total_providers'] * 100

Step 3: Define Risk Factors (Based on Industry Knowledge)

Since we don't have real Medicaid data, I used provider characteristics that correlate with Medicaid dependency:

Risk Factor 1: HCBS-Dominant Areas (>70% HCBS)

Risk Factor 2: High Provider Density (>100 total providers)

Risk Factor 3: ALF-Heavy Areas (>50% ALF)

Step 4: Calculate Risk Scores