Ancestry & Breast Cancer

BREAST CANCER · ANCESTRY GENOMICS

How Genetic Ancestry Shapes Breast Cancer Biology

A multi-omics re-analysis of TCGA-BRCA (n 1,084) and the RA-QA Arab cohort (n 24), examining transcriptomic, immune, pathway, genomic, and prognostic layers.

Roelands et al. 2021 (npj Breast Cancer) · TCGA-BRCA (n=1,084) · RA-QA (n=24) · RNA-seq · Ancestry genomics · Multi-omics re-analysis

Executive Summary

Central Finding

Ancestry shapes breast cancer biology across every molecular layer tested

Genetic ancestry shapes breast cancer biology across transcriptomic (11,424 DEGs), immune (Th2 most robust, TReg challenged), pathway (78 enriched, IFNa top novel), genomic (CNA-driven, not mutation-driven), and prognostic (98.8% non-overlapping signatures) layers — but the total survival effect is null after adjusting for subtype, age, and stage.

1,084

TCGA Patients Analyzed

11,424

DEGs Identified (FDR<0.05)

15

Novel Findings

20

Drug Targets Mapped

Central Finding

Genetic ancestry shapes breast cancer biology across transcriptomic (11,424 DEGs), immune (Th2 most robust, TReg challenged), pathway (78 enriched, IFNa top novel), genomic (CNA-driven, not mutation-driven), and prognostic (98.8% non-overlapping signatures) layers — but the total survival effect is null after adjusting for subtype, age, and stage.

RA-QA Arab Cohort

The RA-QA cohort (24 Arab breast cancer patients) is the first RNA-seq dataset from this population. FAM20C, upregulated in Arab patients, cross-validates in the larger TCGA cohort (padj=0.009), but the cohort is severely underpowered for genome-wide discovery.

Clinical Implications

Three therapeutic themes emerge: (1) Immunotherapy optimization — AA tumors show IFNa-hot/checkpoint-high phenotype; (2) Targeted therapy — PI3K-Akt dominates AA prognosis despite lower PIK3CA mutations; (3) Precision medicine — ancestry-specific molecular tests needed given 98.8% non-overlapping prognostic signatures.

Study Design & Data Quality

24

RA-QA Samples

1,084

TCGA-BRCA Patients

0.977 κ

Ancestry Concordance

0

Discordant Samples

Ancestry Calling Method Concordance

Pairwise Cohen's kappa across 5 SNP-based ancestry calling methods. All pairs exceed κ=0.95, indicating near-perfect agreement.

Top Features by Batch Effect

Cohen's d for the 10 features with largest systematic differences between RA-QA and TCGA (PERMANOVA R²=0.180). Red = higher in RA-QA, blue = higher in TCGA.

Admixture Proportions by Ancestry Group

GroupNAFR MeanEUR MeanEAS MeanSAS Mean
EUR7530.01470.96350.01000.0075
AFR1640.79980.17510.01330.0028
EAS540.02020.02820.94170.0076
SAS80.02560.21860.03330.7205

Mean admixture proportions across ancestry groups. AFR-classified patients show ~18% European admixture, creating a meaningful gradient for continuous analyses.

Ancestry Validation

Five independent SNP-based ancestry calling methods show near-perfect concordance (mean κ=0.977, 0 discordant samples). AFR-classified patients show ~18% European admixture, creating a meaningful gradient for continuous analyses.

Batch Effects

Strong systematic batch effects between RA-QA and TCGA enrichment scores (PERMANOVA R²=0.180, 55/58 features significant at FDR<0.05, median |Cohen’s d|=2.07). Cross-cohort quantitative comparison is invalid. All within-cohort analyses (Tracks 2–7) remain valid.

RA-QA Arab Cohort Analysis

24

Samples

3

DEGs (FDR<0.1)

7

OS Events

9.44

Min Detectable HR

Sample Structure: PCA of RA-QA Expression Profiles

PCA of 24 RA-QA samples colored by ethnicity (Arab/Asian/Other). No ethnicity-driven clustering is observed — subtype and individual variation dominate.

Differentially Expressed Genes: Arab vs Non-Arab

Top 20 genes by significance (Arab vs non-Arab, adjusted for Basal/non-Basal subtype). Only 3 genes reach FDR<0.1: ASIC3, OR10G1P, FAM20C — all upregulated in Arab patients. Gray bars = not significant.

Kaplan-Meier Survival Stratifications

Stratification ▲GroupNEventsEvent Rate (%)Log-rank p
EthnicityArab16743.8%
EthnicityAsian500.0%
EthnicityOther300.0%
Ethnicity (binary)Arab16743.8%0.2483
Ethnicity (binary)Non-Arab800.0%0.2483
ICR ClusterICR-High700.0%0.2811
ICR ClusterICR-Low17741.2%0.2811
IMS_PAM50Basal9222.2%0.7590
IMS_PAM50Her23133.3%0.7590
IMS_PAM50LumA7342.9%0.7590
IMS_PAM50LumB2150.0%0.7590
IMS_PAM50Normal300.0%0.7590
Subtype (binary)Basal9222.2%0.6034
Subtype (binary)Non-Basal15533.3%0.6034

All 7 OS events occur in Arab patients (43.8% event rate). Non-Arab patients (n=8) have 0 events, likely reflecting shorter median follow-up (2.0 vs 8.4 years). No stratification reaches log-rank significance (all p>0.24).

FAM20C: Cross-Cohort Validated

FAM20C (Golgi casein kinase, known oncogene in TNBC) is upregulated in Arab patients (LFC=+2.18, padj=0.053) and cross-validates in TCGA AA vs EA (LFC=+0.25, padj=0.009). This is the strongest RA-QA finding and the only gene confirmed across both cohorts.

Power Limitations

With 24 samples and only 7 OS events (all in Arab patients), the RA-QA cohort is severely underpowered. The minimum detectable HR at 80% power is 9.44. Larger Middle Eastern cohorts are needed for ancestry-specific breast cancer discovery in this population.

No Ethnicity-Driven Structure

Unsupervised analysis (PCA, UMAP, Leiden clustering) reveals no ethnicity-driven transcriptomic structure. Clustering aligns weakly with molecular subtype (ARI=0.027) and not with ethnicity, confirming that the cohort lacks power for transcriptomic discovery at the population level.

Genome-Wide Differential Expression

11,424

DEGs (FDR<0.05)

158

Large Effect (|LFC|>1)

193

BasalMyo DEGs (|LFC|>1)

r=0.84

Direction Concordance

Top 30 DEGs by Significance — All Subtypes (AA vs EA)

Top 30 most significant genes from the all-subtypes AA vs EA contrast (limma, adjusted for subtype, age, stage). Red = higher in AA, Blue = higher in EA. Stars: *** FDR < 0.001, ** < 0.01, * < 0.05. LOC90784 (lncRNA, chr19) is the most significant gene (padj = 3.99e-45).

Top 30 DEGs by Significance — BasalMyo (AA vs EA)

Top 30 most significant genes from the BasalMyo-restricted AA vs EA contrast. Stars: *** FDR < 0.001, ** < 0.01, * < 0.05. LOC90784 remains the top hit (padj = 1.11e-12). 193 genes reach |LFC|>1 with 136 upregulated in AA, reflecting the stronger ancestry signal in basal-like tumors.

Top 100 DEGs — All Subtypes

Genelog₂FCAvg Exprt-statP-valueFDR ▲
LOC90784-1.008.25-15.762.1e-494.0e-45
CROCCL1+1.037.9214.517.0e-436.5e-39
CRYBB2+1.241.4514.202.6e-411.6e-37
FAM3A+0.819.3213.802.6e-391.2e-35
HEXDC+1.087.9413.745.1e-391.9e-35
NACA2+1.275.4513.726.5e-392.0e-35
PRSS45+1.511.3813.431.6e-374.2e-34
DDX6-0.5611.79-13.115.4e-361.3e-32
SNRNP70+0.7411.3212.953.3e-356.8e-32
OGFOD2+0.778.0312.943.7e-356.9e-32
FGD4-0.878.13-12.695.6e-349.4e-31
CDK10+0.878.9012.667.6e-341.2e-30
EXD3+0.867.3912.381.5e-322.2e-29
FBXL8+1.155.6212.371.7e-322.3e-29
C19orf60+1.128.0312.313.3e-324.1e-29
SCAND1+1.069.2512.151.7e-312.0e-28
DDX51+0.608.8312.083.6e-313.9e-28
NSUN5P1+1.176.4112.073.8e-313.9e-28
SPPL2B+0.709.5011.999.0e-318.8e-28
LRRC37A2-0.976.84-11.951.4e-301.3e-27
Showing 1–20 of 100 genes
Page 1 of 5

Top 100 genes from the all-subtypes contrast, ranked by adjusted p-value. Click column headers to sort. Positive logFC indicates higher expression in AA; negative indicates higher in EA.

LOC90784: Novel Top Hit

LOC90784, an uncharacterized lncRNA on chromosome 19, is the most significant gene (padj=3.99e-45 all subtypes, padj=1.11e-12 BasalMyo). It explains up to 36.7% of expression variance by AFR admixture in BasalMyo — the most ancestry-predictive gene in this dataset.

FAM20C Replicates from RA-QA

FAM20C, upregulated in Arab patients in the RA-QA cohort (T2_S1), replicates in TCGA with the same direction (LFC=+0.25, padj=0.009 all subtypes), confirming it as a cross-cohort ancestry-associated gene.

Pathway Enrichment Analysis

78

Significant Pathways

87.5%

Original Recovered

71

Novel Pathways

+1.97

IFNα NES (Top Novel)

Top GSEA Pathways: BasalMyo AA vs EA

Normalized enrichment scores (NES) for the top 30 GSEA pathways in the BasalMyo AA vs EA contrast. Red = enriched in AA (positive NES), Blue = enriched in EA (negative NES). Stars denote FDR significance: *** < 0.001, ** < 0.01, * < 0.05. IFNα response (NES=+1.97) is the top novel Hallmark finding not in the original paper.

Recovery of Original Paper's 16 Pathways

Original PathwayMatched TermNESFDR ▲Concordant
Oxidative phosphorylationOxidative phosphorylation+1.980.0016Yes
Oxidative phosphorylationOxidative Phosphorylation (GO:0006119)+1.990.0050Yes
DNA repairHALLMARK_DNA_REPAIR+1.640.0060Yes
UVB-induced MAPK signalingHALLMARK_UV_RESPONSE_DN-1.680.0061Yes
Oxidative phosphorylationHALLMARK_OXIDATIVE_PHOSPHORYLATION+1.670.0065Yes
PI3K-Akt mTOR signalingHALLMARK_MTORC1_SIGNALING-1.450.0374Yes
mTOR signalingHALLMARK_MTORC1_SIGNALING-1.450.0374Yes
Estrogen responseEstrogen-dependent nuclear events downstream of ESR-membrane signaling-1.630.0822Yes
ERK MAPK signalingGastrin-CREB signalling pathway via PKC and MAPK-1.610.0970Yes
EGF signalingSignaling by EGFR-1.510.1339Yes
ErbB signalingERBB Signaling Pathway (GO:0038127)-1.710.1948Yes
AngiogenesisVEGFA-VEGFR2 Pathway-1.410.1992Yes
PI3K-AKT signalingPI3K events in ERBB2 signaling-1.390.2122Yes
PI3K-Akt mTOR signalingPI3K events in ERBB2 signaling-1.390.2122Yes
ErbB signalingPI3K events in ERBB2 signaling-1.390.2122Yes
ErbB signalingErbB signaling pathway-1.630.2559Yes
PI3K-AKT signalingHALLMARK_PI3K_AKT_MTOR_SIGNALING-1.150.2851Yes
mTOR signalingEnergy dependent regulation of mTOR by LKB1-AMPK-1.250.3181Yes
AMPK signalingEnergy dependent regulation of mTOR by LKB1-AMPK-1.250.3181Yes
AngiogenesisCell Migration Involved In Sprouting Angiogenesis (GO:0002042)-1.450.3771Yes
Estrogen responseRegulation Of Intracellular Estrogen Receptor Signaling Pathway (GO:0033146)-1.370.4310Yes
DNA repairDNA Repair (GO:0006281)-1.360.4426No
AMPK signalingAMPK signaling pathway-1.250.4930Yes
ERK MAPK signalingRegulation Of MAPK Cascade (GO:0043408)-1.290.4987Yes
PI3K-Akt mTOR signalingPI3K-Akt signaling pathway-1.150.5326Yes
PI3K-AKT signalingPI3K-Akt signaling pathway-1.150.5326Yes
DNA repairGap-filling DNA repair synthesis and ligation in GG-NER+1.130.5858Yes
AngiogenesisVEGF signaling pathway+1.060.6941No
EGF signalingVEGF signaling pathway+1.060.6941No
Estrogen responseHALLMARK_ESTROGEN_RESPONSE_LATE+0.980.7211No
Estrogen responseEstrogen signaling pathway-0.960.7297Yes
mTOR signalingmTOR signaling pathway-0.940.7796Yes
PTEN signalingRegulation of PTEN gene transcription-0.900.7934Yes
MAPK up genesHALLMARK_KRAS_SIGNALING_UP-0.940.8329Yes
ERK MAPK signalingMAPK signaling pathway+0.930.9016No
DNA repairMismatch repair+0.890.9364Yes
AngiogenesisHALLMARK_ANGIOGENESIS-0.600.9977Yes

Mapping of the original paper's 16 pathways to Hallmark, KEGG, Reactome, and GO-BP gene sets. "Concordant" indicates whether the enrichment direction matches the original finding. 14/16 pathways recovered by keyword, 5 reach FDR<0.05 with 100% direction concordance among significant hits.

Novel Pathways Not in Original Paper (BasalMyo GSEA)

PathwayCollectionNESFDR ▲Direction
Impaired BRCA2 binding to PALB2Reactome-2.17<1e-16EA > AA
Defective HDR through Homologous Recombination Repair (HRR) due to PALB2 loss of BRCA2/RAD51/RAD51C binding functionReactome-2.17<1e-16EA > AA
Defective HDR through Homologous Recombination Repair (HRR) due to PALB2 loss of BRCA1 binding functionReactome-2.17<1e-16EA > AA
Defective homologous recombination repair (HRR) due to BRCA1 loss of functionReactome-2.17<1e-16EA > AA
Herpes simplex virus 1 infectionKEGG-2.13<1e-16EA > AA
HALLMARK_PROTEIN_SECRETIONHallmark-1.93<1e-16EA > AA
HALLMARK_INTERFERON_ALPHA_RESPONSEHallmark+1.979.3e-4AA > EA
Mitochondrial translation initiationReactome+2.020.0011AA > EA
Mitochondrial translation elongationReactome+2.050.0016AA > EA
Mitochondrial translation terminationReactome+2.030.0016AA > EA
Respiratory electron transportReactome+1.990.0018AA > EA
RibosomeKEGG+2.010.0024AA > EA
Aerobic Respiration (GO:0009060)GO-BP+2.070.0037AA > EA
Complex I biogenesisReactome+1.970.0038AA > EA
Cellular Respiration (GO:0045333)GO-BP+2.040.0043AA > EA
Proton Motive Force-Driven Mitochondrial ATP Synthesis (GO:0042776)GO-BP+2.000.0045AA > EA
Proton Motive Force-Driven ATP Synthesis (GO:0015986)GO-BP+2.000.0045AA > EA
M-decay: degradation of maternal mRNAs by maternally stored factorsReactome-2.040.0047EA > AA
Mitochondrial Translation (GO:0032543)GO-BP+2.000.0053AA > EA
Mitochondrial Gene Expression (GO:0140053)GO-BP+2.080.0057AA > EA
Showing 1–20 of 30 pathways
Page 1 of 2

GSEA pathways significant in BasalMyo AA vs EA that were not among the original paper's 16 reported pathways. These 30 represent the top novel findings by |NES|, spanning Reactome, GO-BP, KEGG, and Hallmark collections.

IFNα: Top Novel Discovery

Interferon alpha response is the most significantly enriched Hallmark pathway in AA BasalMyo (NES=+1.97, FDR<0.001), alongside IFNγ, TNFα/NF-κB, and p53. This interferon/inflammatory signature was not identified in the original paper and suggests AA basal tumors have a more immunologically ‘hot’ microenvironment.

Direction Concordance

All 5 recoverable significant pathways from the original paper show 100% direction concordance. DNA repair and oxidative phosphorylation enriched in AA; mTORC1 and UV response enriched in EA — exactly as originally reported.

Ancestry Dose-Response Analysis

26

Pathways Sig (26/35)

59.7%

Genes Dose-Responsive

0.208

Top Gene R²

0

Survival Interactions

Pathway Enrichment vs AFR Admixture (Regression Beta)

OLS regression beta for each pathway enrichment score vs continuous AFR admixture proportion (n=880), adjusted for age, stage, and TDA subtype. Red = increases with AFR ancestry, Blue = decreases with AFR ancestry, Gray = not significant (FDR≥0.05). Stars: *** < 0.001, ** < 0.01, * < 0.05. DNA repair has the strongest positive dose-response (R²=0.077). 26/35 pathways are significant.

Immune Cell Scores vs AFR Admixture

TReg is the only immune cell type significantly scaling with AFR admixture (beta=+0.038, padj=1.3e-4). NK CD56bright shows a negative association (padj=0.011). Only 2/23 cell types reach FDR<0.05.

Gene-Level Overlap: Admixture vs Categorical DE

97.6%

Gene overlap between methods

10,858
271
566
Both (10,858)Admixture only (271)Cat. DE only (566)

100%

Direction concordance

11,129

Admixture sig

11,424

Categorical DE sig

97.6% of admixture-responsive genes overlap with categorical DE genes, with 100% direction concordance. Both approaches detect the same underlying biology.

Top 50 Admixture-Responsive Genes (All Subtypes)

GeneBeta (AFR)FDR ▲Partial R²Avg Expression
CROCCL1+1.361.0e-410.2087.87
LOC90784-1.222.2e-400.2018.25
CRYBB2+1.599.7e-380.1891.44
PRSS45+1.983.5e-370.1861.33
NACA2+1.631.5e-350.1785.44
FAM3A+1.031.8e-350.1789.31
HEXDC+1.388.4e-350.1747.93
DDX6-0.713.4e-320.16311.79
SNRNP70+0.958.2e-320.16111.30
OGFOD2+0.981.4e-300.1558.02
NSUN5P1+1.563.8e-300.1536.39
WASH7P+1.072.6e-290.1499.74
CDK10+1.102.8e-290.1498.88
EXD3+1.123.2e-290.1497.37
C19orf60+1.442.9e-280.1448.02
FGD4-1.074.5e-280.1438.13
SPPL2B+0.901.2e-270.1419.47
DDX51+0.761.3e-270.1418.81
SCXB+1.531.3e-270.1411.63
ZNF414+0.941.3e-270.1417.29
Showing 1–20 of 50 genes
Page 1 of 3

Top 50 genes ranked by partial R² from OLS regression of expression vs continuous AFR admixture proportion, adjusted for age, stage, and TDA subtype. CROCCL1 (R²=0.208) is the most admixture-predictive gene.

Dose-Response, Not Binary

Ancestry-associated molecular differences scale linearly with admixture proportion — a graded, dose-dependent phenomenon, not a binary switch. 1,285/1,928 extended gene sets (66.6%) show significant dose-response, supporting a polygenic regulatory model.

AMPK Survival: Suggestive but Not Significant

AMPK shows a suggestive survival interaction in BasalMyo (HR=0.624, p=0.055, padj=0.52), directionally consistent with the original paper’s opposing prognostic effects. However, 0/232 formal interaction tests reach FDR significance, limited by only 19 BasalMyo OS events.

Immune Landscape & Checkpoints

7

Methods Used

6/7 sig

Th2: Most Robust

1/7

TReg: 1/7 Methods

18/27

Checkpoints Sig

Immune Cell Type Concordance Across Methods

Number of deconvolution methods (out of 7: ssGSEA, AUCell, MLM, ULM, z-score, GSEA, consensus) finding significant ancestry differences (FDR<0.05) per cell type. Th2 is the most robust (6/7, 100% EA>AA direction concordance). TReg is significant only in ssGSEA because its 4-gene Bindea signature falls below tmin=5 for other methods.

Checkpoint Gene Expression: Ancestry Effect Sizes

Cohen’s d effect size for ancestry differences in 27 immune checkpoint genes (all subtypes, AA vs EA). Red = higher in AA, Blue = higher in EA, Gray = not significant (FDR≥0.05). Stars: *** < 0.001, ** < 0.01, * < 0.05. OX40 shows the largest effect (d=+0.82, AA>EA). Actionable inhibitory checkpoints PD-1, CTLA-4, LAG-3 are all higher in AA tumors.

Th2, Not TReg

Th2 is the most robustly ancestry-differential immune cell type (6/7 methods, 100% EA>AA direction). TReg — the original paper’s headline immune finding — cannot be validated by alternative methods because its 4-gene Bindea signature falls below standard minimum thresholds. The ssGSEA TReg signal is extremely strong (padj=6.5e-9) but method-singular.

OX40: Largest Checkpoint Effect

OX40 (TNFRSF4) shows the largest ancestry effect among checkpoints (d=+0.82, AA>EA, padj=3.2e-18). All three actionable inhibitory checkpoints (PD-1, CTLA-4, LAG-3) are higher in AA, consistent with the IFNα-enriched immune microenvironment.

Distinct Immunosuppression

AA and EA tumors employ different immunosuppressive mechanisms. AA tumors show higher IDO1/TGFB1 (immune-engaged suppression), while EA tumors show higher CD39/CD73 (adenosine-mediated metabolic suppression). This has direct implications for checkpoint inhibitor selection by ancestry.

Transcription Factor Regulation

574

TFs Tested

293

TFs Significant

d=0.97

PAX7 (Top, d=0.97)

NS

FOXP3 (NS, padj=0.51)

Top Transcription Factors by Ancestry Effect Size

Cohen’s d effect size for TF activity differences (ULM + CollecTRI) between AA and EA tumors (all subtypes). Red = higher in AA, Blue = higher in EA, Gray = not significant (FDR≥0.05). Key mechanistic TFs highlighted: GATA3 (Th2 master regulator, EA>AA), IRF7 (IFNα regulator, AA>EA), FOXP3 (TReg master TF, not significant). Stars: *** < 0.001, ** < 0.01, * < 0.05.

Top 50 Ancestry-Differential TFs (All Subtypes)

TFCohen's dDirectionFDRMean AAMean EA
PAX7+0.968AA>EA1.6e-22-0.533-0.850
HOXA10-0.842EA>AA1.0e-181.0841.383
HDAC5+0.744AA>EA5.7e-141.0110.672
HDAC7+0.736AA>EA3.0e-13-1.301-1.614
MSX2-0.723EA>AA6.2e-161.3841.675
FOXP1-0.722EA>AA9.3e-15-2.715-2.491
TAF1-0.716EA>AA3.0e-133.5693.842
HOXA9-0.715EA>AA3.0e-131.8022.144
HEY1-0.696EA>AA1.4e-111.4041.693
NEUROD2+0.693AA>EA1.2e-11-1.434-1.637
EHF+0.673AA>EA4.2e-121.3911.071
KDM5C+0.672AA>EA1.2e-112.0241.856
SMARCA1+0.653AA>EA4.1e-121.6641.422
E2F6-0.648EA>AA7.8e-110.3900.560
KAT2B-0.643EA>AA3.5e-120.6530.817
MECP2+0.639AA>EA2.2e-11-0.242-0.590
PTF1A+0.637AA>EA2.0e-9-3.020-3.292
RCOR2+0.634AA>EA1.9e-82.1782.018
TEAD1+0.629AA>EA3.2e-101.8651.638
NR2F1-0.627EA>AA1.2e-111.8952.099
Showing 1–20 of 50 TFs

GATA3 Explains Th2

GATA3, the master TF for Th2 differentiation, has significantly higher activity in EA tumors (d=-0.47, padj=2.0e-6). This mechanistically explains why Th2 cells are the most robustly EA-enriched immune cell type.

IRF7 Explains IFNα

IRF7, the master regulator of type I interferon signaling, is significantly higher in AA tumors (d=+0.33, padj=1.1e-3), explaining the IFNα pathway enrichment discovered in T2_S3.

FOXP3 Challenges TReg

FOXP3, the canonical TReg master TF, is NOT significantly different between ancestries (padj=0.51). If TReg infiltration truly differed by ancestry, FOXP3 activity should differ too. This is the strongest challenge to the original paper’s TReg finding.

Pathway Network Rewiring

76.0%

Shared Edges

94

Diff Correlations

27

Sig BD Pairs

12

Pattern Switches

Top Differential Pathway Correlations (AA vs EA)

Difference in Spearman correlation (Δρ = ρAA − ρEA) for the top 30 most differentially correlated pathway pairs. Red = Wnt-involved pairs (hub of rewiring), Orange = stronger co-enrichment in AA, Blue = stronger co-enrichment in EA. Stars: *** FDR < 0.001, ** < 0.01, * < 0.05.

AMPK–PI3K Pathway Coupling

0.545

ρ AA

0.555

ρ EA

-0.010

Δρ

0.86

p-value

AMPK–PI3K/AKT correlation is virtually identical across ancestries (ρ ≈ 0.55, p = 0.86), ruling out differential cross-talk as the explanation for the original paper’s claim of opposing AMPK prognostic effects by ancestry.

Breslow-Day Significant Pairs (27)

Feature AFeature BOR AAOR EAPattern AAPattern EASwitchBD FDR
[HM] Estrogen response[HM] Wnt beta catenin signaling0.171.40mut-exclco-occYES1.3e-6
[HM] Estrogen response[HM] KRAS signaling down0.341.90mut-exclco-occYES1.9e-4
Th2 cells[HM] Estrogen response12.251.97co-occco-occno7.3e-4
[HM] DNA repair[HM] PI3K Akt mTOR signaling0.180.94mut-exclmut-exclno0.0011
[HM] Wnt beta catenin signaling[HM] mTORC1 signaling2.250.49co-occmut-exclYES0.0013
[HM] Estrogen response[HM] UV response up0.150.71mut-exclmut-exclno0.0023
[HM] G2M checkpoint[HM] Wnt beta catenin signaling2.980.67co-occmut-exclYES0.0023
[HM] Estrogen response[HM] Notch signaling0.251.08mut-exclco-occYES0.0037
[HM] Mitotic spindle[HM] Notch signaling3.290.82co-occmut-exclYES0.0066
[HM] Notch signaling[IPA] ErbB Signaling9.552.10co-occco-occno0.0066
[HM] Oxidative phosphorylation[HM] PI3K Akt mTOR signaling0.170.70mut-exclmut-exclno0.0066
[HM] E2F targets[HM] Wnt beta catenin signaling2.470.65co-occmut-exclYES0.0068
[HM] Myc targets[HM] PI3K Akt mTOR signaling0.582.23mut-exclco-occYES0.0068
[HM] Myc targets[HM] Wnt beta catenin signaling2.470.65co-occmut-exclYES0.0068
[HM] Mitotic spindle[HM] Wnt beta catenin signaling3.290.85co-occmut-exclYES0.0074
[HM] PI3K Akt mTOR signaling[HM] TGF beta signaling5.441.43co-occco-occno0.0142
[HM] PI3K Akt mTOR signaling[HM] UV response down4.421.22co-occco-occno0.0183
Th1 cells[TPW] Immunogenic Cell Death (ICD)8.492.28co-occco-occno0.0308
[TPW] Immunogenic Cell Death (ICD)aDC12.253.12co-occco-occno0.0308
[HM] Mitotic spindle[TPW] Immunogenic Cell Death (ICD)4.901.43co-occco-occno0.0334
Showing 1–20 of 27 pairs

Wnt: Hub of Rewiring

Wnt/β-catenin is the central hub of ancestry-specific rewiring, showing pattern switches with G2M, mTORC1, E2F, Myc, Mitotic spindle, and Estrogen response. In AA tumors, Wnt drives a coordinated proliferative program; in EA tumors, Wnt is decoupled from cell cycle.

AMPK-PI3K: Not Different

AMPK-PI3K pathway coupling is nearly identical across ancestries (ρₐₐ=0.545, ρᴇᴀ=0.555, p=0.86), ruling out differential cross-talk as the explanation for the original paper’s claim of opposing AMPK prognostic effects by ancestry.

Genomic Architecture

6

Sig Arm Events

3.87

16p Loss OR

0

Sig Gene Mutations

9,839

Genes Tested

Significant Chromosomal Arm Events (FDR < 0.05)

All 6 significant arm-level CNA events are more frequent in AA tumors. Dashed line at OR=1 marks no difference. Red = arm loss, Blue = arm gain. Stars: *** padj < 0.001, ** < 0.01, * < 0.05.

Continuous Genomic Features (All Subtypes)

Median values for AA vs EA (sample sizes vary by feature: 161–179 AA, 761–795 EA). FGA is higher in AA (p=0.011, borderline after FDR). Aneuploidy is significantly higher in AA BasalMyo (median 17 vs 14, p=0.008).

Key Driver Mutation Frequencies

Top nominally significant genes. PIK3CA/CDH1 lower in AA (luminal-associated), TP53 higher in AA (basal-associated). FBXW7 novel (OR=4.92). None survive FDR correction.

BasalMyo Aneuploidy: Intrinsic Ancestry Effect

17.0

Median AA (n=52)

14.0

Median EA (n=101)

0.008

p-value

r = −0.26

Rank-biserial

Within BasalMyo (subtype-controlled), AA tumors have 21% higher aneuploidy scores than EA. This is the only genomic feature with a significant within-subtype difference, confirming that the overall FGA signal is not purely driven by subtype composition.

Top 20 Nominally Differentially Mutated Genes (All Subtypes)

9,839 genes tested, 0 at FDR < 0.1, 70 nominally significant (p < 0.05). All adjusted p-values = 1.0.

GeneFreq AAFreq EAOdds Ratiop-valueDriver
PIK3CA21.1%36.5%0.471.6e-4driver
CDH15.0%14.6%0.314.4e-4driver
TP5346.0%31.1%1.884.6e-4driver
FBXW75.0%1.1%4.920.0027
KCTD81.9%0.0%0.0052
CYP27A11.9%0.0%0.0052
RBBP91.9%0.0%0.0052
ANK30.0%3.5%0.000.0085
RPAP22.5%0.3%9.670.0101
NAPA2.5%0.3%9.670.0101
EFR3B2.5%0.3%9.670.0101
PUM12.5%0.3%9.670.0101
CBFB0.0%3.1%0.000.0136driver
SACS0.0%3.1%0.000.0136
VPS13C0.0%3.3%0.000.0137
KIF4A5.0%1.6%3.260.0138
OTOA3.1%0.7%4.850.0182
LCMT23.1%0.7%4.850.0182
KLHL281.9%0.1%14.430.0183
ZFHX21.9%0.1%14.430.0183

Showing 20 genes (161 AA vs 761 EA). Blue gene names = known cancer drivers. Higher frequency highlighted in the ancestry column with higher value.

CNA-Driven, Not Mutation-Driven

Ancestry genomic differences in breast cancer are primarily copy number-driven (6 significant arm events, FGA higher in AA) rather than mutation-driven (0/9,839 genes at FDR<0.1). This explains why the original paper’s mutation analysis (script C_36) was never shown.

16p Loss: Top CNA Event

16p loss is the most ancestry-differential CNA event (OR=3.87, padj=6.5e-4, AA>EA). This arm harbors PALB2 (DNA repair) and CIITA (antigen presentation), potentially linking CNA to both the DNA repair pathway activation and immune phenotype differences observed in AA tumors.

Survival & Prognosis

HR=1.05 p=0.84

Total Effect (Null)

1.2% (2/167)

Signature Overlap

0.652

EA C-index

37–48%

TF SHAP Share

HR=1.05

p = 0.837

Total ancestry effect on survival is null

After adjusting for subtype, age, and stage, African ancestry does not confer a survival advantage or disadvantage (OS endpoint, n=957, 137 events). Breast cancer survival disparities are largely explained by clinical factors, not ancestry-specific molecular features.

SHAP Feature Importance: Ancestry-Specific Survival Models

Top 20 features by mean |SHAP| value for each ancestry-specific CoxPH model. Colors indicate feature group.

TF activityPathwayImmuneCheckpointGenomic

AA model is dominated by PI3K-Akt-mTOR (SHAP=0.510), while EA model spreads importance across TF and checkpoint features. NT5E (CD73) is the top EA feature — consistent with the adenosine immunosuppression pathway identified in the checkpoint analysis.

Feature Group SHAP Contributions

TF activity contributes 37–48% of SHAP importance across all models, peaking in BasalMyo (47.6%). This layer was entirely absent from the original paper.

Causal Mediation: Ancestry → Survival

OS endpoint (n=957, 137 events). TReg is the sole significant mediator (indirect = −0.089, p=0.030) via suppression (negative indirect effect despite null total). PD-1 is marginal (p=0.064). AMPK shows zero mediation (p=0.918). Green = p<0.05, Yellow = p<0.10.

Prognostic Signature Divergence

82

EA-only genes

85

AA-only genes

2

Shared genes

1.2%

Overlap

GNG4, RPS6KA6

Shared gene names

Of 165 total unique prognostic genes, only 2 (1.2%) are shared between EA and AA signatures. This is the most extreme prognostic divergence observed at any molecular level.

EA Prognostic Signature (82 genes)

C-index = 0.652 (n=726, 101 events). Top 30 of 82 genes shown.

GeneCoefficientHazard Ratio
PIGR-0.14510.865
L1CAM+0.14461.156
LOC100130148-0.14000.869
CLEC3A+0.12861.137
QPRT+0.12631.135
IGFBPL1-0.12200.885
KLRB1-0.11770.889
TNFRSF18-0.11710.889
PCDHGB5+0.10641.112
PCDHGA2+0.10361.109
CD163L1+0.09861.104
C6orf141-0.09580.909
LOC647859+0.09431.099
SULT4A1-0.08990.914
CYP4B1+0.08411.088
DAPL1-0.07680.926
CACNA1H+0.07351.076
NFE2-0.07190.931
CNNM1-0.07090.932
C11orf70+0.06971.072

1–20 of 30

AA Prognostic Signature (85 genes)

C-index = 0.408 (n=106, 13 events). Top 30 of 85 genes shown.

GeneCoefficientHazard Ratio
EN2+0.07251.075
PEG3+0.06091.063
TMSB15A+0.05971.061
ZNF385B-0.05890.943
KLHDC7B-0.05270.949
CCNA1-0.05120.950
ALOX15+0.04621.047
ZNF215-0.04000.961
NOTUM+0.03971.041
CRISPLD1+0.03751.038
PNMA2-0.03640.964
CHRM1+0.03561.036
PCSK6-0.03490.966
PXDNL+0.03471.035
SEMA3E-0.03350.967
ANKRD43-0.03350.967
MCTP2-0.03290.968
DNAH11+0.03211.033
TRPA1-0.03190.969
ARNTL2-0.03110.969

1–20 of 30

Null Total Effect

The total ancestry effect on survival is null (HR=1.046, p=0.837) after adjusting for subtype, age, and stage. Breast cancer survival disparities between ancestries are largely explained by subtype composition, age, and stage rather than ancestry-specific molecular features.

98.8% Prognostic Divergence

EA (82 genes) and AA (85 genes) prognostic signatures share only 2 genes (GNG4, RPS6KA6) — 1.2% overlap. This is the most extreme prognostic divergence observed at any molecular level, arguing strongly for ancestry-specific molecular tests.

TF Activity: Most Informative Layer

Transcription factor activity features contribute 37–48% of total SHAP importance across all survival models — exceeding pathways, immune cells, checkpoints, and genomic features. This was undetectable in the original paper which lacked TF features.

Evidence Synthesis & Clinical Implications

82.3% (93/113)

Features Sig ≥1 Track

32

Features Sig ≥3 Tracks

10

Clinical Hypotheses

9

FDA-Approved Drugs

Cross-Track Evidence Matrix

54 features across 12 analysis tracks. Cell color indicates significance level. Features colored by type: Pathway, Transcription Factor, Immune Cell, Checkpoint, CNA Arm Event, Genomic Summary.

82.3% of features (93/113 examined) are significant in at least one analysis track. 10 features achieve 4-track convergence (including DNA repair, oxidative phosphorylation, mTORC1, Myc targets, and p53), followed by 22 features with 3-track support including key TFs (IRF7, NFKB1, AR) and TReg.

Relation to Original Paper

7

Extend (47%)

7

Orthogonal (47%)

1

Contradict (7%)

Novel Findings (15)

#FindingTracksRelation to PaperConfidence
1IFNα response top enriched in AA BasalMyo (NES=+1.97)3ExtendsHigh
2Wnt/β-catenin is hub of ancestry-specific rewiring (12 pattern switches)3OrthogonalHigh
3Th2 most robust immune cell type (6/7 methods), not TReg3ExtendsHigh
4TReg challenged: FOXP3 NS (padj=0.51), 1/7 methods only3ContradictsMedium
5OX40 largest checkpoint effect (d=+0.82, AA>EA)3OrthogonalHigh
6Distinct immunosuppression: IDO1/TGFB1 (AA) vs CD39/CD73 (EA)3OrthogonalMedium
7TF activity most prognostically informative (37–48% SHAP)3OrthogonalMedium
8Prognostic signatures 1.2% overlap (2/167 genes)3ExtendsMedium
916p loss top CNA event (OR=3.87, PALB2/CIITA locus)2OrthogonalMedium
10Ancestry genomic differences CNA-driven, not mutation-driven2OrthogonalHigh
11Total ancestry-survival effect is null (HR=1.046, p=0.837)3ExtendsHigh
12AMPK shows zero mediation (p=0.918): modification ≠ mediation4ExtendsHigh
1359.7% of genes show admixture dose-response3ExtendsHigh
14PI3K-Akt-mTOR dominates AA prognosis (SHAP=0.510)3ExtendsMedium
15PAX7 novel top TF (d=+0.97, AA>EA, no prior BC link)2OrthogonalMedium

Confidence: High = multi-track convergence with large effects; Medium = significant but fewer supporting tracks or smaller effects.

Drug-Target Mappings (20)

9 FDA-approved and 11 investigational drugs mapped to ancestry-differential targets. Hover over rows for mechanism details.

DrugTargetClassStatusDirection
PembrolizumabPDCD1 (PD-1)Anti-PD-1 mAbFDA-ApprovedAA>EA
NivolumabPDCD1 (PD-1)Anti-PD-1 mAbFDA-ApprovedAA>EA
IpilimumabCTLA4 (CTLA-4)Anti-CTLA-4 mAbFDA-ApprovedAA>EA
Relatlimab + Nivolumab (Opdualag)LAG3 (LAG-3)Anti-LAG-3 + Anti-PD-1 comboFDA-ApprovedAA>EA
Ivuxolimab (PF-04518600)TNFRSF4 (OX40)OX40 agonist mAbPhase I/IIAA>EA
BMS-986156TNFRSF18 (GITR)GITR agonist mAbPhase I/IIAA>EA
EpacadostatIDO1IDO1 inhibitorPhase IIIAA>EA
GalunisertibTGFB1 (TGF-b1)TGFbR1 kinase inhibitorPhase IIAA>EA
OleclumabNT5E (CD73)Anti-CD73 mAbPhase IIEA>AA
TTX-030ENTPD1 (CD39)Anti-CD39 mAbPhase IEA>AA
Ciforadenant (CPI-444)ADORA2A (A2AR)A2AR antagonistPhase I/IIAA>EA
AlpelisibPIK3CA (PI3K pathway)PI3Kα inhibitorFDA-ApprovedEA>AA (mutation), AA prognostic
CapivasertibAKT1/2/3 (AKT pathway)AKT inhibitorFDA-ApprovedAA prognostic (SHAP=0.510)
EverolimusMTOR (mTOR pathway)mTOR inhibitorFDA-ApprovedEA>AA (enrichment), AA prognostic
LGK974/WNT974WNT pathway (multiple)PORCN inhibitorPhase I/IIdifferentially wired (AA: proliferative coupling)
OlaparibPARP1/2 (DNA repair)PARP inhibitorFDA-ApprovedAA>EA (DNA repair activation, 16p/PALB2 loss)
TalazoparibPARP1/2 (DNA repair)PARP inhibitorFDA-ApprovedAA>EA
EntinostatHDAC5/HDAC7Class I HDAC inhibitorPhase IIIAA>EA
Tucidinostat (Chidamide)HDAC5/HDAC7HDAC inhibitorApprovedAA>EA
JQ1 / OTX015 (BET inhibitors)MYCBET bromodomain inhibitorPhase I/IIAA>EA

Immunotherapy Optimization

AA BasalMyo tumors display an IFNα-hot/checkpoint-high phenotype (PD-1, CTLA-4, LAG-3, OX40 all AA>EA) predicting enhanced ICI response. Distinct immunosuppressive mechanisms suggest IDO inhibitors for AA and adenosine pathway inhibitors (CD39/CD73) for EA tumors.

Targeted Therapy

PI3K-Akt-mTOR dominates AA prognosis (SHAP=0.510) despite lower PIK3CA mutations, arguing for pathway activity-based rather than mutation-based biomarker selection. 16p loss (PALB2) + DNA repair activation suggest expanded PARP inhibitor eligibility for AA patients.

Precision Medicine Infrastructure

98.8% non-overlapping prognostic signatures + TF activity as most informative feature (37–48% SHAP) argue for ancestry-specific molecular tests. 59.7% gene dose-response argues for continuous admixture as a clinical trial covariate.

Limitations & Conclusions

This re-analysis extends the Roelands et al. 2021 findings across multiple molecular layers, but several limitations constrain interpretation. Each is stated below with its implications for the reported results.

RA-QA Power

The RA-QA cohort (n=24, 7 OS events) is severely underpowered for genome-wide discovery. The minimum detectable HR at 80% power is 9.44. Only 3 DEGs at FDR<0.1, and no survival predictors reach significance. Larger Middle Eastern cohorts are urgently needed.

Cross-Cohort Batch Effects

Strong systematic batch effects between RA-QA and TCGA (PERMANOVA R²=0.180, 55/58 features significant at FDR<0.05, median |Cohen’s d|=2.07). Direct quantitative cross-cohort comparison of enrichment scores is invalid. All within-TCGA analyses are unaffected.

Null Total Survival Effect

The total ancestry effect on survival is null (HR=1.046, p=0.837) after adjusting for subtype, age, and stage. While individual molecular features differ extensively by ancestry, these differences do not aggregate into a net survival disparity. Clinical disparities likely arise from non-molecular factors (access, screening, socioeconomic).

Retrospective Observational Design

All findings are observational and correlational. Clinical hypotheses require prospective validation. The 10 therapeutic hypotheses are research-generating, not treatment recommendations. Missing data: no protein-level validation, no intra-Arab genetic heterogeneity analysis, no socioeconomic confounders, no single-cell validation.

Relation to Original Paper

Of 15 key findings: 7 extend the original paper’s conclusions (47%), 7 are orthogonal (new dimensions not explored), and 1 contradicts (TReg robustness challenged by FOXP3 NS and 1/7 method concordance). 87.5% of original pathways recovered with 100% direction concordance among significant hits.

Conclusions

Genetic ancestry shapes breast cancer biology across every molecular layer tested — transcriptomic (11,424 DEGs), immune (Th2 most robust across 6/7 methods; TReg challenged by FOXP3 non-significance), pathway (78 enriched, IFNa top novel finding), genomic (CNA-driven, not mutation-driven), and prognostic (98.8% non-overlapping survival signatures). These effects follow a continuous dose-response with admixture proportion rather than a binary switch.

Critically, the total ancestry effect on overall survival is null (HR=1.046, p=0.837) after adjusting for subtype, age, and stage. The extensive molecular differences do not aggregate into a net survival disparity — suggesting that clinical outcome disparities arise from non-molecular factors such as access to care, screening practices, and socioeconomic context.

Of the 15 key findings, 7 extend the original paper (47%), 7 are orthogonal (47%), and 1 contradicts (7% — TReg robustness). All 10 therapeutic hypotheses require prospective validation before clinical translation.

This report was generated with the assistance of AI. While every effort has been made to ensure accuracy, AI can make mistakes — please verify key findings against primary data before drawing conclusions.