BREAST CANCER · ANCESTRY GENOMICS
How Genetic Ancestry Shapes Breast Cancer Biology
A multi-omics re-analysis of TCGA-BRCA (n 1,084) and the RA-QA Arab cohort (n 24), examining transcriptomic, immune, pathway, genomic, and prognostic layers.
Roelands et al. 2021 (npj Breast Cancer) · TCGA-BRCA (n=1,084) · RA-QA (n=24) · RNA-seq · Ancestry genomics · Multi-omics re-analysis
Executive Summary
Central Finding
Ancestry shapes breast cancer biology across every molecular layer tested
Genetic ancestry shapes breast cancer biology across transcriptomic (11,424 DEGs), immune (Th2 most robust, TReg challenged), pathway (78 enriched, IFNa top novel), genomic (CNA-driven, not mutation-driven), and prognostic (98.8% non-overlapping signatures) layers — but the total survival effect is null after adjusting for subtype, age, and stage.
1,084
TCGA Patients Analyzed
11,424
DEGs Identified (FDR<0.05)
15
Novel Findings
20
Drug Targets Mapped
Central Finding
Genetic ancestry shapes breast cancer biology across transcriptomic (11,424 DEGs), immune (Th2 most robust, TReg challenged), pathway (78 enriched, IFNa top novel), genomic (CNA-driven, not mutation-driven), and prognostic (98.8% non-overlapping signatures) layers — but the total survival effect is null after adjusting for subtype, age, and stage.
RA-QA Arab Cohort
The RA-QA cohort (24 Arab breast cancer patients) is the first RNA-seq dataset from this population. FAM20C, upregulated in Arab patients, cross-validates in the larger TCGA cohort (padj=0.009), but the cohort is severely underpowered for genome-wide discovery.
Clinical Implications
Three therapeutic themes emerge: (1) Immunotherapy optimization — AA tumors show IFNa-hot/checkpoint-high phenotype; (2) Targeted therapy — PI3K-Akt dominates AA prognosis despite lower PIK3CA mutations; (3) Precision medicine — ancestry-specific molecular tests needed given 98.8% non-overlapping prognostic signatures.
Study Design & Data Quality
24
RA-QA Samples
1,084
TCGA-BRCA Patients
0.977 κ
Ancestry Concordance
0
Discordant Samples
Ancestry Calling Method Concordance
Pairwise Cohen's kappa across 5 SNP-based ancestry calling methods. All pairs exceed κ=0.95, indicating near-perfect agreement.
Top Features by Batch Effect
Cohen's d for the 10 features with largest systematic differences between RA-QA and TCGA (PERMANOVA R²=0.180). Red = higher in RA-QA, blue = higher in TCGA.
Admixture Proportions by Ancestry Group
| Group | N | AFR Mean | EUR Mean | EAS Mean | SAS Mean |
|---|---|---|---|---|---|
| EUR | 753 | 0.0147 | 0.9635 | 0.0100 | 0.0075 |
| AFR | 164 | 0.7998 | 0.1751 | 0.0133 | 0.0028 |
| EAS | 54 | 0.0202 | 0.0282 | 0.9417 | 0.0076 |
| SAS | 8 | 0.0256 | 0.2186 | 0.0333 | 0.7205 |
Mean admixture proportions across ancestry groups. AFR-classified patients show ~18% European admixture, creating a meaningful gradient for continuous analyses.
Ancestry Validation
Five independent SNP-based ancestry calling methods show near-perfect concordance (mean κ=0.977, 0 discordant samples). AFR-classified patients show ~18% European admixture, creating a meaningful gradient for continuous analyses.
Batch Effects
Strong systematic batch effects between RA-QA and TCGA enrichment scores (PERMANOVA R²=0.180, 55/58 features significant at FDR<0.05, median |Cohen’s d|=2.07). Cross-cohort quantitative comparison is invalid. All within-cohort analyses (Tracks 2–7) remain valid.
RA-QA Arab Cohort Analysis
24
Samples
3
DEGs (FDR<0.1)
7
OS Events
9.44
Min Detectable HR
Sample Structure: PCA of RA-QA Expression Profiles
PCA of 24 RA-QA samples colored by ethnicity (Arab/Asian/Other). No ethnicity-driven clustering is observed — subtype and individual variation dominate.
Differentially Expressed Genes: Arab vs Non-Arab
Top 20 genes by significance (Arab vs non-Arab, adjusted for Basal/non-Basal subtype). Only 3 genes reach FDR<0.1: ASIC3, OR10G1P, FAM20C — all upregulated in Arab patients. Gray bars = not significant.
Kaplan-Meier Survival Stratifications
| Stratification ▲ | Group | N | Events | Event Rate (%) | Log-rank p |
|---|---|---|---|---|---|
| Ethnicity | Arab | 16 | 7 | 43.8% | — |
| Ethnicity | Asian | 5 | 0 | 0.0% | — |
| Ethnicity | Other | 3 | 0 | 0.0% | — |
| Ethnicity (binary) | Arab | 16 | 7 | 43.8% | 0.2483 |
| Ethnicity (binary) | Non-Arab | 8 | 0 | 0.0% | 0.2483 |
| ICR Cluster | ICR-High | 7 | 0 | 0.0% | 0.2811 |
| ICR Cluster | ICR-Low | 17 | 7 | 41.2% | 0.2811 |
| IMS_PAM50 | Basal | 9 | 2 | 22.2% | 0.7590 |
| IMS_PAM50 | Her2 | 3 | 1 | 33.3% | 0.7590 |
| IMS_PAM50 | LumA | 7 | 3 | 42.9% | 0.7590 |
| IMS_PAM50 | LumB | 2 | 1 | 50.0% | 0.7590 |
| IMS_PAM50 | Normal | 3 | 0 | 0.0% | 0.7590 |
| Subtype (binary) | Basal | 9 | 2 | 22.2% | 0.6034 |
| Subtype (binary) | Non-Basal | 15 | 5 | 33.3% | 0.6034 |
All 7 OS events occur in Arab patients (43.8% event rate). Non-Arab patients (n=8) have 0 events, likely reflecting shorter median follow-up (2.0 vs 8.4 years). No stratification reaches log-rank significance (all p>0.24).
FAM20C: Cross-Cohort Validated
FAM20C (Golgi casein kinase, known oncogene in TNBC) is upregulated in Arab patients (LFC=+2.18, padj=0.053) and cross-validates in TCGA AA vs EA (LFC=+0.25, padj=0.009). This is the strongest RA-QA finding and the only gene confirmed across both cohorts.
Power Limitations
With 24 samples and only 7 OS events (all in Arab patients), the RA-QA cohort is severely underpowered. The minimum detectable HR at 80% power is 9.44. Larger Middle Eastern cohorts are needed for ancestry-specific breast cancer discovery in this population.
No Ethnicity-Driven Structure
Unsupervised analysis (PCA, UMAP, Leiden clustering) reveals no ethnicity-driven transcriptomic structure. Clustering aligns weakly with molecular subtype (ARI=0.027) and not with ethnicity, confirming that the cohort lacks power for transcriptomic discovery at the population level.
Genome-Wide Differential Expression
11,424
DEGs (FDR<0.05)
158
Large Effect (|LFC|>1)
193
BasalMyo DEGs (|LFC|>1)
r=0.84
Direction Concordance
Top 30 DEGs by Significance — All Subtypes (AA vs EA)
Top 30 most significant genes from the all-subtypes AA vs EA contrast (limma, adjusted for subtype, age, stage). Red = higher in AA, Blue = higher in EA. Stars: *** FDR < 0.001, ** < 0.01, * < 0.05. LOC90784 (lncRNA, chr19) is the most significant gene (padj = 3.99e-45).
Top 30 DEGs by Significance — BasalMyo (AA vs EA)
Top 30 most significant genes from the BasalMyo-restricted AA vs EA contrast. Stars: *** FDR < 0.001, ** < 0.01, * < 0.05. LOC90784 remains the top hit (padj = 1.11e-12). 193 genes reach |LFC|>1 with 136 upregulated in AA, reflecting the stronger ancestry signal in basal-like tumors.
Top 100 DEGs — All Subtypes
| Gene | log₂FC | Avg Expr | t-stat | P-value | FDR ▲ |
|---|---|---|---|---|---|
| LOC90784 | -1.00 | 8.25 | -15.76 | 2.1e-49 | 4.0e-45 |
| CROCCL1 | +1.03 | 7.92 | 14.51 | 7.0e-43 | 6.5e-39 |
| CRYBB2 | +1.24 | 1.45 | 14.20 | 2.6e-41 | 1.6e-37 |
| FAM3A | +0.81 | 9.32 | 13.80 | 2.6e-39 | 1.2e-35 |
| HEXDC | +1.08 | 7.94 | 13.74 | 5.1e-39 | 1.9e-35 |
| NACA2 | +1.27 | 5.45 | 13.72 | 6.5e-39 | 2.0e-35 |
| PRSS45 | +1.51 | 1.38 | 13.43 | 1.6e-37 | 4.2e-34 |
| DDX6 | -0.56 | 11.79 | -13.11 | 5.4e-36 | 1.3e-32 |
| SNRNP70 | +0.74 | 11.32 | 12.95 | 3.3e-35 | 6.8e-32 |
| OGFOD2 | +0.77 | 8.03 | 12.94 | 3.7e-35 | 6.9e-32 |
| FGD4 | -0.87 | 8.13 | -12.69 | 5.6e-34 | 9.4e-31 |
| CDK10 | +0.87 | 8.90 | 12.66 | 7.6e-34 | 1.2e-30 |
| EXD3 | +0.86 | 7.39 | 12.38 | 1.5e-32 | 2.2e-29 |
| FBXL8 | +1.15 | 5.62 | 12.37 | 1.7e-32 | 2.3e-29 |
| C19orf60 | +1.12 | 8.03 | 12.31 | 3.3e-32 | 4.1e-29 |
| SCAND1 | +1.06 | 9.25 | 12.15 | 1.7e-31 | 2.0e-28 |
| DDX51 | +0.60 | 8.83 | 12.08 | 3.6e-31 | 3.9e-28 |
| NSUN5P1 | +1.17 | 6.41 | 12.07 | 3.8e-31 | 3.9e-28 |
| SPPL2B | +0.70 | 9.50 | 11.99 | 9.0e-31 | 8.8e-28 |
| LRRC37A2 | -0.97 | 6.84 | -11.95 | 1.4e-30 | 1.3e-27 |
Top 100 genes from the all-subtypes contrast, ranked by adjusted p-value. Click column headers to sort. Positive logFC indicates higher expression in AA; negative indicates higher in EA.
LOC90784: Novel Top Hit
LOC90784, an uncharacterized lncRNA on chromosome 19, is the most significant gene (padj=3.99e-45 all subtypes, padj=1.11e-12 BasalMyo). It explains up to 36.7% of expression variance by AFR admixture in BasalMyo — the most ancestry-predictive gene in this dataset.
FAM20C Replicates from RA-QA
FAM20C, upregulated in Arab patients in the RA-QA cohort (T2_S1), replicates in TCGA with the same direction (LFC=+0.25, padj=0.009 all subtypes), confirming it as a cross-cohort ancestry-associated gene.
Pathway Enrichment Analysis
78
Significant Pathways
87.5%
Original Recovered
71
Novel Pathways
+1.97
IFNα NES (Top Novel)
Top GSEA Pathways: BasalMyo AA vs EA
Normalized enrichment scores (NES) for the top 30 GSEA pathways in the BasalMyo AA vs EA contrast. Red = enriched in AA (positive NES), Blue = enriched in EA (negative NES). Stars denote FDR significance: *** < 0.001, ** < 0.01, * < 0.05. IFNα response (NES=+1.97) is the top novel Hallmark finding not in the original paper.
Recovery of Original Paper's 16 Pathways
| Original Pathway | Matched Term | NES | FDR ▲ | Concordant |
|---|---|---|---|---|
| Oxidative phosphorylation | +1.98 | 0.0016 | Yes | |
| Oxidative phosphorylation | +1.99 | 0.0050 | Yes | |
| DNA repair | +1.64 | 0.0060 | Yes | |
| UVB-induced MAPK signaling | -1.68 | 0.0061 | Yes | |
| Oxidative phosphorylation | +1.67 | 0.0065 | Yes | |
| PI3K-Akt mTOR signaling | -1.45 | 0.0374 | Yes | |
| mTOR signaling | -1.45 | 0.0374 | Yes | |
| Estrogen response | -1.63 | 0.0822 | Yes | |
| ERK MAPK signaling | -1.61 | 0.0970 | Yes | |
| EGF signaling | -1.51 | 0.1339 | Yes | |
| ErbB signaling | -1.71 | 0.1948 | Yes | |
| Angiogenesis | -1.41 | 0.1992 | Yes | |
| PI3K-AKT signaling | -1.39 | 0.2122 | Yes | |
| PI3K-Akt mTOR signaling | -1.39 | 0.2122 | Yes | |
| ErbB signaling | -1.39 | 0.2122 | Yes | |
| ErbB signaling | -1.63 | 0.2559 | Yes | |
| PI3K-AKT signaling | -1.15 | 0.2851 | Yes | |
| mTOR signaling | -1.25 | 0.3181 | Yes | |
| AMPK signaling | -1.25 | 0.3181 | Yes | |
| Angiogenesis | -1.45 | 0.3771 | Yes | |
| Estrogen response | -1.37 | 0.4310 | Yes | |
| DNA repair | -1.36 | 0.4426 | No | |
| AMPK signaling | -1.25 | 0.4930 | Yes | |
| ERK MAPK signaling | -1.29 | 0.4987 | Yes | |
| PI3K-Akt mTOR signaling | -1.15 | 0.5326 | Yes | |
| PI3K-AKT signaling | -1.15 | 0.5326 | Yes | |
| DNA repair | +1.13 | 0.5858 | Yes | |
| Angiogenesis | +1.06 | 0.6941 | No | |
| EGF signaling | +1.06 | 0.6941 | No | |
| Estrogen response | +0.98 | 0.7211 | No | |
| Estrogen response | -0.96 | 0.7297 | Yes | |
| mTOR signaling | -0.94 | 0.7796 | Yes | |
| PTEN signaling | -0.90 | 0.7934 | Yes | |
| MAPK up genes | -0.94 | 0.8329 | Yes | |
| ERK MAPK signaling | +0.93 | 0.9016 | No | |
| DNA repair | +0.89 | 0.9364 | Yes | |
| Angiogenesis | -0.60 | 0.9977 | Yes |
Mapping of the original paper's 16 pathways to Hallmark, KEGG, Reactome, and GO-BP gene sets. "Concordant" indicates whether the enrichment direction matches the original finding. 14/16 pathways recovered by keyword, 5 reach FDR<0.05 with 100% direction concordance among significant hits.
Novel Pathways Not in Original Paper (BasalMyo GSEA)
| Pathway | Collection | NES | FDR ▲ | Direction |
|---|---|---|---|---|
| Reactome | -2.17 | <1e-16 | EA > AA | |
| Reactome | -2.17 | <1e-16 | EA > AA | |
| Reactome | -2.17 | <1e-16 | EA > AA | |
| Reactome | -2.17 | <1e-16 | EA > AA | |
| KEGG | -2.13 | <1e-16 | EA > AA | |
| Hallmark | -1.93 | <1e-16 | EA > AA | |
| Hallmark | +1.97 | 9.3e-4 | AA > EA | |
| Reactome | +2.02 | 0.0011 | AA > EA | |
| Reactome | +2.05 | 0.0016 | AA > EA | |
| Reactome | +2.03 | 0.0016 | AA > EA | |
| Reactome | +1.99 | 0.0018 | AA > EA | |
| KEGG | +2.01 | 0.0024 | AA > EA | |
| GO-BP | +2.07 | 0.0037 | AA > EA | |
| Reactome | +1.97 | 0.0038 | AA > EA | |
| GO-BP | +2.04 | 0.0043 | AA > EA | |
| GO-BP | +2.00 | 0.0045 | AA > EA | |
| GO-BP | +2.00 | 0.0045 | AA > EA | |
| Reactome | -2.04 | 0.0047 | EA > AA | |
| GO-BP | +2.00 | 0.0053 | AA > EA | |
| GO-BP | +2.08 | 0.0057 | AA > EA |
GSEA pathways significant in BasalMyo AA vs EA that were not among the original paper's 16 reported pathways. These 30 represent the top novel findings by |NES|, spanning Reactome, GO-BP, KEGG, and Hallmark collections.
IFNα: Top Novel Discovery
Interferon alpha response is the most significantly enriched Hallmark pathway in AA BasalMyo (NES=+1.97, FDR<0.001), alongside IFNγ, TNFα/NF-κB, and p53. This interferon/inflammatory signature was not identified in the original paper and suggests AA basal tumors have a more immunologically ‘hot’ microenvironment.
Direction Concordance
All 5 recoverable significant pathways from the original paper show 100% direction concordance. DNA repair and oxidative phosphorylation enriched in AA; mTORC1 and UV response enriched in EA — exactly as originally reported.
Ancestry Dose-Response Analysis
26
Pathways Sig (26/35)
59.7%
Genes Dose-Responsive
0.208
Top Gene R²
0
Survival Interactions
Pathway Enrichment vs AFR Admixture (Regression Beta)
OLS regression beta for each pathway enrichment score vs continuous AFR admixture proportion (n=880), adjusted for age, stage, and TDA subtype. Red = increases with AFR ancestry, Blue = decreases with AFR ancestry, Gray = not significant (FDR≥0.05). Stars: *** < 0.001, ** < 0.01, * < 0.05. DNA repair has the strongest positive dose-response (R²=0.077). 26/35 pathways are significant.
Immune Cell Scores vs AFR Admixture
TReg is the only immune cell type significantly scaling with AFR admixture (beta=+0.038, padj=1.3e-4). NK CD56bright shows a negative association (padj=0.011). Only 2/23 cell types reach FDR<0.05.
Gene-Level Overlap: Admixture vs Categorical DE
97.6%
Gene overlap between methods
100%
Direction concordance
11,129
Admixture sig
11,424
Categorical DE sig
97.6% of admixture-responsive genes overlap with categorical DE genes, with 100% direction concordance. Both approaches detect the same underlying biology.
Top 50 Admixture-Responsive Genes (All Subtypes)
| Gene | Beta (AFR) | FDR ▲ | Partial R² | Avg Expression |
|---|---|---|---|---|
| CROCCL1 | +1.36 | 1.0e-41 | 0.208 | 7.87 |
| LOC90784 | -1.22 | 2.2e-40 | 0.201 | 8.25 |
| CRYBB2 | +1.59 | 9.7e-38 | 0.189 | 1.44 |
| PRSS45 | +1.98 | 3.5e-37 | 0.186 | 1.33 |
| NACA2 | +1.63 | 1.5e-35 | 0.178 | 5.44 |
| FAM3A | +1.03 | 1.8e-35 | 0.178 | 9.31 |
| HEXDC | +1.38 | 8.4e-35 | 0.174 | 7.93 |
| DDX6 | -0.71 | 3.4e-32 | 0.163 | 11.79 |
| SNRNP70 | +0.95 | 8.2e-32 | 0.161 | 11.30 |
| OGFOD2 | +0.98 | 1.4e-30 | 0.155 | 8.02 |
| NSUN5P1 | +1.56 | 3.8e-30 | 0.153 | 6.39 |
| WASH7P | +1.07 | 2.6e-29 | 0.149 | 9.74 |
| CDK10 | +1.10 | 2.8e-29 | 0.149 | 8.88 |
| EXD3 | +1.12 | 3.2e-29 | 0.149 | 7.37 |
| C19orf60 | +1.44 | 2.9e-28 | 0.144 | 8.02 |
| FGD4 | -1.07 | 4.5e-28 | 0.143 | 8.13 |
| SPPL2B | +0.90 | 1.2e-27 | 0.141 | 9.47 |
| DDX51 | +0.76 | 1.3e-27 | 0.141 | 8.81 |
| SCXB | +1.53 | 1.3e-27 | 0.141 | 1.63 |
| ZNF414 | +0.94 | 1.3e-27 | 0.141 | 7.29 |
Top 50 genes ranked by partial R² from OLS regression of expression vs continuous AFR admixture proportion, adjusted for age, stage, and TDA subtype. CROCCL1 (R²=0.208) is the most admixture-predictive gene.
Dose-Response, Not Binary
Ancestry-associated molecular differences scale linearly with admixture proportion — a graded, dose-dependent phenomenon, not a binary switch. 1,285/1,928 extended gene sets (66.6%) show significant dose-response, supporting a polygenic regulatory model.
AMPK Survival: Suggestive but Not Significant
AMPK shows a suggestive survival interaction in BasalMyo (HR=0.624, p=0.055, padj=0.52), directionally consistent with the original paper’s opposing prognostic effects. However, 0/232 formal interaction tests reach FDR significance, limited by only 19 BasalMyo OS events.
Immune Landscape & Checkpoints
7
Methods Used
6/7 sig
Th2: Most Robust
1/7
TReg: 1/7 Methods
18/27
Checkpoints Sig
Immune Cell Type Concordance Across Methods
Number of deconvolution methods (out of 7: ssGSEA, AUCell, MLM, ULM, z-score, GSEA, consensus) finding significant ancestry differences (FDR<0.05) per cell type. Th2 is the most robust (6/7, 100% EA>AA direction concordance). TReg is significant only in ssGSEA because its 4-gene Bindea signature falls below tmin=5 for other methods.
Checkpoint Gene Expression: Ancestry Effect Sizes
Cohen’s d effect size for ancestry differences in 27 immune checkpoint genes (all subtypes, AA vs EA). Red = higher in AA, Blue = higher in EA, Gray = not significant (FDR≥0.05). Stars: *** < 0.001, ** < 0.01, * < 0.05. OX40 shows the largest effect (d=+0.82, AA>EA). Actionable inhibitory checkpoints PD-1, CTLA-4, LAG-3 are all higher in AA tumors.
Th2, Not TReg
Th2 is the most robustly ancestry-differential immune cell type (6/7 methods, 100% EA>AA direction). TReg — the original paper’s headline immune finding — cannot be validated by alternative methods because its 4-gene Bindea signature falls below standard minimum thresholds. The ssGSEA TReg signal is extremely strong (padj=6.5e-9) but method-singular.
OX40: Largest Checkpoint Effect
OX40 (TNFRSF4) shows the largest ancestry effect among checkpoints (d=+0.82, AA>EA, padj=3.2e-18). All three actionable inhibitory checkpoints (PD-1, CTLA-4, LAG-3) are higher in AA, consistent with the IFNα-enriched immune microenvironment.
Distinct Immunosuppression
AA and EA tumors employ different immunosuppressive mechanisms. AA tumors show higher IDO1/TGFB1 (immune-engaged suppression), while EA tumors show higher CD39/CD73 (adenosine-mediated metabolic suppression). This has direct implications for checkpoint inhibitor selection by ancestry.
Transcription Factor Regulation
574
TFs Tested
293
TFs Significant
d=0.97
PAX7 (Top, d=0.97)
NS
FOXP3 (NS, padj=0.51)
Top Transcription Factors by Ancestry Effect Size
Cohen’s d effect size for TF activity differences (ULM + CollecTRI) between AA and EA tumors (all subtypes). Red = higher in AA, Blue = higher in EA, Gray = not significant (FDR≥0.05). Key mechanistic TFs highlighted: GATA3 (Th2 master regulator, EA>AA), IRF7 (IFNα regulator, AA>EA), FOXP3 (TReg master TF, not significant). Stars: *** < 0.001, ** < 0.01, * < 0.05.
Top 50 Ancestry-Differential TFs (All Subtypes)
| TF | Cohen's d | Direction | FDR | Mean AA | Mean EA |
|---|---|---|---|---|---|
| PAX7 | +0.968 | AA>EA | 1.6e-22 | -0.533 | -0.850 |
| HOXA10 | -0.842 | EA>AA | 1.0e-18 | 1.084 | 1.383 |
| HDAC5 | +0.744 | AA>EA | 5.7e-14 | 1.011 | 0.672 |
| HDAC7 | +0.736 | AA>EA | 3.0e-13 | -1.301 | -1.614 |
| MSX2 | -0.723 | EA>AA | 6.2e-16 | 1.384 | 1.675 |
| FOXP1 | -0.722 | EA>AA | 9.3e-15 | -2.715 | -2.491 |
| TAF1 | -0.716 | EA>AA | 3.0e-13 | 3.569 | 3.842 |
| HOXA9 | -0.715 | EA>AA | 3.0e-13 | 1.802 | 2.144 |
| HEY1 | -0.696 | EA>AA | 1.4e-11 | 1.404 | 1.693 |
| NEUROD2 | +0.693 | AA>EA | 1.2e-11 | -1.434 | -1.637 |
| EHF | +0.673 | AA>EA | 4.2e-12 | 1.391 | 1.071 |
| KDM5C | +0.672 | AA>EA | 1.2e-11 | 2.024 | 1.856 |
| SMARCA1 | +0.653 | AA>EA | 4.1e-12 | 1.664 | 1.422 |
| E2F6 | -0.648 | EA>AA | 7.8e-11 | 0.390 | 0.560 |
| KAT2B | -0.643 | EA>AA | 3.5e-12 | 0.653 | 0.817 |
| MECP2 | +0.639 | AA>EA | 2.2e-11 | -0.242 | -0.590 |
| PTF1A | +0.637 | AA>EA | 2.0e-9 | -3.020 | -3.292 |
| RCOR2 | +0.634 | AA>EA | 1.9e-8 | 2.178 | 2.018 |
| TEAD1 | +0.629 | AA>EA | 3.2e-10 | 1.865 | 1.638 |
| NR2F1 | -0.627 | EA>AA | 1.2e-11 | 1.895 | 2.099 |
GATA3 Explains Th2
GATA3, the master TF for Th2 differentiation, has significantly higher activity in EA tumors (d=-0.47, padj=2.0e-6). This mechanistically explains why Th2 cells are the most robustly EA-enriched immune cell type.
IRF7 Explains IFNα
IRF7, the master regulator of type I interferon signaling, is significantly higher in AA tumors (d=+0.33, padj=1.1e-3), explaining the IFNα pathway enrichment discovered in T2_S3.
FOXP3 Challenges TReg
FOXP3, the canonical TReg master TF, is NOT significantly different between ancestries (padj=0.51). If TReg infiltration truly differed by ancestry, FOXP3 activity should differ too. This is the strongest challenge to the original paper’s TReg finding.
Pathway Network Rewiring
76.0%
Shared Edges
94
Diff Correlations
27
Sig BD Pairs
12
Pattern Switches
Top Differential Pathway Correlations (AA vs EA)
Difference in Spearman correlation (Δρ = ρAA − ρEA) for the top 30 most differentially correlated pathway pairs. Red = Wnt-involved pairs (hub of rewiring), Orange = stronger co-enrichment in AA, Blue = stronger co-enrichment in EA. Stars: *** FDR < 0.001, ** < 0.01, * < 0.05.
AMPK–PI3K Pathway Coupling
0.545
ρ AA
0.555
ρ EA
-0.010
Δρ
0.86
p-value
AMPK–PI3K/AKT correlation is virtually identical across ancestries (ρ ≈ 0.55, p = 0.86), ruling out differential cross-talk as the explanation for the original paper’s claim of opposing AMPK prognostic effects by ancestry.
Breslow-Day Significant Pairs (27)
| Feature A | Feature B | OR AA | OR EA | Pattern AA | Pattern EA | Switch | BD FDR |
|---|---|---|---|---|---|---|---|
| [HM] Estrogen response | [HM] Wnt beta catenin signaling | 0.17 | 1.40 | mut-excl | co-occ | YES | 1.3e-6 |
| [HM] Estrogen response | [HM] KRAS signaling down | 0.34 | 1.90 | mut-excl | co-occ | YES | 1.9e-4 |
| Th2 cells | [HM] Estrogen response | 12.25 | 1.97 | co-occ | co-occ | no | 7.3e-4 |
| [HM] DNA repair | [HM] PI3K Akt mTOR signaling | 0.18 | 0.94 | mut-excl | mut-excl | no | 0.0011 |
| [HM] Wnt beta catenin signaling | [HM] mTORC1 signaling | 2.25 | 0.49 | co-occ | mut-excl | YES | 0.0013 |
| [HM] Estrogen response | [HM] UV response up | 0.15 | 0.71 | mut-excl | mut-excl | no | 0.0023 |
| [HM] G2M checkpoint | [HM] Wnt beta catenin signaling | 2.98 | 0.67 | co-occ | mut-excl | YES | 0.0023 |
| [HM] Estrogen response | [HM] Notch signaling | 0.25 | 1.08 | mut-excl | co-occ | YES | 0.0037 |
| [HM] Mitotic spindle | [HM] Notch signaling | 3.29 | 0.82 | co-occ | mut-excl | YES | 0.0066 |
| [HM] Notch signaling | [IPA] ErbB Signaling | 9.55 | 2.10 | co-occ | co-occ | no | 0.0066 |
| [HM] Oxidative phosphorylation | [HM] PI3K Akt mTOR signaling | 0.17 | 0.70 | mut-excl | mut-excl | no | 0.0066 |
| [HM] E2F targets | [HM] Wnt beta catenin signaling | 2.47 | 0.65 | co-occ | mut-excl | YES | 0.0068 |
| [HM] Myc targets | [HM] PI3K Akt mTOR signaling | 0.58 | 2.23 | mut-excl | co-occ | YES | 0.0068 |
| [HM] Myc targets | [HM] Wnt beta catenin signaling | 2.47 | 0.65 | co-occ | mut-excl | YES | 0.0068 |
| [HM] Mitotic spindle | [HM] Wnt beta catenin signaling | 3.29 | 0.85 | co-occ | mut-excl | YES | 0.0074 |
| [HM] PI3K Akt mTOR signaling | [HM] TGF beta signaling | 5.44 | 1.43 | co-occ | co-occ | no | 0.0142 |
| [HM] PI3K Akt mTOR signaling | [HM] UV response down | 4.42 | 1.22 | co-occ | co-occ | no | 0.0183 |
| Th1 cells | [TPW] Immunogenic Cell Death (ICD) | 8.49 | 2.28 | co-occ | co-occ | no | 0.0308 |
| [TPW] Immunogenic Cell Death (ICD) | aDC | 12.25 | 3.12 | co-occ | co-occ | no | 0.0308 |
| [HM] Mitotic spindle | [TPW] Immunogenic Cell Death (ICD) | 4.90 | 1.43 | co-occ | co-occ | no | 0.0334 |
Wnt: Hub of Rewiring
Wnt/β-catenin is the central hub of ancestry-specific rewiring, showing pattern switches with G2M, mTORC1, E2F, Myc, Mitotic spindle, and Estrogen response. In AA tumors, Wnt drives a coordinated proliferative program; in EA tumors, Wnt is decoupled from cell cycle.
AMPK-PI3K: Not Different
AMPK-PI3K pathway coupling is nearly identical across ancestries (ρₐₐ=0.545, ρᴇᴀ=0.555, p=0.86), ruling out differential cross-talk as the explanation for the original paper’s claim of opposing AMPK prognostic effects by ancestry.
Genomic Architecture
6
Sig Arm Events
3.87
16p Loss OR
0
Sig Gene Mutations
9,839
Genes Tested
Significant Chromosomal Arm Events (FDR < 0.05)
All 6 significant arm-level CNA events are more frequent in AA tumors. Dashed line at OR=1 marks no difference. Red = arm loss, Blue = arm gain. Stars: *** padj < 0.001, ** < 0.01, * < 0.05.
Continuous Genomic Features (All Subtypes)
Median values for AA vs EA (sample sizes vary by feature: 161–179 AA, 761–795 EA). FGA is higher in AA (p=0.011, borderline after FDR). Aneuploidy is significantly higher in AA BasalMyo (median 17 vs 14, p=0.008).
Key Driver Mutation Frequencies
Top nominally significant genes. PIK3CA/CDH1 lower in AA (luminal-associated), TP53 higher in AA (basal-associated). FBXW7 novel (OR=4.92). None survive FDR correction.
BasalMyo Aneuploidy: Intrinsic Ancestry Effect
17.0
Median AA (n=52)
14.0
Median EA (n=101)
0.008
p-value
r = −0.26
Rank-biserial
Within BasalMyo (subtype-controlled), AA tumors have 21% higher aneuploidy scores than EA. This is the only genomic feature with a significant within-subtype difference, confirming that the overall FGA signal is not purely driven by subtype composition.
Top 20 Nominally Differentially Mutated Genes (All Subtypes)
9,839 genes tested, 0 at FDR < 0.1, 70 nominally significant (p < 0.05). All adjusted p-values = 1.0.
| Gene | Freq AA | Freq EA | Odds Ratio | p-value | Driver |
|---|---|---|---|---|---|
| PIK3CA | 21.1% | 36.5% | 0.47 | 1.6e-4 | driver |
| CDH1 | 5.0% | 14.6% | 0.31 | 4.4e-4 | driver |
| TP53 | 46.0% | 31.1% | 1.88 | 4.6e-4 | driver |
| FBXW7 | 5.0% | 1.1% | 4.92 | 0.0027 | |
| KCTD8 | 1.9% | 0.0% | ∞ | 0.0052 | |
| CYP27A1 | 1.9% | 0.0% | ∞ | 0.0052 | |
| RBBP9 | 1.9% | 0.0% | ∞ | 0.0052 | |
| ANK3 | 0.0% | 3.5% | 0.00 | 0.0085 | |
| RPAP2 | 2.5% | 0.3% | 9.67 | 0.0101 | |
| NAPA | 2.5% | 0.3% | 9.67 | 0.0101 | |
| EFR3B | 2.5% | 0.3% | 9.67 | 0.0101 | |
| PUM1 | 2.5% | 0.3% | 9.67 | 0.0101 | |
| CBFB | 0.0% | 3.1% | 0.00 | 0.0136 | driver |
| SACS | 0.0% | 3.1% | 0.00 | 0.0136 | |
| VPS13C | 0.0% | 3.3% | 0.00 | 0.0137 | |
| KIF4A | 5.0% | 1.6% | 3.26 | 0.0138 | |
| OTOA | 3.1% | 0.7% | 4.85 | 0.0182 | |
| LCMT2 | 3.1% | 0.7% | 4.85 | 0.0182 | |
| KLHL28 | 1.9% | 0.1% | 14.43 | 0.0183 | |
| ZFHX2 | 1.9% | 0.1% | 14.43 | 0.0183 |
Showing 20 genes (161 AA vs 761 EA). Blue gene names = known cancer drivers. Higher frequency highlighted in the ancestry column with higher value.
CNA-Driven, Not Mutation-Driven
Ancestry genomic differences in breast cancer are primarily copy number-driven (6 significant arm events, FGA higher in AA) rather than mutation-driven (0/9,839 genes at FDR<0.1). This explains why the original paper’s mutation analysis (script C_36) was never shown.
16p Loss: Top CNA Event
16p loss is the most ancestry-differential CNA event (OR=3.87, padj=6.5e-4, AA>EA). This arm harbors PALB2 (DNA repair) and CIITA (antigen presentation), potentially linking CNA to both the DNA repair pathway activation and immune phenotype differences observed in AA tumors.
Survival & Prognosis
HR=1.05 p=0.84
Total Effect (Null)
1.2% (2/167)
Signature Overlap
0.652
EA C-index
37–48%
TF SHAP Share
HR=1.05
p = 0.837
Total ancestry effect on survival is null
After adjusting for subtype, age, and stage, African ancestry does not confer a survival advantage or disadvantage (OS endpoint, n=957, 137 events). Breast cancer survival disparities are largely explained by clinical factors, not ancestry-specific molecular features.
SHAP Feature Importance: Ancestry-Specific Survival Models
Top 20 features by mean |SHAP| value for each ancestry-specific CoxPH model. Colors indicate feature group.
AA model is dominated by PI3K-Akt-mTOR (SHAP=0.510), while EA model spreads importance across TF and checkpoint features. NT5E (CD73) is the top EA feature — consistent with the adenosine immunosuppression pathway identified in the checkpoint analysis.
Feature Group SHAP Contributions
TF activity contributes 37–48% of SHAP importance across all models, peaking in BasalMyo (47.6%). This layer was entirely absent from the original paper.
Causal Mediation: Ancestry → Survival
OS endpoint (n=957, 137 events). TReg is the sole significant mediator (indirect = −0.089, p=0.030) via suppression (negative indirect effect despite null total). PD-1 is marginal (p=0.064). AMPK shows zero mediation (p=0.918). Green = p<0.05, Yellow = p<0.10.
Prognostic Signature Divergence
82
EA-only genes
85
AA-only genes
2
Shared genes
1.2%
Overlap
GNG4, RPS6KA6
Shared gene names
Of 165 total unique prognostic genes, only 2 (1.2%) are shared between EA and AA signatures. This is the most extreme prognostic divergence observed at any molecular level.
EA Prognostic Signature (82 genes)
C-index = 0.652 (n=726, 101 events). Top 30 of 82 genes shown.
| Gene | Coefficient | Hazard Ratio |
|---|---|---|
| PIGR | -0.1451 | 0.865 |
| L1CAM | +0.1446 | 1.156 |
| LOC100130148 | -0.1400 | 0.869 |
| CLEC3A | +0.1286 | 1.137 |
| QPRT | +0.1263 | 1.135 |
| IGFBPL1 | -0.1220 | 0.885 |
| KLRB1 | -0.1177 | 0.889 |
| TNFRSF18 | -0.1171 | 0.889 |
| PCDHGB5 | +0.1064 | 1.112 |
| PCDHGA2 | +0.1036 | 1.109 |
| CD163L1 | +0.0986 | 1.104 |
| C6orf141 | -0.0958 | 0.909 |
| LOC647859 | +0.0943 | 1.099 |
| SULT4A1 | -0.0899 | 0.914 |
| CYP4B1 | +0.0841 | 1.088 |
| DAPL1 | -0.0768 | 0.926 |
| CACNA1H | +0.0735 | 1.076 |
| NFE2 | -0.0719 | 0.931 |
| CNNM1 | -0.0709 | 0.932 |
| C11orf70 | +0.0697 | 1.072 |
1–20 of 30
AA Prognostic Signature (85 genes)
C-index = 0.408 (n=106, 13 events). Top 30 of 85 genes shown.
| Gene | Coefficient | Hazard Ratio |
|---|---|---|
| EN2 | +0.0725 | 1.075 |
| PEG3 | +0.0609 | 1.063 |
| TMSB15A | +0.0597 | 1.061 |
| ZNF385B | -0.0589 | 0.943 |
| KLHDC7B | -0.0527 | 0.949 |
| CCNA1 | -0.0512 | 0.950 |
| ALOX15 | +0.0462 | 1.047 |
| ZNF215 | -0.0400 | 0.961 |
| NOTUM | +0.0397 | 1.041 |
| CRISPLD1 | +0.0375 | 1.038 |
| PNMA2 | -0.0364 | 0.964 |
| CHRM1 | +0.0356 | 1.036 |
| PCSK6 | -0.0349 | 0.966 |
| PXDNL | +0.0347 | 1.035 |
| SEMA3E | -0.0335 | 0.967 |
| ANKRD43 | -0.0335 | 0.967 |
| MCTP2 | -0.0329 | 0.968 |
| DNAH11 | +0.0321 | 1.033 |
| TRPA1 | -0.0319 | 0.969 |
| ARNTL2 | -0.0311 | 0.969 |
1–20 of 30
Null Total Effect
The total ancestry effect on survival is null (HR=1.046, p=0.837) after adjusting for subtype, age, and stage. Breast cancer survival disparities between ancestries are largely explained by subtype composition, age, and stage rather than ancestry-specific molecular features.
98.8% Prognostic Divergence
EA (82 genes) and AA (85 genes) prognostic signatures share only 2 genes (GNG4, RPS6KA6) — 1.2% overlap. This is the most extreme prognostic divergence observed at any molecular level, arguing strongly for ancestry-specific molecular tests.
TF Activity: Most Informative Layer
Transcription factor activity features contribute 37–48% of total SHAP importance across all survival models — exceeding pathways, immune cells, checkpoints, and genomic features. This was undetectable in the original paper which lacked TF features.
Evidence Synthesis & Clinical Implications
82.3% (93/113)
Features Sig ≥1 Track
32
Features Sig ≥3 Tracks
10
Clinical Hypotheses
9
FDA-Approved Drugs
Cross-Track Evidence Matrix
54 features across 12 analysis tracks. Cell color indicates significance level. Features colored by type: Pathway, Transcription Factor, Immune Cell, Checkpoint, CNA Arm Event, Genomic Summary.
82.3% of features (93/113 examined) are significant in at least one analysis track. 10 features achieve 4-track convergence (including DNA repair, oxidative phosphorylation, mTORC1, Myc targets, and p53), followed by 22 features with 3-track support including key TFs (IRF7, NFKB1, AR) and TReg.
Relation to Original Paper
7
Extend (47%)
7
Orthogonal (47%)
1
Contradict (7%)
Novel Findings (15)
| # | Finding | Tracks | Relation to Paper | Confidence |
|---|---|---|---|---|
| 1 | IFNα response top enriched in AA BasalMyo (NES=+1.97) | 3 | Extends | High |
| 2 | Wnt/β-catenin is hub of ancestry-specific rewiring (12 pattern switches) | 3 | Orthogonal | High |
| 3 | Th2 most robust immune cell type (6/7 methods), not TReg | 3 | Extends | High |
| 4 | TReg challenged: FOXP3 NS (padj=0.51), 1/7 methods only | 3 | Contradicts | Medium |
| 5 | OX40 largest checkpoint effect (d=+0.82, AA>EA) | 3 | Orthogonal | High |
| 6 | Distinct immunosuppression: IDO1/TGFB1 (AA) vs CD39/CD73 (EA) | 3 | Orthogonal | Medium |
| 7 | TF activity most prognostically informative (37–48% SHAP) | 3 | Orthogonal | Medium |
| 8 | Prognostic signatures 1.2% overlap (2/167 genes) | 3 | Extends | Medium |
| 9 | 16p loss top CNA event (OR=3.87, PALB2/CIITA locus) | 2 | Orthogonal | Medium |
| 10 | Ancestry genomic differences CNA-driven, not mutation-driven | 2 | Orthogonal | High |
| 11 | Total ancestry-survival effect is null (HR=1.046, p=0.837) | 3 | Extends | High |
| 12 | AMPK shows zero mediation (p=0.918): modification ≠ mediation | 4 | Extends | High |
| 13 | 59.7% of genes show admixture dose-response | 3 | Extends | High |
| 14 | PI3K-Akt-mTOR dominates AA prognosis (SHAP=0.510) | 3 | Extends | Medium |
| 15 | PAX7 novel top TF (d=+0.97, AA>EA, no prior BC link) | 2 | Orthogonal | Medium |
Confidence: High = multi-track convergence with large effects; Medium = significant but fewer supporting tracks or smaller effects.
Drug-Target Mappings (20)
9 FDA-approved and 11 investigational drugs mapped to ancestry-differential targets. Hover over rows for mechanism details.
| Drug | Target | Class | Status | Direction |
|---|---|---|---|---|
| Pembrolizumab | PDCD1 (PD-1) | Anti-PD-1 mAb | FDA-Approved | AA>EA |
| Nivolumab | PDCD1 (PD-1) | Anti-PD-1 mAb | FDA-Approved | AA>EA |
| Ipilimumab | CTLA4 (CTLA-4) | Anti-CTLA-4 mAb | FDA-Approved | AA>EA |
| Relatlimab + Nivolumab (Opdualag) | LAG3 (LAG-3) | Anti-LAG-3 + Anti-PD-1 combo | FDA-Approved | AA>EA |
| Ivuxolimab (PF-04518600) | TNFRSF4 (OX40) | OX40 agonist mAb | Phase I/II | AA>EA |
| BMS-986156 | TNFRSF18 (GITR) | GITR agonist mAb | Phase I/II | AA>EA |
| Epacadostat | IDO1 | IDO1 inhibitor | Phase III | AA>EA |
| Galunisertib | TGFB1 (TGF-b1) | TGFbR1 kinase inhibitor | Phase II | AA>EA |
| Oleclumab | NT5E (CD73) | Anti-CD73 mAb | Phase II | EA>AA |
| TTX-030 | ENTPD1 (CD39) | Anti-CD39 mAb | Phase I | EA>AA |
| Ciforadenant (CPI-444) | ADORA2A (A2AR) | A2AR antagonist | Phase I/II | AA>EA |
| Alpelisib | PIK3CA (PI3K pathway) | PI3Kα inhibitor | FDA-Approved | EA>AA (mutation), AA prognostic |
| Capivasertib | AKT1/2/3 (AKT pathway) | AKT inhibitor | FDA-Approved | AA prognostic (SHAP=0.510) |
| Everolimus | MTOR (mTOR pathway) | mTOR inhibitor | FDA-Approved | EA>AA (enrichment), AA prognostic |
| LGK974/WNT974 | WNT pathway (multiple) | PORCN inhibitor | Phase I/II | differentially wired (AA: proliferative coupling) |
| Olaparib | PARP1/2 (DNA repair) | PARP inhibitor | FDA-Approved | AA>EA (DNA repair activation, 16p/PALB2 loss) |
| Talazoparib | PARP1/2 (DNA repair) | PARP inhibitor | FDA-Approved | AA>EA |
| Entinostat | HDAC5/HDAC7 | Class I HDAC inhibitor | Phase III | AA>EA |
| Tucidinostat (Chidamide) | HDAC5/HDAC7 | HDAC inhibitor | Approved | AA>EA |
| JQ1 / OTX015 (BET inhibitors) | MYC | BET bromodomain inhibitor | Phase I/II | AA>EA |
Immunotherapy Optimization
AA BasalMyo tumors display an IFNα-hot/checkpoint-high phenotype (PD-1, CTLA-4, LAG-3, OX40 all AA>EA) predicting enhanced ICI response. Distinct immunosuppressive mechanisms suggest IDO inhibitors for AA and adenosine pathway inhibitors (CD39/CD73) for EA tumors.
Targeted Therapy
PI3K-Akt-mTOR dominates AA prognosis (SHAP=0.510) despite lower PIK3CA mutations, arguing for pathway activity-based rather than mutation-based biomarker selection. 16p loss (PALB2) + DNA repair activation suggest expanded PARP inhibitor eligibility for AA patients.
Precision Medicine Infrastructure
98.8% non-overlapping prognostic signatures + TF activity as most informative feature (37–48% SHAP) argue for ancestry-specific molecular tests. 59.7% gene dose-response argues for continuous admixture as a clinical trial covariate.
Limitations & Conclusions
This re-analysis extends the Roelands et al. 2021 findings across multiple molecular layers, but several limitations constrain interpretation. Each is stated below with its implications for the reported results.
RA-QA Power
The RA-QA cohort (n=24, 7 OS events) is severely underpowered for genome-wide discovery. The minimum detectable HR at 80% power is 9.44. Only 3 DEGs at FDR<0.1, and no survival predictors reach significance. Larger Middle Eastern cohorts are urgently needed.
Cross-Cohort Batch Effects
Strong systematic batch effects between RA-QA and TCGA (PERMANOVA R²=0.180, 55/58 features significant at FDR<0.05, median |Cohen’s d|=2.07). Direct quantitative cross-cohort comparison of enrichment scores is invalid. All within-TCGA analyses are unaffected.
Null Total Survival Effect
The total ancestry effect on survival is null (HR=1.046, p=0.837) after adjusting for subtype, age, and stage. While individual molecular features differ extensively by ancestry, these differences do not aggregate into a net survival disparity. Clinical disparities likely arise from non-molecular factors (access, screening, socioeconomic).
Retrospective Observational Design
All findings are observational and correlational. Clinical hypotheses require prospective validation. The 10 therapeutic hypotheses are research-generating, not treatment recommendations. Missing data: no protein-level validation, no intra-Arab genetic heterogeneity analysis, no socioeconomic confounders, no single-cell validation.
Relation to Original Paper
Of 15 key findings: 7 extend the original paper’s conclusions (47%), 7 are orthogonal (new dimensions not explored), and 1 contradicts (TReg robustness challenged by FOXP3 NS and 1/7 method concordance). 87.5% of original pathways recovered with 100% direction concordance among significant hits.
Conclusions
Genetic ancestry shapes breast cancer biology across every molecular layer tested — transcriptomic (11,424 DEGs), immune (Th2 most robust across 6/7 methods; TReg challenged by FOXP3 non-significance), pathway (78 enriched, IFNa top novel finding), genomic (CNA-driven, not mutation-driven), and prognostic (98.8% non-overlapping survival signatures). These effects follow a continuous dose-response with admixture proportion rather than a binary switch.
Critically, the total ancestry effect on overall survival is null (HR=1.046, p=0.837) after adjusting for subtype, age, and stage. The extensive molecular differences do not aggregate into a net survival disparity — suggesting that clinical outcome disparities arise from non-molecular factors such as access to care, screening practices, and socioeconomic context.
Of the 15 key findings, 7 extend the original paper (47%), 7 are orthogonal (47%), and 1 contradicts (7% — TReg robustness). All 10 therapeutic hypotheses require prospective validation before clinical translation.
This report was generated with the assistance of AI. While every effort has been made to ensure accuracy, AI can make mistakes — please verify key findings against primary data before drawing conclusions.