Anatomically modern humans (AMHs) exhibit a suite of craniofacial and prosocial characteristics that are reminiscent of traits distinguishing domesticated species from their wild counterparts (1–3). This has led to the formulation of a self-domestication hypothesis according to which modern humans (3) went through a domestication process in the course of their evolution. Recent evidence, along with the well-warranted distinction between domestication and selective breeding (4), is also extending this notion to other species that might have undergone a self-domestication phase, such as cats, dogs, and bonobos (3). Thus, as self-domestication represents a special case of domestication, the most parsimonious hypothesis must posit the same core mechanisms to underlie both. For this reason, the self-domestication hypothesis also entails the prediction that key aspects of modern humans’ anatomy and cognition can be illuminated by studies of the so-called “domestication syndrome,” the core set of domestication-related traits that was recently proposed to result from mild neural crest (NC) deficits (5). However, both the neurocristopathic basis of domestication and its extension to the evolution of AMHs remain to be tested experimentally.
Williams-Beuren syndrome [WBS; OMIM (Online Mendelian Inheritance in Man) 194050] and Williams-Beuren region duplication syndrome (7dupASD; OMIM 609757), caused respectively by the hemideletion or hemiduplication of 28 genes at the 7q11.23 region [WBS critical region (WBSCR)], represent a paradigmatic pair of neurodevelopmental conditions whose NC-related craniofacial dysmorphisms and cognitive/behavioral traits (6, 7) bear directly on domestication-related traits relevant for AMHs (facial reduction and retraction, pronounced friendliness, and reduced reactive aggression) (fig. S1A). Structural variants in WBS genes, for example in the case of GTF2I and its paralogs, have been shown to underlie stereotypical hypersociability in domestic dogs and foxes (8, 9).
Among the WBSCR genes, we focus here on the chromatin regulator BAZ1B (also known as Williams syndrome transcription factor, WSTF), on the basis of the following lines of evidence that implicate it in domestication-relevant craniofacial features: (i) its established role in NC maintenance and migration in Xenopus laevis and the craniofacial defects observed in knockout mice (10, 11); (ii) the observation that its expression is affected by domestication-related events in canids (12); (iii) the first formulation of the neurocristopathic hypothesis of domestication, which included BAZ1B among the genes influencing NC development (5); (iv) the most comprehensive studies focusing on regions of the modern human genome associated with selective sweep signals compared to Neanderthals/Denisovans (hereafter “archaics”) (13, 14), one of which specifically included BAZ1B within the detected portions of the WBSCR; and (v) the thus far most detailed study systematically exploring high-frequency (HF) (>90%) changes in modern humans for which archaic humans carry the ancestral state, which found BAZ1B enriched for mutations in modern humans (most of which fall in the regulatory regions of the gene) (15).
Our previous work had established the largest cohort of 7q11.23 patient-derived induced pluripotent stem cell (iPSC) lines and revealed major disease-relevant transcriptional dysregulation that was already apparent at the pluripotent state and became further exacerbated upon differentiation (16). Here, we first harness this resource to dissect the impact of BAZ1B dosage on the NC of patients with WBS and 7dupASD, both in terms of function (i.e., NC migration and induction) and of transcriptional and chromatin dysregulation, thereby defining the BAZ1B dosage–dependent circuits controlling the NC. Next, we apply these experimentally determined BAZ1B-dependent circuits underlying craniofacial morphogenesis to interrogate the evidence from paleogenomic analyses, which were thus far only of a correlative nature. We find major convergence between the BAZ1B control and the genes harboring regulatory changes in the modern human lineage. Together, the definition of the role of BAZ1B dosage in craniofacial neurocristopathy and its application to domestication-relevant paleogenomics demonstrate a major contribution of BAZ1B to the modern human face and offer experimental validation for the prediction at the heart of NC-based accounts of (self-) domestication: that the modern human face acquired its shape as an instance of mild neurocristopathy.
Establishment and validation of an extensive cohort of patient-specific BAZ1B-interfered NC stem cell lines
To dissect the role of BAZ1B in the craniofacial dysmorphisms that characterize WBS and 7dupASD, we started from our previous characterization of WBS patient– and 7dupASD patient–specific iPSC lines and differentiated derivatives (16) and selected a cohort of 11 NC stem cell (NCSC) lines (four from patients with WBS, three from patients with 7dupASD, and four from control individuals), which also represent the largest cohort of patient-specific NCSCs described so far. Given the centrality of the cranial NC for the development of the face, we first validated the cranial identity of our NCSC cohort by transcriptomic profiling through a manually curated gene expression signature (fig. S2A), confirming their suitability for the study of craniofacial dysregulations. We then knocked down BAZ1B via RNA interference in all lines across the three genetic conditions, including also NCSCs derived from a particularly informative patient with atypical WBS (hereafter atWBS) bearing a partial deletion of the region that spares BAZ1B and six additional genes (Fig. 1A) (17). To establish a high-resolution gradient of BAZ1B dosages, we selected two distinct short hairpin RNA (shRNA) against BAZ1B (i.e., sh1 and sh2) along with a scrambled shRNA sequence (hereafter scr) as negative control, for a total of 32 NCSC lines. Knockdown (KD) efficiency was evaluated at the RNA level by quantitative polymerase chain reaction (qPCR) (Fig. 1B and fig. S1C), confirming the attainment of the desired gradient with an overall reduction of about 40% for sh1 and 70% for sh2, as well as reduction at the protein level, as detected by Western blot (fig. S1E).
BAZ1B dosage imbalance impairs NCSC migration and induction
NCSCs need to migrate to reach specific target regions in the developing embryo and give rise to distinct cell types and tissues, including craniofacial structures that are major areas of change in human evolution. Since BAZ1B KD was shown to affect the migration of the NC in X. laevis and to promote cancer cell invasion in different lung cancer cell lines (10, 18), we hypothesized that the BAZ1B dosage imbalances entailed in the 7q11.23 syndromes could result in a defective regulation of NCSC migration and might underlie the NC-related alterations typical of patients with WBS and 7dupASD. To test this, we compared the migration properties of patient-specific BAZ1B KD NCSC lines (sh2) to their respective control NCSC line (scr) by the well-established wound-healing assay. The 7dupASD NCSC KD lines took longer to fill the wound when compared to the respective control lines (scr), as indicated by images taken at 8 and 16 hours after a gap was created on the plate surface (Fig. 1C and fig. S1F). We instead observed an opposite behavior for the WBS BAZ1B KD lines, which were faster than the respective scr lines in closing the gap (Fig. 1C and fig. S1F). In contrast to the previous observations from X. laevis (10), we also observed a minor delay in NC induction as a consequence of BAZ1B KD (Fig. 1D and fig. S1D), by means of a differentiation protocol based on NC delamination from adherent embryoid bodies (EBs), which recapitulates the initial steps of NC generation (19). In particular, starting from 2 to 3 days after attachment of EBs, we observed a lower number of outgrowing cells in the KD line (Fig. 1D, days 7 and 10), coupled with an evidently higher cell mortality. Cells were eventually able to acquire the typical NC morphology, although lower differentiation efficiency was evident, as shown by images taken at day 12. In addition, the delay in NC formation was associated with a down-regulation of well-established critical regulators of NC migration and maintenance, including NR2F1, NR2F2, TFAP2A, and SOX9 (Fig. 1E). These results show that BAZ1B regulates the developing NC starting from its earliest migratory stages and that the symmetrically opposite 7q11.23 dosages alterations prime NCSCs to symmetrically opposite deficits upon BAZ1B interference. In turn, the central role of the NC in the development of facial morphology allows relating such findings to the symmetrically opposite craniofacial dysmorphisms of the two 7q11.23 syndromes.
BAZ1B interference disrupts key NC-specific transcriptional circuits
Having defined the functional impact of BAZ1B dosage on NC function, we predicted that a main molecular readout of its dosage imbalances would be at the level of transcriptional regulation, given its critical role as transcriptional regulator in different cell and animal models (20–22). To test this hypothesis and gain mechanistic insights into the specific BAZ1B dosage–dependent downstream circuits, we subjected 32 interfered NCSC lines to high-coverage RNA sequencing (RNA-seq) analysis. As shown in fig. S2A, a manually curated signature from an extensive literature review (23–28) validated the cranial identity of our NCSC lines, while clustering by Pearson correlation excluded the presence of any genotype- or hairpin-specific expression change. Confirming our previous observations in the two largest cohorts of iPSC lines (29), a principal component analysis (PCA) corroborated the significant impact of individual genetic backgrounds on transcriptional variability, with most “KD lines” clustering with their respective control “scr line.” This was consistent with the narrow range of experimentally interfered BAZ1B dosages and pointed to a selective BAZ1B dosage–dependent transcriptional vulnerability (fig. S2B).
To dissect it, we thus resorted to a combination of classical pairwise comparative analysis, contrasting shBAZ1B-interfered NCSC lines (sh1 + sh2) with their respective controls (scr), with a complementary regression analysis using BAZ1B expression levels as independent variables, subtracting the contribution of individual genetic backgrounds. This design increases robustness and sensitivity in the identification of genes that, across multiple genetic backgrounds and target gene dosages, might have a different baseline (scr) across individuals while still being robustly dysregulated upon BAZ1B interference.
The two analyses identified a total of 448 genes with false discovery rate (FDR) < 0.1 (1192 with P < 0.01 and FDR < 0.25) whose transcriptional levels followed BAZ1B dosage, in either a direct (202; 539 with P < 0.01 and FDR < 0.25) or an inverse (246; 653 with P < 0.01 and FDR < 0.25) fashion. In addition, genes identified in the regression analysis included around 90% of the differentially expressed genes (DEGs) (27 of 29, FDR < 0.1) found in the comparative analysis (Fig. 2A). Consistent with the differential efficiency of the two short hairpins, we found a globally stronger transcriptional impact for the group of samples targeted by sh2 (fig. S2C) and a milder but nevertheless clearly distinguishable effect of sh1, resulting in particularly informative gradient of dosages over the scr control lines.
Particularly noteworthy among the genes that we found correlated with BAZ1B levels were (i) crucial regulators of cranial NC, further highlighting a convergent BAZ1B dosage–dependent dysregulation of the foundational CUL3-centered regulatory axis orchestrating NC-mediated craniofacial morphogenesis (30), and (ii) genes associated with variation of human facial shape or causative of dysmorphic facial features and mild intellectual disability when mutated (Fig. 2B and table S1).
Gene Ontology (GO) analysis performed on genes directly following BAZ1B levels suggested specific enrichments in biological processes such as histone phosphorylation, chromosome localization, RNA processing, and splicing. Genes inversely following BAZ1B levels were instead enriched in categories particularly relevant for NC and NC-derivative functions, such as cell migration and cardiovascular and skeletal development (Fig. 2C). By querying the OMIM database, we found that several DEGs were associated with genetic disorders whose phenotypes include “mental retardation,” “intellectual disability,” and/or “facial dysmorphisms” (Fig. 2D), underscoring the pertinence of BAZ1B-dependent dysregulation across both the neurocristopathic and cognitive axes.
Last, a master regulator analysis identified candidate regulators of BAZ1B DEGs, including factors involved in enhancer marking [CEBPB, p300, RBBP5, HDAC2 (histone deacetylase 2), KDM1A, and TCF12], promoter activation [TBP (TATA box–binding protein), TAF1 (TBP-associated factor 1), and POL2 (polymerase 2)], and chromatin remodeling (CTCF, RAD21, and YY1) (Fig. 2E and fig. S2D), several of which are themselves causative genes of intellectual disability syndromes with neurocristopathic involvement, as in the case of our recently identified Gabriele–de Vries syndrome caused by YY1 haploinsufficiency (31, 32). Chromatin remodeling was indeed the most prominently enriched group within the overall domain of transcriptional regulation. Two master regulators are particularly noteworthy, as they are themselves regulated by BAZ1B dosage. The first is EGR1 (FDR < 0.1), which is itself among the genes inversely correlated with BAZ1B levels, which is implicated in cranial development (in animal models) (33, 34) and whose promoter has been recently shown to feature a bivalent state in human embryonic cranial NC (23, 35). The second is MXI1, identified as master regulator of genes directly following BAZ1B levels (FDR < 0.001), which is itself found among the genes inversely correlated with BAZ1B and is itself a regulator of BAZ1B, pointing to a cross-talk between the two (fig. S2C). Notably, two differentially expressed targets of MXI1, TGFB2 and NFIB, are also involved in intellectual disability and craniofacial defects (30, 36, 37).
BAZ1B regulates the NC epigenome in a dosage-dependent manner
The transcriptional readout and functional impact of BAZ1B dosage (at the level of NC induction and migration) established its role as a master controller of the NC. We thus predicted, on the basis of its molecular function, that BAZ1B would directly bind to key NC target genes and that for some of these, the binding would be dosage sensitive. These genes would be, in turn, the most likely direct targets to mediate the dosage-dependent transcriptional and functional phenotypes described above. To test this prediction, we set out to both identify BAZ1B direct targets and characterize their promoter and enhancer states, so as to mechanistically link their transcriptional dysregulation with BAZ1B dosage–dependent chromatin binding. Given the absence of chromatin immunoprecipitation (ChIP)–grade BAZ1B antibodies, to carry out our ChIP coupled with sequencing (ChIP-seq) on scr and KD lines, we first designed a tagging strategy to establish, by CRISPR-Cas9 editing, a series of in-frame 3xFLAG endogenously tagged BAZ1B alleles in representative iPSCs of the four genotypes (Fig. 3A and fig. S3, A and B). These were then differentiated to NCSCs (fig. S3C) and subjected to ChIP-seq with anti-FLAG antibody, enabling a faithful characterization of BAZ1B genome-wide occupancy across dosages (one tagged allele in WBS, two tagged alleles in atWBS and CTL, and two tagged alleles in the context of 1.5-fold dosages in 7dupASD).
PCA shows a clear separation of the samples by BAZ1B copy number, with CTL and atWBS samples clustering more closely and WBS and 7dupASD samples clustering at opposing positions (Fig. 3B). To call NC-specific enhancer regions and promoter-enhancer associations, we exploited for chromatin annotation the unprecedented resolution afforded by the patients’ cohort with its underlying variability and proceeded to (i) select chromosomal regions featuring H3K4me1 and H3K27ac in at least two individuals; (ii) exclude regions marked with H3K4me3 in at least two individuals;(iii) eliminate regions bearing a transcription start site (TSS); and (iv) associate each putative enhancer to the closest TSS, identifying a total of 30,8470 putative enhancer regions. Notably, BAZ1B binds 75% of its targets at their enhancer regions (6747 genes), with the remaining 2297 targets bound at promoters (Fig. 3C). In addition, 40% of genes expressed in NC are bound by BAZ1B, either exclusively at enhancers (27.4%) or exclusively at promoters (3.5%) or at both regions (9%). This highlights its pervasiveness within the NC epigenome (Fig. 3C) and is also reflected in the key functional enrichments observed for the BAZ1B direct targets that are also expressed and that include “axon guidance,” “tube development,” “dendrite development,” “outflow tract morphogenesis,” “odontogenesis,” “wound healing,” and “endochondral bone morphogenesis” (Fig. 3D). Many of the phenomena captured by these GO categories (e.g., odontogenesis and endochondral bone morphogenesis) are linked to recent changes in the bone structure of modern (versus archaic) humans, with Homo sapiens having characteristically smaller teeth than its extinct relatives.
Last and consistent with the enrichments in NC-defining categories uncovered above, the analysis of BAZ1B bound regions revealed major convergence with the binding motifs of critical NC regulators, including two motifs similar to those of TFAP2A and NEUROG2, and one equally associated to TAL1, TCF12, AP4, and ASCL1 (Fig. 3E and text S1A). Thus, BAZ1B binding regions are enriched for target sites of major regulators of NC and its neural derivatives (38, 39), among which TFAP2A stands out given its core role in neural border formation and NC induction and differentiation (40) through the binding and stabilization of NC-specific enhancers, in concert with NR2F1, NR2F2, and EP300 (41).
Last, we identified 81 regions that are quantitatively bound by BAZ1B depending on its copy number (FDR < 0.1) (Fig. 3F), 153 regions differentially bound concordantly in WBS and 7dupASD compared to control and atWBS samples (FDR < 0.1) (fig. S4A), and 176 and 25 regions differentially bound preferentially in WBS (fig. S4B) and 7dupASD (FDR < 0.1) (fig. S4C), respectively.
Given the prominence of its binding to distal regulatory regions, we then set out to define the BAZ1B dosage–dependent impact on NCSC-specific enhancers by integrating H3K27ac, H3K4me1, H3K27me3, and H3K4me3 profiles. We thus performed a regression analysis on BAZ1B levels for the distribution of the three histone marks in the aforementioned regions and found H3K27ac to be the most affected, with 7254 genes differentially acetylated at their enhancers, followed by a differential distribution of the H3K4me1 (4048) and H3K27me3 (2136) marks (fig. S4D). This enabled the overlay of epigenomic and transcriptomic profiles, uncovering that among the 1192 DEGs identified in the regression RNA-seq analysis, 21.3% (257 of 1192) are associated to enhancers that are both bound by BAZ1B and differentially H3K27-acetylated in a manner concordant with BAZ1B levels (fig. S4E), with a stronger overlap for genes whose expression is inversely correlated with BAZ1B levels (160 versus 97). The same held for DEGs that have a concordant differential distribution of H3K4me1 mark at enhancers (123 versus 55), underscoring the consistency of the impact of BAZ1B dosage on distal regulation (fig. S4F). In contrast, a lower number of genes (36) showed a concordant differential distribution of the H3K27me3 mark and, at the same time, were bound by BAZ1B at enhancers (fig. S4G), indicating that BAZ1B preferentially affects active chromatin. From this integrative analysis, we could thus lastly identify a core set of 30 bona fide direct targets of BAZ1B, which are genes whose expression tightly follows BAZ1B levels and whose enhancers are bound by BAZ1B and clearly differentially modified (Fig. 3G, fig. S4H, and text S1B). Together, this first dosage-faithful analysis of BAZ1B occupancy in a diverse cohort of human NCSCs establishes its pervasive and mostly distal targeting of the NC-specific epigenome, with a preferential activator role on the critical transcriptional circuits that define NC fate and function.
Intersection with paleogenomic datasets uncovers a key evolutionary role for BAZ1B
Mild NC deficits have been put forth as a unifying explanatory framework for the defining features of the so-called domestication syndrome, with BAZ1B listed among the putative underlying genes because of its previously reported role in the NC of model organisms (5, 10, 11). The recent observation that its expression is affected by domestication-related mobile element insertion methylation in gray wolves (12) further supported its role in domestication, offering an intriguing parallel to the paleogenomic results that had detected BAZ1B within the regions of the modern genome reflective of selective sweeps and found it enriched for putatively regulatory mutations in AMHs (15).
Having defined the molecular circuits through which BAZ1B regulates NC, and since NC changes have been implicated in the domestication syndrome (5), since craniofacial differences correlate with self-domestication (1), and since 7q11.23 dosage-related craniofacial differences in humans relate to the H. sapiens versus Neanderthal comparison (fig. S1A), we set out to test the role of BAZ1B dosage in the differences between modern and archaic humans. For this, we carried out a systematic integrative analysis of the overlaps between our empirically defined BAZ1B dosage–sensitive genes (blue Venn in Fig. 4B) and a combination of uniquely informative datasets highlighting differences between modern humans and archaics (Neanderthals/Denisovans) (represented in Fig. 4A by skulls illustrating the more “gracile” and “juvenile” profile in AMH relative to Neanderthals visible in the overall shape of the neurocranium, reduced prognathism, brow ridges, and nasal projections) (1, 13–15). Specifically, as shown in Fig. 4B, these datasets include (i) genes associated with signals of positive selection in the modern branch compared to archaic lineages (purple Venn) (13, 14); (ii) genes harboring (nearly) fixed mutations in moderns versus archaics (pink Venn); and (iii) genes associated with signals of positive selection in the four paradigmatic domesticated species dog, cat, cattle, and horse (1) (orange Venn), to reveal statistically significant overlaps between them and genes associated with signals of positive selection in the modern branch compared to archaic lineages. In turn, the list of genes harboring (nearly) fixed mutations in moderns versus archaics contains three classes: (i) genes harboring high-frequency changes (15), (ii) genes harboring high-frequency missense mutations (red barplot), and (iii) genes enriched for high-frequency mutations in regulatory regions (green barplot) [data based on (15)] (Fig. 4C). As shown in the barplots, the obviously very limited number of high-quality coverage archaic genomes available results in a much higher number of nearly fixed changes identified in archaics (left/negative side of the plot) versus modern humans (right side) (Fig. 4C), setting a comparatively much higher threshold for the identification of nearly fixed modern changes.
These analyses are visualized in Fig. 4D (and detailed in tables S2 and S3) through a matrix that intersects all BAZ1B dosage–dependent genes (partitioned in the two categories of directly and inversely correlated and ordered across the full range of biological significance and regulatory proximity, from simply DEGs to bona fide direct targets) with the evolutionary changes underlying domestication and self-domestication, yielding the following key insights (color coded for degree of overlap and marked for significance in hypergeometric tests). First, the most significant pattern was obtains at the intersection with the top 10% genes showing an excess of (nearly) fixed mutations in the regulatory regions of modern humans compared to archaics, across both directly and inversely BAZ1B level–dependent genes (table S2). This same category of nearly fixed modern regulatory changes is also the only one that returns a statistically significant overlap with the most stringent core of BAZ1B dosage–dependent targets (i.e., DEGs whose enhancers are both directly bound by BAZ1B and differentially marked upon its decrease), demonstrating that BAZ1B directly controls, in an exquisitely dosage-dependent manner, this coherent and particularly relevant set of genes that underwent regulatory changes in human evolution. Second, the overall strongest overlaps map to the class of genes that are inversely correlated to BAZ1B levels, which we found to be strongly and specifically enriched for head morphogenesis and NC categories (Fig. 2C), thereby confirming craniofacial morphogenesis as the key domain of functionally relevant overlap between BAZ1B dosage and (self-) domestication changes relevant to the evolution of AMHs. Third, despite the spuriously inflated number of apparently fixed mutations in archaics (15), the overall extent of overlap between genes affected by BAZ1B dosage and our modern and archaic sets does not reveal significantly more hits for archaics. Globally, we found consistently more overlapping genes between the BAZ1B targets and the modern human data and even no statistically significant overlap for any list of the archaic-specific mutations when crossed against genes directly correlated to BAZ1B level. We find this noteworthy, given the evidence that the Neanderthal face also displays derived characteristics (42) that could be the result of modifications of genes that could overlap with those highlighted in this work. Last, the (lower) midface emerges as a particularly salient area of functionally relevant overlap (as illustrated in Fig. 4E and detailed in table S3), given the specific genes that our analysis unearthed: (i) COL11A1, one of the few craniofacial genes highlighted across domestication studies (dog, house sparrow, and pig breeds), which lies in a region of the human genome that resisted archaic introgression (13) and is associated with Marshall syndrome; (ii) XYLT1, one of the five genes (along with ACAN, SOX9, COL2A1, and NFIX) that affect lower and midfacial protrusion, are among the top differentially methylated genes compared to archaics and were also highlighted in a recent study on regulatory changes that shaped the modern human facial and vocal anatomy (tables S1 and S3) (43); and (iii) NFIB, which belongs to the same gene family as NFIX and shares some of its functions. In sum, the direct and dosage-sensitive control by BAZ1B of genes that underwent regulatory changes in human evolution and whose altered expression underlies neurocristopathic facial dysmorphisms is consistent with the hypothesis of mild neurocristopathy as the mechanistic core selected in the self-domestication of the modern human face.
As recently reconstructed (3), the idea of human self-domestication dates back, at least in terms of scientific record, to Johann Friedrich Blumenbach at the onset of the 19th century. Following on his seminal account of domestication systematized in Variations of Animals and Plants under Domestication (44), Charles Darwin also considered the analogy between modern humans and domesticated species in The Descent of Man (45), yet his emphasis on controlled breeding as a key aspect of domestication led him to frame domestication and self-domestication as distinct phenomena and thereby leave Blumenbach’s intuition largely undeveloped (46). Since then, the possibility that the anatomical and cognitive-behavioral hallmarks of AMHs could result from an evolutionary process bearing such significant similarities to the domestication of animals as to share the same underlying cause has been refined into the full-fledged self-domestication hypothesis (1, 2). As recently argued (1, 3), convergent lines of evidence also indicate that self-domestication is temporally aligned with the emergence of AMH, although the process may have acquired further momentum with the gradual expansion of our species (1, 3). However, despite spurring considerable interest, the self-domestication hypothesis has thus far failed to marshal conclusive evidence largely because of two factors: (i) the lack of a coherent explanation, even at a theoretical level, of what developmental and genetic mechanisms could underlie domestication in general and (ii) the absence of suitable experimental systems in which those mechanisms could be specifically tested in the case of human self-domestication. The first problem was tackled by the recent proposition of mild NC deficits as a central and unifying functional layer underlying domestication (5). This constituted a major conceptual advance, particularly because it generated the testable hypothesis of an altered NC gene expression program in domesticated species relative to their wild-type ancestors. For humans, given the obvious lack of gene expression data from archaic hominins, we reasoned that this hypothesis could be verified by examining the genetic changes between archaic and modern humans in light of the gene regulatory networks directly inferred from human neurocristopathies. We thus set out to test whether specific human neurodevelopmental disorders, carefully selected on the basis of both craniofacial and cognitive-behavioral traits relevant to domestication, could illuminate the regulatory circuits shaping the modern human face and hence be harnessed for an empirical validation of the self-domestication hypothesis. Specifically, we reasoned that WBS and 7dupASD, through their uniquely informative set of symmetrically opposite phenotypes at the level of face morphology (fig. S1A) and sociality, constituted a paradigmatic test case to probe the heuristic potential of neurodevelopmental disease modeling for the experimental understanding of human evolution. The following key insights confirm the validity of this approach.
First, we identified the 7q11.23 region BAZ1B gene as a master regulator of the modern human face on the basis of a molecular and functional dissection in the thus far largest cohort of WBS patient– and 7dupASD patient–specific NCSCs and across an exhaustive range of BAZ1B dosages. Notably, our cohort also included NCSCs from a patient with rare WBS featuring a much milder WBS gestalt and harboring an atypical, BAZ1B-sparing deletion that served as a particularly informative control, as confirmed by the clustering of atypical NCSC lines with controls when probed for BAZ1B occupancy. In particular, exploiting the fine-grained resolution of BAZ1B dosages recapitulated in our cohort, we could couple classical pairwise comparisons with a more sophisticated regression analysis on BAZ1B levels, thereby revealing major BAZ1B dosage–dependent transcriptional alterations pivoting around clusters of pathways that are crucial for NC development and maintenance, as well as for its downstream skeletal and cardiac outputs.
Second, we repurposed the versatility of CRISPR-Cas9 to generate an allelic series of endogenously tagged BAZ1B across 7q11.23 dosages (including the BAZ1B-sparing atypical patient as uniquely relevant control) to define its dosage-dependent genome-wide occupancy. Taking advantage of previous extensive work on the NCSC chromatin landscape (41, 47–49), we were able to define a pivotal role for BAZ1B in NCSC enhancer regulation, consistent with its preferential binding of distal regulatory regions, and to partition its dosage-dependent regulation into bona fide direct and indirect targets. The overall balance between the numbers of genes up- or down-regulated upon BAZ1B KD—together with the greater overlap, sheer size, and significance of enrichments in chromatin remodeling categories over other domains of transcription regulation—further corroborates the inclusion of BAZ1B among the factors acting upstream of enhancer and promoter modulations to enable or reinforce rather than specify their net outcome. Last, this molecular readout was translated to the functional level with the definition of an impairment in both NCSC migration and outgrowth from EBs upon decrease in BAZ1B, providing the first validation of BAZ1B involvement in key functions of the developing human NC.
Third, our investigation provides the first experimental evidence for the neurocristopathic hypothesis that had been put forth to explain the domestication syndrome and had pointed to BAZ1B as one of the candidates underlying this syndrome (5). Among the key NC hubs affected by BAZ1B dosage, we uncovered three additional critical genes—EDN3, MAGOH, and ZEB2—that had also been predicted in the same model because they are associated with behavioral changes found in domesticates, thereby defining a regulatory hierarchy for this coherent set of genes underlying domestication.
Last, the empirical determination of BAZ1B dosage–sensitive genes in NC models from AMHs with accentuated domestication-relevant traits allowed us to expose, in a functionally relevant manner, the genetic differences between modern versus archaic. This brought to the fore the significant convergence between BAZ1B-dependent circuits and genes harboring regulatory changes in the human lineage, reinforcing the notion that regulatory regions contain some of the most significant changes relevant for the modern lineage. This is also reinforced by the recent identification of AMH-specific hypermethylation in the regulatory region of BAZ1B itself (43).
Last, it is noteworthy that genes implicated in NC development also play significant roles in the establishment of brain circuits that are critical for cognitive processes like language or theory of mind prominently affected in 7q11.23 syndromes. Among the genes downstream of BAZ1B that we uncovered in this study, FOXP2, ROBO1, and ROBO2 have long been implicated in brain wiring processes critical for vocal learning in several species (50, 51), including humans, and will warrant further mechanistic dissection in light of the distinctive linguistic profile of WBS individuals. In conclusion, our findings establish the heuristic power of neurodevelopmental disease modeling for the study of human evolution.
MATERIALS AND METHODS
Ethics approvals were reported in the study that established the original iPSC cohort (16) and also apply to the additional samples included in this study (7dupASD3 and CTL4R).
Fibroblast reprogramming and iPSC culture
WBS1, WBS2, WBS3, WBS4, 7dupASD1, atWBS1, and CTL2 fibroblasts were reprogrammed using the mRNA Reprogramming Kit (Stemgent), while the 7dupASD2 and CTL1R lines were reprogrammed with the microRNA Booster Kit (Stemgent). The CTL3 line was reprogrammed by transfection with the STEMCCA polycistronic lentiviral vector followed by Cre-mediated excision of the integrated polycistron. 7dupASD3 and CTL4R fibroblasts were reprogrammed using the Simplicon RNA Reprogramming Kit (Millipore).
Before differentiation, iPSC lines were cultured on Matrigel hESC-qualified Matrix (BD Biosciences)–coated plates, diluted 1:40 in Dulbecco’s minimum essential medium/F-12, and grown in mTeSR 1 medium (STEMCELL Technologies). They were passaged upon treatment with Accutase (Sigma-Aldrich) and then plated in mTeSR 1 medium supplemented with 5 μM Y-27632 (Sigma-Aldrich).
NCSCs were detached using Accutase and counted, and 1 × 106 cells per experimental condition were fixed in 4% paraformaldehyde and then blocked in 10% bovine serum albumin. Cells were incubated for 1 hour with primary antibodies conjugated to fluorophores (HNK1–fluorescein isothiocyanate and nerve growth factor receptor–Alex Fluor 647; BD Biosciences). Analyses were performed on a FACSCalibur instrument (BD Biosciences), and data were analyzed with FCS express software (Tree Star). Fluorescence-activated cell sorting characterization for 7dupASD3 and CTL4R lines is reported in fig. S1B; for all the other lines, see (16).
Lentiviral vector production and NCSC transfection
BAZ1B KD was performed using validated pLKO.1 TRC vector TRCN0000013338 (referred to as sh1) and TRCN0000013341 (referred to as sh2). A pLKO.1 TRC vector containing a scrambled short hairpin sequence was used as a negative control.
Second generation lentiviral vectors were produced through calcium phosphate transfection of human embryonic kidney 293T cells and ultracentrifugation (2 hours, 20°C, 20,000 rpm).
NCSCs (3 to 4 × 105) were infected upon splitting and then selected by adding puromycin (1 μg/ml) to the medium.
RNA extraction, retrotranscription, and real-time qPCR
RNA was extracted using the RNeasy Micro Plus Kit (QIAGEN) according to the manufacturer’s instructions. Retrotranscribed cDNA was obtained from 0.5 to 1 μg of total RNA using the SuperScript VILO retrotranscription kit (Thermo Fisher Scientific).
Real-time qPCR was performed on a 7500 Fast Real-Time PCR system (Applied Biosystems) using SYBR Green Master Mix (Applied Biosystems) as the detecting reagent. A total cDNA amount corresponding to 15 ng of starting RNA was used for each reaction. Each sample was analyzed in triplicate and normalized to GAPDH. Relative mRNA quantity was calculated by the comparative cycle threshold (Ct) method using the formula 2−∆Ct [BAZ1B, CCTCGCAGTAAGAAAGCAAAC (forward) and ACTCATCCAGCTCCTTTTGAC (reverse); GAPDH, GCACCGTCAAGGCTGAGAAC (forward) and AGGGATCTCGCTCCTGGAA (reverse); NR2F1, AGAAGCTCAAGGCGCTACAC (forward) and GGGTACTGGCTCCTCACGTA (reverse); NR2F2, GCAAGTGGAGAAGCTCAAGG (forward) and GCTTTCCACATGGGCTACAT (reverse); TFAP2A, GCCTCTCGCTCCTCAGCTCC (forward) and CGTTGGCAGCTTTACGTCTCCC (reverse); and SOX9, AGTACCCGCACTTGCACAAC (forward) and GTAATCCGGGTGGTCCTTCT (reverse)].
RNA-seq libraries preparation
Library preparation for RNA-seq was performed according to the TruSeq Total RNA sample preparation protocol (Illumina), starting from 250 ng to 1 μg of total RNA. cDNA library quality was assessed in an Agilent 2100 Bioanalyzer using the High Sensitivity DNA Kit. Libraries were sequenced with the Illumina HiSeq machine at a read length of 50–base pair (bp) paired end and a coverage of 35 million of reads per sample.
Protein extraction and Western blot
NCSCs were lysed in radioimmunoprecipitation assay buffer [10 mM tris (pH 8.0), 1% Triton X-100, 0.1% sodium deoxycholate, 0.1% SDS, 140 mM NaCl, and 1 mM EDTA] supplemented with protease inhibitor cocktail (Sigma-Aldrich) and 0.5 mM phenylmethylsulfonyl fluoride (Sigma-Aldrich) for 1 hour at 4°C.
Protein extracts (30 to 50 μg per sample) were supplemented with NuPAGE LDS sample buffer (Thermo Fisher Scientific) and 50 mM dithiothreitol (Thermo Fisher Scientific) and denatured at 95°C for 3 min. Then, extracts were run on a precast NuPAGE 4 to 12% bis-tris Gel (Thermo Fisher Scientific) in NuPAGE MOPS SDS Running Buffer (Thermo Fisher Scientific) and transferred to a 0.45-μm nitrocellulose membrane (GE Healthcare) for 1 hour at 100 V in a buffer containing 20% absolute ethanol and 10% 0.25 M tris base and 1.9 M glycine. The membranes were blocked in TBST [50 mM tris (pH 7.5), 150 mM NaCl, and 0.1% Tween 20] and 5% milk for 1 hour, incubated with primary antibodies overnight at 4°C and with secondary antibodies for 1 hour at room temperature. Primary [BAZ1B (Abcam) and glyceraldehyde-3-phosphate dehydrogenase (GAPDH; Millipore)] and secondary antibodies were diluted in TBST and 5% milk. Blots were detected with the ECL Prime Western Blotting Detection Reagents (Sigma-Aldrich) and scanned using the ChemiDoc system (Bio-Rad).
Cells (5 × 104 to 7 × 104) were plated in each of the two Matrigel-coated wells of silicone culture-inserts (Ibidi) attached to six-well culture plates. After 24 hours, the insert was removed, medium was changed to remove dead cells, and time lapse was performed for 24 hours at the rate of one image every 10 min at ×10 magnification; each condition was analyzed in duplicate. Images were acquired with the BX61 upright microscope equipped with a motorized stage from Olympus or the Nikon Eclipse Ti inverted microscope equipped with a motorized stage from Nikon and analyzed with ImageJ.
Endogenous BAZ1B tagging via CRISPR-Cas9
iPSCs were pretreated with 10 μM rho kinase inhibitor for 4 hours, and then 2 × 106 cells were electroporated using the Neon system with the Cas9/single-guide RNA ribonucleoprotein complex and the donor plasmid (synthesized by GeneArt). The donor plasmid contained three FLAG tags followed by a self-cleaving peptide (P2A) and a hygromycin resistance (HygroR). The 3xFLAG-P2A-HygroR cassette was flanked by BAZ1B-specific homology arms (5′ HA and 3′ HA) to promote homologous recombination and then subcloned into a bacterial backbone (Fig. 3A).
After 48 hours, iPSC medium was supplemented with hygromycin B (50 μg/μl), and selection medium was maintained for 15 days. Fifteen to 20 clones per iPSC line were then subjected to PCR to (i) evaluate the presence of the cassette and the insertion in the correct genomic locus and (ii) distinguish heterozygously tagged from homozygously tagged clones (fig. S3A). We could isolate a clone with a homozygous integration from the CTL, the atWBS, and the typical WBS but not from the 7dupASD line. In the 7dupASD clone, the FLAG tag was present in two of three copies, as shown by a digital PCR analysis (fig. S3B).
DNA (60 ng) was amplified in a reaction volume containing the following reagents: QuantStudio 3D Digital PCR Master Mix v2 (Thermo Fisher Scientific), Custom TaqMan Copy Number Assays SM 20× FAM labeled (Thermo Fisher Scientific), and TaqMan Copy Number Reference Assay 20× (Thermo Fisher Scientific) VIC labeled (Thermo Fisher Scientific). The mix was loaded on a chip using the QuantStudio 3D Digital PCR Chip Loader. The chips were then loaded on the ProFlex PCR System (Thermo Fisher Scientific), and data were analyzed using the “QuantStudio 3D AnalysisSuite Cloud Software.” The entire process was performed by the qPCR Service at Cogentech, Milano [Custom (FLAG) TaqMan Copy Number Assays: forward primer, TGGACAGTCCAGAGGACGAA; reverse primer, CACCCTTGTCGTCATCGTCTT; and probe, FAMACAGAAGAAGGACTACAAAGACG and TaqMan Copy Number Reference Assay: TERT (VIC) (catalog number 4403316)].
ChIP coupled with sequencing
Approximately 2 × 105 cells were used (~100 μg of chromatin) for histone mark IP, and 1 mg of chromatin was used for BAZ1B-FLAG IP. Cells were fixed with phosphate-buffered saline, containing 1% formaldehyde (Sigma-Aldrich), for 10 min to cross-link proteins and DNA, when the reaction was then stopped by adding 125 mM glycine for 5 min. Cells were lysed with SDS buffer containing 100 mM NaCl, 50 mM tris-HCl (pH 8.0), 5 mM EDTA (pH 8.0), and 10% SDS, at which point chromatin pellets were resuspended in IP buffer containing 1 volume of SDS buffer and 0.5 volume of Triton dilution buffer [100 mM tris-HCl (pH 8.5), 5 mM EDTA (pH 8.0), and 5% Triton X-100]. Chromatin was then sonicated using the S220 Focused-ultrasonicator (Covaris) to generate <300 bp DNA fragments (for histone mark IPs) or the Branson Digital Sonifier to generate 500 to 800 bp DNA fragments (for BAZ1B-FLAG IP).
Sonicated chromatin was incubated overnight at 4°C with primary antibodies [H3K27ac (Abcam), H3K4me1 (Abcam), H3K4me3 (Abcam), H3K27me3 (Cell Signaling Technology), and FLAG (Sigma-Aldrich)] and then for 3 hours with Dynabeads Protein G (Thermo Fisher Scientific). Beads were washed three times with low-salt wash buffer [0.1% SDS, 1% Triton X-100, 2 mM EDTA, 20 mM tris-HCl (pH 8.0), and 150 mM NaCl] and once with high-salt wash buffer [0.1% SDS, 1% Triton X-100, 2 mM EDTA, 20 mM tris-HCl (pH 8.0), and 500 mM NaCl]. Immunocomplexes were eluted in decross-linking buffer (1% SDS and 100 mM NaHCO3) at 65°C for 2 hours. DNA was purified using QIAquick PCR columns (QIAGEN) and quantified with a Qubit dsDNA HS assay kit (Thermo Fisher Scientific). DNA libraries were prepared by the sequencing facility at European Institute of Oncology according to the protocol described by Blecher-Gonen and colleagues (53), and DNA was sequenced on the Illumina HiSeq 2000 platform. For the FLAG ChIP, samples were run in duplicate.
RNA-seq data were quantified using Salmon 0.91 to calculate read counts and transcripts per million in a transcript- and gene-wise fashion, using the quasi-mapping offline algorithm (54) on the GRCh38 (National Center for Biotechnology Information) database. edgeR was used for differential gene expression analysis (DEA), using generalized linear regression methods, to identify pattern of differential expression following two different schemes:
1) A factorial analysis based on the definition of one group of scrambled and one group of KD samples to identify genes dysregulated similarly across short hairpins characterized by different efficiencies.
2) A numerical analysis in which log-normalized [Trimmed Mean of M-values (TMM)] BAZ1B levels, as quantified by RNA-seq, was used as independent variable.
All analyses were performed dropping individual variations (~individual+KD or ~individual+BAZ1B) to account for the genetic background of each individual. In particular, this design is expected to permit the identification of genes, which change expression level upon KD even in situations in which genotype-specific makeups would lead BAZ1B-dependent genes to have unique expression levels in scramble lines. In the factorial analysis, DEGs were identified and characterized by filtering for fold change (FC) > 1.25 and FDR < 0.05 unless explicitly indicated.
To our knowledge, performing a regression analysis at a gene-specific level has never been performed. We were able to do this because of the availability of a large set of samples (11 individuals) and because of the two short hairpins robustly respectively reducing BAZ1B expression levels, respectively by ~40 and ~70% in all individuals lines. To validate the quality of our numerical differential expression analysis, we took advantage of HipSci data (55, 56) and iPSCpoweR tools (29). We took 50 of 105 possible combinations of 13 random individual RNA-seq data from the healthy HipSci cohort, representing both sexes and having at least two technical replicates per individual. Unfortunately, HipSci does not contain at least 13 individuals with three clones per individual. Thus, we performed four alternative DEAs with edgeR (table S4) on the 50 different random combinations of 13 individuals identified (200 DEAs in total, on 22 samples, two clones per individual), using the same model matrix used for the regression analysis (~individual+BAZ1B) and using BAZ1B levels of scramble and sh2 lines. All analyses identified very low number of spurious DEGs (fig. S2E). Thus, we used the “Edg2” pipeline (table S4) because it does not discard genes with higher variability (Edg2 and Edg4 versus Edg1 and Edg3), and it is based on a better suited algorithm (Edg2 versus Edg4). With our model matrix, filtering by P < 0.01 (and FDR < 0.25), using Edg2 on a random HipSci data, we obtained an average of 93.32 DEGs (on average) with a median equal to 43 (table S5). GO enrichments were performed using topGO R package version 2.28.0.
Master regulatory analysis was performed via hypergeometric test by measuring gene set enrichments in lists of transcription factor targets provided by the TFBS tools database (57). Both GO and transcription factor enrichment analyses were performed considering background genes expressed in at least two samples in our NCSC cohort.
ChIP-seq experiments were analyzed both qualitatively and quantitatively. Reads were trimmed with the FASTX-Toolkit (-Q33 -t 20 -l 22), aligned with Bowtie 1.0 (-v 2 -m 1) on the Human hg38 reference genome, and peaks were called using MACS 2.1.1. H3K4me1, H3K27ac, H3K4me3, and H3K27me3 peaks were called with –broad using default parameters and q < 0.05.
Qualitative analysis, including intersection and comparison of bed files, was performed using BedTools version 2.23.
To define enhancer regions, we intersected those marked by H3K4me1 and H3K27ac in at least two samples, discarded regions with H3K4me3 in at least two samples, and discarded regions overlapping with TSS. Motif enrichment was performed by using HOMER v4.10.
Quantification of reads per region was performed with DeepTools 3.0.2. Differential mark deposition was conducted by means of edgeR 3.24.1 inside R 3.3.3. To define mark deposition following BAZ1B levels, we used the same design as for RNA-seq data (~individual+BAZ1B).
To identify BAZ1B bound regions and to avoid losing identification of lowly covered regions, we resorted to (i) aggregation of all sample aligned reads and (ii) peak calling with MACS2 using –extsize 800 and q < 0.25. BAZ1B binding coverage was calculated with DeepTools, with the same parameters used for histone marks, on the identified peak regions. Differentially bound regions were identified with edgeR.
Assembly of archaic and modern human lists
The archaic/modern lists were generated from the material presented in (15). We used high-coverage genotypes for three archaic individuals: one Denisovan (58), one Neanderthal from the Denisova cave in Altai mountains (59), and another Neanderthal from Vindija cave, Croatia (60). The data are publicly available at http://cdna.eva.mpg.de/neandertal/Vindija/VCF/, with the human genome version hg19 as reference. High-frequency (HF) differences were defined as positions where more than 90% of present-day humans carry a derived allele, while at least the Denisovan and one Neanderthal carry the ancestral allele. High-frequency changes in archaics were defined as occurring at less than 1% in present-day humans, while at least two archaic individuals carry the derived allele. The HF lists used here were examined as presented in (15), with the exception of the HF lists in regulatory regions, which were extracted from the same dataset but not presented as such in the original paper.
Supplementary material for this article is available at http://advances.sciencemag.org/cgi/content/full/5/12/eaaw7908/DC1
Fig. S1. BAZ1B KD validation in iPSC-derived NCSCs and evaluation of its impact on NCSC migration.
Fig. S2. BAZ1B KD affects the transcriptome of iPSC-derived NCSCs.
Fig. S3. Generation of BAZ1B-FLAG iPSC lines and differentiation to NCSCs.
Fig. S4. BAZ1B KD induces a significant chromatin remodeling at distal regions.
Text S1A. Detailed description of HOMER motif enrichments performed on BAZ1B ChIP-seq data.
Text S1B. List of key direct targets of BAZ1B involved in neural- and NC-related development and relevant associated literature.
Table S1. Genes relevant for NC and NC-derived features whose expression follows BAZ1B levels.
Table S2A. Significant genes in human evolution.
Table S2B. Regulatory excess in archaic humans, overlap with BAZ1B targets.
Table S2C. Mutation excess in archaic humans, overlap with BAZ1B targets.
Table S2D. Regulatory changes (exclusive) in archaic humans, overlap with BAZ1B targets.
Table S2E. Missense mutations in archaic humans, overlap with BAZ1B targets.
Table S2F. Mutation excess in archaic humans corrected for length, overlap with BAZ1B targets.
Table S2G. Regulatory excess in modern humans, overlap with BAZ1B targets.
Table S2H. Mutation excess in modern humans, overlap with BAZ1B targets.
Table S2I. Regulatory changes (exclusive) in modern humans, overlap with BAZ1B targets.
Table S2J. Missense mutations in modern humans, overlap with BAZ1B targets.
Table S2K. Mutation excess in modern humans corrected for length, overlap with BAZ1B targets.
Table S2L. Genes under positive selection in domesticated animals, overlap with BAZ1B targets.
Table S2M. Genes under positive selection from Peyrégne et al. (13) in modern humans, overlap with BAZ1B targets.
Table S2N. Genes under positive selection from Racimo (14) in modern humans, overlap with BAZ1B targets.
Table S3. Crucial genes identified in the overlap between BAZ1B datasets and archaic versus modern human datasets reported in this study.
Table S4. Alternative differential expression analysis functions tested with iPSCpower to assess the efficacy of our design matrix (~individual+BAZ1B). R code provided.
Table S5. Number of genes differentially expressed following BAZ1B data in our numerical analysis compared to an analysis conducted on randomized HipSci data, using Edg2 function (see table S4).