major rearrangements, rewriting -> first draft from RAN, needs fleshing out, new...
[synmut.git] / synmut.tex
1 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
2 % ARTICLE ABOUT FATE OF SYNONYMOUS MUTATIONS IN HIV
3 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
4 \documentclass[rmp, twocolumn]{revtex4}
5 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
6 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
7
8 \newcommand{\Author}{Fabio~Zanini and Richard~A.~Neher}
9 \newcommand{\Title}{Deleterious synonymous mutations hitch-hike to high frequency in HIV \env~evolution}
10 \newcommand{\Keywords}{{HIV}, {synonymous}, {population genetics}}
11
12
13 \usepackage[english]{babel}
14 \usepackage[utf8x]{inputenc}
15 \usepackage{amsmath,amsfonts,amssymb,eucal,eurosym}
16 \usepackage{color}
17 \usepackage{subfig}
18 \usepackage{graphicx}
19 %\usepackage[font=small, format=hang, labelfont={sf,bf}, figurename=Fig.]{caption}
20 \usepackage{natbib}
21 \usepackage{pslatex}
22 \usepackage[colorlinks,linkcolor=red,citecolor=red]{hyperref}
23 \hypersetup{pdfauthor={\Author}, pdftitle={\Title}, pdfkeywords={\Keywords}}
24 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
25 \graphicspath{{./figures/}}
26 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
27 %\DeclareMathOperator\de{d\!}
28 \newcommand{\comment}[1]{\textit{\textcolor{red}{#1}}}
29 \newcommand{\mut}{\mu}
30 \newcommand{\mfit}{\langle F\rangle}
31 \newcommand{\mexpfit}{\langle e^{F}\rangle}
32 \newcommand{\ox}{r}
33 \newcommand{\co}{\rho}
34 \newcommand{\gt}{g}
35 \newcommand{\locus}{s}
36 \newcommand{\locuspm}{t}
37 \newcommand{\OO}{\mathcal{O}}
38 \newcommand{\env}{\textit{env}}
39 \newcommand{\rev}{\textit{rev}}
40 \newcommand{\FIG}[1]{Fig.~\ref{fig:#1}}
41
42 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
43 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
44 \begin{document}
45 \title{\Title}
46 \author{\Author}
47 \date{\today}
48 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
49
50 \begin{abstract}
51 \noindent
52 Intrapatient HIV evolution is goverened by selection on the protein level in the
53 arms race with the immune system (killer T-cells and antibodies). Synonymous
54 mutations do not have an immunity-related phenotype and are often assumed to be
55 neutral. In this paper, we show that synonymous changes in epitope-rich regions
56 are often deleterious but still reach frequencies of order one. We analyze time
57 series of viral sequences from the V1-C5 part of {\it env} within individual
58 hosts and observe that synonymous derived alleles rarely fix in the
59 viral population. Simulations suggest that such synonymous mutations
60 have a (Malthisuan) selection coefficient of the order of $-0.001$, and that
61 they are brought up to high frequency by linkage to neighbouring beneficial
62 nonsynonymous alleles (genetic draft). As far as the biological causes are
63 concerned, we detect a negative correlation between fixation of an allele and
64 its involvement in evolutionarily conserved RNA stem-loop structures.
65 This phenonenon is not observed in other parts of the HIV genome, in which
66 selective sweeps are less dense and the genetic architecture less constrained.
67 %In absence of antiretroviral treatment, HIV is very successful at producing
68 %mutants that are able to stay undetected by the host immune system for months,
69 %boosting the infection~\citep{richman_rapid_2003, bunnik_autologous_2008,
70 %moore_limited_2009}. The high mutation rate at the core of this process,
71 %however, also generates genetic noise in the background of beneficial alleles.
72 %Because of the limited ourcrossing of HIV, hitchhikers usually stay linked to
73 %the focal allele for a time comparable to its sweeping time, i.e. a few
74 %months~\citep{neher_recombination_2010, batorsky_estimate_2011}. The later fate
75 %of these accessory mutations depends more and more, as they decouple genetically
76 %from the escape allele via recombination, on their own fitness effects. In this
77 %study we show that, in a genetic region particularly dense of selective sweeps,
78 %synonymous hitchhikers tend to revert on a time scale of the order of 500 days,
79 %because of their deleterious fitness effects. In addition, we provide evidence
80 %that the biological origin for this deleterious effects resides, at least
81 %partially, in the disruption of macroevolutionarily conserved RNA secondary
82 %structures, termed ``insulating stems''.
83 \end{abstract}
84
85
86 \maketitle
87
88 \section{Introduction}
89
90 HIV evolves rapidly within a single host during the course of the infection. This evolution is driven by strong selection imposed by the host immune system via killer T cells (CTLs) and neutralizing antibodies (AB)~\citep{pantaleo_immunopathogenesis_1996} and facilitated by the high mutation rate of HIV \citep{mansky}.
91 When the host develops a CTL or AB response against a particular viral epitope, mutations that reduce or prevent recognition of the epitope frequently emerge. Escape mutations in epitopes targeted by CTLs typically evolve early in infection and spread rapidly through the population \citep{McMichael}. Later in infection, the most rapidly evolving part of the HIV genome are the so called variable loops of the envelope proteins that need to avoid recognition by neutralizing anti bodies. Mutation in \env~ spread through the population within a few months (see \figurename~\ref{fig:aft}, solid lines). During chronic infection, the (Malthusian) effect size of this beneficial
92 mutations is of the order of $s_e \sim 0.01$~\citep{neher_recombination_2010}.
93
94 These escape mutations are selected for their effect on the amino acid sequence of the viral proteins. The viral genome, however, needs to meet additional constraints such as efficient processing and translation, nuclear export, and packaging into the viral capsid which operate on the RNA level. Purifying selection beyond the protein sequence is therefore expected, while it seems reasonable that the bulk of positive selection through the immune system should be restricted to amino acid sequences. A couple important RNA elements are well characterized. For example, within \env{} a certain RNA sequence, called \rev{} response element (RRE), is used by HIV to enhance
95 nuclear export of some of its transcripts~\citep{fernandes_hiv-1_2012}. Another
96 well studied case is the interaction between viral reverse transcriptase, viral
97 ssRNA, and the host tRNA$^\text{Lys3}$: the latter is required for priming
98 reverse transcription (RT) and bound by a specifical pseudoknotted RNA structure
99 in the viral 5' untranslated region~\citep{barat_interaction_1991,
100 paillart_vitro_2002}. Recent studies
101 have shown that genetically engineered HIV strains with skewed codon usage bias
102 (CUB) patterns towards more or less abundant tRNAs replicate better or worse,
103 respectively~\citep{ngumbela_quantitative_2008, li_codon-usage-based_2012}.
104
105
106 INFLUENZA PSEUDO VACCINE.
107
108 SYNONYMOUS CONSERVATION. DO WE HAVE A PLOT OF GENOME WIDE CONSERVATION, MAYBE FOR SUPPLEMENT?
109
110 Despite evidence for functional importance of specific RNA sequences, synonymous mutations are commonly used as approximate neutral markers in studies of viral evolution. Neutral markers allow to make inference about the stochastic forces driving evolution \citep{smth}. Here, we characterize the dynamics of synonymous mutations in \env{} and show that a substantial fraction of these mutations are deleterious. The central quantity we investigate is the probability of fixation of a mutation, conditional on its population frequency. Even though the synonymous mutations are deleterious and cannot be used as neutral markers, we show the degree to which they hitch-hike with nearby non-synonymous mutations is very informative. Extending the analysis of fixation probabilities to the non-synonymous mutations, we show that time dependent selection or strong competition of escape mutations inside the same epitope are necessary to explain the observed patterns of fixation and loss.
111
112
113 One simple way to assess the neutrality of synonymous mutations is to look at
114 their level of conservation. Deleterious mutations at functional sites are
115 expected to be absent or rare across the viral population; vice versa, mutant
116 alleles that reach high frequencies are expected to be neutral. If genetic sites
117 are independent, the equilibrium frequency of a deleterious allele with fitness
118 $-s$ is $\mut / |s|$, where $\mut$ is the mutation rate per site per generation;
119 neutral alleles have no equilibrium frequency and can slowly fix via genetic
120 drift~\citep{ewens_mathematical_2004}.
121 t If the focal synonymous mutant is linked
122 to another nonneutral allele, however, its frequency is the result of the
123 combined fitness effects of both sites, and simple conservation-level analyses
124 fail. Since recombination in HIV is known to happen
125 rarely~\citep{neher_recombination_2010, batorsky_estimate_2011}, the genetic
126 context of the synonymous change at hand must be taken into account. Our
127 results underline the importance of the latter scenario for intrapatient HIV
128 evolution.
129
130
131
132
133 \section{Results}
134 A neutral mutation segregating at frequency $\nu$ has a probability $\nu$ to spread through the population and fix, while it is lost with probability $1-\nu$. This is a simple consequence of the fact that exactly one of present $N$ individuals will be the common ancestor of the entire population at a particular locus and this ancestor has a probability $\nu$ of carrying this mutations, see illustration in \FIG{illustration}. Deleterious or beneficial mutations, in contrast, should fix less or more often, respectively. Time series sequence data therefore suggest a simple way to investigate average properties of different classes of mutations.
135
136 \paragraph{Synonymous mutations in \env, C2-V5 are mostly deleterious}
137
138 \FIG{aft} shows time series data of the frequencies of all mutations observed \env, C2-V5, in patient ??\citep{shankarappa_consistent_1999,liu_selection_2006}. Despite many synonymous mutations reaching high frequency, very few fix. This observation in further quantified in panels ? and ?, that stratify the data of ?? patients (see methods) according to the frequency at which different mutations are observed. Considering all mutations in a frequency interval $\nu_0$ at some time $t_i$, we calculate the fraction that is found at frequency 1, frequency 0, or at intermediate frequency at later time points $t_f$. Plotting these fixed, lost, and polymorphic fraction against the time interval $t_f-t_i$, we see that most synonymous mutations segregate for roughly one year and are lost much more frequently than expected. The ultimate probability of loss or fixation is shown as a function of the initial frequency $\nu_0$ in panel ??. In contrast to synonymous mutations, the non-synonymous seem to follow more a less the neutral expectation -- a point to which we will come back below.
139
140
141 \begin{figure}
142 \begin{center}
143 \includegraphics[width=\linewidth]{Shankarappa_allele_freqs_trajectories_syn_nonsynp8}
144 \includegraphics[width=\linewidth]{Shankarappa_fix_loss_dt_times}
145 \caption{Synonymous mutations rarely fix in \env, C2-V5: Panel A shows mutation
146 frequency trajectories observed in patient ?? \cite{shankarappa_consistent_1999}; Nonsynonymous
147 and synonymous mutations are shown as solid and dashed lines, respectively.
148 Colors indicate the position of the site along the C2-V5 region (red to blue) ADD COLORBAR. MAYBE MAKE FIGURE WITH SYNONYMOUS AND NONSYN SEPARATELY.
149 While non-synonymous mutations frequently fix, very few synonymous mutations do even though they are frequently observed at intermediate frequencies. Panel B quantifies time course of loss and fixation of synonymous mutations observed in a frequency interval $\nu_0$. The ultimate fraction of synonymous mutations that fix as a function of intermediate frequency $\nu_0$.
150 }
151 \label{fig:aft}
152 \end{center}
153 \end{figure}
154
155 \citet{bunnik_autologous_2008} present a longitudinal data sets on the entire \env~gene of ?? patients at ?? time points with ??-?? sequences each. Repeating the above analysis separately on the C2-V5 region studied above and the remainder of \env~ reveal strikingly different behavior inside and outside the hyper variable region. Within C2-V5, this data fully confirms the observations made in the data set by \citet{shankarappa_consistent_1999}. In the remainder of \env, however, observed synonymous mutations behave as if they were neutral; see \FIG{fixp}.
156
157 ARE OBSERVED SYNONYMOUS MUTATIONS OUTSIDE C2-V5 NEUTRAL? DOES LOSS/FIX CORRELATE WITH CONSERVATION. CAN WE LOOK AT THE AVERAGE LEVEL OF CONSERVATION STRATIFIED BY MAX FREQ? MAYBE WE COULD HAVE ONE -- COMPLETELY CIRCULAR -- FIGURE SHOWING LOSS/FIX VS CONSERVATION.
158
159 These observations suggest that many of the synonymous mutations in the part of \env~that includes the hyper variable regions are deleterious, while outside this regions only roughly neutral mutations are polymorphic.
160
161
162
163 \begin{figure}
164 \begin{center}
165 \subfloat{\includegraphics[width=0.49\linewidth]{Bunnik2008_fixmid_syn_ShankanonShanka}}
166 \caption{Left panel: fixation probability of derived synonymous alleles is strongly
167 suppressed in C2-V5 versus other parts of the {\it env} gene, and of
168 nonsynonymous ones.
169 Right panel: especially hard is fixation of new alleles in conserved regions flanking the V
170 loops. The black dashed line is the prediction from neutral
171 theory, for comparison purposes. Data from
172 Refs.~\cite{shankarappa_consistent_1999, bunnik_autologous_2008}.}
173 \label{fig:fixp}
174 \end{center}
175 \end{figure}
176
177
178 \paragraph{Synonymous mutations in C2-V5 tend to disrupt conserved RNA stems}
179 One possible {\it a priori} explanation for lack of fixation of synonymous mutations in C2-V5 are secondary structures in the viral RNA. If any RNA secondary structures are relevant for HIV replication,
180 mutations in nucleotides involved in those base pairs are expected to be deleterious and to revert preferentially.
181 Many functionally important secondary structure elements have been characterized, including the \rev{} response element (RRE) to enhance nuclear export of some of its transcripts~\citep{fernandes_hiv-1_2012}. Another well studied case is the interaction between viral reverse transcriptase, viral ssRNA, and the host tRNA$^\text{Lys3}$: the latter is required for priming reverse transcription (RT) and bound by a specifical pseudoknotted RNA structure in the viral 5' untranslated region~\citep{barat_interaction_1991, paillart_vitro_2002}. It has been suggested early on that parts of the viral genome that has the potential to form stems as better conserved that the remainder \citep{forsdyke_reciprocal_1995}.
182
183 Recently, the propensity of nucleotides of the HIV genome to form base pairs has been measured using the SHAPE assay (a biochemical reaction preferentially altering unpaired bases) \citep{watts_architecture_2009}. The SHAPE assay has shown that the variable regions V1 to V5 tend to be unpaired, while the conserved regions between those variable regions form stems. We partition all synonymous alleles observed
184 at intermediate frequencies above 10-15\% depending on their final destiny
185 (fixation or extinction). Subsequently, we align our sequences to the reference
186 NL4-3 strain used in ref.~\citep{watts_architecture_2009} and assign them SHAPE
187 reactivities. As shown in \figurename~\ref{fig:SHAPE} (left panel) in a
188 cumulative histogram, the reactivity of fixed alleles are systematically larger
189 than of alleles that are doomed to extinction. In other words, alleles that are
190 likely to be breaking RNA helices are also more likely to revert and finally be
191 lost from the population. We then split the synonymous mutations in the C2-V5 region further into conserved and variable regions and found that the biggest depression in fixation probability is observed in the conserved stems, while the variable loops show little deviations from the neutral signature; see \FIG{SHAPE}B.
192
193 In addition to RNA secondary structure, we have considered other possible
194 explanations for a fitness effect of synonymous mutations, in particular codon
195 usage bias (CUB). HIV is known to prefer A-rich codons over highly expressed
196 human housekeeping genes~\citep{jenkins_extent_2003}. Moreover, codon-optimized
197 and -pessimized viruses have recently been generated and shown to replicate
198 better or worse than wild type strains,
199 respectively~\citep{li_codon-usage-based_2012, ngumbela_quantitative_2008,
200 coleman_virus_2008}. We do not found, however, evidence for any contribution of
201 CUB to the ultimate fate of synonymous alleles. Several lines of thought support
202 this result. First of all, although codon-optimized HIV seems to perform better
203 {\it in vitro}, the distance in CUB between HIV and human genes is not shrinking
204 at the macroevolutionary level. Second, within a single patient, we do not
205 observe any bias towards more human-like CUB in the synonymous mutations that
206 reach fixation rather than extinction. Third, it is a common phenomenon for
207 retroviruses to use variously different codons from their hosts, and CUB effects
208 on fitness are thought to be so small that divergent nucleotide composition has
209 been suggested as a possible mechanism for viral
210 speciation~\citep{bronson_nucleotide_1994}. Fourth, CUB in the V1-C5 region is
211 not very different from other parts of the HIV genome, whereas the reduced
212 fixation probability is only observed there. In conclusion, although we cannot
213 exclude an effect of CUB on fitness as a general rule, we expect it to be a
214 minor effect in our context.
215
216 \begin{figure}
217 \begin{center}
218 \subfloat{\includegraphics[width=0.49\linewidth]{mixed_Shankarappa_Bunnik2008_Liu_fixation_reactivity_Vandflanking_fromSHAPE}}
219 \subfloat{\includegraphics[width=0.49\linewidth]{Shankarappa_fixmid_syn_V_regions.pdf}}
220 \caption{Watts et al. have measured the reactivity of HIV nucleotides to {\it
221 in vitro} chemical attack and shown that some nucleotides are more likely to
222 be involved in RNA secondary folds. C1-C5 regions, in particular, show
223 conserved stem-loop structures~\citep{watts_architecture_2009}. We show that
224 among all derived alleles in those regions reaching frequencies of order one,
225 there is a negative correlation between fixation and involvement in a base
226 pairing in a RNA stem (left panel). The rest of the genome does not show any
227 correlation (right panel). There might be too few silent polymorphisms in the
228 first place, or the signal might be masked by non-functional RNA
229 structures. Data from Refs.~\cite{shankarappa_consistent_1999,
230 bunnik_autologous_2008, liu_selection_2006}.}
231 \label{fig:SHAPE}
232 \end{center}
233 \end{figure}
234
235
236 \paragraph{Deleterious mutations are brought to high frequency by hitch-hiking}
237 While the observation that some fraction of synonymous mutations is deleterious is not unexpected, it seems odd that we observe them at high population frequency -- at least in some regions of the genome. The region of \env~in which we observe deleterious mutations at high frequency is special in that it undergoes frequent adaptive changes to evade recognition by neutralizing antibodies \cite{Williamson}. Due to the limited amount of recombination in HIV \cite{neher_recombination_2010,batorsky_2011}, deleterious mutations that are linked to adaptive variants can reach high frequency \citep{maynard_smith}.
238
239 The potential for hitch-hiking is already apparent from the allele frequency trajectories in \FIG{aft}, where many mutations appear to change rapidly in frequency as a flock. Deleterious synonymous mutations can be amplified exponentially by selection on linked nonsynonymous sites, a process known as {\it genetic draft}~\citep{neher_genetic_2011}. In order to be advected to high frequency by a linked adaptive mutation, the deleterious effect of the mutation has to be substantially smaller than the adaptive effect. The latter was estimated to be on the order of $s_a = 0.01$ per day. The approximate magnitude of the deleterious effects can be estimated from \FIG{fixtimes}, that shows the distribution of times for synonymous alleles to reach the fix or get lost starting from intermediate frequencies. The typical time to loss is of the order of 500 days. If this loss is driven by the deleterious effect of the mutation, this corresponds to deleterious effects of roughly $0.002$ per day.
240
241 To get a better idea of the range of parameters that are compatible with the observations and our interpretation, we perform computer simulations of evolving viral populations under selection and rare recombination. For this purpose, we use the recently published package FFPopSim, which includes a module dedicated to intra-patient
242 HIV evolution~\citep{zanini_ffpopsim:_2012}. We analyze many combinations of
243 parameters such as population size, recombination rate, selection coefficient
244 and density of escape mutations, deleterious effect of synonymous mutation.
245
246 The main result of the simulations is that genetic draft can indeed bring weakly deleterious mutations to high frequencies and result in a dependence of the fixation probability on initial frequency that is compatible with observations. We quantify the reduction in fixation probability by the area under the diagonal.... Since neutral mutations are much more likely to rise to high frequency than deleterious ones, the majority of the synonymous mutations needs to be slightly deleterious observe a significant reduction of $p_{fix}$. Furthermore, the two crucial parameters that control the fixation probability
247 are the following: (a) the deleterious effects of hitchhikers compared to
248 the beneficial effects of escape mutants, and (b) the density of escape
249 mutations. Intuitively, a higher density of escape mutations (i.e., epitopes)
250 enables a larger degree of genetic draft, because escape mutations start to
251 combine and their effects add up. In \figurename~\ref{fig:simheat} (left panel),
252 we show that this is indeed the case in simulations.
253
254 SHOW THAT THE DEPRESSION WORKS.
255
256 \begin{figure}
257 \begin{center}
258 \subfloat{\includegraphics[width=0.49\linewidth]{fixation_loss_shortgenome_area_ada_frac_del_eff_coi_0_01_nescepi_6_heat.pdf}}
259 \subfloat{\includegraphics[width=0.49\linewidth]{fixation_loss_shortgenome_area_ada_frac_del_eff_coi_0_01_nescepi_6_nonsyn_heat.pdf}}
260 \caption{Simulations on the escape competition scenario show that the density of
261 selective sweeps and the size of the deleterious effects of synonymous
262 mutations are the main driving forces of the phenomenon. A convex fixation
263 probability is recovered, as seen in the data, along the diagonal (left panel):
264 more dense sweeps can support more deleterious linked mutations. The density of
265 sweeps is limited, however, by the nonsynonymous fixation probability, which is
266 quite close to neutrality (right panel). Moreover, strong competition between
267 escape mutants is required, so that several escape mutants are ``found'' by HIV
268 within a few months of antibody production.}
269 \label{fig:simheat}
270 \end{center}
271 \end{figure}
272
273
274 However, if hitch-hiking is driven by non-synonymous mutations that are unconditionally beneficial, we should find that non-synonymous mutations almost always fix once they reach high frequencies -- in contrast with \FIG{fixp} that shows that non-synonymous mutations fix as if they were neutral. We know, however, that non-synonymous variation in the variable regions is driven by positive selection. Inspecting the trajectories of non-synonymous mutations suggest the rapid rise and fall of many alleles. We test two possible such mechanisms that are biologically plausible
275 and could explain the transient rise of non-synonymous mutations: time-dependent selection and
276 within-epitope competition. If the immune system starts recognizes the escape mutant before its fixation, the mutant might cease to be beneficial and disappears despite its quick initial rise in
277 frequency. An example for the fixation probabilities generated by this kind of
278 models is shown in \figurename~\ref{fig:simfixp} (right panel). In support of this idea, \citet{richman_rapid_2003,
279 bunnik_autologous_2008} report antibody responses to escape mutants. These respones are delayed by a few months, roughly matching the average sweep time of an escape mutant. Alternatively, several different escape mutations in the same epitope can arise almost simultaneously and start to spread. Their fitness benefits are not additive, because each of them is essentially sufficient to escape. As a consequence, several mutations rise to high frequency, while the escape with the smallest cost is most likely to eventually fix. In simulations, this kind of epistatic interactions within epitopes is reduces fixation probabilities in simulations (\figurename~\ref{fig:simfixp}, left panel). The emergence of multiple sweeping nonsynonymous mutations in real HIV infections has been shown
280 previously~\citep{moore_limited_2009}.
281
282 \section{Discussion}
283 Despite several known functional roles for RNA secondary structure in the HIV genome, synonymous mutations are often used as approximately neutral markers in evolutionary studies of viruses. We have shown that the majority of synonymous mutations in the conserved regions C2-C5 of the \env~gene are deleterious. Comparison with recent biochemical studies of binding propensity of bases in RNA genome suggest that these mutations are deleterious in part because they disrupt stems in RNA secondary structure. Furthermore, we provide evidence that these mutations are brought to high frequency through linkage to adaptive mutations. The latter mutations are only transiently adaptive, either through a coevolution with the immune system or redundant escape within an epitope.
284
285 Our observations and conclusion rely heavily on longitudinal data in which the dynamics of mutations can be explicitly observed. The fact that deleterious mutations can be brought to high frequencies through hitch-hiking underscores the vigorousness of the coevolution with the immune system. The fact that multiple escape mutations in the same epitope -- as is indeed observed in studies of antibody escape \citep{sdfsd} -- are necessary to explain the patterns of fixation of non-synonymous mutations points towards a large populations size that rapidly discovers adaptive mutations. A similar point has been made recently by \citet{boltz_ultrasensitive_2012} in the context of preexisting drug resistance mutations.
286
287 The observed hitch-hiking highlights the importance of linkage due to infrequent recombination for the evolution of HIV \citep{neher_recombination_2010,batorsky_estimate_2011,joseffson_smth_2011}. The recombination rate has been estimated to be on the order of $\rho = 10^{-5}$ per base and day. It takes roughly $t_{sw} = s^{-1} \log \nu_0$ generations for an adaptive mutation with growth rate $s$ to rise from an initially low frequency $\nu_0\sim \mu$ to frequency one. This implies that a region of length $l = (\rho t_{sw})^{-1} = s/(\rho \log \nu_0)$ remains linked to the adaptive mutation. With $s=0.01$, $l\approx 100$ bases which is consistent with strong linkage between the variable loops and the stems in between. Furthermore, we do not expect much hitch-hiking to extend far beyond the variable regions consistent with the lack of signal out side of C5-V5. In case of much stronger selection -- such as observed during early CTL escape or drug resistance evolution -- the linked region is of course a lot larger.
288
289 The functional significance of the insulating RNA structure stems between the hyper variable loops has been proposed previously~\citep{watts_architecture_2009, sanjuan_interplay_2011}. Our analysis is akin to that in ref.~\citep{sanjuan_interplay_2011} in terms of overall goals, but provides direct evidence that insulating stems are relevant for viral fitness {\it in vivo}. Our analysis is limited by the availability of longitudinal data which requires a focus on the the variable regions of \env. Conserved RNA structures likely exist (and several are known) in different parts of the HIV genome. In absence of repeated adaptive substitutions in the vicinity that cause hitch-hiking, the deleterious synonymous mutations will remain at low frequencies and can only be observed by deep sequencing methods.
290
291 As far as population genetics models are concerned, our study uncovers the
292 subtle balance of evolutionary forces governing intrapatient HIV evolution. The
293 fixation and extinction times and probabilities represent a rich and simple
294 summary statistics to test sequencing data and computer simulation upon, as
295 noted independently in ref.~\citep{strelkowa_clonal_2012} in the context of
296 influenza. Furthermore, our results emphasize the inadequacy of independent-site
297 models of HIV evolution, especially in the light of transient effects on
298 sweeping sites, such as time-dependent selection and within-epitope negative
299 epistasis. Although a final word about which of these mechanisms is more
300 widespread is yet to be spoken, both intuition and biological evidence from the
301 literature support a mixed scenario~\citep{richman_rapid_2003,
302 moore_limited_2009}. Note also that, unlike influenza, HIV does recombine if
303 rarely, hence clonal interference as studied in
304 ref.~\citep{strelkowa_clonal_2012} is only a short-term effect. In conclusion,
305 we regard two consequences of this state of affairs as particularly relevant for
306 clinical purposes. On the one hand, the intervention of the host immune system
307 appears, if late, effective in fighting escape mutants, so that an active
308 stimulation of the host immune systems towards a more prompt response might be a
309 viable treatment route. On the other hand, if HIV is indeed able to generate
310 several escape mutants at the same time, as both data and calculations seem to
311 indicate, an early response against some of them might not suffice to control
312 the viral load.
313
314 \section{Methods}
315 \comment{to be written\dots}
316 \section*{Acknowledgements}
317 \comment{to be written\dots}
318
319
320 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
321 \bibliographystyle{natbib}
322 \bibliography{bib}
323 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
324 \end{document}
325 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
326