some changes applied
[synmut.git] / synmut.tex
1 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
3 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
4 \documentclass[rmp, twocolumn]{revtex4}
5 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
6 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
7 \newcommand{\Author}{Fabio~Zanini and Richard~A.~Neher}
8 \newcommand{\Title}{Deleterious synonymous mutations hitch-hike to high frequency in HIV \env~evolution}
9 \newcommand{\Keywords}{{HIV}, {synonymous}, {population genetics}}
10 \usepackage[english]{babel}
11 \usepackage[utf8x]{inputenc}
12 \usepackage{amsmath,amsfonts,amssymb,eucal,eurosym}
13 \usepackage{color}
14 \usepackage{subfig}
15 \usepackage{graphicx}
16 \usepackage{natbib}
17 \usepackage{pslatex}
18 \usepackage[colorlinks,linkcolor=red,citecolor=red]{hyperref}
19 \hypersetup{pdfauthor={\Author}, pdftitle={\Title}, pdfkeywords={\Keywords}}
20 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
21 \graphicspath{{./figures/}}
22 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
23 %\DeclareMathOperator\de{d\!}
24 \newcommand{\comment}[1]{\textit{\textcolor{red}{#1}}}
25 \newcommand{\mut}{\mu}
26 \newcommand{\mfit}{\langle F\rangle}
27 \newcommand{\mexpfit}{\langle e^{F}\rangle}
28 \newcommand{\ox}{r}
29 \newcommand{\co}{\rho}
30 \newcommand{\gt}{g}
31 \newcommand{\locus}{s}
32 \newcommand{\locuspm}{t}
33 \newcommand{\OO}{\mathcal{O}}
34 \newcommand{\env}{\textit{env}}
35 \newcommand{\rev}{\textit{rev}}
36 \newcommand{\FIG}[1]{Fig.~\ref{fig:#1}}
38 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
39 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
40 \begin{document}
41 \title{\Title}
42 \author{\Author}
43 \date{\today}
44 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
46 \begin{abstract}
47 \noindent
48 Intrapatient HIV evolution is goverened by selection on the protein level in the
49 arms race with the immune system (killer T-cells and antibodies). Synonymous
50 mutations do not have an immunity-related phenotype and are often assumed to be
51 neutral. In this paper, we show that synonymous changes in epitope-rich regions
52 are often deleterious but still reach frequencies of order one. We analyze time
53 series of viral sequences from the V1-C5 part of {\it env} within individual
54 hosts and observe that synonymous derived alleles rarely fix in the
55 viral population. Simulations suggest that such synonymous mutations
56 have a (Malthisuan) selection coefficient of the order of $-0.001$, and that
57 they are brought up to high frequency by linkage to neighbouring beneficial
58 nonsynonymous alleles (genetic draft). As far as the biological causes are
59 concerned, we detect a negative correlation between fixation of an allele and
60 its involvement in evolutionarily conserved RNA stem-loop structures.
61 This phenonenon is not observed in other parts of the HIV genome, in which
62 selective sweeps are less dense and the genetic architecture less constrained.
63 \end{abstract}
66 \maketitle
68 \section{Introduction}
70 HIV evolves rapidly within a single host during the course of the infection.
71 This evolution is driven by strong selection imposed by the host immune system
72 via killer T cells (CTLs) and neutralizing antibodies
73 (AB)~\citep{pantaleo_immunopathogenesis_1996} and facilitated by the high
74 mutation rate of HIV~\citep{mansky_lower_1995}. When the host develops a CTL or
75 AB response against a particular viral epitope, mutations that reduce or prevent
76 recognition of the epitope frequently emerge. Escape mutations in epitopes
77 targeted by CTLs typically evolve early in infection and spread rapidly through
78 the population~\citep{mcmichael_immune_2009}. Later in infection, the most
79 rapidly evolving part of the HIV genome are the so called variable loops of the
80 envelope protein gp120, which need to avoid recognition by neutralizing ABs.
81 Mutations in \env~, the gene encoding for gp120, spread through the population
82 within a few months (see \figurename~\ref{fig:aft}, solid lines). During chronic
83 infection, the (Malthusian) effect size of this beneficial mutations is of the
84 order of $s_a \sim 0.01$~\citep{neher_recombination_2010}.
86 These escape mutations are selected for their effect on the amino acid sequence
87 of the viral proteins. The viral genome, however, needs to meet additional
88 constraints such as efficient processing and translation, nuclear export, and
89 packaging into the viral capsid: all these processes operate at the RNA level. A
90 few important RNA elements are well characterized. For example, a certain RNA
91 sequence, called \rev{} response element (RRE), is used by HIV to enhance
92 nuclear export of some of its transcripts~\citep{fernandes_hiv-1_2012}. Another
93 well studied case is the interaction between viral reverse transcriptase, viral
94 ssRNA, and the host tRNA$^\text{Lys3}$: the latter is required for priming
95 reverse transcription (RT) and bound by a specifical pseudoknotted RNA structure
96 in the viral 5' untranslated region~\citep{barat_interaction_1991,
97 paillart_vitro_2002}. Furthermore, recent studies have shown that genetically
98 engineered HIV strains with skewed codon usage bias (CUB) patterns towards more
99 or less abundant tRNAs replicate better or worse,
100 respectively~\citep{ngumbela_quantitative_2008, li_codon-usage-based_2012}.
101 Purifying selection beyond the protein sequence is therefore expected, while it
102 seems reasonable that the bulk of positive selection through the immune system
103 should be restricted to amino acid sequences.
109 Despite evidence for functional importance of specific RNA sequences, synonymous
110 mutations are commonly used as approximate neutral markers in studies of viral
111 evolution. Neutral markers allow to make inference about the stochastic forces
112 driving evolution~\citep{smth}. Here, we characterize the dynamics of synonymous
113 mutations in \env{} and show that a substantial fraction of these mutations are
114 deleterious. The central quantity we investigate is the probability of fixation
115 of a mutation, conditional on its population frequency. Even though the
116 synonymous mutations are deleterious and cannot be used as neutral markers, we
117 show that the degree to which they hitchhike with nearby non-synonymous
118 mutations is very informative; their ability to hitchhike for extended times is
119 itself rooted in the small recombination rate of
120 HIV~\citep{neher_recombination_2010, batorsky_estimate_2011}. Extending the
121 analysis of fixation probabilities to the non-synonymous mutations, we show that
122 time dependent selection or strong competition of escape mutations inside the
123 same epitope are necessary to explain the observed patterns of fixation and
124 loss.
126 %One simple way to assess the neutrality of synonymous mutations is to look at
127 %their level of conservation. Deleterious mutations at functional sites are
128 %expected to be absent or rare across the viral population; vice versa, mutant
129 %alleles that reach high frequencies are expected to be neutral. If genetic sites
130 %are independent, the equilibrium frequency of a deleterious allele with fitness
131 %$-s$ is $\mut / |s|$, where $\mut$ is the mutation rate per site per generation;
132 %neutral alleles have no equilibrium frequency and can slowly fix via genetic
133 %drift~\citep{ewens_mathematical_2004}. This intuitive picture does not hold in
134 %presence of genetic linkage and, in particular, for HIV evolution, because
135 %recombination is rare~\citep{neher_recombination_2010, batorsky_estimate_2011}.
136 %A more likely scenario, at least in escape mutation-rich regions, is the
137 %following: if the focal synonymous mutant is linked to a beneficial allele
138 %nearby, the latter essentially carries the synonymous allele toward high
139 %frequencies for a time on the order of the inverse of the recombination rate.
140 %The two slowly decouple afterwards, and the fitness effect of the synonymous
141 %allele starts to be visible on its own. On the one hand, genetic linkage and
142 %hitchhiking confound the interprentation of conservation levels; on the other,
143 %as we show below, purifying selection can be still observed with the help of
144 %longitudinal data.
146 \section{Results}
148 A neutral mutation segregating at frequency $\nu$ has a probability $\nu$ to
149 spread through the population and fix, while it is lost with probability
150 $1-\nu$. This is a simple consequence of the fact that exactly one of present
151 $N$ individuals will be the common ancestor of the entire population at a
152 particular locus and this ancestor has a probability $\nu$ of carrying this
153 mutations, see illustration in \FIG{fixp}. Deleterious or beneficial
154 mutations, in contrast, should fix less or more often, respectively. Time series
155 sequence data therefore suggest a simple way to investigate average properties
156 of different classes of mutations.
158 \paragraph{Synonymous polymorphisms in \env, C2-V5 are mostly deleterious}
160 \FIG{aft} shows time series data of the frequencies of all mutations observed
161 \env, C2-V5, in patient 8\citep{shankarappa_consistent_1999,liu_selection_2006}.
162 Despite many synonymous mutations reaching high frequency, very few fix. This
163 observation in further quantified in panels \ref{fig:fixp1} and
164 \ref{fig:fixp2}, that stratify the data of
165 7-10 patients (see methods) according to the frequency at which different
166 mutations are observed. Considering all mutations in a frequency interval
167 $\nu_0$ at some time $t_i$, we calculate the fraction that is found at frequency
168 1, frequency 0, or at intermediate frequency at later time points $t_f$.
169 Plotting these fixed, lost, and polymorphic fraction against the time interval
170 $t_f-t_i$, we see that most synonymous mutations segregate for roughly one year
171 and are lost much more frequently than expected. The ultimate probability of
172 loss or fixation is shown as a function of the initial frequency $\nu_0$ in
173 panel ??. In contrast to synonymous mutations, the non-synonymous seem to follow
174 more a less the neutral expectation -- a point to which we will come back below.
177 \begin{figure}
178 \begin{center}
179 \includegraphics[width=\linewidth]{Shankarappa_allele_freqs_trajectories_syn_nonsynp8}
180 \caption{Synonymous mutations rarely fix in \env, C2-V5: mutation frequency
181 trajectories observed in patient 8~\cite{shankarappa_consistent_1999};
182 Nonsynonymous and synonymous mutations are shown as solid and dashed lines,
183 respectively. Colors indicate the position of the site along the C2-V5 region
185 SEPARATELY. While non-synonymous mutations frequently fix, very few synonymous
186 mutations do even though they are frequently observed at intermediate
187 frequencies.}
188 \label{fig:aft}
189 \end{center}
190 \end{figure}
192 \citet{bunnik_autologous_2008} present a longitudinal data sets on the entire
193 \env~gene of 3 patients at $\sim 5$ time points with approximately 5-20
194 sequences each. Repeating the above analysis separately on the C2-V5 region
195 studied above and the remainder of \env~ reveal strikingly different behavior
196 inside and outside the hypervariable region. Within C2-V5, this data fully
197 confirms the observations made in the data set by
198 \citet{shankarappa_consistent_1999}. In the remainder of \env, however, observed
199 synonymous mutations behave as if they were neutral; see \FIG{fixp}.
203 These observations suggest that many of the synonymous mutations in the part of
204 \env~that includes the hypervariable regions are deleterious, while outside
205 this regions mostly roughly neutral mutations are polymorphic.
207 \begin{figure}
208 \begin{center}
209 \subfloat{\includegraphics[width=0.9\linewidth]{Shankarappa_fix_loss_dt_times}
210 \label{fig:fixp1}}\\
211 \subfloat{\includegraphics[width=0.9\linewidth]{Bunnik2008_fixmid_syn_ShankanonShanka}
212 \label{fig:fixp2}}
213 \caption{Left panel: time course of loss and fixation of synonymous mutations
214 observed in a frequency interval $\nu_0$. The ultimate fraction of synonymous
215 mutations that fix as a function of intermediate frequency $\nu_0$ is the
216 fixation probability. Right panel: fixation probability of derived synonymous
217 alleles is strongly suppressed in C2-V5 versus other parts of the {\it env}
218 gene, and of nonsynonymous ones. Data from
219 Refs.~\cite{shankarappa_consistent_1999, bunnik_autologous_2008}.}
220 \label{fig:fixp}
221 \end{center}
222 \end{figure}
225 \paragraph{Synonymous mutations in C2-V5 tend to disrupt conserved RNA stems}
226 One possible {\it a priori} explanation for lack of fixation of synonymous
227 mutations in C2-V5 are secondary structures in the viral RNA. If any RNA
228 secondary structures are relevant for HIV replication, mutations in nucleotides
229 involved in those base pairs are expected to be deleterious and to revert
230 preferentially. Many functionally important secondary structure elements have
231 been characterized, including the \rev{} response element (RRE) to enhance
232 nuclear export of some of its transcripts~\citep{fernandes_hiv-1_2012}. Another
233 well studied case is the interaction between viral reverse transcriptase, viral
234 ssRNA, and the host tRNA$^\text{Lys3}$: the latter is required for priming
235 reverse transcription (RT) and bound by a specifical pseudoknotted RNA structure
236 in the viral 5' untranslated region~\citep{barat_interaction_1991,
237 paillart_vitro_2002}. It has been suggested early on that parts of the viral
238 genome that has the potential to form stems is better conserved that the
239 remainder~\citep{forsdyke_reciprocal_1995}.
241 Recently, the propensity of nucleotides of the HIV genome to form base pairs has
242 been measured using the SHAPE assay (a biochemical reaction preferentially
243 altering unpaired bases)~\citep{watts_architecture_2009}. The SHAPE assay has
244 shown that the variable regions V1 to V5 tend to be unpaired, while the
245 conserved regions between those variable regions form stems. We partition all
246 synonymous alleles observed at intermediate frequencies above 10-15\% depending
247 on their final destiny (fixation or extinction). Subsequently, we align our
248 sequences to the reference NL4-3 strain used in
249 ref.~\citep{watts_architecture_2009} and assign them SHAPE reactivities. As
250 shown in \FIG{SHAPE} (left panel) in a cumulative histogram, the reactivity of
251 fixed alleles are systematically larger than of alleles that are doomed to
252 extinction. In other words, alleles that are likely to be breaking RNA helices
253 are also more likely to revert and finally be lost from the population. We then
254 split the synonymous mutations in the C2-V5 region further into conserved and
255 variable regions and found that the biggest depression in fixation probability
256 is observed in the conserved stems, while the variable loops show little
257 deviations from the neutral signature; see \FIG{SHAPE}B.
259 In addition to RNA secondary structure, we have considered other possible
260 explanations for a fitness effect of synonymous mutations, in particular codon
261 usage bias (CUB). HIV is known to prefer A-rich codons over highly expressed
262 human housekeeping genes~\citep{jenkins_extent_2003}. Moreover, codon-optimized
263 and -pessimized viruses have recently been generated and shown to replicate
264 better or worse than wild type strains,
265 respectively~\citep{li_codon-usage-based_2012, ngumbela_quantitative_2008,
266 coleman_virus_2008}. We do not find, however, evidence for any contribution of
267 CUB to the ultimate fate of synonymous alleles. Several lines of thought support
268 this result. First of all, although codon-optimized HIV seems to perform better
269 {\it in vitro}, the distance in CUB between HIV and human genes is not shrinking
270 at the macroevolutionary level. Second, within a single patient, we do not
271 observe any bias towards more human-like CUB in the synonymous mutations that
272 reach fixation rather than extinction. Third, it is a common phenomenon for
273 retroviruses to use variously different codons from their hosts, and CUB effects
274 on fitness are thought to be so small that divergent nucleotide composition has
275 been suggested as a possible mechanism for viral
276 speciation~\citep{bronson_nucleotide_1994}. Fourth, CUB in the V1-C5 region is
277 not very different from other parts of the HIV genome, whereas the reduced
278 fixation probability is only observed there. In conclusion, although we cannot
279 exclude an effect of CUB on fitness as a general rule, we expect it to be a
280 minor effect in our context.
282 \begin{figure}
283 \begin{center}
284 \subfloat{\includegraphics[width=0.9\linewidth]{mixed_Shankarappa_Bunnik2008_Liu_fixation_reactivity_Vandflanking_fromSHAPE}}\\
285 \subfloat{\includegraphics[width=0.9\linewidth]{Shankarappa_fixmid_syn_V_regions.pdf}}
286 \caption{Watts et al. have measured the reactivity of HIV nucleotides to {\it
287 in vitro} chemical attack and shown that some nucleotides are more likely to
288 be involved in RNA secondary folds. C1-C5 regions, in particular, show
289 conserved stem-loop structures~\citep{watts_architecture_2009}. We show that
290 among all derived alleles in those regions reaching frequencies of order one,
291 there is a negative correlation between fixation and involvement in a base
292 pairing in a RNA stem (left panel). The rest of the genome does not show any
293 correlation (right panel). There might be too few silent polymorphisms in the
294 first place, or the signal might be masked by non-functional RNA
295 structures. Data from Refs.~\cite{shankarappa_consistent_1999,
296 bunnik_autologous_2008, liu_selection_2006}.}
297 \label{fig:SHAPE}
298 \end{center}
299 \end{figure}
302 \paragraph{Deleterious mutations are brought to high frequency by hitch-hiking}
303 While the observation that some fraction of synonymous mutations is deleterious
304 is not unexpected, it seems odd that we observe them at high population
305 frequency -- at least in some regions of the genome. The region of \env~in which
306 we observe deleterious mutations at high frequency is special in that it
307 undergoes frequent adaptive changes to evade recognition by neutralizing
308 antibodies~\cite{williamson_adaptation_2003}. Due to the limited amount of
309 recombination in HIV \cite{neher_recombination_2010,batorsky_estimate_2011},
310 deleterious mutations that are linked to adaptive variants can reach high
311 frequency~\citep{smith_hitch-hiking_1974}.
313 The potential for hitch-hiking is already apparent from the allele frequency
314 trajectories in \FIG{aft}, where many mutations appear to change rapidly in
315 frequency as a flock. Deleterious synonymous mutations can be amplified
316 exponentially by selection on linked nonsynonymous sites, a process known as
317 {\it genetic draft}~\citep{gillespie_genetic_2000, neher_genetic_2011}. In order
318 to be advected to high frequency by a linked adaptive mutation, the deleterious
319 effect of the mutation has to be substantially smaller than the adaptive effect.
320 The latter was estimated to be on the order of $s_a = 0.01$ per day. The
321 approximate magnitude of the deleterious effects can be estimated from
322 \FIG{fixp} (left panel), that shows the distribution of times for synonymous
323 alleles to reach the fix or get lost starting from intermediate frequencies. The
324 typical time to loss is of the order of 500 days. If this loss is driven by the
325 deleterious effect of the mutation, this corresponds to deleterious effects of
326 roughly $s_d \sim - 0.002$ per day.
328 To get a better idea of the range of parameters that are compatible with the
329 observations and our interpretation, we perform computer simulations of
330 evolving viral populations under selection and rare recombination. For this
331 purpose, we use the recently published package FFPopSim, which includes a module
332 dedicated to intra-patient HIV evolution~\citep{zanini_ffpopsim:_2012}. We
333 analyze many combinations of parameters such as population size, recombination
334 rate, selection coefficient and density of escape mutations, deleterious effect
335 of synonymous mutation.
337 The main result of the simulations is that genetic draft can indeed bring weakly
338 deleterious mutations to high frequencies and result in a dependence of the
339 fixation probability on initial frequency that is compatible with observations.
340 We quantify the reduction in fixation probability by the area under the
341 diagonal~\comment{EXPLAIN!} Since neutral mutations are much more likely to rise
342 to high frequency than deleterious ones, the majority of the synonymous
343 mutations needs to be slightly deleterious observe a significant reduction of
344 $P_\text{fix}$. Furthermore, the two crucial parameters that control the fixation probability
345 are the following: (a) the deleterious effects of hitchhikers compared to
346 the beneficial effects of escape mutants, and (b) the density of escape
347 mutations. Intuitively, a higher density of escape mutations (i.e., epitopes)
348 enables a larger degree of genetic draft, because escape mutations start to
349 combine and their effects add up. In \figurename~\ref{fig:simheat} (left panel),
350 we show that this is indeed the case in simulations.
352 \begin{figure}
353 \begin{center}
354 \subfloat{\includegraphics[width=0.9\linewidth]{fixation_loss_shortgenome_distance_ada_frac_del_eff_coi_various.pdf}}\\
355 \subfloat{\includegraphics[width=0.9\linewidth]{fixation_loss_shortgenome_area_ada_frac_del_eff_coi_0_01_nescepi_6_heat.pdf}}\\
356 \subfloat{\includegraphics[width=0.9\linewidth]{fixation_loss_shortgenome_area_ada_frac_del_eff_coi_0_01_nescepi_6_nonsyn_heat.pdf}}
357 \caption{The depression in $P_\text{fix}$ depends on the deleterious effect size
358 of the synonymous alleles (panel A). Simulations on the escape competition
359 scenario show that the density of selective sweeps and the size of the
360 deleterious effects of synonymous mutations are the main driving forces of the
361 phenomenon. A convex fixation probability is recovered, as seen in the data,
362 along the diagonal (panel B): more dense sweeps can support more deleterious
363 linked mutations. The density of sweeps is limited, however, by the
364 nonsynonymous fixation probability, which is quite close to neutrality (panel
365 C). Moreover, strong competition between escape mutants is required, so that
366 several escape mutants are ``found'' by HIV within a few months of antibody
367 production.}
368 \label{fig:simheat}
369 \end{center}
370 \end{figure}
373 However, if hitch-hiking is driven by non-synonymous mutations that are
374 unconditionally beneficial, we should find that non-synonymous mutations almost
375 always fix once they reach high frequencies -- in contrast with \FIG{fixp} that
376 shows that non-synonymous mutations fix as if they were neutral. We know,
377 however, that non-synonymous variation in the variable regions is driven by
378 positive selection. Inspecting the trajectories of non-synonymous mutations
379 suggest the rapid rise and fall of many alleles. We test two possible such
380 mechanisms that are biologically plausible and could explain the transient rise
381 of non-synonymous mutations: time-dependent selection and within-epitope
382 competition. If the immune starts recognizes the escape mutant before its
383 fixation, the mutant might cease to be beneficial and disappears despite its
384 quick initial rise in frequency. In support of this idea,
385 \citet{richman_rapid_2003, bunnik_autologous_2008} report antibody responses to
386 escape mutants. These respones are delayed by a few months, roughly matching the
387 average sweep time of an escape mutant. Alternatively, several different escape
388 mutations in the same epitope can arise almost simultaneously and start to
389 spread. Their fitness benefits are not additive, because each of them is
390 essentially sufficient to escape. As a consequence, several mutations rise to
391 high frequency, while the escape with the smallest cost is most likely to
392 eventually fix. In simulations, this kind of epistatic interactions within
393 epitopes reduces fixation probabilities in simulations. The emergence of
394 multiple sweeping nonsynonymous mutations in real HIV infections has been shown
395 previously~\citep{moore_limited_2009, bar_early_2012}.
397 \section{Discussion}
398 Despite several known functional roles for RNA secondary structure in the HIV
399 genome, synonymous mutations are often used as approximately neutral markers in
400 evolutionary studies of viruses. We have shown that the majority of synonymous
401 mutations in the conserved regions C2-C5 of the \env~gene are deleterious.
402 Comparison with recent biochemical studies of binding propensity of bases in RNA
403 genome suggest that these mutations are deleterious in part because they disrupt
404 stems in RNA secondary structure. Furthermore, we provide evidence that these
405 mutations are brought to high frequency through linkage to adaptive mutations.
406 The latter mutations are only transiently adaptive, either through a
407 coevolution with the immune system or redundant escape within an epitope.
409 Our observations and conclusion rely heavily on longitudinal data in which the
410 dynamics of mutations can be explicitly observed. The fact that deleterious
411 mutations can be brought to high frequencies through hitch-hiking underscores
412 the vigorousness of the coevolution with the immune system. The fact that
413 multiple escape mutations in the same epitope -- as is indeed observed in
414 studies of antibody escape~\citep{moore_limited_2009, bar_early_2012} -- are
415 necessary to explain the patterns of fixation of non-synonymous mutations points
416 towards a large populations size that rapidly discovers adaptive mutations. A
417 similar point has been made recently by Boltz {\it et al.} in the context of
418 preexisting drug resistance mutations~\citep{boltz_ultrasensitive_2012}.
420 The observed hitch-hiking highlights the importance of linkage due to infrequent
421 recombination for the evolution of HIV
422 \citep{neher_recombination_2010,batorsky_estimate_2011,
423 josefsson_majority_2011}. The recombination rate has been estimated to be on the
424 order of $\rho = 10^{-5}$ per base and day. It takes roughly $t_{sw} = s^{-1}
425 \log \nu_0$ generations for an adaptive mutation with growth rate $s$ to rise
426 from an initially low frequency $\nu_0\sim \mu$ to frequency one. This implies
427 that a region of length $l = (\rho t_{sw})^{-1} = s/ \rho \log \nu_0$ remains
428 linked to the adaptive mutation. With $s=0.01$, $l\approx 100$ bases which is
429 consistent with strong linkage between the variable loops and the stems in
430 between. Furthermore, we do not expect much hitch-hiking to extend far beyond
431 the variable regions consistent with the lack of signal out side of C5-V5. In
432 case of much stronger selection -- such as observed during early CTL escape or
433 drug resistance evolution -- the linked region is of course a lot larger.
435 The functional significance of the insulating RNA structure stems between the
436 hyper variable loops has been proposed
437 previously~\citep{watts_architecture_2009, sanjuan_interplay_2011}.
438 \citet{sanjuan_interplay_2011} have shown that insulating stems are relevant for
439 viral fitness {\it in vivo}. Our analysis is limited by the availability of
440 longitudinal data which requires a focus on the the variable regions of \env.
441 Conserved RNA structures likely exist (and several are known) in different parts
442 of the HIV genome. In absence of repeated adaptive substitutions in the vicinity
443 that cause hitch-hiking, the deleterious synonymous mutations will remain at low
444 frequencies and can only be observed by deep sequencing methods.
446 As far as population genetics models are concerned, our study uncovers the
447 subtle balance of evolutionary forces governing intrapatient HIV evolution. The
448 fixation and extinction times and probabilities represent a rich and simple
449 summary statistics to test sequencing data and computer simulation upon, as
450 noted independently in ref.~\citep{strelkowa_clonal_2012} in the context of
451 influenza. Furthermore, our results emphasize the inadequacy of independent-site
452 models of HIV evolution, especially in the light of transient effects on
453 sweeping sites, such as time-dependent selection and within-epitope negative
454 epistasis. Although a final word about which of these mechanisms is more
455 widespread is yet to be spoken, both intuition and biological evidence from the
456 literature support a mixed scenario~\citep{richman_rapid_2003,
457 moore_limited_2009, bar_early_2012}. Note also that, unlike influenza, HIV does
458 recombine if rarely, hence clonal interference as studied in
459 ref.~\citep{strelkowa_clonal_2012} is only a short-term effect. In conclusion,
460 we regard two consequences of this state of affairs as particularly relevant for
461 clinical purposes.
463 \section{Methods}
464 \comment{to be written\dots}
465 \section*{Acknowledgements}
466 \comment{to be written\dots}
469 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
470 \bibliographystyle{natbib}
471 \bibliography{bib}
472 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
473 \end{document}
474 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%