Chapter 5 Viruses, HIV and Drug Resistance
5.1 What are viruses ?
Viruses occupy a strange place in the tree of life, with many debating if they are actually alive or not. André Lwoff gave what is probably the most fitting definition: “viruses are viruses”424. Despite this ambiguity, viruses share some common characteristics which allow us to define them as intracellular parasites425:
- Viruses have some type of genetic information, contained in DNA or RNA.
- This genetic information is protected by some form of envelope.
- They use the cellular machinery of host cells to make copies of themselves.
While we all know that viruses can be pathogenic and dangerous (the recent example of SARS-CoV2 springs to mind), that is not necessarily the case. Some viruses like GBV-C426 or certain strains of H5N1 Influenza427 are non pathogenic and essentially harmless.
Viruses have been discovered for all three domains of life: Eukaryota, Bacteria and Archea. In Eukaryota many viruses have been discovered for animals (both vertebrate428 and invertebrate429), plants430, protozoa431, chromista432 and even fungi433. Bacterial viruses known as phages have been known to exists since the beginning of the 20th century434,435. These bacteriophages are being considered as a therapeutic alternative to antibiotics436,437 which could help with multi-drug-resistant bacterial pathogens. Archea are also known to have their own viral infections438,439.
Strangely even viruses of viruses seem to exist, such as the plant satellite virus440,441 or hepatitis delta virus442,443. These “viroids” do not infect viral hosts per se but they cannot replicate on their own. Replication must happen during co-infection with a larger virus. More recently, true viruses of viruses called virophages have been discovered. These virophages like sputnik444 or zamilon445 specifically infect giant viruses.
There is a huge diversity of viruses affecting all types of life, and new viruses are being discovered all the time446. This diversity hints at a rich and long evolutionary history. When and where viruses originated is still under study447,448 and we might never know how they emerged. It is, however, believed that they may have played an important role in the emergence of eukaryotic cells449. This co-evolution between virus and host cell shows a strong link between the two organisms and some parts of the human genome are likely of ancient viral origin450,451. It has been estimated that 1% to 8% of the human genome are endogenous retroviral sequences452,453.
The rich diversity of viruses is reflected in the variety of genetic information support, replication strategy, physical and genomic size, as well as shape. The differences in genetic information support and replication strategy form the basis of the Baltimore virus classification system454 , still used today455 to classify virus lineages.
As stated above, all viruses have some genetic information. This information is stored either as DNA or as RNA, which is the molecule of choice for 70% of human pathogenic viruses456 (HIV and SARS-CoV 2 are RNA viruses).
For DNA viruses, the molecule can be double-stranded as for Herpesvirus457,458, single-stranded like in the case of Papillomavirus459 or even circular in the case of the Hepatitis B virus460. This molecular diversity is also present in RNA viruses where the RNA molecule can be double-stranded like for Rotavirus461, or single-stranded. Furthermore, for single-stranded RNA viruses the strand can either be positive (i.e. can be directly translated into a protein) like the Hepatitis C virus462 or Poliovirus463,464; conversely there are negative-strand RNA viruses, for which the complementary strand of RNA must be synthesized before translation into a protein, such as the Influenza or Measles viruses465.
This diversity in genetic information support implies a necessary diversity in replication strategy. The main replication strategies are as follows466:
The RNA molecule is directly copied as RNA. This is the strategy followed by single-stranded RNA coronaviruses467, Dengue viruses468 or Hepatitis C virus469.
The DNA molecule is directly replicated as DNA. this can happen for both single-stranded DNA viruses like Papillomavirus470and double-stranded DNA viruses like Herpes simplex virus471.
The DNA molecule is replicated by going through an RNA intermediary like Hepatitis B virus472.
The RNA molecule is replicated by going through a DNA intermediary. This strategy is used by retroviruses that integrate this viral DNA intermediary into the host DNA, like HIV-1 (see Section 5.2.2).
Finally, the genetic diversity of viruses is reflected in their physical characteristics: viruses come in all shapes and sizes. Physical size range from 17nm for plant satellite viruses473 to the giant, 400nm Mimivirus474. Genomic size is also quite variable. There is a stark contrast between the 860 bp Circovirus SFBeef and the 2.5 Mbp Pandoravirus salinus genomes475. Viruses also come in a variety of shapes476: icosahedral for HIV, helical for the tobacco mosaic virus or a distinctive head-tail shape for bacteriophages.
Although there are a large number of viruses, and many of them are of great importance for human health, we will now focus on one virus of particular importance: Human Immunodeficiency Virus otherwise known as HIV.
5.2 Getting to know HIV
5.2.1 Quick presentation of HIV
HIV is a single-stranded RNA retrovirus that is responsible for the Acquired Immune Deficiency Syndrome (AIDS) pandemic that has been around for the last couple decades. This virus is transmitted through sexual contact or through blood. Sexual activity is the largest transmission factor followed by intravenous drug use477,478.
HIV infects cells of the host immune system, specifically CD4 T-cell lymphocytes and destroys them due to its replication process479. CD4 T-cells are an essential part of the immune system response, helping fight against infection in humans. An HIV infection typically starts with an asymptomatic phase that can last years, followed by a growth in viral replication leading to a decrease in CD4 cells which progresses into AIDS480. During AIDS, when the CD4 cell count is low enough, opportunistic diseases such as pneumonia or tuberculosis481 can easily infect the host, leading to death when the immune system is weak enough.
HIV/AIDS is one of the deadliest pandemics in history, estimated to have lead to the death of 36 million people482. In 2010 approximately 33 million people were infected with HIV483, 2.6 million of which were due to new infections, and 1.8 million died of AIDS. Most of the new infections happened in economically developing regions of the world, 70% of them coming from sub-Saharan Africa483. As of 2020, these numbers have decreased with “only” 1.5 million new infections and 680,000 AIDS deaths, which is encouraging from a public health perspective.
The HIV-1 virus was discovered simultaneously in 1983 by Françoise Barré-Sinoussi, Luc Montagnier484 and Robert Gallo485. There exists a second HIV-2 virus discovered shortly after HIV-1486, it is however less transmissible than HIV-1 which is largely responsible for the global HIV/AIDS pandemic487. In Africa in 2006, HIV-1 infections were rising where HIV-2 were declining488.
While both viruses are of zoonotic origin, from transmissions of Simian Immunodeficiency Virus (SIV) from primates to humans, HIV-1 most likely originates from an SIV present in chimpanzees489–491, and HIV-2 from an SIV present in Sooty mangabeys492–494.
Several independent such transmissions have resulted in 4 lineages of HIV-1 labeled groups M, N, O and P495 (similarly HIV-2 is split into groups A to H also resulting from independent zoonotic transmissions). Groups N and P have been identified in only a handful of individuals in Cameroon, and group O is estimated to a few thousand cases in western Africa. The majority of the pandemic is due to viruses from group M.
The most recent common ancestor, i.e. the putative virus that founded group M, is estimated to have originated in what is now the Democratic Republic of Congo496–498 at some point between 1910 and 1931496,499,500.
Group M is further subdivided into 9 subtypes each with distinct genetic characteristics, labeled A to K491,501. Like in many viruses502, when 2 genetically different strains of HIV co-infect a single host there is a risk of genetic recombination leading to a new strain503. During recombination, a new genome is formed from parts of the original genomes. This can lead to new strains that can spread and form lineages of their own. HIV strains resulting from recombination are called Circulating Recombinant Forms (CRFs). There are currently 118 identified HIV-1 CRFs in the Los Alamos National Laboratory HIV sequence database504 (1 for HIV-2). Many unique recombinant forms (URFs) also exist. URFs and CRFs are both the result of intra-host genetic recombination a URF becomes a CRF once it has been identified in at least three epidemiologically independent infected individuals505. Recombination can be particularly bothersome, complicating evolutionary analyses506, facilitating the emergence of drug resistance and hindering vaccine development507.
While subtype C represented almost half of global infections from 2004 to 2007, subtype B is the majority subtype in richer countries of North America and Western Europe508 where sequencing efforts are more common. This accounts for an over-representation of subtype B sequences in public databases such as the Los Alamos sequence database where 54% of sequences are of the B subtype and only 15% are C509.
5.2.2 The replication cycle of HIV
The virus’s replication cycle and its immune-cell host specificity are what makes it particularly dangerous. This replication cycle can broadly be categorized into 9 separate steps510,511 shown in Figure 5.1.
An HIV virion binds itself to the CD4 host cell through membrane proteins.
The virion envelope and host cell membrane fuse together, allowing the viral genetic material and proteins to enter the host cell.
The viral RNA is reverse-transcribed into viral DNA.
The viral DNA is integrated into the host cell genome.
The integrated viral DNA is transcribed by the host cell machinery into multiple copies of viral RNA.
The viral RNA is translated into immature viral polyproteins.
The viral polyproteins are cleaved to form individual viral proteins.
The newly synthesized viral RNA and viral proteins gather around the host-cell membrane which starts budding to create a new virion.
Once the budding is complete, the virion is released from the host cell and matures before being able to infect other CD4 cells and replicate again.
The successive infection of CD4 cells by HIV virions leads to cellular death due to inflammatory response and/or activation of apoptosis512,513. The gradual depletion of CD4 cells in the infected individual’s body lead to the suppression of the immune system, and eventually to AIDS.
5.2.3 Genetics of HIV
The replication cycle described in Section 5.2.2 is made possible by the 15 proteins of HIV. These proteins are coded by 9 separate genes514. An overview of the HIV proteins, their structure and localization within the viral particle can be seen in Figure 5.2.
The HIV genome is made up of three main genes each coding for polyproteins and six genes coding for proteins with regulatory or accessory roles. The three polyproteins correspond to long chains of amino acids which are subsequently cleaved at specific positions to produce separate viral proteins.
The gag (“group-specific-antigen”) gene codes for the Gag polyprotein which, once cleaved, results in four proteins with mainly structural roles:
The Matrix protein (MA or p17) lines the internal surface of the virion membrane, maintaining the shape and structural integrity of the virion.
The Capsid protein (CA or p24) forms an inner core (the capsid) inside the virion around the viral RNA. It helps protect the viral genetic information.
The Nucleocapsid protein (NC or p7) binds with the viral RNA inside the capsid, stabilizing the molecule and further protecting the genetic information.
The p6 protein is a small, largely unstructured protein515 that is suspected of playing a role in virion budding and release from the host cell at the end of the replication cycle516,517.
The pol (“polymerase”) gene codes for the Pol polyprotein. After cleaving, this results in three essential viral enzymes at the heart of the replication cycle:
The Protease (PR) is responsible for cleaving the Gag, Pol and Env polyproteins to get the individual viral proteins. Without it, the individual viral proteins cannot come into being and therefore cannot function, stopping viral replication.
The Reverse Transcriptase (RT or p51/p66) is responsible for synthesizing viral DNA from the viral RNA template contained in the virion. This is the first step in hijacking the cellular machinery for replication. Without viral DNA, HIV replication is impossible.
The Integrase (IN) is responsible for integrating the viral DNA produced by RT in to the host cell DNA. Once the viral DNA is inside the host genome it can be transcribed and then translated (as described in Section 1.1) to produce new copies of the viral RNA and proteins. Without this integration step the viral genetic information cannot be expressed and the replication cycle is stopped.
These three proteins are of particular importance and we will go into more detail about them in Section 5.3.2.
The env (“envelope”) gene codes for Env, the third and last polyprotein. The two resulting proteins coat the membrane of the virion and are responsible for binding with the CD4 host cells.
The Surface protein (SU or gp120) binds to receptors on the surface of CD4 cells and allows the virion to attach itself to the host cell518. It also enables membrane fusion, the essential first step in the viral replication cycle519.
The Transmembrane protein (TM or gp41) anchors SU into the virion membrane.
The 6 remaining genes all code for single proteins. Two of these have essential regulatory roles and the remaining four accessory roles.
The tat (“trans-activator of transcription”) gene codes for Tat, the first essential regulatory protein. Tat activates and promotes transcription leading to more numerous and longer copies of the viral RNA520. The rev (for “regulator of virion”) gene codes for Rev, the second essential regulatory protein. Rev helps transcribed viral RNA exit the nucleus of the host cell in order to be translated to viral proteins or be packaged in new, budding virions521.
The remaining four accessory genes are as follows: nef (“negative regulatory factor”) code for the Nef protein which prevents the production of the CD4 cellular defense proteins increasing infectivity522; vif (“viral infectivity factor”) codes for the Vif protein which also increases viral infectivity523; vpu (“viral protein U”) codes for Vpu which likely helps during release of new virions523,524 as well as preventing production of CD4 in the host cell. It is not believed to be present in the mature virion as it binds to host cellular membranes525; vpr (“viral protein R”) likely helps viral DNA enter the host cell nucleus and prevents the natural host cell reproduction cycle526.
The existence of a 10th HIV-1 gene was suggested in 1988527, overlapping the env gene and coding for proteins on the other strand of viral DNA than the other genes. This putative gene was named asp (“antisense protein”) and Asp transcripts were isolated during an HIV-1 infection in 2002528. The function of this protein is still unknown but it has been shown to have a strong evolutionary correlation with HIV-1 group M responsible for the pandemic529. This Asp protein is still a source of debate and is under active research530.
5.3 Drug resistance in HIV
Although the HIV/AIDS pandemic has been very deadly around the world, we are not completely defenseless against it. The first antiretroviral therapy (ART) drugs were made available in the late eighties, only a couple years after discovering the virus. ART reduce the viral load in an HIV positive patient reducing its transmissibility532. While ART is not a cure for an HIV infection it has been shown to drastically reduce mortality and morbidity533. ART is estimated to have saved the lives of 9.5 million individuals between 1995 and 2015534.
5.3.1 A quick history of ART
The first available anti-HIV drug was Zidovudine (ZDV, also known as azidothymidine or AZT) approved by the FDA for usage in the USA in 1987535, a few years only after the discovery of the virus. This drug was a reverse transcriptase inhibitor (RTI) therefore preventing the viral RNA from being transcribed into viral DNA. Unfortunately, 3 years later, strains of HIV resistant to ZDV were circulating536. This rapid emergence of resistance to treatment is common for HIV537 due to its very high evolution rate538 allowing it to explore many possible mutations in response to selective pressures, as well as the frequent occurrence of genetic recombination539. To counter this resistance new drugs were rapidly developed and, between 1988 and 1995, four more RTIs were approved by the FDA. Using a combination of these drugs was also shown to be effective and led to a slower rise of resistance540.
Then, focus was shifted to the development of a new type of drug: Protease Inhibitors (PI). Between 1995 and 1997, 4 of them were approved. These, taken in combination with RTI made it harder for the virus to develop resistance541. A new class of RTIs was also explored, Non-Nucleoside RTIs (NNRTIs) that block the RT action in another manner than the previously approved Nucleoside RTIs (NRTIs). When taken in combination with other drugs they are also highly effective542. As the years advanced even more drug targets were explored, with 5 Integrase inhibitors (INSTI) being approved since 2007543, A Fusion Inhibitor (FI) in 2003544, and 3 other Entry inhibitors (EI)545,546 since 2007 all targeting different steps in the replication cycle of HIV (see Table B.1 and Figure 5.3).
In response to the rapid emergence of resistance in HIV when treated with a single drug, clinicians started systematically treating HIV with a combination of multiple drugs targeting different proteins, as early as 1996. This is now referred to as highly active antiretroviral combination therapy (HAART, also known as tritherapy). HAART usually consists of 2 NRTIs coupled with another drug: NNRTI or PI at first and later FI or INSTI547. As of 2008, 22 anti-HIV single drugs were approved by the FDA548, and 27 as of today. This large array of available drugs made HAART possible and gave options to clinicians to switch targets when the multi-resistant HIV emerged. It is important to note here that, while high-income countries had access to this large panel of antiviral drugs, in most lower-income countries that was not the case. This meant that drug switching and second-line5 drug regimens were rarely possible in these countries, leading to multi-resistant viruses549.
With the advent of HAART, patients had access to more potent treatments. However, the complexity of treatment regimens grew. They often involved several pills a day, taken at precise intervals. Complex drug regimens have been associated with poorer treatment adherence550,551. This can lead to poor treatment outcome, as well as the emergence of multi-resistant HIV strains552 and their spread within the population. To avoid this issue, increasingly more single pill regimens are being developed with a staggering 7 new drugs approved by the FDA in 2018. These single pill regimens greatly reduce the burden of adherence for patients, leading to better therapeutic outcomes, and reduced healthcare costs553.
Most recently, some studies explored using some of these single pill regimens (such as Truvada, c.f. Table B.1) as prophylactics, called Pre-exposure prophylaxis (PrEP). Putting uninfected but at risk populations on ART, before any known exposure, has been shown to effectively lower the risk of infection554–556. When adherence is maintained, this risk reduction has been estimated to be between 44% and 100%557. As of 2022, Truvada is the only authorized drug for PrEP in Europe558. Descovy and Apretude are also authorized for PrEP in the USA559.
All of these drugs are widely used and are by now very well studied, therefore detailed guidelines on all the aspects of ART; when to start, which drugs to use, when to change drugs; are issued and updated regularly by practitioners560 and global instances561 alike.
5.3.2 Main mechanisms of viral proteins, antiretroviral drugs and associated resistance.
Each ART drug targets a specific protein. Most of them target one of the three pol proteins: RT, PR and IN. The structure of these proteins is inherently linked to their function, and as such is essential to take into account when developing ART. Similarly, the structure of these proteins is very important when studying the resistance mechanisms developed by the virus562,563. In this section we will go over the main structural elements and how they relate to treatment and resistance, for RT, IN and PR.
5.3.2.1 Reverse transcriptase
The reverse transcriptase protein is the most targeted protein, in number of ART drugs (c.f. Figure 5.3 and Table B.1). The mature protein is formed of two subunits: p51 and p66. These two subunits are translated from the same section of the pol gene, and have the same amino acid sequence, but p51 is cleaved and is shorter than p66. The p66 subunit contains the active sites of RT whereas p51 plays a mainly structural role.
The p66 sububit can be separated into 5 domains564. The “fingers”, “palm”, and “thumb” domains are linked together and folded to form a canal through which the RNA template and newly synthesized viral DNA can pass through. The polymerase active site, responsible for incorporating nucleotides to the viral DNA molecule, is situated in the “palm” domain at the bottom of the canal. The “RNase” domain of RT contains a secondary active site responsible for cleaving the viral RNA template from the viral DNA so that the RT can fill out the complementary strand of viral DNA before integration into the host genome. The final “connection” domain is simply a link between the “RNase” and the “thumb” domains. A three dimensional view of RT with these domains highlighted can be seen in Figure 5.4.
Reverse Transcriptase inhibitors can be separated into two classes: Nucleoside RTIs (NRTIs) and Non-Nucleoside RTIs (NNRTIs). They inhibit the action of RT in two disctinct manners:
NRTIs are analogues of free nucleotides in the host cell. They competitively inhibit RT and can be used to elongate the viral DNA chain. Once an NRTI is incorporated, further elongation of the DNA molecule is impossible and the viral DNA cannot be synthesized anymore. This is similar to the chain terminating nucleotides introduced in Section 1.2.
NNRTIs bind to a specific region of the p51 subunit: the Non Nucleoside Inhibitor Binding Pocket (NNIBP) (A view of RT with the NNIBP visible is shown in Figure 6.4). This pocket, although it is on the p51 subunit is spatially situated very close to the polymerase active site. NNRTIs bind to the NNIBP to change the conformation of the active site, lowering its flexibility566, and thus non-competitively inhibiting the action of RT.
Research has been conducted into inhibition of the RNase active site of RT567,568 which could also inhibit the action of RT. There is, however, to this day, no approved treatment that inhibits the RNase action of RT.
Drug resistance mutations (DRMs) that arise in HIV from the selective pressures resulting from RTI exposure can similarly be grouped into two categories: NRTI and NNRTI resistance mutations.
NRTI resistance mutations can further be subcategorized into two groups569,570. The first type of NRTI resistance mutations are mutations that prevent the incorporation of NRTIs into the viral DNA molecule. M184V and M184I, indicating the replacement, at site number 184, of a Methionine by a Valine or an Isoleucine respectively, are very common NRTI resistance mutations. These V and I amino acids have a different structure than the original M, interfering with the incorporation of lamiduvine (3TC) but not dNTP571. The second type of mutation, allows RT to remove an incorporated NRTI from the viral DNA to resume synthesis. Thymidine Analog Mutations (TAMs), M41L, D67N, K70R, L210W, T215Y/F and K219Q/E confer resistance to azidothymidine (AZT) through this mechanism572,573.
Similarly, NNRTI resistance mutations work via several different mechanisms574,575. Some NNRTI resistance mutations, like Y181C, lower the affinity of the NNIBP to NNRTIs preventing binding of drugs to RT. Others, like K103N change the conformation of the p51 subunit, making the NNIBP disappear. NNRTI resistance mutations are particularly dangerous because they often confer cross-resistance to multiple NNRTIs without affecting the polymerase action very much562, giving rise to viruses that are both fit and highly resistant. This is contrast to NRTI resistance mutations that generally incur a fitness cost for the virus, lowering its efficacy576.
5.3.2.2 Protease
The Protease protein, also a major drug target for ART, cleaves the gag and pol polyproteins in order to produce functional viral proteins, essential to replication. It has a symmetric, dimeric, structure. That is to say: it is composed of two identical chains of amino acids577,578. A structural view of PR is shown in Figure 5.5.
These two chains are folded in order to create a “tunnel” through which the polyproteins enter. In the middle of this “tunnel”, at the bottom, is the active site. The active site is composed of two Aspartate residues, one on each chain. Using water, they can provoke a chemical reaction that cleaves the polyprotein at a specific position579.
The roof of the “tunnel” is formed by the flaps, a flexible region from each of the two chains that can open or close the “tunnel”580. These flaps most likely control the access of polyproteins to the active site581,582.
All the approved Protease Inhibitors (PIs) share a similar mode of action. Each PI binds to the active site of the PR, denying access to the “tunnel” for polyproteins, and stopping the catalytic action of PR583,584. Tipranavir, one of the more recent PIs, also binds with the flaps584.
According to Prabu-Jeyabalan et al., PR does not recognize the specific sequence of the polyprotein cleavage site but rather its shape585. They proposed an inhibitor based on the shape of all polyproteins combined, which establishes more bonds with PR, making it supposedly more efficient586 than current approved PIs.
As is the case with RTIs, when under selective pressure due to PIs, the virus tends to develop PI associated DRMs. Most PI resistance mutations result in an enlarged “tunnel”. This tends to lower the affinity of the PIs to the active site, but also the affinity of polyproteins, lowering the fitness of the virus significantly541. In addition, some mutations on the gag polyprotein seem to lower the efficacy of PIs, although the underlying mechanism is not well known541.
Some mutations in the flaps of PR have also been shown to confer PI resistance. It seems likely that these mutations change conformation of the flaps, opening them and leading to the release of inhibitors from the active site587.
5.3.2.3 Integrase
The integrase protein is the third major anti-retroviral drug target. It is responsible for integrating the viral DNA into the host genome. IN is a tetramer composed of four identical amino acid chains588,589. Each of these chains contain three domains linked together by flexible linker sequences: the N-terminal domain, the catalytic core and the C-terminal domain. In each tetramer, two chains provide the active site for the integration reaction while the other two have a mostly structural role. It is probable that the N-terminal domain, which is very conserved, is necessary for stable tetramerization of IN monomers590. This tetrameric structure is shown in Figure 5.6.
Several steps are needed in order to integrate the viral DNA with the host genome591. First, IN binds to the both ends of the viral DNA, using the C-terminal domains, forming a closed loop. Secondly, both ends of the viral DNA molecule are then prepared for integration by the catalytic core. Third, the host DNA is captured with C-terminal domains. Then, the strand-transfer is done within the catalytic core: the host DNA is cut in two places and a single strand from each end of the viral DNA are attached to these two breakpoints. Finally, the IN tetramer detaches from the linked molecules and the final steps necessary to create a single hybrid DNA molecule are done by the host cellular machinery. A graphical representation of this process can be found in Figure 1 of Maertens et al. (2022)591.
Integrase Strand Transfer Inhibitors (INSTIs), as their name indicates, block the strand transfer reaction. They achieve this by strongly binding to the active site of the IN tetramer after it has formed a complex with the viral DNA591,592. In doing so, INSTIs prevent the IN / viral DNA complex from binding to the host DNA, effectively preventing strand transfer.
In the presence of INSTIs during therapy, once more, the HIV virus develops resistance mutations over time. These mutations all lower affinity of IN to INSTIs, preventing bonding591,593. Since most INSTIs behave similarly, this means that cross-resistance to INSTIs is quite common for INSTI DRMs593,594. Again, these mutations tend to lower the overall viral fitness necessitating secondary compensatory mutations to restore fitness593,594.
5.3.2.4 Other drug targets
For now, resistance has not been observed for novel drugs like entry inhibitors. This might be because the genetic barrier to resistance is higher and not enough time has passed since their introduction for resistance to emerge.
For all the other drug targets however, as stated earlier in this section, resistance is documented and problematic. Resistance has even been detected for PrEP which is prophylactic595,596. This seems to be rare however, and mostly due to unknown pre-treatment HIV infections597.
5.3.3 Consequences of resistance on global health
HIV resistance to ART drugs is problematic from a global health perspective. Indeed, circulation of resistant strains of HIV within populations can lead to treatment-naive individuals that will not respond well to treatment.
More concerning, is the fact that transmission of resistant strains of HIV between treatment-naive individuals is the main mode of resistance transmission in the UK598,599 and Switzerland600. This treatment-naive to treatment-naive transmission is particularly insidious since it can go undetected and creates long lasting drug resistant strain reservoirs in the treatment-naive population. This of course is dangerous since some infected individuals might experience poor therapeutic outcomes and even treatment failure when administered first line regimens601. To avoid this, genotypic resistance testing has become standard practice when choosing the therapeutic strategy in high-income countries, but more effort must be done to make resistance testing less expensive and more cost-efficient in lower and middle income countries602.
Although the transmitted drug resistance described above is problematic, a large portion of DRMs incur a fitness cost for the resistant strain603,604. This means that, although they are selected when exposed to the evolutionary pressure of ART, when the treatment is interrupted there is another pressure leading these costly mutations to disappear. This reversion is commonly observed after interruption of treatment, however the median reversion times vary widely from 1 to 13 years605 depending on the severity of the fitness loss and type of mutation. This means that, although reversion can possibly lead to loss of resistance, this can potentially take a long time and possibly longer than the treatment interruption.
In practice, it is therefore very important to keep an eye on all drug resistance mutations, their population dynamics, and spread as well as their presence or absence in a particular strain before starting treatment.
5.3.4 Finding DRMs 6
Finding and categorizing mutations as DRMs is an important task in light of the public health implications mentioned in Section 5.3.3. As such, this is an active part of the HIV research field.
The most important thing needed in order to study DRMs is, of course, viral sequences. To facilitate the search for DRMs, several sequence databases exist. Sequences are often linked to metadata related to the treatment status of the patient from which the sequence was obtained. This metadata can be quite variable: from a coarse level binary indicator of treatment to a finely detailed list of all treatments received and associated phenotypic measurements like viral load.
Databases like the UK-CHIC607, UK HIV drug resistance database (https://www.hivrdb.org.uk/) and Swiss cohort study (https://www.shcs.ch/) host sequences on a national level, although access can be granted to international researchers. Other databases like the PANGEA database608 host sequences from multiple countries in sub-Saharan Africa. The Stanford HIV drug resistance database (https://hivdb.stanford.edu/) hosts HIV sequences with some phenotypic data33,609. Finally some database only host sequences, such as the Los Alamos HIV sequence database (http://www.hiv.lanl.gov/). However, with few specific treatment or resistance related metadata610, these have less direct applicability to the DRM search task.
Some databases, like the Stanford HIV resistance database also store specific knowledge about known resistance mutations, keeping and regularly updating lists of clinically important DRMs as well as their impact on ART611,612. Additionally, Stanford also offers tools for clinicians to do genotypic resistance testing with interpretable results613.
The first step of mutations discovery is usually some kind of statistical association analysis611,614 where the association between treatment status (coarse of fine grained) and specific mutations is statistically tested. This is usually done with Fisher association tests615,616 or correlation testing with the Spearman correlation617. This results in a list of mutations that are significantly associated with a given treatment and corresponding p-values.
Since, on a given sequence dataset, several mutations are usually tested at once, this can lead to inflated false positives618 and spurious associations619. Fortunately, this is a well studied problem and many methods exist to control this effect by controlling the Familywise Error Rate (FWER) e.g. with the Bonferroni procedure620, or the False Discovery Rate (FDR) e.g. with the Benjamini-Hochberg procedure621. These methods are often applied when testing for resistance association615,622,623. However, these correction methods are a double-edged sword, some of them can be very conservative and lead to falsely rejecting true associations624. In some studies on resistance, phylogenetic correlation between the sequences is also accounted for625,626.
Statistical testing on treatment status, while informative, can only associate a mutation with a treatment. In order to actually validate whether a mutation causes resistance or not, biological analyses are needed611,614. The easiest of these are in vitro analyses where live viruses are subjected to a phenotypical assay. These assays measure the susceptibility of HIV viruses to a wide array of drugs, which can then be statistically associated with genetic traits like specific mutations. These assays like phenosense627 or antivirogram628 are widely used629–631. Viruses can be obtained from clinical isolates632, or viruses with specific mutations can be manufactured with site directed mutagenesis633,634. In vivo studies can be conducted by sequencing viruses from patients failing ART, following over time and studying the association between their treatment response and HIV genetics635,636.
More recently, as sequence database grow bigger and bigger (The UK-CHIC database contains more than 80,000 HIV sequences with treatment status), methods based on statistical and machine learning are being used to study resistance. Most approaches rely on training models to predict some type of resistance: either classifying sequences as resistant or not331,637 of predicting a phenotypic response like fold resistance compared to wild type638. Initial approaches were mainly designed for clinical testing, rather than new DRM search, and distributed via web services639,640.
Initially these approaches were based on models like decision trees641, SVMs639 or logistic regression642. Over time the use of more complex models such as neural networks has increased, with increased prediction accuracy638.
By analyzing the important features used by trained models to predict resistance, it is possible to find features corresponding to mutations, that are useful for predicting, and therefore likely associated with, drug resistance (see Chapter 6). With the improvement in methods to interpret and extract features from complex models such as deep neural networks, this approach has been used with deep learning models331. This novel way of finding resistance associated mutations has the potential to uncover complex mutational effects that simple association testing cannot.
5.4 Conclusion
Viruses are surprisingly complex in light of their apparent simplicity. They are ubiquitous and present an extreme diversity. Whether they are pathogenic or not, the role of viruses in a myriad of processes and niches make them interesting and important to study. The sequences of these viruses, although small can be very useful for evolutionary as well as clinical analyses.
Although the study of viruses as a whole is very useful, HIV is particularly important to study. The impact of the HIV pandemic on global health has been severe, both in Lower and Higher income countries. It is therefore paramount to fully understand the underlying mechanisms and evolutionary adaptations of this virus. Its high mutation rate allows it to quickly explore evolutionary alternatives when exposed to drugs, making anti HIV therapy a complex endeavor.
Fortunately, with large scale sequencing efforts it is possible to study and track these evolutionary adaptations to treatments. This allows us to adapt therapeutic strategies as well as develop new compounds and approaches. In this context, studying and finding the virus’s mutational processes is paramount. This is especially important when studying resistance to RTIs as they form the backbone of first line regimen combination therapies, and are the most common type of anti-HIV drug. This process is made easier by the large scale sequence repositories now available, and the usage of machine and statistical learning to leverage that data.