The Data

In 2004 as part of the resolution of a fraud action taken by New York State, GlaxoSmithKline agreed to post the data from all their studies on the Company Website. They posted lengthy Clinical Study Reports (CSRs) for all the pediatric antidepressant studies (Paxil). They also posted shorter Summary Reports on Avandia and other drugs.

There are two CSRs for Study 329, one for the Acute Phase (528 pages) and one for the Continuation Phase (264 pages).

Clinical Study Report 329: Acute Phase.

Clinical Study Report 329: Continuation Phase.

In January 2012, Peter Doshi noticed that the CSRs for Study 329 referred to a number of Appendices (A to H), but that these were not present.  He wrote to New York State’s Attorney General’s Office who contacted GSK.   GSK agreed to post Appendices A – G, but Appendix H was posted without content.

Appendix A: Protocol & Related Material (952 pages).
Appendix B: Patient Data Listings (640 pages).
Appendix C: Efficacy (660 pages).
Appendix D: Adverse Events (224 pages).
Appendix E: Vital Signs (89 pages).
Appendix F: Laboratory Values (856 pages).
Appendix G: CRF Tabulations by Patient (2073 pages).
Appendix H: [Empty Shell]

Data Periscope


The links above are to the documents posted on the GSK site.  These are Adobe Acrobat files that are images and not searchable.  The 329 Team converted these to a version of PDFs that can be searched.  In addition we made some combined files to maker it easier to search the collection.  Links to these files are below.

Clinical Study Report 329: Acute Phase. (OCR)

Clinical Study Report 329: Continuation Phase. (OCR)

Appendix A: Protocol & Related Material (952 pages OCR).
Appendix B: Patient Data Listings (640 pages OCR).
Appendix C: Efficacy (660 pages OCR).
Appendix D: Adverse Events (224 pages OCR).
Appendix E: Vital Signs (89 pages OCR).
Appendix F: Laboratory Values (856 pages OCR).
Appendix G: CRF Tabulations by Patient (2073 pages OCR).
Appendix H: [Empty Shell] (OCR)

All appendices combined. (OCR very large file)
All reports, synopses, appendices combined. (OCR very large file)

The original coding used by GSK was from an obscure, inaccessible coding dictionary. The Study 329 RIAT Team believed that a more modern and widely-used coding system was more appropriate, and so they re-coded all the adverse events.

Originally, GSK posted all Appendices as PDF documents. Eventually, with the exception of Appendix H,  GSK did provide the data in electronic form to the team. For the harms of treatment, we had already created our own “live” spreadsheets which contain both GSK’s codes and the RIAT team coding side by side. These can be downloaded  and analyzed.  Having the data available is important for debating the meaning of observations, challenging approaches taken and spotting errors.

The most difficult challenge lay in getting access to the data in Appendix H.  Appendix H contains the individual patient level data in the form of Clinical / Case Report Forms (CRFs). There are roughly 77,000 pages – between 200-300 pages for each of 273 patients.  The correspondence with GSK between December 2013 and March 2014 reveals the negotiations that took place to get access to this data.  Click here to view the correspondence.

The Adverse Harms Data Spreadsheet re-created by the 329 RIAT Team contains material from Appendix H which gave rise to adverse events not listed in the original Appendix D. To make the data usable, the Study 329 RIAT Team  had to create Excel Spreadsheets and re-enter the data.

GSK also granted access “for audit purposes” to Appendix H, the CRFs. Even though all patient names and details were redacted, this access was not in the form of a PDF. It was through a “periscope” – a remote access portal that reached into GSK and allowed Joanna Le Noury to scrutinize the 77,000 pages individually without being able to print or download them. She had to make manual notations for each document.

We used the R environment for statistical computing and graphics for this Study. Under our data access agreement with GSK, we cannot post the data.  Consequently, the RIAT team is unable to provide direct access to this data. However, the full efficacy data is available in a PDF format.  (See Appendix B, C, & D above)

Click here for our Harms Data Spreadsheet.

Click here for our Patient Demographics Withdrawal Reasons Spreadsheet.

Click here to view the basic form use for the files imported into R for the analysis, followed by the code for the calculations.

Click here for the Data Sharing Agreement between GSK and RIAT.