Penn Treebank Revised: English News Text Treebank LDC2015T13
Ann Bies and Justin Mott and Colin Warner

LDC2015T13_Penn_Treebank_revised.tar.zst 6.86MB
Type: Dataset
Tags: nlp, english, natural language, corpus, news, text, newswire, Treebank, LDC, corpora, Penn Treebank, Penn, 2015, LDC2015T13, parsing, tagging, part of speech, WSJ, PTB

Bibtex:
@article{,
title= {Penn Treebank Revised: English News Text Treebank LDC2015T13},
journal= {},
author= {Ann Bies and Justin Mott and Colin Warner},
year= {2015},
url= {https://doi.org/10.35111/xpjy-at91},
doi= {10.35111/xpjy-at91},
isbn= {1-58563-724-6},
dcmi= {text},
languages= {english},
language= {english},
ldc= {LDC2015T13},
abstract= {# Penn Treebank Revised: English News Text Treebank - 2015

## Metadata

* Item Name:	English News Text Treebank: Penn Treebank Revised
* Author(s):	Ann Bies, Justin Mott, Colin Warner
* LDC Catalog No.:	LDC2015T13
* ISBN:	1-58563-724-6
* DOI:	https://doi.org/10.35111/xpjy-at91
* Release Date:	July 15, 2015
* Member Year(s):	2015
* DCMI Type(s):	Text
* Data Source(s):	newswire
* Application(s):	parsing, tagging, part of speech tagging, natural language processing
* Language(s):	English
* Language ID(s):	eng
* License(s):	LDC User Agreement for Non-Members
* Online Documentation:	LDC2015T13 Documents
* Licensing Instructions:	Subscription & Standard Members, and Non-Members
* Citation:	Bies, Ann, Justin Mott, and Colin Warner. English News Text Treebank: Penn Treebank Revised LDC2015T13. Web Download. Philadelphia: Linguistic Data Consortium, 2015.
* Related Works:	View


## Introduction

English News Text Treebank: Penn Treebank Revised was developed by the Linguistic Data Consortium (LDC) with funding through a gift from Google Inc. It consists of a combination of automated and manual revisions of the [Penn Treebank](https://catalog.ldc.upenn.edu/LDC99T42) annotation of Wall Street Journal (WSJ) stories. The data is comprised of 1,203,648 word-level tokens in 49,191 sentence-level tokens -- in all 2,312 of the original Penn Treebank WSJ files.

## Data

This release includes revised tokenization, part-of-speech, and syntactic treebank annotation intended to bring the full WSJ treebank section into compliance with the agreed-upon policies and updates implemented for current English treebank annotation specifications at LDC. Examples include English Web Treebank ([LDC2012T13](https://catalog.ldc.upenn.edu/LDC2012T13)), OntoNotes ([LDC2013T19](https://catalog.ldc.upenn.edu/LDC2013T19)), and English translation treebanks such as English Translation Treebank: An-Nahar Newswire ([LDC2012T02](https://catalog.ldc.upenn.edu/LDC2012T02)). English Treebank Supplemental Guidelines are included in this release.

## Samples

Please view this [treebank](https://catalog.ldc.upenn.edu/desc/addenda/LDC2015T13.tree.txt) and [tokenized](https://catalog.ldc.upenn.edu/desc/addenda/LDC2015T13.txt) samples.

## Updates

None at this time.
},
keywords= {nlp, english, natural language, corpus, news, text, newswire, Treebank, LDC, corpora, PTB, Penn Treebank, Penn, 2015, LDC2015T13, parsing, tagging, part of speech, WSJ},
terms= {},
license= {},
superseded= {}
}

Hosted by users:

Send Feedback Start
   0.000006
DB Connect
   0.000590
Lookup hash in DB
   0.003894
Get torrent details
   0.000802
Get torrent details, finished
   0.000925
Get authors
   0.000193
Parse bibtex
   0.002373
Write header
   0.000743
get stars
   0.000537
home tab
   0.000961
render right panel
   0.000015
render ads
   0.000067
fetch current hosters
   0.000967
Done