Penn Treebank Revised: English News Text Treebank LDC2015T13
Ann Bies and Justin Mott and Colin Warner

LDC2015T13_Penn_Treebank_revised.tar.zst 6.86MB
Type: Dataset
Tags: nlp, english, natural language, corpus, news, text, newswire, Treebank, LDC, corpora, Penn Treebank, Penn, 2015, LDC2015T13, parsing, tagging, part of speech, WSJ, PTB

Bibtex:
@article{,
title= {Penn Treebank Revised: English News Text Treebank LDC2015T13},
journal= {},
author= {Ann Bies and Justin Mott and Colin Warner},
year= {2015},
url= {https://doi.org/10.35111/xpjy-at91},
doi= {10.35111/xpjy-at91},
isbn= {1-58563-724-6},
dcmi= {text},
languages= {english},
language= {english},
ldc= {LDC2015T13},
abstract= {# Penn Treebank Revised: English News Text Treebank - 2015

## Metadata

* Item Name:	English News Text Treebank: Penn Treebank Revised
* Author(s):	Ann Bies, Justin Mott, Colin Warner
* LDC Catalog No.:	LDC2015T13
* ISBN:	1-58563-724-6
* DOI:	https://doi.org/10.35111/xpjy-at91
* Release Date:	July 15, 2015
* Member Year(s):	2015
* DCMI Type(s):	Text
* Data Source(s):	newswire
* Application(s):	parsing, tagging, part of speech tagging, natural language processing
* Language(s):	English
* Language ID(s):	eng
* License(s):	LDC User Agreement for Non-Members
* Online Documentation:	LDC2015T13 Documents
* Licensing Instructions:	Subscription & Standard Members, and Non-Members
* Citation:	Bies, Ann, Justin Mott, and Colin Warner. English News Text Treebank: Penn Treebank Revised LDC2015T13. Web Download. Philadelphia: Linguistic Data Consortium, 2015.
* Related Works:	View


## Introduction

English News Text Treebank: Penn Treebank Revised was developed by the Linguistic Data Consortium (LDC) with funding through a gift from Google Inc. It consists of a combination of automated and manual revisions of the [Penn Treebank](https://catalog.ldc.upenn.edu/LDC99T42) annotation of Wall Street Journal (WSJ) stories. The data is comprised of 1,203,648 word-level tokens in 49,191 sentence-level tokens -- in all 2,312 of the original Penn Treebank WSJ files.

## Data

This release includes revised tokenization, part-of-speech, and syntactic treebank annotation intended to bring the full WSJ treebank section into compliance with the agreed-upon policies and updates implemented for current English treebank annotation specifications at LDC. Examples include English Web Treebank ([LDC2012T13](https://catalog.ldc.upenn.edu/LDC2012T13)), OntoNotes ([LDC2013T19](https://catalog.ldc.upenn.edu/LDC2013T19)), and English translation treebanks such as English Translation Treebank: An-Nahar Newswire ([LDC2012T02](https://catalog.ldc.upenn.edu/LDC2012T02)). English Treebank Supplemental Guidelines are included in this release.

## Samples

Please view this [treebank](https://catalog.ldc.upenn.edu/desc/addenda/LDC2015T13.tree.txt) and [tokenized](https://catalog.ldc.upenn.edu/desc/addenda/LDC2015T13.txt) samples.

## Updates

None at this time.
},
keywords= {nlp, english, natural language, corpus, news, text, newswire, Treebank, LDC, corpora, Penn Treebank, Penn, 2015, LDC2015T13, parsing, tagging, part of speech, WSJ, PTB},
terms= {},
license= {},
superseded= {}
}


Send Feedback Start
   0.000002
DB Connect
   0.000390
Lookup hash in DB
   0.008113
Get torrent details
   0.006165
Get torrent details, finished
   0.000535
Get authors
   0.000043
Parse bibtex
   0.000332
Write header
   0.000505
get stars
   0.004172
home tab
   0.024595
render right panel
   0.000016
render ads
   0.000038
fetch current hosters
   0.011843
Done