New Federal Agency Releases: Department of Justice

Thanks to all who participated in June’s Sqoop user survey. As a result of your answers, we prioritized adding U.S. Department of Justice (DOJ) publications. This includes:

  • Press releases and speech transcripts from the top-level DOJ Office of Public Affairs

  • Releases from 93 U.S. Attorney’s Offices (USAOs) associated with each federal district court1

The Office of Public Affairs covers the highest profile DOJ activity. Releases from the Attorney’s Offices provide significantly more detail on many cases than what we can cost effectively obtain via PACER, including complaint and indictment attachments, or sentencing details. We now have attorney’s office releases associated with 21 district courts that do not otherwise provide us access to docket data.

Search Sidebar

While you may already follow some parts of the DOJ on Twitter, we now offer you better summaries, search of the full release text, and alerting for more complete coverage. The 140-character tweets often fail to give the name of the company or individual being indicted, and these characters are all that is available on Twitter for search purposes. The DOJ also offers direct email alerts via GovDelivery2, but in our testing, we are able to retrieve and alert on DOJ releases much more quickly than when emails are sent from this service. See Latency below.

We were able to retrieve and index back releases from January 1, 2014 onward, so our DOJ historic corpus is immediately comparable with our existing sources.

In the search sidebar (see left), we introduce a new Releases collection at the same level as (SEC) Filings, Patents and (Court) Dockets. For now this just includes DOJ releases, but in the future we will add releases from additional federal agencies. For example, the Federal Trade Commission (FTC) publishes press releases with both a similar format to DOJ releases and also topically on enforcement actions. Sub-filters may be added to allow selection of specific agencies, document types, or other divisions.

Roughly two thirds of current user saved search queries are broad enough to potentially start seeing releases, either because they do not filter on any collections, or because they include a specific COURT: scope which now may include DOJ releases.

You may already have alert(s) for DOJ releases which are now effectively “new” to your search results. We think you will find these DOJ releases relevant to most queries. Some queries might benefit from refinement given the new content. Since we index the full release text, queries for multiple common terms could be tightened up, for example by using new query language features such as phrases.

Not surprisingly, Donald Trump and Jeff Sessions are commonly referenced in these releases. Bitcoin, Facebook, Instagram, and Snapchat are all referenced for suspected roles in criminal activity. Twitter more frequently appears as a suggestion to follow a USAO or other Twitter account. Here are some interesting examples of DOJ activity, both prominent and obscure:

Release Summaries

For our users and for parity with our other sources, we put effort into providing useful at a glance summaries of the DOJ releases in search results and alerts. As an example:

Justice Department and EPA Enter Into Settlement with Harcros Chemicals to Improve its Accident Prevention…
July 31, 2017 (retrieved: 12:17pm EDT)
Department of Justice, Public Affairs, Environment and Natural Resources Division; ↓Harcros Consent Decree
The U.S. Department of Justice and the U.S. Environmental Protection Agency (EPA) today announced that Harcros Chemicals Inc. has entered into a proposed agreement to settle claims that Harcros violated provisions of the Clean Air Act…

Notes:

  • We currently link directly to the justice.gov hosted release page. Unlike some other government sources, these pages don’t present any glaring usability issues. However we do keep the full page details and we might add our own detail page in the future, primarily to function as a cached copy since we have seen cases where releases are subsequently removed from justice.gov.

  • The DOJ only offers a release date without a valid timestamp in its feeds. We show the DOJ-provided date first then the time we found, retrieved and indexed the release.

  • The next line summarizes the department, office, and when well-formatted by the DOJ: “Topic(s)”, “Components(s)”, and downloadable (↓) “Attachment(s)” like this consent decree, as well as complaints and indictments.3

  • Finally we add a few lines from the first paragraph of the release which frequently serves as an abstract.

We’ve found cases where even the DOJ-provided date is inaccurate (e.g. marked as the day prior to when it was actually available). In those cases as well as historic releases, we provide our own full retrieved date:

August 9, 2017 (retrieved: August 10, 2017 12:24pm EDT)

If the chronology is particularly important to your investigative work, you may also want to try and find the same release in a DOJ Twitter stream and compare the tweet timestamp provided by Twitter.

Latency

As mentioned, the DOJ is unusual in comparison to our other sources in that it doesn’t offer us a reliable timestamp with which to compare latency introduced by either its own systems, our crawlers or other systems. We are currently obtaining signal of new DOJ releases via many different and in some cases overlapping DOJ published RSS feeds and then retrieving and indexing the full release page as linked.

Observation suggests that in some cases DOJ is publishing to Twitter before the associated updates are available in RSS while in other cases, the feeds are ahead of Twitter by a significant margin. The DOJ’s RSS appears to be behind at least two layers of caches, each adding latency. Twitter may be published to on some other batch interval. We may also integrate with Twitter APIs in the future to provide faster access in cases where a release is made to Twitter first, as well as to collect the Twitter timestamps as a basis for comparison. If we do this, we will also offer you a link to the tweet once we have it.

Conclusion

As with all prior features and enhancements, we added DOJ releases as a result of your feedback, including June’s survey. Please let us know how relevant these new releases are to your searches and alerts, where we can further improve, or if we can further help you to find what you are looking for. Please also let us know when we help you find news sourced from the DOJ.


  1. The Guam and Northern Mariana Islands Attorney’s Office is associated with both the Guam District Court and Northern Mariana District Court. Thus 93 USAOs for 94 District Courts. 

  2. The govdelivery.com domain appears to be currently owned by Granicus, a D.C. consultancy. 

  3. In some cases download links are poorly formatted in a non-structured way, free-form in the body of the release. We currently miss these in our summaries. For example: Nine Members of Hooligans Motorcycle Gang Charged in Sophisticated High-Tech Auto Theft Scheme Targeting… We can fix this in cases like this example.