QUAC ("Quantitative analysis of chatter" or any related acronym you like) is a package for acquiring and analyzing social internet content. Features:
- Reliable data collection and conversion of raw data into into easy-to-parse,
de-duplicated, and well-ordered formats. We support:
- Tweets from the Twitter Streaming API.
- Wikipedia hourly aggregate pageview logs.
- Wikipedia edit history and related XML dumps.
- Estimate the origin location of tweets with no geotag. (But see issue #15.)
- Careful preservation of Unicode throughout the processing pipeline.
- Various cleanup steps to deal with tweet quirks, including very rare ones (we've seen certain weirdnesses in only one of our 1.3+ billion tweets). That is, we deal with the special cases so you don't have to.
- Parallel processing using various combinations of Make,
joblib
, and a simple map-reduce framework called QUACreduce which is included.
QUAC is copyright © 2012–2019 Triad National Security, LLC, and others. It is open source under the Apache license and was formerly known as Twepi ("Twitter for epidemic analysis").
Use our list of issues. To maximize the chances of your bug being understood and fixed, take a look at "Three parts to every good bug report" (scroll down).
That said, note that unlike many open source projects, we make a point of being friendly to bug reporters, even newbies. Therefore, please don't hesitate to report a bug, even if you're inexperienced with QUAC or feel unsure. In almost all cases you will tell us something useful, even if the issue turns out not to be a bug per se, and we will support your efforts in this regard.
Please send us a note at [email protected]
if you use QUAC, even for small
uses, and/or star the project on GitHub. This type of feedback is very
important for continued justification of the project to our sponsors.
Note that for many uses of QUAC (especially research) you are ethically obligated to cite it. For guidelines on how to do this, see the Citing section of the documentation.
We use QUAC for scientific research. To promote reproducibility, which is one
of the core values of science, we try to open-source the code that runs our
related experiments as well as QUAC itself. This code, and further information
about it, can be found in the directory experiments
.
- Documentation is online at <http://reidpr.github.io/quac>. (Note: this may describe a different version of QUAC than the one you have.)
- Current documentation is rooted at
doc/index.html
. (You'll probably need to build it first.) - Most scripts have pretty help which you can print using the
--help
option and/or look at in comments at the top of the script. Modules also usually have good docstrings.