From 9d24dfb022211b5a2960e8aa56f2780e2a9785b9 Mon Sep 17 00:00:00 2001 From: Katie Carnie Date: Thu, 27 Feb 2014 14:59:30 +0000 Subject: [PATCH 01/26] alberto cairo, future of data visualisation, Katie notes --- thurs/next-gen-of-data-viz.md | 87 +++++++++++++++++++++++++++++++++++ 1 file changed, 87 insertions(+) diff --git a/thurs/next-gen-of-data-viz.md b/thurs/next-gen-of-data-viz.md index e2cfdcc..b6b5fba 100644 --- a/thurs/next-gen-of-data-viz.md +++ b/thurs/next-gen-of-data-viz.md @@ -1,3 +1,90 @@ # The next generation of data viz By Alberto Cairo + +**Examples given:** + +* + +* - how much time each character spends on screen, done entirely in d3 + +**When we talk about the future of visualisation, we focus too much on the technical aspects. Other tools are available:** + +* graphly +* plotly +* cartodb + +**Future of visualisation doesn't depend on the tools we use, but on solid principles. Predictions/wishes:** + +* designers and journalists should become better at using numbers and evidence - not just data journalists +* data journalists and infographic/visualisation designers can lead the way and help raise numeracy and visual skills of their colleagues and the public +* should point out when things are wrong +* need skills to spot bad visualisations +* misuse of data - absolute numbers - divide the total number of deaths by number of cars, deaths per 100,000 vehicles +* critiques are extremely important but don't be unkind +* interrogate the data - check margin of error, number/variable in isolation means nothing. Put it in context +* data journalists and visualisation designers are victims of 'patternicity' - finding meaningful patterns in meaningless noise, jumping to conclusions based on preconcieved ideas + +**"The first principle in journalism is that you must not fool yourself -- and you are the easiest person to fool"** - Richard Feynman + +**Ukraine map** - + +* Ukraine is a divided country. Cities in which unrest is reported all in the west. Split between east and west is an OVERSIMPLIFICATION. The picture is much more nuanced and data being hidden in current reporting of the Ukrainian situation. +* Over interpreting patterns - story is much more complicated than major news orgs maps would have you believe. + +**John Snow map - proved connection between cholera and drinking contaminated water** + +* didn't just pay attention to the data that confirmed his hypothesis +* Also looked at the data that could potentially disprove his idea eg. people drinking water on their way to work, or people with private wells in areas of high death rate + +**Prediction two: A return to the foundation of visualisation** + +* At the moment, there is a lot of noise + +*Five features:* + +* Truthful +* Functional - what do you want your reader to do with the data? What graphic form should you use? +* Beautiful - attracting and appealing to audience. Basics of graphic design mandatory for undergrads +* Insightful - should reveal things that are important/interesting. Should help readers understand the data, not just data dumps. Layer the graphic in a way that most important facts shown first, then the rest of the information follows +* Enlightening - reveal something the reader didn't know before + +**To many designers focus on the beautiful, but forget the other features** + +* Mike Monteiro - How designers destroyed the world +* William Playfair - the great innovator - imports and exports to and from North America +* Avoid assuming readers will not understand the graphic, don't fear innovation but be skeptical +* Washington Post graphic - are the winter olympics for the rich? Difficult to extract pattern and compare. Bouncing bubbles are a distraction +* AC thinks should be organised as a bar chart at the end - another tool for understanding the data better, accessing the complexity of the data + +**Prediction three: Designers and journalists will encode their data more than once** + +* Multi-dimensionality of the data +* eg. NYT - how American's spend their day graphic. Stage one gives you an overview, stage two comes when you click on any of the categories +* Multiple representation for multiple functions eg. map then table +* University of Wisconsin - cartography students: + +**Prediction four: Designers will learn that writing is as important to visualisation as the visuals and graphics themselves** + +* Language Communities of Twitter by Eric Fischer - not particularly insightful because no annotation. Not pointing out the exceptions +* Editorial voice is very important, higlighting what is important in the data +* In some cases, graphics that are most important are the simplest eg. NYT where does breast cancer kill - tells you something in stages. Clear and close relationship between graphic and text + +**Prediction five: Data driven motion graphics will be everywhere** + +* Robert B Reich: Inequality for All +* Hans Rosling: + +**Prediction six: News visualisation designers will assume thst becoming proficient in using technologies is not a goal in itself** + +* You must be a creator of devices that make the earth a better place before you can even think of becoming a fine artist + + + + + + + + + + From 13881ec03160f1e5df03a69c8d3f60f6977b66a1 Mon Sep 17 00:00:00 2001 From: Katie Carnie Date: Thu, 27 Feb 2014 17:04:35 +0000 Subject: [PATCH 02/26] maps and charts in R session notes Katie --- thurs/MapsAndChartsInR | 1 + thurs/next-gen-of-data-viz.md | 1 + thurs/statistics_in_the_newsroom.md | 101 ++++++++++++++++++++++++++++ 3 files changed, 103 insertions(+) create mode 160000 thurs/MapsAndChartsInR create mode 100644 thurs/statistics_in_the_newsroom.md diff --git a/thurs/MapsAndChartsInR b/thurs/MapsAndChartsInR new file mode 160000 index 0000000..7bd4645 --- /dev/null +++ b/thurs/MapsAndChartsInR @@ -0,0 +1 @@ +Subproject commit 7bd4645eab93673a4c8be6937e62e3b786bd81f1 diff --git a/thurs/next-gen-of-data-viz.md b/thurs/next-gen-of-data-viz.md index b6b5fba..817f5fd 100644 --- a/thurs/next-gen-of-data-viz.md +++ b/thurs/next-gen-of-data-viz.md @@ -79,6 +79,7 @@ By Alberto Cairo * You must be a creator of devices that make the earth a better place before you can even think of becoming a fine artist + diff --git a/thurs/statistics_in_the_newsroom.md b/thurs/statistics_in_the_newsroom.md new file mode 100644 index 0000000..0845b5e --- /dev/null +++ b/thurs/statistics_in_the_newsroom.md @@ -0,0 +1,101 @@ +### Statistics in the newsroom + +*Rob Barry, WSJ @rob_barry* + +*Steven Rich, Washington Post @dataeditor* + + +**Don't overthink things: Basics** + +* counting, summing, grouping and ranking + +**Seek the middle** + +* Mean +* Median +* Both tell a different story, both have limitations. Be aware that if you are averaging something you may weight things heavily + +***Improper use of statistical methods can obscure reality and distort our stories*** + +**Other tools you can use:** + +* Quartiles: medians on steroids +* Correlation - measuring the degree to which groups of numbers move together. Range from -1 to 1 +* Example of using correlation in print - Libor furor: key rate gets new scrutiny (WSJ) + +**Visualisation** + +* Histograms: frequency - bins, buckets, groups - shows distribution of your numbers +* R >hist(salaries) - shows distribution in one line of programming + +**Standard deviation and distribution** + +* Standard deviation - measures distance from the average +* Normal distribution - bell curve +* Log-normal distribution - numbers grouped to the left +* Cauchy distribution - very sharp peak +* If you know the distribution you can make assertions of the probability of the values +* Monte Carlo distribution to show insider trading before companies report dismal results - improbably lucky trades +* By randomly shuffling the executive's trades we can calculate a distribution of possible returns + +**P-values** + +* Statistical significance +* Probability +* Significance levels -.05 (most common), -.01 (threshold) + +**T-tests** + +* Used to test hypotheses about means when the population variance is unknown +* Developed by Gossett for the quality control of beer +* Single sample t - one group, test against hypothetical mean +* Independent samples t - 2 means, 2 groups, no relation between groups + +**ANOVA** + +* Analysis of variance - compare three or more groups +* Within vs between groups + +**Fisher's exact test** + +* Good for small data series, small sample sizes +* Gives exact P-value +* Great for categorical data eg male vs female, RH vs LH + +**Linear regression** + +* Based on y=mx+b +* x - independent variable +* y = dependent variable +* m - slope +* b - ??? + +**R-square** + +* How much the independent variable predicts the direct variable +* Anything above .8 is highly correlated + +**Logistic regression** + +* Variable isn't always continuous +* Race, gender - binary data + +**Pearson's chi-squared** + +* Observed vs expected +* Eg. tax lien auctions in DC (Washington Post) +* Wanted to look at how frequently irregular patterns occurred between bidders. Used this test + +**How to avoid mistakes** + +* Try to prove yourself wrong +* Run it by your targets - eg government office +* Ask someone who is smarter than you - call an expert +* Make sure you are doing the correct test +* - will tell you what test is important to run + +* When you are unsure what test to use, consult IRE tipsheets - David Donald, Jen Lafleur +* Try not to use higher level stats if you don't need to +* Ask yourself what the easiest way of solving the problem is +* Seek out the outliers but give context - this is where stats come in handy: how far from the expected value are they, and then attempt to explain why +* correlation does not equal causation \ No newline at end of file From 7207336d71a65fcad7889c81d263dfcde307a0c7 Mon Sep 17 00:00:00 2001 From: Katie Carnie Date: Thu, 27 Feb 2014 19:14:06 +0000 Subject: [PATCH 03/26] Free tools for data analysis --- thurs/maps_and_charts_in_r.md | 20 ++++++++++++++++++++ 1 file changed, 20 insertions(+) create mode 100644 thurs/maps_and_charts_in_r.md diff --git a/thurs/maps_and_charts_in_r.md b/thurs/maps_and_charts_in_r.md new file mode 100644 index 0000000..16cdb17 --- /dev/null +++ b/thurs/maps_and_charts_in_r.md @@ -0,0 +1,20 @@ +### Maps and charts in R + +**Matt Waite: Github - mattwaite** + +**All notes: ** + + +* R: impenetrable, hates you, a lifetime commitment +* Fantastic tool but hard to learn +* Allows you to make statistical models, data visualisations, very powerful +* Integrated environment - no need to use a lot of different programmes. You do not need to leave the church of R! +* Easy to repeat things you have done +* Easy to iterate + +* R doesn't care about spaces +* stat="bin" - default +* stat="identity" - identifying + +* ggplot(data=enrollment, aes(x=Year, y=Students, fill=Year)) + geom_line() + ylim(0, *max(enrollment$Students))* +* In this - enrollment is the data source, then dollar sign, then Students = column name \ No newline at end of file From 04339717a3a71a95189e7b8be8a3b1c825595f08 Mon Sep 17 00:00:00 2001 From: Katie Carnie Date: Thu, 27 Feb 2014 19:35:25 +0000 Subject: [PATCH 04/26] Deleted folder that was not needed --- thurs/MapsAndChartsInR | 1 - thurs/analysis_done_dirt_cheap.md | 46 +++++++++++++++++++++++++++++++ 2 files changed, 46 insertions(+), 1 deletion(-) delete mode 160000 thurs/MapsAndChartsInR create mode 100644 thurs/analysis_done_dirt_cheap.md diff --git a/thurs/MapsAndChartsInR b/thurs/MapsAndChartsInR deleted file mode 160000 index 7bd4645..0000000 --- a/thurs/MapsAndChartsInR +++ /dev/null @@ -1 +0,0 @@ -Subproject commit 7bd4645eab93673a4c8be6937e62e3b786bd81f1 diff --git a/thurs/analysis_done_dirt_cheap.md b/thurs/analysis_done_dirt_cheap.md new file mode 100644 index 0000000..8512d1a --- /dev/null +++ b/thurs/analysis_done_dirt_cheap.md @@ -0,0 +1,46 @@ +### Analysis, done dirt cheap + +**Free CAR tools** + +**Turning a mess into data** + +* Tabula - pdf to excel +* Chrome scraper - good for scraping tables +* Overview - integrates with Document Cloud +* Mr. People + +**When good data goes bad** + +* CSVkit - command line tool for trimming csv files +* OpenRefine - cleaning data, standardising + +**Custom stats** + +* R/Rstudio +* BayesDB + +**Summaries and viz** + +* LibreOffice Calc - used to be terrible, now a lot better. A bit like excel +* GGobi +* Sci2 +* MicroStrategy - analytics desktop. Free, not open source. Exploratory visualisations, a bit like Tableau + +**Making it Internet** + +* SQLite Manager - simplest db manager +* MySQL + +**Geospatial analysis** + +* QGIS - new 2.0 version released called Dufour +* PostGIS +* SpatialLite +* TileMill +* OpenGeo + + + + + + From f6b933e4007e80d4818ff9a031e0e0093f2cbde1 Mon Sep 17 00:00:00 2001 From: Katie Carnie Date: Thu, 27 Feb 2014 19:48:17 +0000 Subject: [PATCH 05/26] data tools for analysis --- thurs/analysis_done_dirt_cheap.md | 13 +++++++++++++ 1 file changed, 13 insertions(+) diff --git a/thurs/analysis_done_dirt_cheap.md b/thurs/analysis_done_dirt_cheap.md index 8512d1a..9fdfdee 100644 --- a/thurs/analysis_done_dirt_cheap.md +++ b/thurs/analysis_done_dirt_cheap.md @@ -13,6 +13,7 @@ * CSVkit - command line tool for trimming csv files * OpenRefine - cleaning data, standardising +* Mr Data Converter **Custom stats** @@ -30,6 +31,9 @@ * SQLite Manager - simplest db manager * MySQL +* Data Tables +* FreeDive - Knight digital media center, uses google docs to get data online +* Transcribable - showing pdf and readers can type in comments, search themselves. Used by propublica **Geospatial analysis** @@ -39,6 +43,15 @@ * TileMill * OpenGeo +**Kitchen sink installers** + +* Bitnami +* OSGeo-Live +* Vagrant VN with iPy + +OSalt.com - find open source alternatives + + From 3539182961e68d1c123e06fcbc84f45b1701928b Mon Sep 17 00:00:00 2001 From: Katie Carnie Date: Thu, 27 Feb 2014 21:02:21 +0000 Subject: [PATCH 06/26] Making reporting better with Github --- thurs/making_reporting_better_with_github.md | 63 ++++++++++++++++++++ 1 file changed, 63 insertions(+) create mode 100644 thurs/making_reporting_better_with_github.md diff --git a/thurs/making_reporting_better_with_github.md b/thurs/making_reporting_better_with_github.md new file mode 100644 index 0000000..a454019 --- /dev/null +++ b/thurs/making_reporting_better_with_github.md @@ -0,0 +1,63 @@ +### Making reporting better with Github + +@BenBalter - government@github.com + +Git and github fundamentals tomorrow at 11am (?) + +Advanced github on Saturday at 1pm (?) + +#### Detailing the process + +**Old ways of working - uncollaborative** + +* Content shared as late as possible +* You had to be there in order to collaborate - smoke-filled back rooms +* Doesn't capture process, only captures the final outcome + +**Open source now a decentralised model** + +* Software can live on without me +* More collaboration, less friction - easier for other people to contribute to whatever you are working on +* Open Source (eg. Britannica vs Wikipedia) + +**How to work in an open-source way** + +* Electronic - discussion, planning and operations, high fidelity form of electronic communication eg. email, github or chat with transcripts +* Work should be visible, expose process, have a URL, describe how the decision came to pass +* Make everything asyncronous +* Lock free - avoid creating systemic blockers. Working toward shared goals shouldn't require approval +* Non-adversarial + +**Build an ecosystem around a project** + +* Task list or bug tracker +* Communicate the big picture - set a vision of reality eg Obama campaign 2008 +* What is your vision/goal +* Encourage contribution but detail how to contribute + + +#### Git + +* Version control system, decentralised +* Tracks who made change when +* Command line + +**Repositories** + +* Essentially a project folder, most basic element of GitHub +* Can have multiple collaborators +* Clone (noun) - initial pull request: taking code that exists on a shared server, pull it down to your own machine +* Commits - individual change to a file. Uniquely identifiable by the hash. Tells you who made the change (diff) when it was made etc +* Branches - parallel versions of the same repo eg Back to the Future, going back to a parallel 1985. Don't upset original version. Can then be merged back in + +**Github** - social network for git + +* Users: personal to individual +* Organisations: group many users together, are administered by users, can own repositories +* Issues: suggested improvements, tasks. Created by users, closed by admins +* Forks: personal copy of another user's repository which then lives as a repository in your own account. Can make changes without affecting the original, can be public or private. Used to submit a *pull request* +* Feedback repos + + + + From 249e78ef371c86ec58b11db6291b1c8ec24c62f3 Mon Sep 17 00:00:00 2001 From: Katie Carnie Date: Fri, 28 Feb 2014 12:22:24 +0000 Subject: [PATCH 07/26] How the internet watches you --- thurs/how_the_internet_watches_you.md | 35 +++++++++++++++++++++++++++ 1 file changed, 35 insertions(+) create mode 100644 thurs/how_the_internet_watches_you.md diff --git a/thurs/how_the_internet_watches_you.md b/thurs/how_the_internet_watches_you.md new file mode 100644 index 0000000..e2277a7 --- /dev/null +++ b/thurs/how_the_internet_watches_you.md @@ -0,0 +1,35 @@ +### How the internet watches you + +**Email** + +* Send an email, encoded as packets with metadata +* Goes through local router +* Sends it into the ISP's "last mile" network +* ISP sends it to a part of the internet backbone, built of optical fibre - NSA +* After bouncing around sent to another ISP's "last mile" +* Handed off to mail server + +**Who can see what after your email leaves your computer** + +*The ISP* + +* Local router: wired or wifi router at home or company, coffee shop, hotel etc - normally provided by ISP +* Local router can always see address of site you are visiting (eg gmail), header info on the email and sometimes the content of the email itself +* Header information: metadata - think of your email message as a postcard that anyone can read +* Contains address of recipient, sender, time it was sent and IP address (which can be geocoded) + +*Every backbone provider along the way* + +* This includes anyone they decide (or are forced) to share that information with + +*Email server* + +**Resources** + +* +* +* + +**Longer version of this presentation:** + +* \ No newline at end of file From 7615e8d1b6986c77fbc7aca150d7c6b826507fb1 Mon Sep 17 00:00:00 2001 From: Katie Carnie Date: Fri, 28 Feb 2014 14:29:14 +0000 Subject: [PATCH 08/26] holding algorithms accountable --- fri/holding_algorithms_accountable.md | 17 +++++++++++++++++ 1 file changed, 17 insertions(+) create mode 100644 fri/holding_algorithms_accountable.md diff --git a/fri/holding_algorithms_accountable.md b/fri/holding_algorithms_accountable.md new file mode 100644 index 0000000..1101e62 --- /dev/null +++ b/fri/holding_algorithms_accountable.md @@ -0,0 +1,17 @@ +### Holding algorithms accountable + +*Print outs of recent Tow Center report upstairs* + +**Question of responsibility** + +* Correlation does not equal causation +* Correlation does not equal intent (with regard to algoritms - maybe the designer did not intend for this to happen) +* Understanding the design process of building algorithms can help understand why they function in the way they do + + +**A couple of different ways that journalists can approach algorithms:** + +* Why are they important? (explainers etc) +* Pick apart algorithms and expose flaws - how might an algoritm be discriminatory +* Does it break a law? Does it make us feel uneasy? If so should be questioned + From 218f14252ac7fa95a959b20f3e0b6462bb656228 Mon Sep 17 00:00:00 2001 From: Katie Carnie Date: Fri, 28 Feb 2014 15:58:24 +0000 Subject: [PATCH 09/26] Introduction to R --- fri/introduction_to_r.md | 37 +++++++++++++++++++++++++++++++++++++ 1 file changed, 37 insertions(+) create mode 100644 fri/introduction_to_r.md diff --git a/fri/introduction_to_r.md b/fri/introduction_to_r.md new file mode 100644 index 0000000..e0babfb --- /dev/null +++ b/fri/introduction_to_r.md @@ -0,0 +1,37 @@ +### Introduction to R + +##### Sharon Machlis, Computerworld + + + + +R project for statistical computing - great for sophisticated data analysis, but also useful for basic and intermediate work eg. grouping, plotting, exploratory data visualisations. Works primarily on the command line + +**When to use R over Excel:** + +* Amazing community and lots of add-on packages +* Writing scripts - reproducable research - check, share, reproduce again and again. Create script once and use again and again + +**Tour of R** + +* Console - where you type in scripts +* Top right - history of all commands you have run +* Environment tab - as you store variables, they will appear here +* Bottom right - where visualisations will appear, where you can view add on packages, help files etc +* When you run equations, it will show the number of your results in brackets +* First command: library(lattice) +* install.packages +* data() - loads sample datasets +* plot(melanoma) - loads a chart of melanoma data in the bottom right window +* Up arrow cycles through the history of commands +* Create our own function: + * regline <- lm(melanoma$incidence ~ melanoma$year) + * lm: linear model + * abline: trend line + * <- is equals + +* Store variables eg. x <- 5 +* C function - concatenate +* Starts at 1 not 0, unlike other computer languages + +When searching for column figures, always remember to add comma after, eg. melanoma[5,] Shows everything in column 5 From babbd328fa2c2f7cf26a96f80e435695bca26252 Mon Sep 17 00:00:00 2001 From: Katie Carnie Date: Fri, 28 Feb 2014 11:24:39 -0500 Subject: [PATCH 10/26] Edited footer --- thurs/next-gen-of-data-viz.md | 1 + 1 file changed, 1 insertion(+) diff --git a/thurs/next-gen-of-data-viz.md b/thurs/next-gen-of-data-viz.md index 817f5fd..7c65357 100644 --- a/thurs/next-gen-of-data-viz.md +++ b/thurs/next-gen-of-data-viz.md @@ -89,3 +89,4 @@ By Alberto Cairo + From 9412575af5c6b73392a0ba48942ba416813ea4c7 Mon Sep 17 00:00:00 2001 From: Katie Carnie Date: Fri, 28 Feb 2014 19:39:41 +0000 Subject: [PATCH 11/26] Maps with Tilemill and Mapbox --- fri/maps_with_leaflet_and_mapbox.md | 17 +++++++++++++++++ 1 file changed, 17 insertions(+) create mode 100644 fri/maps_with_leaflet_and_mapbox.md diff --git a/fri/maps_with_leaflet_and_mapbox.md b/fri/maps_with_leaflet_and_mapbox.md new file mode 100644 index 0000000..76a7a12 --- /dev/null +++ b/fri/maps_with_leaflet_and_mapbox.md @@ -0,0 +1,17 @@ +### Maps with leaflet and mapbox + +1. Get data from Texas open data site. Add FIPS number + +2. Open shapefile in qgis + +3. For all of your columns tell qgis your relevant columns are strings then save as a csvt - save in same folder: + + eg. "String","String","String","String","String","String","String","Real","String","String","String" + +4. Joins: Join shapefile and data on FIPS number. Now attribute table should show both + +5. Save as geojson + +**All session notes and example files can be found here:** + + From 57666ac1437a77238a89f502295a6fcb94cec67f Mon Sep 17 00:00:00 2001 From: Katie Carnie Date: Sat, 1 Mar 2014 16:00:46 +0000 Subject: [PATCH 12/26] Notes from Leaflet/Mapbox demo. Better notes on Github --- fri/maps_with_leaflet_and_mapbox.md | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/fri/maps_with_leaflet_and_mapbox.md b/fri/maps_with_leaflet_and_mapbox.md index 76a7a12..c046dde 100644 --- a/fri/maps_with_leaflet_and_mapbox.md +++ b/fri/maps_with_leaflet_and_mapbox.md @@ -15,3 +15,12 @@ **All session notes and example files can be found here:** +* Possible to style geojson files with leaflet, but it works really well with Mapbox +* example1.html +* On the command line, navigate to your folder +* One line of Python to show in browser +* example2.html - added styles and div for legend, set zoom level etc +* Add grid layers to show interactive elements +* example.html - added marker + + From a5dead3eb03f8fab1022a05c0f572b4c1d5034b9 Mon Sep 17 00:00:00 2001 From: Katie Carnie Date: Sat, 1 Mar 2014 16:01:07 +0000 Subject: [PATCH 13/26] Katie's notes from the NICAR lightning talks --- fri/lightning_talks.md | 122 +++++++++++++++++++++++++++++++++++++++++ 1 file changed, 122 insertions(+) create mode 100644 fri/lightning_talks.md diff --git a/fri/lightning_talks.md b/fri/lightning_talks.md new file mode 100644 index 0000000..c74b6b9 --- /dev/null +++ b/fri/lightning_talks.md @@ -0,0 +1,122 @@ +### Lightning talks + +####Refactoring - why your code sucks and how to fix it +**@onyxfish** + +*"Code read and modified much more quickly than written"* + +What is refactoring? - improving code quality without adding features + + **Follow the code smells:** + + 1. Duplicated code - if it's there twice it's wrong + 2. Loooong functions + 3. Inconsistent style - don't copy and paste! + +Develop good habits by refactoring, otherwise your code is like word clouds - bad! + +--- + +#### A few of my favourite wee things +**@lenagroeger** + +* Small multiples - sequences of small graphics +* Tiny text - highlight difference in size, outliers +* Tiny art +* Mini maps - can give more context for a story - little map next to a large map +* Inline pictures in text +* Mini graphics - sparklines, can now be put in tweets. Icons eg. navigation icons - noun project. Tiny states (State Face) + +--- + +#### Natural language processing in the kitchen +**@anthonyjpesce** + +* Text dump of LA Times full of recipes +* python >>> import nltk +* Feed trainer words, parts of speech, trigrams +* Turn horrible text file into structured data +* Good if you know in advance what you are looking for +* Blog post: Slides: + +--- + +#### Five algorithms in five minutes +**@chasedavis** + +* All code: +* Loops: slow code - vectorisation +* Naive Bayes - solves classification problems +* Iterative algorithms +* Vantage point trees - for fuzzy search. Solves problem of adults who can't spell +* Latent Dirichlet Allocation - topic modelling algo. + +--- + +#### What can we learn from terrible data viz +**@katiepark** + +* Not everything needs a chart! +* Watch your scale +* Know your data types +* Double check your information, then do it again! +* Don't overdo it! Simple is often best + +--- + +#### Calculus for journalism +**@dataeditor** + +* Journalists should do maths! Creative discipline +* Calculus = change +* Compound interest - can compound at different rates +* Chemical leak - spouting cylinder +* Riemann sums - calculate the area under a curve + +--- + +**@sisiwei** + +--- + +#### The whole internet in 5 minutes +**@jeremybowers** + +[Slides](https://docs.google.com/presentation/d/1h41aj_hg-8Y0cotOjSIEOBBoPUxIuU_Ol45jbFhVqlY/edit#slide=id.p) + +[Text](https://gist.github.com/jeremyjbowers/9279751) + +--- + +#### How to raise an army + + +* How do we make Knight Lab a community of webmakers? +* Community: Brown bag lunches +* learn.knightlab.com - self guided curriculum for digital literacy +* Open lab hours - people come and work on things +* Lessons learned - kill imposter syndrome newsnerdfirsts.tumblr.com +* Communication is the key + +--- + +#### You must learn! + +*Five lessons from the history of data viz* + +* Nothing we are doing is new +* Seriously, nothing we are doing is new +* It is a wild world out there +* You've already been scooped by a computer + +[Ben Welsh's slides](https://docs.google.com/presentation/d/1f9RJO8-6pxJn1LWvVmWaLPuVzklxiTSJL5PScl3NcPI/edit#slide=id.g2b056d122_07) + + + + + + + + + + From ec06c2049df1950d807a7414a3252c26b134f6c9 Mon Sep 17 00:00:00 2001 From: Katie Carnie Date: Sat, 1 Mar 2014 16:01:36 +0000 Subject: [PATCH 14/26] Notes from Tor session --- sat/tor.md | 47 +++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 47 insertions(+) create mode 100644 sat/tor.md diff --git a/sat/tor.md b/sat/tor.md new file mode 100644 index 0000000..a3a9a0e --- /dev/null +++ b/sat/tor.md @@ -0,0 +1,47 @@ +### Tor + +Kelley Misata - kelley@torproject.org + +* 9 full time employees +* 30 contractors +* 5000 volunteers +* 300,000 daily users + +Started as US Navy research lab project - 2002 onion router technology + +**Tips:** + +* Always make sure https:// + +* Consider using PHP or GBG encryption with email + +* When using chat, use secure chat or [Adium](https://adium.im/) (OTR) + +* Enable full disk encryption + +* Keep sensitive information separate + +* Power down your computer often to close connections + +* Use a separate laptop and Tails - partner project that allows user to run Tor on a USB + +**How does Tor work?** + +Network of 5000 volunteers: + +* Alice sends an email to Bob +* Email sent to Alice's ISP +* Goes into the network at the entry node - traffic encrypted +* Passed to middle node - traffic encrypted +* Exit node - traffic encrypted +* Bob's ISP +* Bob + +*Only encrypting traffic. Should encrypt message before sending into network* + +Tor browser bundle - based on Firefox + +Tor doesn't know who the volunteers are. Could be bad actors - no way to tell. Anyone can become a relay - page detailing the risks, what it means. Exit nodes funnel trafic out and are first point of contact for law enforcement. + + + From 818c1778f19d3b4de3495ab8f6f2acac0bbcad84 Mon Sep 17 00:00:00 2001 From: Katie Carnie Date: Sat, 1 Mar 2014 16:56:50 +0000 Subject: [PATCH 15/26] Notes from design talk --- sat/it's_not_just_for_looks.md | 71 ++++++++++++++++++++++++++++++++++ 1 file changed, 71 insertions(+) create mode 100644 sat/it's_not_just_for_looks.md diff --git a/sat/it's_not_just_for_looks.md b/sat/it's_not_just_for_looks.md new file mode 100644 index 0000000..ec6015c --- /dev/null +++ b/sat/it's_not_just_for_looks.md @@ -0,0 +1,71 @@ + +### It's not just for looks: +Presentation as a storytelling tool + +*Moderator: Chrys Wu: @MacDiva* + +####Helene Sears: @MateerS + +BBC online workflow: + +1. Define: Getting the who, what, when, where of the projct down on paper. Figure out who has a right to an opinion, weekly checkin schedule etc. Write a headline of what you want to create +2. Brainstorm: All hands on deck, as many ideas as you can come up with, quantity not quality. Designers to understand editorial process. Journos to try not to come with a sketch of what you expect and be disappointed if it doesn't come off. No one party should dominate. Keep pushing on ideas, first idea you come up with probably not the best one. +3. Wireframe and prototype: Test frequently and early, sanity check your ideas to make sure they make sense to people not involved in the project. Set up questions you will ask in user testing. Distil information to find out which parts are relevant. Use proto.io and paper prototypes +4. Refine and protect: Mobile first starting with the smallest screen. Conisistent, optimised user experience. Accessibility should be key +5. Deliver: More and more designer/developers at the BBC. Testing on a whole stack of devices, every browser + +[BBC newsgraphics website](http://bbc.co.uk/newsgraphics) + +Twitter: @bbcnewsgraphics + + +--- + +####Aron Pilhofer: @Pilhofer + +NYT: Work completely outside of the content management system Everything we do is collaborative + +Would describe the team as a "product development team" + +Newsrooms very good at doing single pane graphics, but not so good at devising big new ways + +What's the story? What's the headline? + +**Oscar coverage:** + +Old model: + +* Feature creep - a huge problem. User comes to site and can't figure out what to do in the first 7 seconds + +New model: + +* Get rid of all tabs - tabs are LAZY. Design becomes much cleaner +* Twitter coverage not just on a hashtag +* Presenters referenced live blog and live blog referenced them +* Deprioritised the ballot because it was only important to a small number of users (analytics) +* Event tracker to check what readers are doing +* Do readers use the tools in the way that we expect? Answer: yes + + +--- + +#### Alison at NPR @Alykat + +**Planet Money makes a T-shirt** + +* Eight people on the team: Two designer/developers, 3 interactive producers, project manager, journos +* Brainstorming - started with the process story, storyboarded everything +* Visual style guife for all photographers to help get a consistent look. Cheat sheet made and specified down to the lenses that would be used, technical specification +* Inspiration - Serengeti Lion project from National Geographic +* Wireframed structure - visually driven with a text and graphic component +* Broke down into chapters. How do we break this up within chapters? +* Amazing visuals of the process so lead with video. +* Conversational voice guides you through +* Get more indepth - scripts for video and text below written mindfully of each other +* Design decisions around the interface were deliberate +* If it doesn't work on mobile it doesn't work +* No autoplaying video on mobile +* Reach out on instagram to ask people to share pictures of themselves wearing the tshirt +* User testing is so important: watched as people went through the site - significant changes occurred after this + + From ec8ed3467e584059360b034dee1f93791f90d764 Mon Sep 17 00:00:00 2001 From: Katie Carnie Date: Sat, 1 Mar 2014 16:58:23 +0000 Subject: [PATCH 16/26] Changed spelling of Alyson's name --- sat/it's_not_just_for_looks.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/sat/it's_not_just_for_looks.md b/sat/it's_not_just_for_looks.md index ec6015c..970a426 100644 --- a/sat/it's_not_just_for_looks.md +++ b/sat/it's_not_just_for_looks.md @@ -46,10 +46,11 @@ New model: * Event tracker to check what readers are doing * Do readers use the tools in the way that we expect? Answer: yes +Need to improve on accessability testing --- -#### Alison at NPR @Alykat +#### Alyson at NPR @Alykat **Planet Money makes a T-shirt** From b2086be8ed470067a7460a738ed8d6b7387aed63 Mon Sep 17 00:00:00 2001 From: Katie Carnie Date: Sat, 1 Mar 2014 16:59:52 +0000 Subject: [PATCH 17/26] Added link to slides --- sat/it's_not_just_for_looks.md | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/sat/it's_not_just_for_looks.md b/sat/it's_not_just_for_looks.md index 970a426..d8da17e 100644 --- a/sat/it's_not_just_for_looks.md +++ b/sat/it's_not_just_for_looks.md @@ -4,6 +4,8 @@ Presentation as a storytelling tool *Moderator: Chrys Wu: @MacDiva* +[Slides](http://j.mp/nicar14) + ####Helene Sears: @MateerS BBC online workflow: @@ -70,3 +72,6 @@ Need to improve on accessability testing * User testing is so important: watched as people went through the site - significant changes occurred after this + + + From 3329c58e49035ab3f08b8f303a9b2ddc47b9373e Mon Sep 17 00:00:00 2001 From: Katie Carnie Date: Sat, 1 Mar 2014 20:36:30 +0000 Subject: [PATCH 18/26] Session notes from journo/dev session --- sat/crossing_the_language_boundaries.md | 29 +++++++++++++++++++++++++ 1 file changed, 29 insertions(+) create mode 100644 sat/crossing_the_language_boundaries.md diff --git a/sat/crossing_the_language_boundaries.md b/sat/crossing_the_language_boundaries.md new file mode 100644 index 0000000..d3f1b4d --- /dev/null +++ b/sat/crossing_the_language_boundaries.md @@ -0,0 +1,29 @@ +###Crossing the language boundaries + +* Email sucks - better to use a ticket system such as Jira + +* Have developers in your editorial meeting + +* Make sure you leave enough time for testing news apps + +* Is it something you might want to reuse or repurpose? + +* What is the absolute minimum set of things it is worth launching - minimum viable product? Can you launch more features as time goes on? + +* CMS Inflexible, but inflexibility can be an opportunity as CMS rely on clear workflow - Gateway drug to programming? Key to understanding logic + +* Participating in hackathons - solutions to problems: 2:1 devs to reporters + +* Source: learning sessions - Matt Waite series on journalism ethics + +* Just because it's a big story doesn't mean it should be a big application. Just because the journalist thinks the data is important, it doesn't mean it should be a bit data application + +* Developers should avoid pushing for unrealistic deadlines from journalists because the reporting process takes a long time + +* Always remember to fact check the app!! + +* Very difficult to be a very good developer and a very good reporter simultaneously - very different types of disciplines, but good to have enough of an understanding to be able to *communicate* easily between each other + +* Learning lunches - talking about things, putting in context, talk about what is and is not easy to do, easy to scrape vs hard to scrape etc. Create a space for everyone to ask questions that there is no other good time to ask - [link](github.com/veltman/learninglunches) + +* \ No newline at end of file From a1d35e8e7bf14c520d8d4fc3d8d9cb2cb1aedbe1 Mon Sep 17 00:00:00 2001 From: Katie Carnie Date: Sat, 1 Mar 2014 20:52:59 +0000 Subject: [PATCH 19/26] More notes --- sat/crossing_the_language_boundaries.md | 17 +++++++++++++++-- 1 file changed, 15 insertions(+), 2 deletions(-) diff --git a/sat/crossing_the_language_boundaries.md b/sat/crossing_the_language_boundaries.md index d3f1b4d..502ec77 100644 --- a/sat/crossing_the_language_boundaries.md +++ b/sat/crossing_the_language_boundaries.md @@ -16,7 +16,7 @@ * Source: learning sessions - Matt Waite series on journalism ethics -* Just because it's a big story doesn't mean it should be a big application. Just because the journalist thinks the data is important, it doesn't mean it should be a bit data application +* Just because it's a big story doesn't mean it should be a big application. Just because the journalist thinks the data is important, it doesn't mean it should be a big data application * Developers should avoid pushing for unrealistic deadlines from journalists because the reporting process takes a long time @@ -26,4 +26,17 @@ * Learning lunches - talking about things, putting in context, talk about what is and is not easy to do, easy to scrape vs hard to scrape etc. Create a space for everyone to ask questions that there is no other good time to ask - [link](github.com/veltman/learninglunches) -* \ No newline at end of file +* Static vs dynamic: + * Static - Actual file sitting somewhere, pre-wrapped food for you to take + * Dynamic - food in a restaurant, assembled on the fly from a db becayse you asked for it, file does not actually exist + + +* Static site generator, benefits of a static site, but a database behind the scenes, so our users feel like they are using a vending machine. Speedier than a static site + +* Workflow - a lot of backward, sideways movement on the reporting side + +* Who edits data applications? Fuzzy line between reporters and developers. Reporters would love to have great interactives whilst they are writing so they could learn from them + +* What can be done, should be done and what you want to do. Always remember to go back to the mission - get information to the public + +* Check your metrics - why do your editors want these apps? Will people actually use them? \ No newline at end of file From 57eb699c9630dcebb7916eaf430a2469cf023b3f Mon Sep 17 00:00:00 2001 From: Katie Carnie Date: Sat, 1 Mar 2014 21:43:29 +0000 Subject: [PATCH 20/26] Link to all slides from NICAR2014 --- sun/link_to_all_slides_etc.md | 2 ++ 1 file changed, 2 insertions(+) create mode 100644 sun/link_to_all_slides_etc.md diff --git a/sun/link_to_all_slides_etc.md b/sun/link_to_all_slides_etc.md new file mode 100644 index 0000000..7f8befc --- /dev/null +++ b/sun/link_to_all_slides_etc.md @@ -0,0 +1,2 @@ +### Link to all slides etc + \ No newline at end of file From 3061c9a843d281f08f63c00d7526534d85dd514f Mon Sep 17 00:00:00 2001 From: Katie Carnie Date: Sat, 1 Mar 2014 21:43:40 +0000 Subject: [PATCH 21/26] Slides from talk --- sat/threat_modelling_for_journalists.md | 3 +++ 1 file changed, 3 insertions(+) create mode 100644 sat/threat_modelling_for_journalists.md diff --git a/sat/threat_modelling_for_journalists.md b/sat/threat_modelling_for_journalists.md new file mode 100644 index 0000000..5ec8bb4 --- /dev/null +++ b/sat/threat_modelling_for_journalists.md @@ -0,0 +1,3 @@ +### Threat modelling for journalists + +[Link to slides](http://www.scribd.com/doc/209968137/Threat-Modeling-Planning-Digital-Security-for-your-Story) \ No newline at end of file From 9435875a329bc290c436fe8f04fe034cf3a4e73c Mon Sep 17 00:00:00 2001 From: Katie Carnie Date: Sat, 1 Mar 2014 21:43:48 +0000 Subject: [PATCH 22/26] Notes from session --- sat/scraping_data.md | 29 +++++++++++++++++++++++++++++ 1 file changed, 29 insertions(+) create mode 100644 sat/scraping_data.md diff --git a/sat/scraping_data.md b/sat/scraping_data.md new file mode 100644 index 0000000..b7db307 --- /dev/null +++ b/sat/scraping_data.md @@ -0,0 +1,29 @@ +### Scraping data + +**When to scrape:** + +*When no one has the data you need* + +* Eg to answer the question - how many adoptions have taken place internationally where the children are subsequently rehomed. Scraping a messageboard +* Down them all - download manager, plugin for FF + +*When they won't give you the data you need and you don't have time for FOIA* + +* The Second Mile - child abuse case. Database of Penn. state police - sex offenders register +* Firebug - get under the hood of the website. Chrome in network tab, run search on the website. 'POST' request to the server, parameters that it is feeding to the servers +* Used free tool called 'ie unit' - quick focus interface as you navigate through the page it goes into the page and figures out what js is running as you click through. Write script then run it and it will download the data +* Helium scraper - basic version of software for $99. 10 day trial, point and click tool + +*When there is no one to ask or you don't want them to know you are doing it* + +* Counterfeit pharma coming out of China + +*When you want regular updates* + +* Crime numbers - good for police reporters, provide context in your reporting + +*Cautions:* + +* How do you know you got them all? +* Look for the hidden treasure - 'download dataset' +* Involve the lawyers, be mindful of legal implications From 9ca63eca3cb465c86537307874362a67d730b509 Mon Sep 17 00:00:00 2001 From: Katie Carnie Date: Sun, 2 Mar 2014 16:03:07 +0000 Subject: [PATCH 23/26] Notes from session --- sun/feel_like_hacking_without_really_doing_it.md | 4 ++++ 1 file changed, 4 insertions(+) create mode 100644 sun/feel_like_hacking_without_really_doing_it.md diff --git a/sun/feel_like_hacking_without_really_doing_it.md b/sun/feel_like_hacking_without_really_doing_it.md new file mode 100644 index 0000000..30f0383 --- /dev/null +++ b/sun/feel_like_hacking_without_really_doing_it.md @@ -0,0 +1,4 @@ +### Feel like hacking without really doing it + +[Samantha Sunne's website](http://www.samanthasunne.com/nicar14/) + From 10a21a4e859a08460942e0cc0cfe8ee7bbbb2394 Mon Sep 17 00:00:00 2001 From: Katie Carnie Date: Sun, 2 Mar 2014 16:03:14 +0000 Subject: [PATCH 24/26] Notes from session --- sun/where_do_you_want_your_career_to_go?.md | 40 +++++++++++++++++++++ 1 file changed, 40 insertions(+) create mode 100644 sun/where_do_you_want_your_career_to_go?.md diff --git a/sun/where_do_you_want_your_career_to_go?.md b/sun/where_do_you_want_your_career_to_go?.md new file mode 100644 index 0000000..83d4fb8 --- /dev/null +++ b/sun/where_do_you_want_your_career_to_go?.md @@ -0,0 +1,40 @@ +### Where do you want your career to go? + +Anthony DeBarros (Gannett Digital) +Megan Luther (IRE/NICAR) +Lea Thomspon (Independent Journalist) +Sisi Wei (ProPublica) +Ellen Gabler (Milwaukee Journal Sentinel) + +**Two tips from each panel member:** + +* New jobs found by meeting people, not by sending in CV +* Keep on learning new skills +* Computer Assisted Reporting essential but don't forget to polish your writing skills. No one will read your investigation if it's not an enjoyable read. Go back to basics +* Be prepared to learn additional skills on evenings and weekends +* Understand the organisation you are going to work for - who are the people in the organisation that can get things done? Find a good champion for your work and listen to them +* Be willing to do the jobs that other people may not want to do - managers appreciate people who go the extra mile +* Get to know everyone on the team and try your best to make a good impression (esp when interning) +* If you are interested in learning code, you should be prepared to put in extra time outside of work +* Learn video - very important in getting a job: how do you get excel into pictures? You have to put it into pictures + +**How to get unstuck when you feel stuck in a job:** + +* Remain positive at all times. Don't complain. Be a builder and don't badmouth your colleagues +* Journalism isn't a 9-5, you must devote some of your own time. Your energy level and how much you want to work at this has everything to do with your success +* When learning code, don't feel disheartened! You can do it! +* Be active, not passive - have a long term vision of where you want to be and build step by step +* Use IRE tipsheets +* Be an ideas person, have imagination +* If you feel overwhelmed it's okay to take a break and figure out what is important to you +* Extra Extra blog on IRE +* Don't be afraid to brag and share your work + +**How to get better at something you're not good at:** + +* Don't forget to keep on working on the things you have learned here otherwise you will forget +* Do the hard things, don't stick to the stuff you feel confident in +* Don't try to teach yourself everything at once. Don't try to learn an entire system, start small and build on the previous project eg. bar chart using html, static then move towards dynamic next time +* Work on your networking skills + +*Ira Glass on Storytelling - Vimeo* \ No newline at end of file From 398e6e47f9c85256dda6d8dfc33fcf36d832f15b Mon Sep 17 00:00:00 2001 From: Katie Carnie Date: Sun, 2 Mar 2014 16:23:27 +0000 Subject: [PATCH 25/26] Feel like hacking notes --- sun/feel_like_hacking_without_really_doing_it.md | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/sun/feel_like_hacking_without_really_doing_it.md b/sun/feel_like_hacking_without_really_doing_it.md index 30f0383..2ba460a 100644 --- a/sun/feel_like_hacking_without_really_doing_it.md +++ b/sun/feel_like_hacking_without_really_doing_it.md @@ -2,3 +2,12 @@ [Samantha Sunne's website](http://www.samanthasunne.com/nicar14/) +Examples and tutorials on the website + +* API console - look at the raw data ourselves: [Apigee](https://apigee.com) +* Twiangulate - who people have in common + +[Find out exif data from photographs](regex.info/exif.cgi) + +* Creepy but useful for making pictures are taken where they said they were +* You might want to strip the metadata out of your pictures before uploading them if you don't want people to know where you are, or take a screenshot of the photo From 38c5cec4d934cfaea2d07f590bfa7cc287f57f6b Mon Sep 17 00:00:00 2001 From: Katie Carnie Date: Sun, 2 Mar 2014 16:23:43 +0000 Subject: [PATCH 26/26] Updated career roundtable notes --- sun/where_do_you_want_your_career_to_go?.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/sun/where_do_you_want_your_career_to_go?.md b/sun/where_do_you_want_your_career_to_go?.md index 83d4fb8..3f1a449 100644 --- a/sun/where_do_you_want_your_career_to_go?.md +++ b/sun/where_do_you_want_your_career_to_go?.md @@ -37,4 +37,5 @@ Ellen Gabler (Milwaukee Journal Sentinel) * Don't try to teach yourself everything at once. Don't try to learn an entire system, start small and build on the previous project eg. bar chart using html, static then move towards dynamic next time * Work on your networking skills -*Ira Glass on Storytelling - Vimeo* \ No newline at end of file +*Ira Glass on Storytelling - Vimeo* +Everyone feels like they suck sometimes!