Skip to content
This repository was archived by the owner on Jul 26, 2019. It is now read-only.

Wc.go hadoop prep#2

Merged
eburdon merged 9 commits into
eburdon:masterfrom
jordan-heemskerk:wc.go_hadoop_prep
Jul 25, 2016
Merged

Wc.go hadoop prep#2
eburdon merged 9 commits into
eburdon:masterfrom
jordan-heemskerk:wc.go_hadoop_prep

Conversation

@jordan-heemskerk

Copy link
Copy Markdown
Contributor
  • Cleaned out a bunch of stuff that shouldn't be in the repo
  • Restructured wc.go to work with Hadoop streaming, first step in getting it to go on EMR
  • Restructure graphbuilder.go to work with new wc.go outputs

@jordan-heemskerk

Copy link
Copy Markdown
Contributor Author

@eburdon this too!

@jordan-heemskerk

Copy link
Copy Markdown
Contributor Author

ec01ea0 is the crowning achievement... see this fvbock/trie#2

@eburdon

eburdon commented Jul 25, 2016

Copy link
Copy Markdown
Owner

😮 You contributed to open source?! NIIIIIIIIIIIIICE

Just looking through the files now... I was thinking that once this is stable, we'd just fire up the smallest EC2 cluster and run EMR on that instead of spot instances until the 12th. Shouldn't be too expensive and would prevent Lambda from having to configure every time. Thoughts?

@jordan-heemskerk

Copy link
Copy Markdown
Contributor Author

I was thinking that once this is stable, we'd just fire up the smallest EC2 cluster and run EMR on that instead of spot instances until the 12th. Shouldn't be too expensive and would prevent Lambda from having to configure every time. Thoughts?

My thoughts exactly. We can spin up a small EMR cluster and leave it running. Lambda can just submit jobs to it using the API (available for most major languages) and then fetch the results from S3 when they are available.

@eburdon

eburdon commented Jul 25, 2016

Copy link
Copy Markdown
Owner

👍 Just for deleting all the junk alone... I packaged the existing repo just for safety's sake.

Otherwise, looks great, and the plan sounds solid! Merge when ready.

@eburdon

eburdon commented Jul 25, 2016

Copy link
Copy Markdown
Owner

To confirm, looks like there's 1 input, 1 output now?

@jordan-heemskerk

Copy link
Copy Markdown
Contributor Author

I don't have permission to merge in this repo I don't think. You can just do it if you want, or gimme god mode :P.

@jordan-heemskerk

Copy link
Copy Markdown
Contributor Author

Input and output all happens over STDIN and STDOUT now, as required by hadoop streaming. There is one input, it controls whether the execution is mapping or reducing.

@eburdon

eburdon commented Jul 25, 2016

Copy link
Copy Markdown
Owner

haha ok. Alex can get the command needed to run from lambda from this codebase / readme?

@eburdon eburdon merged commit 284b516 into eburdon:master Jul 25, 2016
@jordan-heemskerk

jordan-heemskerk commented Jul 25, 2016

Copy link
Copy Markdown
Contributor Author

Alex can get the command needed to run from lambda from this codebase / readme?

@eburdon is he in this repo yet? Have him ask me, its going to depend on what he is calling it from

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants