comcast4-lambda-parallization
I used many services from AWS. Of all the AWS offerings, I like Lambda the most.
Lambda in itself has quite limits. Two biggest limits are time and space. It has 15 minutes (last time I used it) time limit. It has 10g? space limit.
While I can make the compute resource using larger memory, time and space is the biggest limit, if you want to do something fancy in Lambda.
Lambda is not for fancy or large jobs. It shines when you use it as a glue for many operations.
I used Lambda for data operation automations, e.g., data availability checks, EMR launch operations or quick file processing etc.
One day, a guy came to me about parallelizing some python code using Spark. His idea is using Spark to call his python code in parallel. That does not work too well. Spark is great in data processing. But it is not great in Parallel execution. I made it work via Lambda and SNS. I put SNS trigger on S3 output and then upon the trigger I launched Lambdas in massive fashion. With this, I was able to run his work in less than 15 minutes. With his idea of running things in Spark, he could not even finish his work. There however is one downside. That is, launching too many lambdas at the same time cripples your whole AWS operation. Because many jobs you run in AWS leverages lambda unbeknownst to you. I overcome this via different mechanism. But the core idea is there, use Lambda when you want to parallel operation to the extreme. Nowawdays, there are some upcoming Python parallel operations and some seems quite promising, e.g., Ray.