sparksteps package¶
Submodules¶
sparksteps.cluster module¶
Create EMR cluster.
Parse AWS tags.
- Examples:
>>> from pprint import pprint >>> pprint(parse_tags(['name="Peanut Pug"', 'age=5'])) [{'Key': 'name', 'Value': '"Peanut Pug"'}, {'Key': 'age', 'Value': '5'}]
sparksteps.pricing module¶
Get optimal pricing for EC2 instances.
-
class
sparksteps.pricing.Spot(availability_zone, timestamp, price)¶ Bases:
tuple-
availability_zone¶ Alias for field number 0
-
price¶ Alias for field number 2
-
timestamp¶ Alias for field number 1
-
-
class
sparksteps.pricing.Zone(name, max, min, mean, current)¶ Bases:
tuple-
current¶ Alias for field number 4
-
max¶ Alias for field number 1
-
mean¶ Alias for field number 3
-
min¶ Alias for field number 2
-
name¶ Alias for field number 0
-
-
sparksteps.pricing.determine_best_price(demand_price, aws_zone)[source]¶ Calculate optimal bid price.
- Args:
- demand_price (float): on-demand cost of AWS instance aws_zone (Zone): AWS zone namedtuple (‘name max min mean current’)
- Returns:
- float: bid price bool: boolean to use spot pricing
-
sparksteps.pricing.get_bid_price(client, instance_type)[source]¶ Determine AWS bid price.
- Args:
- client: boto3 client instance_type: EC2 instance type
- Returns:
- float: bid price, bool: is stop
- Examples:
>>> import boto3 >>> client = boto3.client('ec2', region_name='us-east-1') >>> print(get_bid_price(client, 'm3.2xlarge'))
-
sparksteps.pricing.get_demand_price(aws_region, instance_type)[source]¶ Get AWS instance demand price.
>>> print(get_demand_price('us-east-1', 'm4.2xlarge'))
-
sparksteps.pricing.get_spot_price_history(ec2_client, instance_type, lookback=1)[source]¶ Return dictionary of price history by availability zone.
- Args:
- ec2_client: EC2 client instance_type (str): get results by the specified instance type lookback (int): number of hours to look back for spot history
- Returns:
- float: bid price for the instance type.
sparksteps.steps module¶
Create EMR steps and upload files.
-
class
sparksteps.steps.CmdStep[source]¶ Bases:
object-
cmd¶
-
on_failure= 'CANCEL_AND_WAIT'¶
-
step¶
-
step_name¶
-
-
class
sparksteps.steps.CopyStep(bucket, filename)[source]¶ Bases:
sparksteps.steps.CmdStep-
cmd¶
-
key¶
-
s3_uri¶
-
step_name¶
-
-
class
sparksteps.steps.DebugStep[source]¶ Bases:
sparksteps.steps.CmdStep-
cmd¶
-
on_failure= 'TERMINATE_CLUSTER'¶
-
step_name¶
-
-
class
sparksteps.steps.S3DistCp(s3_dist_cp)[source]¶ Bases:
sparksteps.steps.CmdStep-
cmd¶
-
on_failure= 'CONTINUE'¶
-
step_name¶
-
-
class
sparksteps.steps.SparkStep(app_path, submit_args=None, app_args=None)[source]¶ Bases:
sparksteps.steps.CmdStep-
cmd¶
-
remote_app¶
-
step_name¶
-
-
class
sparksteps.steps.UnzipStep(dirpath)[source]¶ Bases:
sparksteps.steps.CmdStep-
cmd¶
-
dirname¶
-
remote_dirpath¶
-
remote_zipfile¶
-
step_name¶
-
zipfile¶
-