sparksteps package¶
Submodules¶
sparksteps.cluster module¶
Create EMR cluster.
Parse AWS tags.
- Examples:
>>> from pprint import pprint >>> pprint(parse_tags(['name="Peanut Pug"', 'age=5'])) [{'Key': 'name', 'Value': '"Peanut Pug"'}, {'Key': 'age', 'Value': '5'}]
sparksteps.pricing module¶
Get optimal pricing for EC2 instances.
-
class
sparksteps.pricing.
Spot
(availability_zone, timestamp, price)¶ Bases:
tuple
-
availability_zone
¶ Alias for field number 0
-
price
¶ Alias for field number 2
-
timestamp
¶ Alias for field number 1
-
-
class
sparksteps.pricing.
Zone
(name, max, min, mean, current)¶ Bases:
tuple
-
current
¶ Alias for field number 4
-
max
¶ Alias for field number 1
-
mean
¶ Alias for field number 3
-
min
¶ Alias for field number 2
-
name
¶ Alias for field number 0
-
-
sparksteps.pricing.
determine_best_price
(demand_price, aws_zone)[source]¶ Calculate optimal bid price.
- Args:
- demand_price (float): on-demand cost of AWS instance aws_zone (Zone): AWS zone namedtuple (‘name max min mean current’)
- Returns:
- float: bid price bool: boolean to use spot pricing
-
sparksteps.pricing.
get_bid_price
(client, instance_type)[source]¶ Determine AWS bid price.
- Args:
- client: boto3 client instance_type: EC2 instance type
- Returns:
- float: bid price, bool: is stop
- Examples:
>>> import boto3 >>> client = boto3.client('ec2', region_name='us-east-1') >>> print(get_bid_price(client, 'm3.2xlarge'))
-
sparksteps.pricing.
get_demand_price
(aws_region, instance_type)[source]¶ Get AWS instance demand price.
>>> print(get_demand_price('us-east-1', 'm4.2xlarge'))
-
sparksteps.pricing.
get_spot_price_history
(ec2_client, instance_type, lookback=1)[source]¶ Return dictionary of price history by availability zone.
- Args:
- ec2_client: EC2 client instance_type (str): get results by the specified instance type lookback (int): number of hours to look back for spot history
- Returns:
- float: bid price for the instance type.
sparksteps.steps module¶
Create EMR steps and upload files.
-
class
sparksteps.steps.
CmdStep
[source]¶ Bases:
object
-
cmd
¶
-
on_failure
= 'CANCEL_AND_WAIT'¶
-
step
¶
-
step_name
¶
-
-
class
sparksteps.steps.
CopyStep
(bucket, filename)[source]¶ Bases:
sparksteps.steps.CmdStep
-
cmd
¶
-
key
¶
-
s3_uri
¶
-
step_name
¶
-
-
class
sparksteps.steps.
DebugStep
[source]¶ Bases:
sparksteps.steps.CmdStep
-
cmd
¶
-
on_failure
= 'TERMINATE_CLUSTER'¶
-
step_name
¶
-
-
class
sparksteps.steps.
S3DistCp
(s3_dist_cp)[source]¶ Bases:
sparksteps.steps.CmdStep
-
cmd
¶
-
on_failure
= 'CONTINUE'¶
-
step_name
¶
-
-
class
sparksteps.steps.
SparkStep
(app_path, submit_args=None, app_args=None)[source]¶ Bases:
sparksteps.steps.CmdStep
-
cmd
¶
-
remote_app
¶
-
step_name
¶
-
-
class
sparksteps.steps.
UnzipStep
(dirpath)[source]¶ Bases:
sparksteps.steps.CmdStep
-
cmd
¶
-
dirname
¶
-
remote_dirpath
¶
-
remote_zipfile
¶
-
step_name
¶
-
zipfile
¶
-