sparksteps package

Submodules

sparksteps.cluster module

Create EMR cluster.

sparksteps.cluster.emr_config(release_label, keep_alive=False, **kw)[source]
sparksteps.cluster.parse_conf(raw_conf_list)[source]

Parse configuration items for spark-defaults.

sparksteps.cluster.parse_tags(raw_tags_list)[source]

Parse AWS tags.

Examples:
>>> from pprint import pprint
>>> pprint(parse_tags(['name="Peanut Pug"', 'age=5']))
[{'Key': 'name', 'Value': '"Peanut Pug"'}, {'Key': 'age', 'Value': '5'}]

sparksteps.pricing module

Get optimal pricing for EC2 instances.

class sparksteps.pricing.Spot(availability_zone, timestamp, price)

Bases: tuple

availability_zone

Alias for field number 0

price

Alias for field number 2

timestamp

Alias for field number 1

class sparksteps.pricing.Zone(name, max, min, mean, current)

Bases: tuple

current

Alias for field number 4

max

Alias for field number 1

mean

Alias for field number 3

min

Alias for field number 2

name

Alias for field number 0

sparksteps.pricing.determine_best_price(demand_price, aws_zone)[source]

Calculate optimal bid price.

Args:
demand_price (float): on-demand cost of AWS instance aws_zone (Zone): AWS zone namedtuple (‘name max min mean current’)
Returns:
float: bid price bool: boolean to use spot pricing
sparksteps.pricing.get_bid_price(client, instance_type)[source]

Determine AWS bid price.

Args:
client: boto3 client instance_type: EC2 instance type
Returns:
float: bid price, bool: is stop
Examples:
>>> import boto3
>>> client = boto3.client('ec2', region_name='us-east-1')
>>> print(get_bid_price(client, 'm3.2xlarge'))
sparksteps.pricing.get_demand_price(aws_region, instance_type)[source]

Get AWS instance demand price.

>>> print(get_demand_price('us-east-1', 'm4.2xlarge'))
sparksteps.pricing.get_spot_price_history(ec2_client, instance_type, lookback=1)[source]

Return dictionary of price history by availability zone.

Args:
ec2_client: EC2 client instance_type (str): get results by the specified instance type lookback (int): number of hours to look back for spot history
Returns:
float: bid price for the instance type.
sparksteps.pricing.get_zone_profile(zone_history)[source]
sparksteps.pricing.price_by_zone(price_history)[source]

sparksteps.steps module

Create EMR steps and upload files.

class sparksteps.steps.CmdStep[source]

Bases: object

cmd
on_failure = 'CANCEL_AND_WAIT'
step
step_name
class sparksteps.steps.CopyStep(bucket, filename)[source]

Bases: sparksteps.steps.CmdStep

cmd
key
s3_uri
step_name
class sparksteps.steps.DebugStep[source]

Bases: sparksteps.steps.CmdStep

cmd
on_failure = 'TERMINATE_CLUSTER'
step_name
class sparksteps.steps.S3DistCp(s3_dist_cp)[source]

Bases: sparksteps.steps.CmdStep

cmd
on_failure = 'CONTINUE'
step_name
class sparksteps.steps.SparkStep(app_path, submit_args=None, app_args=None)[source]

Bases: sparksteps.steps.CmdStep

cmd
remote_app
step_name
class sparksteps.steps.UnzipStep(dirpath)[source]

Bases: sparksteps.steps.CmdStep

cmd
dirname
remote_dirpath
remote_zipfile
step_name
zipfile
sparksteps.steps.get_basename(path)[source]
sparksteps.steps.ls_recursive(dirname)[source]

Recursively list files in a directory.

sparksteps.steps.setup_steps(s3, bucket, app_path, submit_args=None, app_args=None, uploads=None, s3_dist_cp=None)[source]
sparksteps.steps.upload_steps(s3_resource, bucket, path)[source]

Upload files to S3 and get steps.

sparksteps.steps.zip_to_s3(s3_resource, dirpath, bucket, key)[source]

Zip folder and upload to S3.

Module contents