Data-science-on-gcp - Source code accompanying book: Data Science on the Google Cloud Platform, Valliappa Lakshmanan, O'Reilly 2017

Overview

data-science-on-gcp

Source code accompanying book:

Data Science on the Google Cloud Platform, 2nd Edition
Valliappa Lakshmanan
O'Reilly, Jan 2022
Branch edition2 [being built]
Data Science on the Google Cloud Platform
Valliappa Lakshmanan
O'Reilly, Jan 2017
Branch edition1_tf2 (also: main)

Try out the code on Google Cloud Platform

Open in Cloud Shell

The code on Qwiklabs (see below) is continually tested, and this repo is kept up-to-date. The code should work as-is for you, however, there are three very common problems that readers report:

  • Ch 2: Download data fails. The Bureau of Transportation website to download the airline dataset periodically goes down or changes availability due to government furloughs and the like. Please use the instructions in 02_ingest/README.md to copy the data from my bucket. The rest of the chapters work off the data in the bucket, and will be fine.
  • Ch 3: Permission errors. These typically occur because we expect that you will copy the airline data to your bucket. You don't have write access to gs://cloud-training-demos-ml/. The instructions will tell you to change the bucket name to one that you own. Please do that.
  • Ch 4, 10: Dataflow doesn't do anything.. The real-time simulation requires that you simultaneously run simulate.py and the Dataflow pipeline. If the Dataflow pipeline is not progressing, make sure that the simulate program is still running.

If the code doesn't work for you, I recommend that you try the corresponding Qwiklab lab to see if there is some step that you missed. If you still have problems, please leave feedback in Qwiklabs, or file an issue in this repo.

Try out the code on Qwiklabs

Purchase book

Read on-line or download PDF of book

Buy on Amazon.com

Updates to book

I updated the book in Nov 2019 with TensorFlow 2.0, Cloud Functions, and BigQuery ML.

Comments
  • GSP195-Updates for Python3 in 02_ingest/monthlyupdate folder

    GSP195-Updates for Python3 in 02_ingest/monthlyupdate folder

    • Updated the python version in app.yaml file.
    • Removed the flexible environment in the app.yaml as it is incompatible with python 3.
    • Updated the script handlers to auto parameter in app.yaml file.
    • Updated the flask version as needed for the python 3 in requirement.txt file.
    opened by pandekalyani 10
  • [CRITICAL] WORKER TIMEOUT errors in stderr (via Stackdriver)

    [CRITICAL] WORKER TIMEOUT errors in stderr (via Stackdriver)

    From AppEngine logs -

    A  GET 200 86 B 2 ms Chrome 64 / GET 200 86 B 2 ms Chrome 64 
    A  INFO: Rejected non-Cron request
     
    A  GET 200 236 B 2 ms Chrome 64 /ingest GET 200 236 B 2 ms Chrome 64 
    A  INFO: Received cron request true
     
    A  INFO: scheduling ingest of year=2016 month=02
     
    A  INFO: Requesting data for 2016-02-*
     
    A  [2018-02-21 05:04:42 +0000] [1] [CRITICAL] WORKER TIMEOUT (pid:15)
     
    A  [2018-02-21 05:04:42 +0000] [15] [INFO] Worker exiting (pid: 15)
     
    
    opened by arsenyspb 8
  • Chapter 2: call_cf.sh not working

    Chapter 2: call_cf.sh not working

    For some reason the call_cf.sh script does not work as expected. I've played a little with the year and month specified in the file, f.e. changed the year to 2016 and tried various values for the month. When running the file as bash script no files are added to the bucket. I've no idea why it fails because running the commands line by line in the shell works just fine and the requested month is added to the bucket.

    opened by kaijennissen 6
  • Many Chapter 9 issues

    Many Chapter 9 issues

    There are a lot of issues that make following along with Chapter 9 difficult to impossible.

    When submitting the exact code that you have in this repository with the following command:

    gcloud ai-platform jobs submit training $JOBNAME \      
    --region=$REGION \
    --module-name=trainer.task \
    --package-path=$(pwd)/flights/trainer \
    --job-dir=$OUTPUT_DIR \
    --runtime-version=1.14 \
    --staging-bucket=gs://$BUCKET \
    --master-machine-type=n1-standard-4 \
    --scale-tier=CUSTOM \
    -- \
    --bucket=$BUCKET --num_examples=100000 --func=linear
    

    It doesn't work due to some error that I don't understand:

    Traceback (most recent call last):
      File "/usr/lib/python2.7/runpy.py", line 174, in _run_module_as_main
        "__main__", fname, loader, pkg_name)
      File "/usr/lib/python2.7/runpy.py", line 72, in _run_code
        exec code in run_globals
      File "/root/.local/lib/python2.7/site-packages/trainer/task.py", line 105, in <module>
        model.train_and_evaluate(func_to_call)
      File "/root/.local/lib/python2.7/site-packages/trainer/model.py", line 220, in train_and_evaluate
        callbacks=[cp_callback])
      File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/keras/engine/training.py", line 780, in fit
        steps_name='steps_per_epoch')
      File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/keras/engine/training_arrays.py", line 274, in model_iteration
        batch_outs = f(actual_inputs)
      File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/keras/backend.py", line 3292, in __call__
        run_metadata=self.run_metadata)
      File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1458, in __call__
        run_metadata_ptr)
    FailedPreconditionError: Table not initialized.
    	 [[{{node features/carrier_indicator/hash_table_Lookup/LookupTableFindV2}}]]
    

    There are more issues where that came from. I had to comment out import hypertune because that isn't available from pip. Also, the command in the book says to use --runtime-version 2.0 but that isn't even a publicly available version (I fell back to 1.14 - not sure if that's the reason for this error).

    To address this I tried to fall back to the commands that you use in the GitHub instead of the ones you list in the book, but your README lists scripts (e.g. retrain_cloud.sh) that don't even exist in the repository, and more importantly I can't figure out how these scripts line up with what I'm reading in the book.

    On the whole while I've been able to follow along with the book up to this point, I can't really do it with Chapter 9.

    opened by eliot1785 6
  • Chapter 2: Getting 500 - Internal Server Error when Running ingest_flights.py

    Chapter 2: Getting 500 - Internal Server Error when Running ingest_flights.py

    Screenshot of results from running ingest_flights.py:

    command2

    It looks like the link used to request the data in ingest.py may be broken: 'https://www.transtats.bts.gov/DownLoad_Table.asp?Table_ID=236&Has_Group=3&Is_Zipped=0'.

    Receiving a 500 error when trying to access that link.

    Is there an alternative link from that site that could be used?

    opened by dylanmpeck 5
  • Fix for cloud scheduler

    Fix for cloud scheduler

    It appears that the http-trigger requires request.get(force=True) according to https://stackoverflow.com/questions/53216177/http-triggering-cloud-function-with-cloud-scheduler

    opened by kaijennissen 5
  • Chapter 2 - Optional Tasks

    Chapter 2 - Optional Tasks

    Hello there! I'm currently following this tutorial and when I ran ./deploy_cf.sh after setting the token and modifying it on main.py I got the following error

    ingest_flights_vNdUIIMPCAy5v8715MV361o4TTdDBssZ Allow unauthenticated invocations of new function [ingest_flights_vNdUIIMPCAy5v8715MV361o4TTdDBssZ]? (y/N)? y Deploying function (may take a while - up to 2 minutes)...failed. ERROR: (gcloud.functions.deploy) OperationError: code=3, message=Function failed on loading user code. Error message: File main.py is expected to contain a functi on named ingest_flights_vNdUIIMPCAy5v8715MV361o4TTdDBssZ

    opened by andresgalarza-astro 4
  • Added dependency for 04_streaming exercises.

    Added dependency for 04_streaming exercises.

    In PR added missing dependency for the exercises.

    Though it doesn't appear to work properly for --upgrade flag as it seem to complain about version conflict so was made as separate line.

    Related issue: https://github.com/GoogleCloudPlatform/data-science-on-gcp/issues/34 Suggested answer from #34 doesn't seem to work or require more investigation.

    opened by bdabrowski 4
  • Issue in 02_ingest/monthlyupdate/ingest_flights.py

    Issue in 02_ingest/monthlyupdate/ingest_flights.py

    In the download function,

    response = urlopen(url, PARAMS)

    TypeError: POST data should be bytes, an iterable of bytes, or a file object. It cannot be of type str.

    I am getting type error because PARAMS need to be encoded in bytes. However when I try the following code, still there is error.

    PARAMS = urllib.parse.urlencode(PARAMS).encode("utf-8")

    opened by FaizSaeed 4
  • Error compiling FlightsMLService

    Error compiling FlightsMLService

    Caused by: com.google.gson.JsonSyntaxException: java.lang.IllegalStateException: Expected an int but was BEGIN_ARRAY at line 1 column 91 path $.predictions[0].classes at com.google.gson.internal.bind.ReflectiveTypeAdapterFactory$Adapter.read(ReflectiveTypeAdapterFactory.java:224) at com.google.gson.internal.bind.TypeAdapterRuntimeTypeWrapper.read(TypeAdapterRuntimeTypeWrapper.java:41) at com.google.gson.internal.bind.CollectionTypeAdapterFactory$Adapter.read(CollectionTypeAdapterFactory.java:82) at com.google.gson.internal.bind.CollectionTypeAdapterFactory$Adapter.read(CollectionTypeAdapterFactory.java:61) at com.google.gson.internal.bind.ReflectiveTypeAdapterFactory$1.read(ReflectiveTypeAdapterFactory.java:129) at com.google.gson.internal.bind.ReflectiveTypeAdapterFactory$Adapter.read(ReflectiveTypeAdapterFactory.java:220) at com.google.gson.Gson.fromJson(Gson.java:887) at com.google.gson.Gson.fromJson(Gson.java:852) at com.google.gson.Gson.fromJson(Gson.java:801) at com.google.gson.Gson.fromJson(Gson.java:773) at com.google.cloud.training.flights.FlightsMLService.sendRequest(FlightsMLService.java:106) at com.google.cloud.training.flights.FlightsMLService.main(FlightsMLService.java:177)

    opened by mshearer0 4
  • Wide and Deep Model and runtime version.

    Wide and Deep Model and runtime version.

    Wide and Deep Model produces same probabilities regardless of request instance values. Linear and DNN models function correctly.

    Model.py: def get_model(output_dir, nbuckets, hidden_units, learning_rate): #return linear_model(output_dir) return dnn_model(output_dir) #return wide_and_deep_model(output_dir, nbuckets, hidden_units, learning_rate)

    Deploy_model.sh: Need to add --runtime-version=1.6: gcloud ml-engine versions create ${MODEL_VERSION} --model ${MODEL_NAME} --origin ${MODEL_LOCATION} --runtime-version=1.6

    opened by mshearer0 4
  • Ch.4 - Time Correction - input/output error when running df02.py

    Ch.4 - Time Correction - input/output error when running df02.py

    Hi! I'm getting the following error when running df02.py (df01.py worked fine) - any advice please?

    (beam_env) jgammerman@cloudshell:~/data-science-on-gcp/04_streaming/transform (peppy-booth-371612)$ python3 ./df02.py Traceback (most recent call last): File "apache_beam/runners/common.py", line 1417, in apache_beam.runners.common.DoFnRunner.process File "apache_beam/runners/common.py", line 624, in apache_beam.runners.common.SimpleInvoker.invoke_process File "/home/jgammerman/beam_env/lib/python3.9/site-packages/apache_beam/transforms/core.py", line 1879, in <lambda> File "/home/jgammerman/data-science-on-gcp/04_streaming/transform/./df02.py", line 39, in <lambda> File "/home/jgammerman/data-science-on-gcp/04_streaming/transform/./df02.py", line 24, in addtimezone File "/home/jgammerman/beam_env/lib/python3.9/site-packages/timezonefinder/timezonefinder.py", line 260, in __init__ File "/home/jgammerman/beam_env/lib/python3.9/site-packages/timezonefinder/timezonefinder.py", line 92, in __init__ OSError: [Errno 5] Input/output error: '/home/jgammerman/beam_env/lib/python3.9/site-packages/timezonefinder/poly_zone_ids.bin'

    Followed by some more output (omitting for brevity) which ends as follows:

    RuntimeError: OSError: [Errno 5] Input/output error: '/home/jgammerman/beam_env/lib/python3.9/site-packages/timezonefinder/poly_zone_ids.bin' [while running 'Map(<lambda at df02.py:39>)'] Exception ignored in: <function AbstractTimezoneFinder.__del__ at 0x7f1142a9a1f0> Traceback (most recent call last): File "/home/jgammerman/beam_env/lib/python3.9/site-packages/timezonefinder/timezonefinder.py", line 97, in __del__ AttributeError: poly_zone_ids

    opened by jgammerman 1
  • README improvement for chapters 2 and 3 regarding upload to BQ

    README improvement for chapters 2 and 3 regarding upload to BQ

    Hello,

    Excellent book so far, but a problem I've been having is uploading the 2015 CSVs from my cloud storage bucket to BigQuery.

    Both the ch2 and ch3 READMEs just tell you to run:

    cd data-science-on-gcp/02_ingest ./ingest_from_crsbucket.sh bucketname

    But this only copies the CSVs from the book's bucket to the user's. It doesn't cover the next stage i.e. uploading to BQ.

    The alternative route of ingesting from the original source of data also doesn't work: I found that my Google Cloud Shell kept disconnecting halfway through the upload process.

    Therefore I'd recommend adding the following instruction to both READMEs, showing you explicitly how to do the upload to BQ:

    bash bqload.sh bucketname 2015

    opened by jgammerman 2
  • AttributeError: tzinfo

    AttributeError: tzinfo

    ERROR:apache_beam.runners.direct.executor:Giving up after 4 attempts. WARNING:apache_beam.runners.direct.executor:A task failed with exception: tzinfo WARNING:apache_beam.runners.direct.executor:A task failed with exception: tzinfo Traceback (most recent call last): File "/home/gcpuser/data-science-on-gcp/04_streaming/realtime/./avg01.py", line 82, in run(project=args['project'], bucket=args['bucket'], region=args['region']) File "/home/gcpuser/data-science-on-gcp/04_streaming/realtime/./avg01.py", line 61, in run (all_events File "/home/gcpuser/.local/lib/python3.9/site-packages/apache_beam/pipeline.py", line 598, in exit self.result.wait_until_finish() File "/home/gcpuser/.local/lib/python3.9/site-packages/apache_beam/runners/direct/direct_runner.py", line 588, in wait_until_finish self._executor.await_completion() File "/home/gcpuser/.local/lib/python3.9/site-packages/apache_beam/runners/direct/executor.py", line 432, in await_completion self._executor.await_completion() File "/home/gcpuser/.local/lib/python3.9/site-packages/apache_beam/runners/direct/executor.py", line 480, in await_completion raise update.exception File "/home/gcpuser/.local/lib/python3.9/site-packages/apache_beam/runners/direct/executor.py", line 370, in call self.attempt_call( File "/home/gcpuser/.local/lib/python3.9/site-packages/apache_beam/runners/direct/executor.py", line 416, in attempt_call result = evaluator.finish_bundle() File "/home/gcpuser/.local/lib/python3.9/site-packages/apache_beam/runners/direct/transform_evaluator.py", line 704, in finish_bundle data = self._read_from_pubsub(self.source.timestamp_attribute) File "/home/gcpuser/.local/lib/python3.9/site-packages/apache_beam/runners/direct/transform_evaluator.py", line 693, in _read_from_pubsub results = [_get_element(rm.message) for rm in response.received_messages] File "/home/gcpuser/.local/lib/python3.9/site-packages/apache_beam/runners/direct/transform_evaluator.py", line 693, in results = [_get_element(rm.message) for rm in response.received_messages] File "/home/gcpuser/.local/lib/python3.9/site-packages/apache_beam/runners/direct/transform_evaluator.py", line 681, in _get_element timestamp = Timestamp.from_utc_datetime(message.publish_time) File "/home/gcpuser/.local/lib/python3.9/site-packages/apache_beam/utils/timestamp.py", line 106, in from_utc_datetime if dt.tzinfo != pytz.utc and dt.tzinfo != datetime.timezone.utc: AttributeError: tzinfo

    opened by bumegha 2
  • Chapter 6 create_cluster.sh script problem

    Chapter 6 create_cluster.sh script problem

    I had to remove the zone from line 21 here https://github.com/GoogleCloudPlatform/data-science-on-gcp/blob/main/06_dataproc/create_cluster.sh for the script to work, otherwise I kept getting resource not found error.

    opened by tjaensch 0
  • Crash during Deployment of model on Vertex AI

    Crash during Deployment of model on Vertex AI

    Chapter 9 within "Deploy model to Vertex AI" section in flights_model_tf2.ipynb. The deployment crashes when you try to execute the first cell:

    ...
    # upload model
    gcloud beta ai models upload --region=$REGION --display-name=$MODEL_NAME \
         --container-image-uri=us-docker.pkg.dev/vertex-ai/prediction/tf2-cpu.${TF_VERSION}:latest \
         --artifact-uri=$EXPORT_PATH
    MODEL_ID=$(gcloud ai models list --region=$REGION --format='value(MODEL_ID)' --filter=display_name=${MODEL_NAME})
    echo "MODEL_ID=$MODEL_ID"
    
    # deploy model to endpoint
    gcloud ai endpoints deploy-model $ENDPOINT_ID \
      --region=$REGION \
      --model=$MODEL_ID \
      --display-name=$MODEL_NAME \
      --machine-type=n1-standard-2 \
      --min-replica-count=1 \
      --max-replica-count=1 \
      --traffic-split=0=100
    

    When I check the Vertex Endpoints, one was created but something else seems to have gone wrong.

    Output: gs://tribbute-ml-central/ch9/trained_model/export/flights_20220803-222758/ Creating Endpoint for flights-20220803-223154 ENDPOINT_ID=974809417000157184 MODEL_ID=

    followed by very long error (the error was too long so I pasted part of it):

    Using endpoint [https://us-central1-aiplatform.googleapis.com/]
    WARNING: The following filter keys were not present in any resource : display_name
    Using endpoint [https://us-central1-aiplatform.googleapis.com/]
    Waiting for operation [7706081518493368320]...
    .....done.
    Created Vertex AI endpoint: projects/591020730428/locations/us-central1/endpoints/974809417000157184.
    Using endpoint [https://us-central1-aiplatform.googleapis.com/]
    Using endpoint [https://us-central1-aiplatform.googleapis.com/]
    ERROR: gcloud crashed (InvalidDataFromServerError): Error decoding response "{
      "models": [
        {
          "name": "projects/591020730428/locations/us-central1/models/1316788319564070912",
          "displayName": "flights-20220803-223002",
          "predictSchemata": {},
          "containerSpec": {
            "imageUri": "us-docker.pkg.dev/vertex-ai/prediction/tf2-cpu.2-9:latest"
          },
          "supportedDeploymentResourcesTypes": [
            "DEDICATED_RESOURCES"
          ],
          "supportedInputStorageFormats": [
            "jsonl",
            "bigquery",
            "csv",
            "tf-record",
            "tf-record-gzip",
            "file-list"
          ],
          "supportedOutputStorageFormats": [
            "jsonl",
            "bigquery"
          ],
          "createTime": "2022-08-03T22:30:12.377079Z",
          "updateTime": "2022-08-03T22:30:14.993220Z",
          "etag": "AMEw9yOIRZqfqqO_ngaA77Jw8Fs9E_kcI8tkqAIsTzFViX-aIrRbHfc0d2HRBihT32rp",
          "supportedExportFormats": [
            {
              "id": "custom-trained",
              "exportableContents": [
                "ARTIFACT"
              ]
            }
          ],
    ...
    
    
    If you would like to report this issue, please run the following command:
      gcloud feedback
    
    To check gcloud for common problems, please run the following command:
      gcloud info --run-diagnostics
    Using endpoint [https://us-central1-aiplatform.googleapis.com/]
    ERROR: (gcloud.ai.endpoints.deploy-model) could not parse resource []
    ---------------------------------------------------------------------------
    CalledProcessError                        Traceback (most recent call last)
    /tmp/ipykernel_1/3503756464.py in <module>
    ----> 1 get_ipython().run_cell_magic('bash', '', '# note TF_VERSION and ENDPOINT_NAME set in 1st cell\n# TF_VERSION=2-6\n# ENDPOINT_NAME=flights\n\nTIMESTAMP=$(date +%Y%m%d-%H%M%S)\nMODEL_NAME=${ENDPOINT_NAME}-${TIMESTAMP}\nEXPORT_PATH=$(gsutil ls ${OUTDIR}/export | tail -1)\necho $EXPORT_PATH\n\nif [[ $(gcloud ai endpoints list --region=$REGION \\\n        --format=\'value(DISPLAY_NAME)\' --filter=display_name=${ENDPOINT_NAME}) ]]; then\n    echo "Endpoint for $MODEL_NAME already exists"\nelse\n    # create model\n    echo "Creating Endpoint for $MODEL_NAME"\n    gcloud ai endpoints create --region=${REGION} --display-name=${ENDPOINT_NAME}\nfi\n\nENDPOINT_ID=$(gcloud ai endpoints list --region=$REGION \\\n              --format=\'value(ENDPOINT_ID)\' --filter=display_name=${ENDPOINT_NAME})\necho "ENDPOINT_ID=$ENDPOINT_ID"\n\n# delete any existing models with this name\nfor MODEL_ID in $(gcloud ai models list --region=$REGION --format=\'value(MODEL_ID)\' --filter=display_name=${MODEL_NAME}); do\n    echo "Deleting existing $MODEL_NAME ... $MODEL_ID "\n    gcloud ai models delete --region=$REGION $MODEL_ID\ndone\n\n# upload model\ngcloud beta ai models upload --region=$REGION --display-name=$MODEL_NAME \\\n     --container-image-uri=us-docker.pkg.dev/vertex-ai/prediction/tf2-cpu.${TF_VERSION}:latest \\\n     --artifact-uri=$EXPORT_PATH\nMODEL_ID=$(gcloud ai models list --region=$REGION --format=\'value(MODEL_ID)\' --filter=display_name=${MODEL_NAME})\necho "MODEL_ID=$MODEL_ID"\n\n# deploy model to endpoint\ngcloud ai endpoints deploy-model $ENDPOINT_ID \\\n  --region=$REGION \\\n  --model=$MODEL_ID \\\n  --display-name=$MODEL_NAME \\\n  --machine-type=n1-standard-2 \\\n  --min-replica-count=1 \\\n  --max-replica-count=1 \\\n  --traffic-split=0=100\n')
    
    /opt/conda/lib/python3.7/site-packages/IPython/core/interactiveshell.py in run_cell_magic(self, magic_name, line, cell)
       2470             with self.builtin_trap:
       2471                 args = (magic_arg_s, cell)
    -> 2472                 result = fn(*args, **kwargs)
       2473             return result
       2474 
    
    /opt/conda/lib/python3.7/site-packages/IPython/core/magics/script.py in named_script_magic(line, cell)
        140             else:
        141                 line = script
    --> 142             return self.shebang(line, cell)
        143 
        144         # write a basic docstring:
    
    /opt/conda/lib/python3.7/site-packages/decorator.py in fun(*args, **kw)
        230             if not kwsyntax:
        231                 args, kw = fix(args, kw, sig)
    --> 232             return caller(func, *(extras + args), **kw)
        233     fun.__name__ = func.__name__
        234     fun.__doc__ = func.__doc__
    
    /opt/conda/lib/python3.7/site-packages/IPython/core/magic.py in <lambda>(f, *a, **k)
        185     # but it's overkill for just that one bit of state.
        186     def magic_deco(arg):
    --> 187         call = lambda f, *a, **k: f(*a, **k)
        188 
        189         if callable(arg):
    
    /opt/conda/lib/python3.7/site-packages/IPython/core/magics/script.py in shebang(self, line, cell)
        243             sys.stderr.flush()
        244         if args.raise_error and p.returncode!=0:
    --> 245             raise CalledProcessError(p.returncode, cell, output=out, stderr=err)
        246 
        247     def _run_script(self, p, cell, to_close):
    
    CalledProcessError: Command 'b'# note TF_VERSION and ENDPOINT_NAME set in 1st cell\n# TF_VERSION=2-6\n# ENDPOINT_NAME=flights\n\nTIMESTAMP=$(date +%Y%m%d-%H%M%S)\nMODEL_NAME=${ENDPOINT_NAME}-${TIMESTAMP}\nEXPORT_PATH=$(gsutil ls ${OUTDIR}/export | tail -1)\necho $EXPORT_PATH\n\nif [[ $(gcloud ai endpoints list --region=$REGION \\\n        --format=\'value(DISPLAY_NAME)\' --filter=display_name=${ENDPOINT_NAME}) ]]; then\n    echo "Endpoint for $MODEL_NAME already exists"\nelse\n    # create model\n    echo "Creating Endpoint for $MODEL_NAME"\n    gcloud ai endpoints create --region=${REGION} --display-name=${ENDPOINT_NAME}\nfi\n\nENDPOINT_ID=$(gcloud ai endpoints list --region=$REGION \\\n              --format=\'value(ENDPOINT_ID)\' --filter=display_name=${ENDPOINT_NAME})\necho "ENDPOINT_ID=$ENDPOINT_ID"\n\n# delete any existing models with this name\nfor MODEL_ID in $(gcloud ai models list --region=$REGION --format=\'value(MODEL_ID)\' --filter=display_name=${MODEL_NAME}); do\n    echo "Deleting existing $MODEL_NAME ... $MODEL_ID "\n    gcloud ai models delete --region=$REGION $MODEL_ID\ndone\n\n# upload model\ngcloud beta ai models upload --region=$REGION --display-name=$MODEL_NAME \\\n     --container-image-uri=us-docker.pkg.dev/vertex-ai/prediction/tf2-cpu.${TF_VERSION}:latest \\\n     --artifact-uri=$EXPORT_PATH\nMODEL_ID=$(gcloud ai models list --region=$REGION --format=\'value(MODEL_ID)\' --filter=display_name=${MODEL_NAME})\necho "MODEL_ID=$MODEL_ID"\n\n# deploy model to endpoint\ngcloud ai endpoints deploy-model $ENDPOINT_ID \\\n  --region=$REGION \\\n  --model=$MODEL_ID \\\n  --display-name=$MODEL_NAME \\\n  --machine-type=n1-standard-2 \\\n  --min-replica-count=1 \\\n  --max-replica-count=1 \\\n  --traffic-split=0=100\n'' returned non-zero exit status 1.
    
    opened by shinchri 9
  • problem extracting bigquery table data to cloud storage

    problem extracting bigquery table data to cloud storage

    I am in chapter 9 of the book trying to extract BigQuery table into Google Cloud Storage when I ran into a problem.

    The code where I ran into problem is (inside the correct project id is printed by PROJECT):

    %%bash
    PROJECT=$(gcloud config get-value project)
    for dataset in "train" "eval" "all"; do
      TABLE=dsongcp.flights_${dataset}_data
      CSV=gs://${BUCKET}/ch9/data/${dataset}.csv
      echo "Exporting ${TABLE} to ${CSV} and deleting table"
      bq --project_id=${PROJECT} extract --destination_format=CSV $TABLE $CSV
      bq --project_id=${PROJECT} rm -f $TABLE
    done
    

    For some odd reason, I am getting weird error message:

    Exporting dsongcp.flights_train_data to gs://tribbute-ml-central/ch9/data/train.csv and deleting table in project tribbute-ml
    BigQuery error in extract operation: BigQuery API has not been used in project
    457198359311 before or it is disabled. Enable it by visiting https://console.dev/
    elopers.google.com/apis/api/bigquery.googleapis.com/overview?project=457198359311 
    then retry. If you enabled this API recently, wait a few minutes for the
    action to propagate to our systems and retry.
    

    This is followed by:

    CalledProcessError: Command 'b'PROJECT="tribbute-ml"\nfor dataset in "train" "eval" "all"; do\n  TABLE=dsongcp.flights_${dataset}_data\n  CSV=gs://${BUCKET}/ch9/data/${dataset}.csv\n  echo "Exporting ${TABLE} to ${CSV} and deleting table in project ${PROJECT}"\n  bq extract --project_id=${PROJECT} --location=us-central1 --destination_format=CSV $TABLE $CSV\n  bq --project_id=${PROJECT} rm -f $TABLE\ndone\n'' returned non-zero exit status 1.
    

    Weird thing is that I already enabled BigQuery API for my project and "457198359311 " is not even my project. (I verified that in the bash command correct project id gets printed)

    Anyone knows what's causing this issue and how to fix it?

    opened by shinchri 2
Owner
Google Cloud Platform
Google Cloud Platform
A website for courses of Major Computer Science, NKU

A website for courses of Major Computer Science, NKU

Sakura 0 Oct 6, 2022
📖 Generate markdown API documentation from Google-style Python docstring. The lazy alternative to Sphinx.

lazydocs Generate markdown API documentation for Google-style Python docstring. Getting Started • Features • Documentation • Support • Contribution •

Machine Learning Tooling 118 Dec 31, 2022
xeuledoc - Fetch information about a public Google document.

xeuledoc - Fetch information about a public Google document.

Malfrats Industries 651 Dec 27, 2022
Markdown documentation generator from Google docstrings

mkgendocs A Python package for automatically generating documentation pages in markdown for Python source files by parsing Google style docstring. The

Davide Nunes 44 Dec 18, 2022
The source code that powers readthedocs.org

Welcome to Read the Docs Purpose Read the Docs hosts documentation for the open source community. It supports Sphinx docs written with reStructuredTex

Read the Docs 7.4k Dec 25, 2022
Source Code for 'Practical Python Projects' (video) by Sunil Gupta

Apress Source Code This repository accompanies %Practical Python Projects by Sunil Gupta (Apress, 2021). Download the files as a zip using the green b

Apress 2 Jun 1, 2022
graphical orbitational simulation of solar system planets with real values and physics implemented so you get a nice elliptical orbits. you can change timestamp value or scale from source code idc.

solarSystemOrbitalSimulation graphical orbitational simulation of solar system planets with real values and physics implemented so you get a nice elli

Mega 3 Mar 3, 2022
A tutorial for people to run synthetic data replica's from source healthcare datasets

Synthetic-Data-Replica-for-Healthcare Description What is this? A tailored hands-on tutorial showing how to use Python to create synthetic data replic

null 11 Mar 22, 2022
An open source utility for creating publication quality LaTex figures generated from OpenFOAM data files.

foamTEX An open source utility for creating publication quality LaTex figures generated from OpenFOAM data files. Explore the docs » Report Bug · Requ

null 1 Dec 19, 2021
Canonical source repository for PyYAML

PyYAML - The next generation YAML parser and emitter for Python. To install, type 'python setup.py install'. By default, the setup.py script checks

The YAML Project 2k Jan 1, 2023
Hasköy is an open-source variable sans-serif typeface family

Hasköy Hasköy is an open-source variable sans-serif typeface family. Designed with powerful opentype features and each weight includes latin-extended

null 67 Jan 4, 2023
step by step guide for beginners for getting started with open source

Step-by-Step Guide for beginners for getting started with Open-Source Here The Contribution Begins ?? If you are a beginner then this repository is fo

Arpit Jain 66 Jan 3, 2023
Mozilla Campus Club CCEW is a student committee working to spread awareness on Open Source software.

Mozilla Campus Club CCEW is a student committee working to spread awareness on Open Source software. We organize webinars and workshops on different technical topics and making Open Source contributions.

Mozilla-Campus-Club-Cummins 8 Jun 15, 2022
An open-source script written in python just for fun

Owersite Owersite is an open-source script written in python just for fun. It do

大きなペニスを持つ少年 7 Sep 21, 2022
EasyModerationKit is an open-source framework designed to moderate and filter inappropriate content.

EasyModerationKit is a public transparency statement. It declares any repositories and legalities used in the EasyModeration system. It allows for implementing EasyModeration into an advanced character/word/phrase detection system.

Aarav 1 Jan 16, 2022
💻An open-source eBook with 101 Linux commands that everyone should know

This is an open-source eBook with 101 Linux commands that everyone should know. No matter if you are a DevOps/SysOps engineer, developer, or just a Linux enthusiast, you will most likely have to use the terminal at some point in your career.

Ashfaque Ahmed 0 Oct 29, 2022
Python code for working with NFL play by play data.

nfl_data_py nfl_data_py is a Python library for interacting with NFL data sourced from nflfastR, nfldata, dynastyprocess, and Draft Scout. Includes im

null 82 Jan 5, 2023
Quick tutorial on orchest.io that shows how to build multiple deep learning models on your data with a single line of code using python

Deep AutoViML Pipeline for orchest.io Quickstart Build Deep Learning models with a single line of code: deep_autoviml Deep AutoViML helps you build te

Ram Seshadri 6 Oct 2, 2022
Run `black` on python code blocks in documentation files

blacken-docs Run black on python code blocks in documentation files. install pip install blacken-docs usage blacken-docs provides a single executable

Anthony Sottile 460 Dec 23, 2022