Hi all I am new here... I am trying to self-host t...
# help
n
Hi all I am new here... I am trying to self-host the API on GKE, behind a corporate proxy. However, when the job trying to download the data, exists with error 400 when trying to retrieve the downloadURL from the pricing API. The proxy environment variables are defined, I can curl the API successfully from the container, but the command below fails
Copy code
npm run data:download
{"level":30,"time":1676947535300,"pid":922,"hostname":"cloud-pricing-api-77cfdcf8d4-q2hwv","msg":"Starting: downloading DB data"}
{"level":50,"time":1676947535343,"pid":922,"hostname":"cloud-pricing-api-77cfdcf8d4-q2hwv","msg":"There was an error downloading data: Request failed with status code 400"}
w
@nutritious-motorcycle-74922 I wonder if there you have
http_proxy
or
https_proxy
set in your Cloud Pricing API pod’s environment? If so, try disabling the proxy for the following domain by running
export no_proxy="<http://pricing.api.infracost.io|pricing.api.infracost.io>"
so the init job can download the pricing DB dump from us. (this is separate from the proxy you’ve set in the infracost CLI to hit your self-hosted pricing API)
n
@white-airport-8778 I can only access internet via proxy in my corporate environment. All proxy variables are setup correctly on the Pod. The strange thing is error status 400 (Bad Request) If exec curl in the container accessing the pricing API url with the same API key it works, if I run the npm run data:download manually, I get the same error.
l
@nutritious-motorcycle-74922 the download URL redirects to
<https://pricing-api-db-data-settled-blowfish.s3.us-east-2.amazonaws.com>
, I’m wondering if there’s anything you need to do to open access to that?
w
If exec curl in the container accessing the pricing API url with the same API key it works, if I run the npm run data:download manually, I get the same error.
Can you share the exact
curl
command you’re running? (without your API key). I wonder if curl is respecting the proxy/no-proxy envs but the Cloud Pricing API code isn’t?
n
@white-airport-8778 Yes, curl is respecting the proxy. npm is also respecting it (before adding the proxy settings to Pod I was getting connection errors) @little-author-61621 I looked at download source code, the error happens one step before when it is trying to reach the pricing API that generates the S3 downloadUrl
Maybe it is related to this https://github.com/axios/axios/issues/5256 As a workaround, I updated the helm chart jobs to perform the download using curl job-init.yaml
Copy code
command: 
  - /bin/bash
  - -c 
  - | 
     npm run db:setup && 
     curl -s -H "X-Api-Key: ${INFRACOST_API_KEY}" <https://pricing.api.infracost.io/data-download/latest> | grep -o '"downloadUrl": *"[^"]*"' | grep -o '"[^"]*"$' | xargs -n1 curl --progress-bar --output ./data/products/products.csv.gz &&
     npm run data:load
w
@nutritious-motorcycle-74922 I’ve subscribed to that issue for updates- thanks for sharing the workaround! I just helped another user setup self-hosting behind their corporate internet proxy, and they didn’t run into that axios issue so something seems odd. Can I confirm that you’ve got the PR comments part working ok too now?
n
Hi @white-airport-8778, I would understand more the issue if the axios could provide more details. The workaround is working fine, Both jobs were updated (Init and Cron) to use curl. Maybe if you want to support this going forward, my only suggestion would be to provide the raw download url as response. The current response is wrapped in a JSON thus I had to parse in bash
w
Thanks, yeah I want to update this troubleshooting section with your workaround 🙂
l
@nutritious-motorcycle-74922 I wasn’t able to reproduce this exact issue, but I managed to reproduce a similar issue with proxies and Axios. I’ve pushed an update which uses node-fetch instead and released it in v0.3.12 (helm-chart version 0.6.3). Let me know if that works for you.
n
@little-author-61621 I've just tested the 0.3.12. I set HTTPS_PROXY (uppercase and lowercase) environment variable, but the job does not pick up the proxy settings from the env. It still tries to download the data directly. I am still using the workaround. Can I make a suggestion to keep the download as curl. It is a very establish utility that everyone on the IT community understands and know how to setup. You might face situations where proxy needs authentication and the code would have to cater for all these use cases
w
Thanks Alex! I’ve documented the workaround here as it’s simple and flexible for anyone who is running into proxy issues
n
That is great. If you are interested I can send you a PR from our version of the chart with a variable _use_curl_to_download_. This will render the workaround command instead