Generative AI Support

Author

Hans van Leeuwen

Introduction

I will not write that I could not have done this project without the help of Generative AI tools like Claude, Gemini and Perplexity. However, without their support:

it would have taken me much longer to do all the things that I did for this project - a rough estimate: instead of a month working on it in my spare time it would have taken me at least 6 months.
I would have missed out on certain analyses, visualizations or use of tools and hosting platforms.

Below I provide a selection of positive examples of using Generative AI support, and after that I provide some less positive experiences, providing key considerations for effective and proper use of Generative AI.

Positive examples

Brainstorming ideas for data analytics and visuals

Given a description of the data and the objectives of the report, Claude.AI gave a detailed and long list of suggested analyses and visualizations. Many of those matched ideas that I already had thought of myself, but there were also new ones. Here is how the conversation went.

Me: I can use some suggestions for my capstone project in the Google data analytics certification

Claude: I’d be happy to help brainstorm ideas for your Google Data Analytics capstone project. Could you tell me what areas of data analysis interest you the most or what kind of industries you’d potentially like to work in? This would help me suggest projects that align with your career goals.

Also, are you leaning towards using any particular data analysis tools (like R, Python, SQL, Tableau) that you learned during the certification? This would help me tailor the suggestions to your technical preferences.

Me: I am using R and am thinking about creating a Tableau dashboard as well. I am using, with permission, all the data about u2 concerts from U2 gigs.com.

Claude: That’s a fantastic dataset choice! U2’s concert history would offer rich opportunities for analysis across multiple dimensions. Here are some potential angles you could explore for your capstone:

Temporal Analysis:

Concert frequency patterns across decades

Seasonal touring patterns

Evolution of tour lengths over time

Gap analysis between tours

Geographic Analysis:

Global reach and market penetration

Most frequently visited cities/venues

Regional touring patterns

New markets entered over time

Heat maps of concert locations

Venue Analysis:

Types of venues (stadiums vs. arenas vs. festivals)

Venue capacity trends

Favorite/recurring venues

Indoor vs. outdoor shows

Setlist/Performance Analysis (if the data includes this):

Most performed songs

Song rotation patterns

Set length changes over time

Debut songs by tour

For your Tableau dashboard, you could create:

An interactive map showing tour routes

Timeline visualizations of concert frequency

Venue capacity comparisons

Geographic distribution of shows by decade

Would you like me to elaborate on any of these angles? Also, what specific aspects of U2’s concert history interest you the most? This could help us focus your analysis on the most compelling stories in the data.

Troubleshooting coding errors

When one encounters persistent and hard to figure out errors while coding, the typical thing to do is to use the internet to search for possible causes and solutions. There are many amazing forums and communities where people help each other. For the first part of my project I used that approach to solve coding challenges. However, at some point I switched to using AI tools for support while coding. I used tools like Claude, Perplexity, Gemini and ChatGPT. An example with Claude when there was an issue with a graph plotting distance trees of setl-ists

Me: Now the whole plot is narrower and moved to the left, but the tree tip labels are still cut off, and now there is a lot of empty space on the right of the plot

Claude: The issue is likely better solved by using coord_cartesian() to extend the plot area rather than xlim_tree(). Let’s fix this: (new code provided).

Key changes: - Removed xlim_tree() - Added coord_cartesian(clip = 'off') to prevent label clipping - Used scale_x_continuous(expand = ...) to add appropriate padding on both sides - Reduced the right margin since we’re handling the spacing with scale expansion

This should: - Keep the tree at its original width - Show all labels without cutting them off - Maintain proper spacing - Avoid excessive empty space

If you still see labels being cut off, we can adjust the expansion multiplier (0.3) to a larger value. Let me know if further adjustments are needed!

Me: That’s good!

Advise for tools and hosting platforms

The AI tools were very good in advising on potential tools and hosting platforms to use for my U2 Data Analytics project as I was starting. I could have figured that out also by just searching on the internet, but the AI tools gave clear overviews, comparing different options in just seconds. that saved me a lot of time.

Advising on the structure for an R Package

As my project advanced I was creating more and more code for all the analyses. The R code chunks in the Quarto documents were getting long and complex. I decided it was time to start using a custom R package where I could manage custom R functions that perform a specific analysis, processing or transformation. Back in the days when I was scripting I used R packages but never created one myself… time to learn! With some help from Claude.AI.

Me: I want to create an R package for an existing R project. What location should I create this package?

Claude: I’ll help you set up a new R package structure. The best practice is to create a new, dedicated directory for your package that’s separate from your existing project files. Here’s the recommended approach:

Create a new directory for your package with a valid package name (should contain only letters, numbers, and dots, start with a letter, and not end in a dot). For example:
# Choose a location and create directory
dir.create("~/path/to/mypackage")
Set up the basic package structure using either:

RStudio: File > New Project > New Directory > R Package

Or from R console using usethis:
# Install if needed
install.packages("usethis")
# Create package structure
usethis::create_package("~/path/to/mypackage")
This will create the essential package structure:
mypackage/
├── DESCRIPTION
├── NAMESPACE
├── R/
└── .Rbuildignore
Would you like me to explain how to move your existing R code into this package structure?

After this Claude.AI also helped me to convert my existing analysis code into functions, with proper header lines.

Automating the deployment of changes in the Quarto web site

As explained in the Sharing > Maintenance and Updates section, I initially had a cumbersome process that involved many manual steps to push changes to my Quarto web site. That was until I asked Claude.AI (3.5 Sonnet) for help. Here was my prompt:

I have my R Studio project u2_data_analytics. I have a git repository locally and on GitHub. All the Quarto rendered files are in the docs folder and in the gh-pages branch, and published on GitHub pages. The rest is in the main branch.

The process to update is currently cumbersome. I update qmd documents in main, render these, copy the new files from docs temporarily and manually somewhere outside of my project folder, then do a git stash, after which I switch to the gh-pages branch, then manually copy paste the new files into the docs folder of the gh-pages branch, followed by git add, commit and push.

Is there a more automated way to deploy my qmd file changes to the Quarto site that is on GitHub pages?

This was Claude’s response:

I understand you want to streamline the process of updating your Quarto site deployed on GitHub Pages. Yes, there’s a much simpler automated approach using GitHub Actions. Let me create a workflow file that will handle this automatically.

GitHub Actions workflow for Quarto publishing (code was provided)

To implement this automated workflow:

Create a .github/workflows directory in your repository’s main branch:
mkdir -p .github/workflows
Save the workflow file I provided as .github/workflows/quarto-publish.yml

Commit and push these changes to your main branch:
git add .github/workflows/quarto-publish.yml
git commit -m "Add Quarto publishing workflow"
git push origin main
This workflow will:

Trigger automatically when you push changes to .qmd, .Rmd, or .md files in the main branch

Set up R and Quarto in the GitHub Actions environment

Install your R dependencies using renv

Render your Quarto project

Automatically deploy the rendered files to the gh-pages branch

Now your process becomes much simpler:

Make changes to your Quarto documents in the main branch

Commit and push to main

GitHub Actions will automatically handle the rendering and deployment

The changes should appear on your GitHub Pages site within a few minutes after pushing to main.

Would you like me to explain any part of this workflow in more detail?

I implemented this but there was a failure on the first attempt. That was due to a missing renv.lock file in my repository. As Claude.AI had timed out (I have a free account) I went to Perplexity.AI who helped to resolve this issue:

The error “This project does not contain a lockfile” suggests that you’re using a package management system (likely renv for R or pip for Python) but haven’t created a lockfile yet. Here’s how to resolve this:

Locally, in your R project, run:
renv::snapshot()
This will create a renv.lock file in your project directory.

Commit and push the renv.lock file to your repository.

When I created the renv.lock file I got a warning about my local loading of my concertData R package, so I changed the loading of that package in my Quarto code chunks:

From: devtools::load_all("../packages/concertData")

To: devtools::install_github("vanleeuwen-hans/concertData")

And that was basically it, now the deployment is fully automated whenever I change something in my report qmd files and push it to GitHub.

Connecting with a private repository to protect sensitive data

The data that I used in this project was obtained from u2gigs.com (with permission), and was used during rendering of my Quarto web site that combined the analysis code, the output and my reporting. I did not want to have this data publicly available in my GitHub repository, however, to deploy my web site the data was needed. So I asked Claude.AI for a solution and it helped me in easy to follow steps to set it up.

Yes, it is possible to store the data/image files needed for rendering your Quarto website in a separate private GitHub repository, and then access them during the deployment workflow. This can be done by integrating the private repository into your GitHub Actions workflow. Here are the high-level steps:

1. Create a new private GitHub repository to store the data/image files.

2. In your GitHub Actions workflow, use the actions/checkout@v3 action to fetch the files from the private repository.

3. Make the files available to the Quarto rendering process during the deployment step.

This allows you to keep sensitive or large files out of your public repository, while still using them for the website build. Let me know if you need any clarification or have additional questions!

It then proceeded to help me with more detailed instructions on how to set this up. That worked very nicely and provided me with a secure solution in very little time!

Negative examples

Sometimes the AI tools were mixing up a current request with a past request. For example, they would provide me with code that included parts that had nothing to do with the current request. Starting new threads is very helpful to separate different requests to AI tools. All these tools have a history/library of all past and current conversations, so it is easy to go back. It happened one time that this mixing up also happened across two different conversations, so always pay attention to the output and check if it makes sense.
Sometimes the troubleshooting did not lead to resolving the issue, then the AI tool just keeps on suggesting “changes” to the code, while some times there was no change to previous suggestion. It also happened that a suggested fix that did not work 15 minutes ago comes back as a suggestion as the troubleshooting becomes lengthy. At times the suggested code fixes got worse and worse as I kept giving the feedback that it still did not work. In those cases it was important to have a manual deep-dive in the code and the problem. The good old human brain still beats the AI tools occasionally 😉
Forgetting about previously agreed formats or details also happens at times. It is important to make a clear and detailed prompt and explain the what is the problem, what output I expect and in what format. There are many frame works and best practices around, here is an example from the Google Data Analytics Certification I completed:
- Task, Context, References, Evaluate and Iterate.
I asked Claude.AI to create a nice visual with the set-up of the project with all its repositories, web sites and the connections between them. I provided all the details and Claude.AI created a Mermaid diagram, but the lay-out and formatting was very basic and did not improve with some iterations. So, creation of good quality and attractive visuals including text and relations is not a strength of the AI tools. I decided to create it manually in PowerPoint, you can see the result on the Sharing page of this report.
It happened a few times when I provided a working piece of code and asked it to improve or add a certain functionality that the AI tool decided to change other things in the code that broke it. That costed me some time to figure out why it suddenly stopped working. I told the AI tool about it and it apologized for changing code that worked and didn’t really need to be changed. Again, always important to carefully check if the output from the AI tool makes sense.

Hans van Leeuwen 2024 | E-mail | LinkedIn | X-Twitter