Video Tutorial

S3 Archive Depots: AWS - Module 8

| Admin

Intro to Archive Depots
Hi there I'm Jase, a solutions engineer at Perforce, and today I want to talk about a very exciting new feature in Helix Core 2023.2, which is support for archive depots being backed by S3 storage. This is something that's been requested for quite a while. I'm super excited about it. Basically Archive depots, which are something that has existed for a long time in Helix Core, allow you to take the actual file data from files that are stored on your server and move those off to somewhere else. The idea is that this could be some kind of slower storage, like a spinning disk, or even a volume that you could detach, maybe snapshot it, something, save it. All the metadata stays on your server.

So you'll still see the history of these files. But you won't be able to actually access that file data and it's no longer taking up space on your main, fast, directly-mounted storage which, especially in the cloud, takes up more money as well.

Instead, you can set up your archive Depot so that it will directly store stuff on S3 storage. That could be AWS's S3, but it also could be any other S3 compatible storage, as long as they follow the standard S3 API for how you interact with it, which most of them do.

Object storage or S3 storage is way cheaper than block storage, than standard disk storage, on the order of maybe one fifth to one tenth of the price, a lot of the time. And then because it's in S3, it will often have extra redundancies and backups built in, and that depends on your cloud provider.

So I'm not going to get into all the details on that. You'll have to check with them and see what kind of durability they have on that data. But by putting it in S3, rather than having to put it on a drive, zip that up, detach it, store it somewhere and then bring it back. You can just keep that S3 attached all the time, and if you ever need to get those files back, you can just run the p4 restore command to restore files that you've previously put into your archive. This is a really exciting feature.

So I just want to quickly walk you through how you would set this up for AWS.

AWS: S3 Setup
Here we are in the AWS dashboard. And what we need to do first is if we're going to store stuff in S3, we need to create a bucket. So I'm going to make a new bucket here, choose a region for it.

First, I need to give my bucket a name. I'm going to call this jase-p4-s3-archive. Bucket names do have to be unique globally, so make sure you come up with something that's unique for your company name or your name. Going down, I'm going to leave all of this on the defaults, blocking public access. I won't use bucket versioning on this. And I'm going to leave it on server side encryption with S3 managed keys. You can change these if you have different security policies, but this gives us a nice default secure bucket.

You'll see, I have my new bucket here. And there's nothing in it. Great.

AWS: Create S3 User & Group
Now in order to access this bucket programmatically, we are going to need an access key and a secret key.

To do that, we are going to go to the identity and access management section here, and we are going to go to users. And create a new user.

I'm going to name this s3 Archive demo user. You could call this whatever you want. And in this case, I'm going to leave the management console unchecked because this is only a programmatic user. I will hit next. And now I need to add these users to a group.

Now we could attach a policy directly, but it is recommended to add users to a group, so I'm going to do that. I'm going to create a group. And I'll call this again. S3 archive demo. And I'll add group this time. And then we're going to attach a policy to it. And in this case, I want to create a policy.

AWS: Create S3 Policy
As a best practice, we want to give any user the minimum amount of permissions that they actually need for their job.

So in this case, the only service we need access to is S3. And you could hit all actions, but we're going to go with the minimum that we need. I'm going to search for getObject. And we just need this one here that allows us to read objects. So if you're unarchiving something it'll need to read that. We're going to do PutObject, which is under right here that allows us to create objects in S3. And then we need DeleteObject. And that allows it to delete objects. Now that we have those three permissions selected. And you can see, we have one under read two under write Under resources here. We could say all, but we're going to be even more specific than that.

And we're going to add ARNs to restrict access. And I need to add my buckets name, which let me go back here and check. I call that jase-p4-s3-archive. I'll just copy that. Paste this in here. And then for object name, I'm going to check any object name because this whole bucket is just going to be for this archive.

So that'll put the star at the end there, and then I click add ARNs. And click next. So I will call this. Again, s3-archive-demo-policy. Could add a description if I want, and you'll see, I just have this limited read write there. And I will create my policy.

Now, if I go back to my group, that I was just creating that opened in a new tab. I'll hit refresh here to get my policies. And if I just search for demo, I should see this S3 archive demo policy. I'll just check that. And click create user group.

And now I will see I'm back on my creating my user and I'm going to choose this new group that I made. To add them to that group and hit next. And then in here, I just get a chance to review it. They're part of this S3 demo group that has all those permissions. We just set up and I'll hit, create user.

AWS: Create Access Key
Now we just need to get our secret key. So I'll click on this user under security credentials, I'll scroll down to access keys, and create access key. Now for this, I'm just going to choose other. In this case, we just need the access key and the secret. So I'll hit next. And I'll give it a description here. So again, this is the Jase S3 demo. key. I'll hit create access and I'll get an access key and a secret access key.

So I'm going to hit copy on this and I'm just going to paste that over into a text document off the screen. So I have that. And then for the secret access key, I'm going to delete this user right after this. So I will show this to you. You'll see this big, long string. I will copy this. As soon as I hit done here, I will not have access to this secret key anymore. So I have to be sure that I write this down and save it. And you can also download it in a CSV file and I will just hit, done.

Alright, So we now have our new user that has access to our new S3 bucket. And I have that secret key and access key.

P4: Archive Depot Creation
Now we're going to go to P4 to create our archive Depot, which we will then link to whatever S3 storage we're using. If you go to the HelixCore Administrator Guide at perforce.com, you'll find this page about reclaim disk space by archiving files. You can just search for that up here, or you can find it in the navigation on the left, under manage server and its resources. And on here, this talks about archive depots generally, but toward the bottom, we have this new section S3 storage for archive depots that talks about the different keys that we're going to need and how we will set this up.

So I'm going to go through these steps and show you, but just so you know, you can come back and reference this here.

So now the first thing that we need to do is to create an archive Depot. You can create an archive Depot through p4admin, but as of the time of this recording, that does not yet have the capability to set one up for S3.

So we're going to do this through the command line here. So what I'm going to do is I'm going to create a Depot with the p4 depot command. And I'm going to add dash t archive. And that means this is going to be an archive type depot. And then I'm going to give it a name. So this time, I will say s3-archive-demo is the name of my Depot. And I'll hit enter. And this is going to bring up my default text editor for my system here.

We open this up. You can see it has the name that we gave it in the command and it's type, archive and map is normally what determines where the files are actually stored in the file system. But we are going to set this one up using the address field. So if I go back to this documentation here, I can see this example here. And we'll just go through this together.

AWS: Archive Depot Setup
So I'll paste this in. And you can see we have a new. Field at the bottom called address colon. And now first thing is it has to start with s3. And then it has a comma with no spaces around it. And then we're going to do a series of key value pairs that are separated by a colon. So you can see here we have region colon. This defaulted to us east 1. And I go to Properties, I can see, I put this in us west 2. So I'm going to copy that. And go back to TextEdit, and I'm going to replace that with us west 2. Again, comma with no spaces around it, then the bucket colon, and then this is going to be the bucket name.

And it's this name I gave it the Jase P4 S3 archive. And then we have our access key colon and our secret key colon. So for my access key, that was this first shorter one that we copied and no spaces gotta make sure I remove that. And then for my secret key, I'm gonna also grab that from my text document here and go in here and paste that. This is the format for AWS. Now, one thing to note real quick while we're here, is that, in our address field. This is the list of key value pairs. So far we've used region. Bucket access key, secret key. The bucket, access key, and secret key are all required. These other ones depend on the implementation. So region is important for AWS, but if you're using another service like digital ocean, you don't need the region field. The URL field defaults to the AWS url for buckets, it'll automatically assemble that for you. So we don't need the URL field since we're doing this on AWS.

However, if we were doing this on digital ocean or another S3 storage provider, we would need to add the access URL for that. I'm just going to control S to save and then control Q to quit. And if I go back to my terminal here, I will see depot S3 archive demo saved.

And Now if I type P4 depots, I can see all of my depots here, including this new S3 archive demo.

P4: Archiving Files
I'm going to clear my screen. And let's go look at the documentation real quick. So up further on this same page where we were, we can see some instructions about creating and restoring files from an archive Depot, but I just want to grab this link to actually go to the command line reference for p4 archive.

I definitely recommend reading through this, so you understand all of the options here, but some important ones: the -z command, which makes it so that it will also store files that have been branched to, or from another revision.

This -h to do not archive head revisions. This is very useful for if you just wanted to archive all the history, but still have easy online access to all of the latest revisions of everything.

One other one worth noting here is the -t, which will also archive text files. By default, this is only going to archive binary files or text files that are just stored as binary data. Normally text files are small enough that you're not going to get a lot of gain by moving those off to an archive Depot. However, if you want to, you can use this -t flag for that as well.

So I could archive this entire Depot, but just for this example, I'm going to archive this one particular folder inside of it. So if I navigate to that in my depot view here on the left, I will get this depot path up here. I'm going to copy that. So the command will be p4 archive dash uppercase D and then I have to give the name of the archive depot I want to send it to, which is this: s3-archive-demo up here.

And then I can add some of these flags if I want, in my case, I'll add t and z to say, I want even branched files, and I also want text files just because I'll be thorough here. And then I'm going to paste in that Depot path from here that we copied.

So I'll just copy that. Come here. Paste that in. And I do need to put this into quotes because I have a space in my filename here, which is generally a bad idea for exactly this reason. I'm gonna say dot dot. dot. And then hit enter.

And Now this is running through and it's archiving those files off to that S3 storage. Now, depending on the size of these files, depending on if your storage is on the same service as your server, like if your server's on AWS and your S3 is also on AWS, this will be very fast. In this case, I'm actually using a server not hosted on AWS, so it's having to upload those to AWS, to store them. And you can see it came back and these were some large ish video and audio files. So we can see here that if I come back now these files are gone from here because I've archived them and they are now no longer taking up space on my server. This is especially useful, I find to archive an entire project. Or to archive the entire history of a particularly large set of files so that you just keep the head revision.

If we now come back to our bucket and refresh, we will see that we have a folder called p4, 1, depots, archive demo, old game, and you can see this whole path that we had from before has been recreated and files or rather folders for these, which would contain all of their different revisions. So you can see this is two and a half gigs that I just saved right here. And I can see that all of these are here.

P4: Restoring Archived Files
And now I can continue on working. If I ever decided that I needed to get these files back then what I can do is restore them.

Let's go back one more time to our documentation here. And in this case, I'm going to click on this link for the p4 restore command. You can also get to this directly from the helix core command line reference, which is one of my main bookmarks that I use all the time. And again, you can read information. This one's generally a lot simpler. You're just going to specify the Depot from which you want to restore them, and then you'll give that same path of which files you want to put back.

So you don't have to get everything back out of the archive if you don't want to. So let's go through that here real quick.

Now, when you want to restore this, it's generally a best practice to use the P4 Verify command, like this. p4 verify -a to tell it, you want to verify an archive Depot and then we could put in the S3 archive. depot name here. Slash dot dot, dot, or we could give the specific path that we want to verify.

What that does is just make sure that no corruption or anything has happened to those files while they've been in storage. This is honestly less of a concern, in my opinion, when it comes to cloud storage. But if you had this on an actual physical disk that you just stuck in a closet for several years, it's very possible that that data is going to degrade over time. So verify is going to check for that and make sure you're not going to bring back files that have some kind of data corruption in them.

And now to get this back, we're just going to do a very similar command. We're going to go p4 restore, dash uppercase D and again, we'll give the name of our archive Depot. And then the path of where those files are going to go back to.

So in this case, this will be, this same path that I used up here, I'm just going to copy this same path that we had from before. If this were a whole project, we could also say, we just want to get back everything. So in this case, we only archived this one folder. But let's say we gradually archived various folders over time. But what I want to say now is, Hey, everything that's inside of this that used to be inside of this old game Depot. Get me back all of that. In this case, it'll just be the same files, and of course I have a typo here that's why I like to copy and paste. This is S3. Archive demo is what I called that Depot, not depo. And I'll just hit enter there and you can see it's restoring these files back. That one went much faster than it did when it was uploading them. And now if I come back to my P4V and hit refresh. I will see that this video1 folder is back.

All those files are back here. And if I were to sync my workspace, I would get all those files back.

Conclusion
I hope this detailed tutorial of S3 storage for archive, depots and helix core was helpful for you. Please let us know in the comments. If there are more things you'd like to see, or if you have questions, and of course, check out our YouTube channel, go to Perforce. And of course, check out our YouTube channel for more videos and go to perforce.com for the latest updates and software downloads, as well as all the documentation that I mentioned in this video.

By Need

By Industry

Featured Product

Own Your Creative Workflows

2025 State of Data Compliance and Security

S3 Archive Depots: AWS - Module 8

Course - Getting Started with Helix Core - For Admins