Thursday, February 3, 2011

How to allow wget to overwrite files

Hi,

Using wget command, how do I allow/instruct to overwrite my local file everytime, irrespective of how many times I invoke.

Let's say, I want to download a file from the location: http://server/folder/file1.html

Here, whenever I say wget http://server/folder/file1.html, I want this file1.html to be overwritten in my local system irrespective of the time it is changed, already downloaded, etc. My intention/use case here is that when I call wget, I'm very sure that I want to replace/overwrite the existing file.

I've tried out the following options, but each option is intended/meant for some other purpose.

  1. -nc => --no-clobber
  2. -N => Turn on time-stamping
  3. -r => Turn on recursive retrieving
  • I don't think you can do it unless you also download the directories (so pass the -x flag). If you know what the file is, you can do use -O filename, so for example:
    wget http://yourdomain.com/index.html -O index.html

  • wget -q http://www.whatever.com/filename.txt -O /path/filename.txt 
    

    -q is quiet mode so you can throw it in a cron without any output from the command

    Gnanam : @aleroot There is no direct option in `wget` command that does this without me specifying explicitly using `-O filename`?
    aleroot : It seems that there is no way to force overwriting every files when downloading files using wget. However, use -N option can surely force downloading and overwriting newer files. wget -N Will overwrite original file if the size or timestamp change
    rasjani : Not true. direct the output of the command into stdout and pipe it to the file: wget -q $urlYouNeedToGrap > $fileYouWantToOverwriteEverytime
    From aleroot
  • Untried: maybe you can work with wget -r --level=0.

    Another possibility: curl -O overwrites (but it uses a different way of choosing the file name, which may or may not matter to you).

    From Gilles
  • Use curl instead?

    curl http://server/folder/file1.html > file1.html
    
    Gnanam : @StuThompson I'm not a Linux expert. What is the basic difference between `wget` and `curl`? I'm sure that each command is meant for some specific purpose.
    Stu Thompson : @Gnanam: They overlap a lot in basic CLI utility, actually. Both can make an HTTP connection and save the result to disk. For a run down on the differences check out http://daniel.haxx.se/docs/curl-vs-wget.html Regardless, the above usage is complete valid. There are other tools in this general area, too: http://curl.haxx.se/docs/comparison-table.html
    Gnanam : @StuThompson Those 2 links are really helpful to understand the difference.
  • Why not put a small wrapper around the wget in your script?

    The script could move all the files to a temporary location, then wget the remote files / web pages.

    On success delete the files in the temporary location. On failure move the files back and raise an error.

    There isn't a simple way to do what you want using just wget unless you know specifically the name of all files, in which case the -O option will allow you to force the filename of the file downloaded.

  • Like this:

    wget -q $URL > $FILE

    From rasjani

0 comments:

Post a Comment