Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make 'ml restore' behave the same for broken collections/inconsistent collections #329

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

blappm
Copy link

@blappm blappm commented Nov 3, 2017

Make 'ml restore' behave the same for broken collections/inconsistent collections. At the moment it is possible to load a collection if there are modules missing, but it's not possible to load if a checksum has changed.

If a checksum has changed, the collection is lost, since '--force' is not available. So I've added also the --force parameter to be consistent with the other parts of lmod:

ml restore 'collectionname' --force

@@ -1174,7 +1175,13 @@ function M.getMTfromFile(self,tt)
activeT = nil -- done with activeT
if (#aa > 0) then
sort(aa)
LmodWarning{msg="w_Mods_Not_Loaded",module_list=concatTbl(aa," ")}
if (force ~= true) then
LmodWarning{msg="w_Missing_Coll", collectionName = collectionName, module_list = concatTbl(aa,"\", \"")}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not make this a LmodError if you exit anyway?

@@ -527,7 +527,7 @@ end
-- state. If a user has a "default" then use that collection.
-- Otherwise do a "Reset()"
-- @param collection The user supplied collection name. If *nil* the use "default"
function Restore(collection)
function Restore(collection, force)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add any new arguments to the doc string of the function.

@@ -565,6 +565,12 @@ function Restore(collection)


local masterTbl = masterTbl()
local force
if (masterTbl.force) then
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the point of this?

local force = masterTbl.force

does the same? And if you only need to pass it as an argument, do you really need a local variable?

if (force ~= true) then
LmodWarning{msg="w_Missing_Coll", collectionName = collectionName, module_list = concatTbl(aa,"\", \"")}
LmodErrorExit()
return false
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a change from the current behaviour?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed. And I considered that as a bug, Robert too as it seems :)

@rtmclay
Copy link
Member

rtmclay commented Nov 4, 2017

Thanks for reporting to me that it is possible for a collection to load even if it is missing a module. That is a bug and I'll try to fix it.

As for this PR. it makes me nervous. The point of restore is to have it behave EXACTLY as if the user did the loads by hand. Lmod goes to a lot of trouble to make sure that the restore is correct. I don't think it is a good idea to allow users to get sorta what they asked for.

Why not just force users to rebuild their collection? Comments?

@blappm
Copy link
Author

blappm commented Nov 4, 2017

Hi Robert,

Rebuilding the collection is painful. It's much more easier to load the
missing parts and the save the collection again. My patch allows this with --force.

We have some modules which change frequently (License Ports) because
new licenses are added, and other, similar stuff. With the current behaviour
module collections are almost useless since users need to rebuild them
by hand - over and over again.

@wpoely86
Copy link
Contributor

wpoely86 commented Nov 4, 2017

@blappm Can't you put the stuff that changes in a separate file and load that in the module? So the actual module file remains unchanged.

@blappm
Copy link
Author

blappm commented Nov 4, 2017

There are other examples: A missing variable in one module. We rebuild the modules
on all applications for this software and currently all module collections containing this version
can't be loaded anymore. You should really give an option (--force) to the people knowing
what they do, and not act as a guardian. IMHO the warning is enough.

@rtmclay
Copy link
Member

rtmclay commented Nov 4, 2017

The design of the collection hash is to only checks for any kinds of load() and prepend_path() and append_path() to MODULEPATH. Any other changes in a modulefile should be ignored. If that is not the case then there is a bug in the way that the collection hash is computed and that needs to be fixed.

Could you give me a modulefile that cause the hash value to changes that doesn't change the loads or changes MODULEPATH? Thanks!

@blappm
Copy link
Author

blappm commented Nov 6, 2017

I'll have to try and test this again in this case. The error we get if the modulepath has changed is different and it's missleading too: A notice that the collection hasn't been loaded is not given then, just a 'warning' that one needs to rebuild:

ml restore test
Restoring modules from user's test
Lmod Warning: The system MODULEPATH has changed: please rebuild your saved collection.

The whole concept about this modulepath in the collections is broken IMHO and makes it very difficult to exchange collections between users. Even if all modules are available but the module paths are not in the same order or one modulepath is twice in the list, loading fails.

@blappm
Copy link
Author

blappm commented Nov 6, 2017

Why not save the modulepath as well in the collection and try to load the modules with this one ?

@rtmclay
Copy link
Member

rtmclay commented Nov 6, 2017

The principal design criteria for collections is they produce the same environment as if the user did the loads by hand or sourced a shell script. That is their only function. Collections are not designed to be shared between users. How could they be? They depend on what the current $MODULEPATH. Also collections do not add to the current modules. Restoring a collection first purges ALL modules then loads the list.

If a user does:

 $ module use $HOME/my_modules; module load foo; module save bar

Then the bar collection adds ~/my_modules to MODULEPATH and remembers that. That way when a user does "module restore bar" they don't have to first remember to do the module use before restoring the bar collection.

Also what modules get loaded is highly dependent on $MODULEPATH and duplicate directories are a very bad idea.

If users want to share the same set of modules, they can easily create a shell script that does:

module use /a/b/c
module purge  # This is optional
module load a b c

and everybody can source that script. This script is even shell independent!

Lmod does use the remembered MODULEPATH to load the module files. BUT it first checks to see that the old system modulepath and the new one are the same. If they are not, Lmod errors out. This is there to make sure that the restored collection matches what the user would load by hand.

I will improve the error message when the system modulepath are different.

@blappm
Copy link
Author

blappm commented Nov 6, 2017

Our users liked to use collections exactly for that: Saveing their working module collection in a way other users (for example in the same group) can use it (of course without manual interactions). The problem is that there is no nice shortcut to do it.

If you like to keep module collections for privat use only, please think then about adding something like ml export, which could create a shellscript with 'ml use, ml purge, ml load ...'. You'd make a lot of people happy, I promise you.

@wpoely86
Copy link
Contributor

wpoely86 commented Nov 6, 2017

@blappm why don't you make it a module instead of a collection? You can use try_load? It is much easier to share then a collection?

@blappm
Copy link
Author

blappm commented Nov 7, 2017

@wpoely86 Not really an option. Our users like to share their ENV for an application, exactly like a collection. If they just use 'try load' they maybe have other stuff loaded which doesn't fit. And they like to save and load those modules as easy as possible. Of course it's possible to do it by hand, but they like that fully automated. At the moment some of them are using 'ml list' piped into sed/awk blabla
to create some sort of shell script - but this is ugly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants