glusterfs.git/xlators/mgmt/glusterd/src/glusterd-op-sm.h, branch v3.6.3beta2

glusterd : release cluster wide locks in op-sm during failures

2015-03-04T07:31:08+00:00

glusterd op-sm infrastructure has some loophole in handing error cases in
locking/unlocking phases which ends up having stale locks restricting
further transactions to go through.

This patch still doesn't handle all possible unlocking error cases as the
framework neither has retry mechanism nor the lock timeout. For eg - if
unlocking fails in one of the peer, cluster wide lock is not released and
further transaction can not be made until and unless originator node/the node
where unlocking failed is restarted.

Following test cases were executed (with the help of gdb) after applying this
patch:

* RPC timesout in lock cbk
* Decoding of RPC response in lock cbk fails
* RPC response is received from unknown peer in lock cbk
* Setting peerinfo in dictionary fails while sending lock request for first peer
  in the list
* Setting peerinfo in dictionary fails while sending lock request for other
  peers
* Lock RPC could not be sent for peers

For all above test cases the success criteria is not to have any stale locks

Patch link : http://review.gluster.org/9012

Change-Id: Ia1550341c31005c7850ee1b2697161c9ca04b01a
BUG: 1179136
Signed-off-by: Atin Mukherjee 
Reviewed-on: http://review.gluster.org/9012
Reviewed-by: Krishnan Parthasarathi 
Tested-by: Gluster Build System 
Reviewed-by: Kaushal M 
Reviewed-on: http://review.gluster.org/9393
Reviewed-by: Raghavendra Bhat

glusterd : Volname, brickpath & volfpath length validation

2014-05-03T15:20:06+00:00

While creating a volume and adding a brick validation for _POSIX_PATH_MAX is
done on absolute pathname instead of relative pathname due to which a brickpath
having less than _POSIX_PATH_MAX may also fail the validation if the directory
length is greater than (_POSIX_PATH_MAX -strlen(brickpath/volume name).

Also this fix addresses one cli response message correction which says the
volume file is too long instead of brick path is too long (when brickpath
length validation doesn't fail and vol file length validation fails.)

It is also important to note that with the current design of volfile naming, it
can not be guranteed that volname and brickpath can have max of _POSIX_PATH_MAX
characters.

Change-Id: I1283d1f9dea96ae797620002c8723719f26a866d
BUG: 1085330
Signed-off-by: Atin Mukherjee 
Reviewed-on: http://review.gluster.org/7420
Reviewed-by: Niels de Vos 
Reviewed-by: Krishnan Parthasarathi 
Tested-by: Gluster Build System 
Reviewed-by: Vijay Bellur

glusterd/geo-rep: Looks for state_file and pid-file in gsyncd_template.conf

2014-05-02T03:21:26+00:00

If entries like state_file or pid-file are missing in the gsyncd.conf
or if the gsyncd.conf is also missing, glusterd looks for the missing
configs in the gsyncd_template.conf

status will display "Config Corrupted" as long as the entry is missing in
the config file.  Missing state-file entry in both config and template
will not allow starting a geo-rep session.

However stop force will successfully stop an already running session,
if the state-file entries are missing in both the config file and
the template, as long as either of them have a pid-file entry.

if the pid-file entry is missing in the gsyncd.conf file, starting a
geo-rep session will not be allowed.

if the pid-file entry is missing in an already started session, then
stop force will fetch it from the config template and stop the session.

if the pid-file entry is missing in both the config and the template,
stop force will fail with appropriate error stating pid-file entry is missing.

Change-Id: I81d7cbc4af085d82895bbef46ca732555aa5365d
BUG: 1059092
Signed-off-by: Avra Sengupta 
Reviewed-on: http://review.gluster.org/6856
Tested-by: Gluster Build System 
Reviewed-by: Vijay Bellur

glusterd/Vol-Locks : Moving globals into glusterd priv and code refactoring

2014-02-14T15:05:30+00:00

Moved globals(vol_lock and txn_opinfo dicts and global_txn_id) into
glusterd priv

Moved glusterd_op_send_cli_response() out of gd_unlock_op_phase
as gd_unlock_op_phase and glusterd_clear_txn_opinfo should only
be called if the txn id has been successfully generated. The
cli resp should be sent irrespective of that.

Changed log levels from ERROR to WARNING for some volume lock logs
where the logs are expected and is not an error

Added logs for better transparency of transaction ids.

Change-Id: Ifac9b23aa9f1648c9ae252cfd3ac50bb2ed46728
BUG: 1011470
Signed-off-by: Avra Sengupta 
Reviewed-on: http://review.gluster.org/6976
Tested-by: Gluster Build System 
Reviewed-by: Vijay Bellur

glusterd: Volume locks and transaction specific opinfos

2014-02-11T07:25:40+00:00

With this patch we are replacing the existing cluster-wide
lock taken on glusterds across the cluster, with volume locks
which are also taken on glusterds across the cluster, but are
volume specific. So with the volume locks we are able to perform
more than one gluster operation at the same time, as long as the
operations are being performed on different volumes.

We maintain a global list of volume-locks (using a dict for a list)
where the key is the volume name, and which saves the uuid of the
originator glusterd. These locks are held and released per volume
transaction.

In order to acheive multiple gluster operations occuring at the
same time, we also separate opinfos in the op-state-machine, as a
part of this patch. To do so, we generate a unique transaction-id
(uuid) per gluster transaction. An opinfo is then associated with
this transaction id, which is used throughout the transaction. We
maintain a run-time global list(using a dict) of transaction-ids,
and their respective opinfos to achieve this.

Upstream Feature Page: http://www.gluster.org/community/documentation/index.php/Features/glusterd-volume-locks

Change-Id: Iaad505a854bac8de8f83beec0357eb6cde3f7ea8
BUG: 1011470
Signed-off-by: Avra Sengupta 
Reviewed-on: http://review.gluster.org/5994
Tested-by: Gluster Build System 
Reviewed-by: Vijay Bellur

glusterd/geo-rep : Allow volume to stop if geo-rep is not active.

2014-01-15T06:24:58+00:00

In staging phase of volume stop, code is added to read the state_file
for each slave of the master to which the volume belongs. If any of the
geo-rep session is active with at least one slave, volume is not
allowed to stop else it is allowed.

Change-Id: I4a01a357fc86b872e9635b3d19998cdbd9545114
BUG: 1049727
Signed-off-by: Kotresh H R 
Reviewed-on: http://review.gluster.org/6663
Tested-by: Gluster Build System 
Reviewed-by: Vijay Bellur

mgmt/glusterd: fix undefined sybmol error related to BD

2013-11-20T06:11:29+00:00

Change-Id: I2210f1ac7de04c6025c0ec02d998b626d41466ae
BUG: 1028672
Signed-off-by: Pranith Kumar K 
Reviewed-on: http://review.gluster.org/6303
Reviewed-by: M. Mohan Kumar 
Tested-by: Gluster Build System 
Reviewed-by: Vijay Bellur

bd: posix/multi-brick support to BD xlator

2013-11-13T19:38:42+00:00

Current BD xlator (block backend) has a few limitations such as
* Creation of directories not supported
* Supports only single brick
* Does not use extended attributes (and client gfid) like posix xlator
* Creation of special files (symbolic links, device nodes etc) not
  supported

Basic limitation of not allowing directory creation is blocking
oVirt/VDSM to consume BD xlator as part of Gluster domain since VDSM
creates multi-level directories when GlusterFS is used as storage
backend for storing VM images.

To overcome these limitations a new BD xlator with following
improvements is suggested.

* New hybrid BD xlator that handles both regular files and block device
  files
* The volume will have both POSIX and BD bricks. Regular files are
  created on POSIX bricks, block devices are created on the BD brick (VG)
* BD xlator leverages exiting POSIX xlator for most POSIX calls and
  hence sits above the POSIX xlator
* Block device file is differentiated from regular file by an extended
  attribute
* The xattr 'user.glusterfs.bd' (BD_XATTR) plays a role in mapping a
  posix file to Logical Volume (LV).
* When a client sends a request to set BD_XATTR on a posix file, a new
  LV is created and mapped to posix file. So every block device will
  have a representative file in POSIX brick with 'user.glusterfs.bd'
  (BD_XATTR) set.
* Here after all operations on this file results in LV related
  operations.

For example opening a file that has BD_XATTR set results in opening
the LV block device, reading results in reading the corresponding LV
block device.

When BD xlator gets request to set BD_XATTR via setxattr call, it
creates a LV and information about this LV is placed in the xattr of the
posix file. xattr "user.glusterfs.bd" used to identify that posix file
is mapped to BD.

Usage:
Server side:
[root@host1 ~]# gluster volume create bdvol host1:/storage/vg1_info?vg1 host2:/storage/vg2_info?vg2
It creates a distributed gluster volume 'bdvol' with Volume Group vg1
using posix brick /storage/vg1_info in host1 and Volume Group vg2 using
/storage/vg2_info in host2.

[root@host1 ~]# gluster volume start bdvol

Client side:
[root@node ~]# mount -t glusterfs host1:/bdvol /media
[root@node ~]# touch /media/posix
It creates regular posix file 'posix' in either host1:/vg1 or host2:/vg2 brick
[root@node ~]# mkdir /media/image
[root@node ~]# touch /media/image/lv1
It also creates regular posix file 'lv1' in either host1:/vg1 or
host2:/vg2 brick
[root@node ~]# setfattr -n "user.glusterfs.bd" -v "lv" /media/image/lv1
[root@node ~]#
Above setxattr results in creating a new LV in corresponding brick's VG
and it sets 'user.glusterfs.bd' with value 'lv: --deltag
>

Changes from previous version V5:
* Removed support for delayed deleting of LVs

Changes from previous version V4:
* Consolidated the patches
* Removed usage of BD_XATTR_SIZE and consolidated it in BD_XATTR.

Changes from previous version V3:
* Added support in FUSE to support full/linked clone
* Added support to merge snapshots and provide information about origin
* bd_map xlator removed
* iatt structure used in inode_ctx. iatt is cached and updated during
fsync/flush
* aio support
* Type and capabilities of volume are exported through getxattr

Changes from version 2:
* Used inode_context for caching BD size and to check if loc/fd is BD or
  not.
* Added GlusterFS server offloaded copy and snapshot through setfattr
  FOP. As part of this libgfapi is modified.
* BD xlator supports stripe
* During unlinking if a LV file is already opened, its added to delete
  list and bd_del_thread tries to delete from this list when a last
  reference to that file is closed.

Changes from previous version:
* gfid is used as name of LV
* ? is used to specify VG name for creating BD volume in volume
  create, add-brick. gluster volume create volname host:/path?vg
* open-behind issue is fixed
* A replicate brick can be added dynamically and LVs from source brick
  are replicated to destination brick
* A distribute brick can be added dynamically and rebalance operation
  distributes existing LVs/files to the new brick
* Thin provisioning support added.
* bd_map xlator support retained
* setfattr -n user.glusterfs.bd -v "lv" creates a regular LV and
  setfattr -n user.glusterfs.bd -v "thin" creates thin LV
* Capability and backend information added to gluster volume info (and
--xml) so
  that management tools can exploit BD xlator.
* tracing support for bd xlator added

TODO:
* Add support to display snapshots for a given LV
* Display posix filename for list-origin instead of gfid

Change-Id: I00d32dfbab3b7c806e0841515c86c3aa519332f2
BUG: 1028672
Signed-off-by: M. Mohan Kumar 
Reviewed-on: http://review.gluster.org/4809
Tested-by: Gluster Build System 
Reviewed-by: Anand Avati

glusterd : Improved quota volume reset command

2013-10-28T18:39:21+00:00

Quota volume reset command without "force"
option fixed, doesn't fail anymore. It resets
unprotected fields and not the protected ones.

Also, an appropriate message is provided to the user
for the following cases :
1. only unprotected fields are reset, "force" option
should be used to reset protected fields.
2. Both protected and unprotected fields are reset.
3. No field was reset, "force" option required.

Test case for the same also added.

Change-Id: I24e8f1be87b79ccd81bf6f933e00608b861c7a16
BUG: 1022905
Signed-off-by: Anuradha 
Reviewed-on: http://review.gluster.org/6135
Tested-by: Gluster Build System 
Reviewed-by: Krishnan Parthasarathi 
Reviewed-by: Vijay Bellur

cluster/afr: [Feature] Command implementation to get heal-count

2013-10-14T21:41:54+00:00

Currently to know the number of files to be healed, either user
has to go to backend and check the number of entries present in
indices/xattrop directory. But if a volume consists of large
number of bricks, going to each backend and counting the number
of entries is a time-taking task. Otherwise user can give
gluster volume heal vol-name info command but with this
approach if no. of entries are very hugh in the indices/
xattrop directory, it will comsume time.

So as a feature, new command is implemented.

Command 1: gluster volume heal vn statistics heal-count
This command will get the number of entries present in
every brick of a volume. The output displays only entries
count.

Command 2: gluster volume heal vn statistics heal-count
           replica 192.168.122.1:/home/user/brickname

           Here if we are concerned with just one replica.
So providing any one of the brick of a replica will get
the number of entries to be healed for that replica only.

Example:
Replicate volume with replica count 2.

Backend status:
--------------
[root@dhcp-0-17 xattrop]# ls -lia | wc -l
1918

NOTE: Out of 1918, 2 entries are  dummy
entries so actual no. of entries to be healed are
1916.

[root@dhcp-0-17 xattrop]# pwd
/home/user/2ty/.glusterfs/indices/xattrop

Command output:
--------------
Gathering count of entries to be healed on volume volume3 has been successful

Brick 192.168.122.1:/home/user/22iu
Status: Brick is Not connected
Entries count is not available

Brick 192.168.122.1:/home/user/2ty
Number of entries: 1916

Change-Id: I72452f3de50502dc898076ec74d434d9e77fd290
BUG: 1015990
Signed-off-by: Venkatesh Somyajulu 
Reviewed-on: http://review.gluster.org/6044
Tested-by: Gluster Build System 
Reviewed-by: Anand Avati